Using two-step machine learning to predict harmful algal bloom risk

(1) Simon G. Atkins Academic & Technology High School

https://doi.org/10.59720/24-196
Cover photo for Using two-step machine learning to predict harmful algal bloom risk
Image credit: Jordan Whitfield

Water shortages are a global issue now impacting North America. Inland water quality is significantly affected by the seasonal occurrence of harmful algal blooms (HABs). Annually, the economic impact of HAB is over $2.3 billion due to the cleanup, drinking water restrictions, tourism, and closure of fisheries. Existing machine learning (ML) models use binary classifications, such as the presence or absence of HABs, to predict cyanobacteria proliferation, which can leave a gap in assessing the likelihood of a potential outbreak. In this study, we explored the application of ML regression algorithms to predict HAB risk on a continuum. Using primary data from water samples collected in Maryland, North Carolina, and Virginia, we hypothesized there would be a positive correlation between algal weight (an indicator of HAB risk) and nitrates, phosphates, and temperature. To test this hypothesis, we trained artificial intelligence (AI) models using primary data collected from 30 inland aquatic systems. Using the results, we then built Monte Carlo simulations generating over 100,000 scenarios to perform sensitivity analysis on the variables to predict the HAB risk. Through our experiments and selecting ML regression models with high validation accuracies, we achieved a 77% test accuracy in predicting HAB risk levels. We checked the results of our HAB forecasts with observations from United States Geological Survey (USGS) and National Aeronautics and Space Administration (NASA) for specific locations and dates, further validating the model's accuracy.

Download Full Article as PDF