Comparative study of machine learning models for water potability prediction

Currently, water quality is an increasingly pressing issue globally because many people cannot access clean drinking water. In order to better predict the potability of water, scientists have used many machine learning models, such as artificial neural network (ANN) and support vector machine (SVM) models. However, many of these methods tend to be complex and take up a lot of computing resources, making them inefficient, so our research aimed to find a machine learning model that is not only effective at predicting the quality of water, but also simpler and more efficient. We hypothesized that neural networks would be the most effective at this task because of their ability to recognize patterns and underlying relationships within complex datasets. We experimented with four different machine learning models: logistic regressions, k-nearest neighbors, decision trees, and neural networks. Each algorithm was trained and validated using the same dataset. We found that logistic regression with L1 regularization had the highest precision score of 0.75000, and decision trees had the second highest precision score of 0.74359. When comparing the accuracy score, we found decision trees had a higher accuracy score than logistic regression. This could be due to the fact that L1 regularization estimates around the median of the data, while the “yes and no” structure of decision trees is very effective for binary classifications. As a result, we concluded that decision trees were the most effective at predicting water quality.