Machine Learning: Comparison of Algorithms for Determining Water Quality in the Rímac River

Juan Marroquin-Peralta, Yvan Jesus Garcia Lopez, Jose Antonio Taquia

Research output: Contribution to journalArticle (Contribution to Journal)peer-review


The evaluation of the quality of the water in rivers is necessary to manage the efficiency of its use, being necessary to carry out physicochemical and biological analyzes to determine its healthiness, but it implies in its determination of a series of parameters that use various analytical methods that often they are tedious and time consuming to calculate. The present study makes a comparison of machine learning models such as Multiple Linear Regression (MLR), Neural Network Backpropagation (BPNN) and Support Vector Regression (SVR) to estimate Dissolved Oxygen (DO) and Biochemical Oxygen Demand (BOD) to determine the quality of the water of the Rímac river. Water samples were collected from 26 stations and non-point sources of contamination along the Rímac River with 624 records made during the years 2010 to 2012. The physical and chemical parameters introduced in the models include pH, turbidity, total dissolved solids, temperature, electrical conductivity, dissolved oxygen, biochemical oxygen demand, chemical oxygen demand, hardness, chloride, sulfate, calcium, magnesium, and nitrate. The dependent variables of the output models include biochemical oxygen demand (BOD) and dissolved oxygen (DO). The independent variables that were selected for the BOD, these were: pH, EC, turbidity, Nitrites, TOC, COD, iron, and chlorides. For DO, they were temperature, Nitrites, COD, Nitrates, STD, Chlorides and Total Solids. Both dependent parameters have 8 independent variables and the highest correlation coefficient values. The models were trained for learning and validation of 70% and 30% of the data set,respectively. The BPNNpresented for the estimation of BOD, with 16 hidden nodes, values of R2= 0.857 for training and 0.481 for the test phase; For the estimation of DO, with 8 hidden nodes, this was R2= 0.768 in training and test phase of 0.605. These values were higher than the MLRand SVR, which showed that the BPNNwas the best selection. Finally, the classification of water quality as Good, Fair and Poor obtained a precision of 0.88 with a sensitivity of 0.86 and an f1-score of 85%, which evidenced its effectiveness when carrying out this process
Translated title of the contribution Aprendizaje automático: comparación de algoritmos para determinar la calidad del agua en el río Rímac
Original languageEnglish
JournalTurkish Journal of Computer and Mathematics Education Vol.12 No.12 (2021), 552-572
Issue number12
StatePublished - 21 May 2021


  • Water quality
  • artificial neural network
  • multiple linear regression
  • support vector regression
  • Rímac river


Dive into the research topics of 'Machine Learning: Comparison of Algorithms for Determining Water Quality in the Rímac River'. Together they form a unique fingerprint.

Cite this