<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Soil Sci.</journal-id>
<journal-title>Frontiers in Soil Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Soil Sci.</abbrev-journal-title>
<issn pub-type="epub">2673-8619</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fsoil.2024.1407502</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Soil Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Groundwater fluoride prediction modeling using physicochemical parameters in Punjab, India: a machine-learning approach</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Kerketta</surname>
<given-names>Anjali</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2684792"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kapoor</surname>
<given-names>Harmanpreet Singh</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2411513"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Sahoo</surname>
<given-names>Prafulla Kumar</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1201828"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Environmental Science and Technology, Central University of Punjab</institution>, <addr-line>Bathinda, Punjab</addr-line>, <country>India</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Mathematics and Statistics, Central University of Punjab</institution>, <addr-line>Bathinda, Punjab</addr-line>, <country>India</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Bifeng Hu, Jiangxi University of Finance and Economics, China</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Zihao Wu, China University of Mining and Technology, China</p>
<p>Xiaolin Jia, North China University of Water Conservancy and Electric Power, China</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Prafulla Kumar Sahoo, <email xlink:href="mailto:prafulla.iitkgp@gmail.com">prafulla.iitkgp@gmail.com</email>
</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>07</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>4</volume>
<elocation-id>1407502</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>03</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>06</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2024 Kerketta, Kapoor and Sahoo</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Kerketta, Kapoor and Sahoo</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Rising fluoride levels in groundwater resources have become a worldwide concern, presenting a significant challenge to the safe utilization of water resources and posing potential risks to human well-being. Elevated fluoride and its vast spatial variability have been documented across different districts of Punjab, India, and it is, therefore, imperative to predict the fluoride levels for efficient groundwater resources planning and management.</p>
</sec>
<sec>
<title>Methods</title>
<p>In this study, five different models, Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (Xgboost), Extreme Learning Machine (ELM), and Multilayer Perceptron (MLP), are proposed to predict groundwater fluoride using the physicochemical parameters and sampling depth as predictor variables. The performance of these five models was evaluated using the coefficient of determination (<italic>R</italic>
<sup>2</sup>), mean absolute error (MAE), and root mean square error (RMSE).</p>
</sec>
<sec>
<title>Results and discussion</title>
<p>ELM outperformed the remaining four models, thus exhibiting a strong predictive power. The <italic>R</italic>
<sup>2</sup>, MAE, and RMSE values for ELM at the training and testing stages were 0.85, 0.46, 0.36 and, 0.95, 0.31, and 0.33, respectively, while other models yielded inferior results. Based on the relative importance scores, total dissolved solids (TDS), electrical conductivity (EC), sodium (Na<sup>+</sup>), chloride (Cl<sup>&#x2212;</sup>), and calcium (Ca<sup>2+</sup>) contributed significantly to model performance. High variability in the target (fluoride) and predictor variables might have led to the poor performance of the models, implying the need for better data pre-processing techniques to improve data quality. Although ELM showed satisfactory results, it can be considered a promising model for predicting groundwater quality.</p>
</sec>
</abstract>
<kwd-group>
<kwd>groundwater fluoride</kwd>
<kwd>machine-learning</kwd>
<kwd>prediction modeling</kwd>
<kwd>Extreme Learning Machine</kwd>
<kwd>physicochemical parameters</kwd>
<kwd>relative importance of variables</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="3"/>
<equation-count count="6"/>
<ref-count count="109"/>
<page-count count="17"/>
<word-count count="9621"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-in-acceptance</meta-name>
<meta-value>Pedometrics</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<label>1</label>
<title>Introduction</title>
<p>Consumption of groundwater with fluoride (F<sup>&#x2212;</sup>) levels between 0.5&#x2013;1.5 mg/L is essential for proper bone and tooth development. However, concentrations exceeding the recommended safe limit of 1.5 mg/L (<xref ref-type="bibr" rid="B1">1</xref>) can cause dental fluorosis (1.5&#x2013;4.0 mg/L), skeletal fluorosis (4.0&#x2013;10.0 mg/L), and several other disorders, including hypertension, renal failure, and cancer (&gt; 10 mg/L) (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B3">3</xref>). Reportedly, elevated F<sup>&#x2212;</sup> levels have already affected over 200 million people in 29 nations, including India (<xref ref-type="bibr" rid="B4">4</xref>). In India, F<sup>&#x2212;</sup> prevalence has been identified in 20 out of 29 states with 66 million inhabitants, including 6 million children, under the grasp of fluorosis (<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B6">6</xref>), with the numbers still expected to rise (<xref ref-type="bibr" rid="B7">7</xref>). Fluoride-bearing minerals, like fluorite, amphibole, mica, apatite, and biotite associated with host rocks like granite, mica, gneisses, etc., are the primary natural sources. Groundwater chemical conditions such as elevated alkalinity, reduced calcium levels, and sodium bicarbonate water type favor dissolution and desorption of metal oxides, causing F<sup>&#x2212;</sup> enrichment. Additionally, arid and semi-arid climatic zones have also reported increased F<sup>&#x2212;</sup> concentrations (<xref ref-type="bibr" rid="B8">8</xref>, <xref ref-type="bibr" rid="B9">9</xref>) due to enhanced cation exchange capacity, dissolution from F<sup>&#x2212;</sup>-bearing minerals and longer groundwater residence times, thereby increasing the interaction between the rock-water interface (<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B11">11</xref>). Besides the natural factors, anthropogenic activities, including phosphate fertilizer application, sewage and sludge dumping, mining, coal combustion, and excess groundwater extraction, also contribute to high F<sup>&#x2212;</sup> levels (<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B12">12</xref>).</p>
<p>Innumerable studies across Punjab have provided an overall picture of the state&#x2019;s groundwater contamination problem. Fluoride concentrations have been reported in all the districts, particularly in the shallow aquifers, with more pronounced levels in the south and southwestern districts. For instance, F<sup>&#x2212;</sup> concentration in this region ranged from 0.1&#x2013;17.5 mg/L in Bathinda, 0.34&#x2013;8.24 mg/L in Fazilka (<xref ref-type="bibr" rid="B13">13</xref>), 0.15&#x2013;11.6 mg/L in Mansa (<xref ref-type="bibr" rid="B14">14</xref>), and 1.5&#x2013;9.2 mg/L in Patiala (<xref ref-type="bibr" rid="B15">15</xref>). Thus, this region has emerged as a hotspot of F<sup>&#x2212;</sup>-contaminated groundwater (<xref ref-type="bibr" rid="B16">16</xref>, <xref ref-type="bibr" rid="B17">17</xref>). The abundance of F<sup>&#x2212;</sup>-bearing minerals, along with agricultural activities and industrial operations in this region, further enhance the contaminant levels in the groundwater system. The region&#x2019;s climate, surface, and sub-surface conditions are conducive to dissolving, mobilizing, and enriching this contaminant in the aquifers. Punjab experiences meagre precipitation, high temperatures, and high evaporation rates linked to high total dissolved solids (TDS)/salinity, particularly in shallow aquifers. The aquifers are oxic and alkaline due to high bicarbonate concentrations. Additionally, the nitrate levels are prominent in shallow waters, probably due to agricultural runoff (<xref ref-type="bibr" rid="B17">17</xref>). All of these hydrochemical factors have a direct influence on F<sup>&#x2212;</sup> concentrations and, therefore, tend to intensify the contamination problem in this region. Hence, it is imperative to develop methodologies by integrating the <italic>in-situ</italic> measured variables from field surveys with other advanced and efficient techniques to strategize sustainable groundwater management plans and establish robust monitoring systems (<xref ref-type="bibr" rid="B18">18</xref>). Field-based groundwater monitoring is labor-intensive and expensive (<xref ref-type="bibr" rid="B19">19</xref>), in addition to the lab-based analytical procedures, which are tedious, complicated, and add a cost burden (<xref ref-type="bibr" rid="B20">20</xref>). In this context, various numerical and physical models, along with geospatial modeling, are often applied to comprehend the groundwater contamination process and the contributing factors (<xref ref-type="bibr" rid="B21">21</xref>, <xref ref-type="bibr" rid="B22">22</xref>). However, these methods require huge datasets and an adequate hydrogeochemical understanding, which are mostly lacking in underdeveloped regions, leading to poor model performance (<xref ref-type="bibr" rid="B18">18</xref>, <xref ref-type="bibr" rid="B23">23</xref>). Furthermore, the difficulty in interpreting the outputs of classical models and poor user-friendliness widen the gap between model creators and users. To bridge this gap, state-of-the-art machine-learning (ML) techniques are now being widely used to predict groundwater contamination.</p>
<p>Machine-learning models have been adopted extensively in the past several years to forecast a variety of contaminants in the groundwater due to their strong algorithms, flexible constraints, and reliable and accurate prediction performance (<xref ref-type="bibr" rid="B24">24</xref>). These techniques can also handle the non-linear relationships between the input and target variables efficiently, proving to be more robust than the conventional methods (<xref ref-type="bibr" rid="B25">25</xref>). Random Forest (RF) classification algorithm is widely used to forecast groundwater-F<sup>&#x2212;</sup> hazard areas globally (<xref ref-type="bibr" rid="B26">26</xref>), regionally (<xref ref-type="bibr" rid="B10">10</xref>), and locally (<xref ref-type="bibr" rid="B27">27</xref>) with an accuracy of 0.89, 0.91, and 0.93, respectively. All of these studies used continuous variables such as climate, soil, geology, and topography for prediction modeling. Contrarily, limited studies have considered water quality parameters for predicting F<sup>&#x2212;</sup> concentrations. The regression-based modeling for groundwater fluoride prediction using hydrogeochemical variables obtained superior accuracy for RF (&gt; 0.89) over logistic regression (LR) and artificial neural network (ANN) (<xref ref-type="bibr" rid="B28">28</xref>). Groundwater fluoride was also estimated using LR, ANN, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN), where KNN and SVM performed better than the other models (<xref ref-type="bibr" rid="B29">29</xref>). Gupta and Maiti (<xref ref-type="bibr" rid="B30">30</xref>) compared six ML models, gaussian process (GP), long short term memory (LSTM), Extreme Learning Machine (ELM), Multilayer Perceptron (MLP), RF, and SVM. All the models achieved an overall accuracy of &gt; 0.85, implying satisfactory prediction capability. In another study, ELM outperformed MLP and SVM in predicting F<sup>&#x2212;</sup> concentration (<xref ref-type="bibr" rid="B31">31</xref>). Furthermore, Nafouanti et&#xa0;al. (<xref ref-type="bibr" rid="B32">32</xref>) compared the prediction performance of RF, Extreme Gradient Boosting (Xgboost), Light Gradient Boosting (LightGBM), and Hybrid Random Forest Linear Model (HRFLM) estimating the F<sup>&#x2212;</sup> levels in the Datong basin, China. They achieved an overall accuracy of &gt; 0.88 for all the models. These outcomes indicate that different ML models may give distinctive predicted outcomes when tested for the same dataset (<xref ref-type="bibr" rid="B30">30</xref>). Moreover, predictive modeling will aid in the early detection of the contamination, further help undertake remedial steps and allocate resources to prevent pollution efficiently. In this regard, the present study compares the performance of five different ML models, including the commonly used RF model, to predict groundwater F<sup>&#x2212;</sup> contamination using the water quality parameters as potential predictor variables.</p>
<p>Groundwater F<sup>&#x2212;</sup> contamination is a typical phenomenon in arid and semi-arid zones like Punjab. However, the lack of monitoring programs for F<sup>&#x2212;</sup> estimation in this region poses a possible health risk for humans from drinking contaminated water. In addition, the study area opted for this work lacks ML-based prediction studies for groundwater F<sup>&#x2212;</sup> estimation. Therefore, with this in view, five ML models with distinct algorithms were chosen to predict F<sup>&#x2212;</sup> levels in the groundwater. Several researchers have thoroughly tested the selected models, and we attempted to replicate them using our results for our study region. The objective of the current work is to determine the most suitable predictive model that can be applied to predict F<sup>&#x2212;</sup> concentration in groundwater of the Punjab, India. Henceforth, the performance evaluation and comparison of RF, SVM, Xgboost, ELM, and MLP was performed using hydrogeochemical variables commonly estimated from the study area. The influence of different predictor variables on the model performance was also assessed to identify the most significant water quality parameters responsible for groundwater F<sup>&#x2212;</sup> contamination in Punjab. Based on these parameters, the best-performing model can aid in optimizing data collection, transmission, and analysis time, resulting in a rapid resolution to the contamination problem. This effort will be beneficial in determining the possible F<sup>&#x2212;</sup> levels with the help of physicochemical parameters in locations lacking regular groundwater quality monitoring. This information will provide new research directions and help develop management plans to boost the availability of safe drinking water in the region.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Study area description</title>
<p>The north-western state of India, Punjab, is 200 meters above mean sea level and comprises an area of 50,362 km<sup>2</sup>. It stretches between latitudes 29&#xb0; 32&#x2032;&#x2013;32&#xb0; 28&#x2032; N and longitudes 73&#xb0; 50&#x2032;&#x2013;77&#xb0; 00&#x2019; E, sharing boundaries with Pakistan on the west, Jammu and Kashmir on the north, Himachal Pradesh on the northeast, Haryana and Rajasthan on the south. Punjab is further subdivided into the Malwa region, consisting of 11 districts of the south and southwest, the northern sub-mountainous region of Majha, and the semi-arid central plains of Doaba (<xref ref-type="bibr" rid="B33">33</xref>). The state has three major rivers, Sutlej, Beas, and Ravi, and an extensive irrigation canal system widely used for crop irrigation. Approximately 86% of the state comprises agricultural land (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1A</bold>
</xref>) (<xref ref-type="bibr" rid="B37">37</xref>), with paddy and cotton as principal Kharif crops and wheat as the major rabi crop cultivated in the region. The climate varies from semi-humid to semi-arid type in the north, while arid conditions are prominent in the southern and southwestern districts. The rest of the state experiences semi-arid conditions (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1B</bold>
</xref>). The overall temperature in this region ranges from 5&#x2013;50 degrees Celsius with hot summers starting from mid-April and cold winter months from December to February. Punjab lies on a flat alluvial plain of the Indo-Gangetic basin (IGB) surrounded by Quaternary sediments deposited by the Indus River and its tributaries. These sediments constitute a continuous groundwater system forming the north-western portion of the IGB aquifers. The aquifers in the central districts experience the maximum hydraulic conductivity (approximately 10&#x2013;90 m/day) and the minimum in the southwestern region (4&#x2013;8 m/day). The soil is primarily loose, consisting of sand and calcareous materials, gravel, silt, and clay. Kankar, a nodular structure of impure calcium carbonate, is often found 60&#x2013;200 cm underneath the surface and sporadically at the surface of some agricultural lands (<xref ref-type="bibr" rid="B38">38</xref>). The groundwater is found in partially confined/confined deeper aquifers and unconfined shallow aquifers fed by rainfall and canal water (<xref ref-type="bibr" rid="B16">16</xref>, <xref ref-type="bibr" rid="B39">39</xref>, <xref ref-type="bibr" rid="B40">40</xref>), with north and central districts having fresh groundwater and the southwestern region dominated by saline groundwater. The elevated mountains and hill regions in the northern and northeastern Punjab are responsible for groundwater recharge from where the water flows towards the lower elevation areas in southwest regions (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1C</bold>
</xref>). Therefore, southwestern districts such as Bathinda, Muktsar, Fazilka, and Ferozepur have shallow groundwater and often experience water-logging and highly saline soil conditions, resulting from evaporation of canal water and continual movement of water from canals and distributaries (<xref ref-type="bibr" rid="B16">16</xref>). Punjab receives most precipitation from July to September from the southwest monsoon, which ultimately aids in the replenishment of the groundwater table (<xref ref-type="bibr" rid="B16">16</xref>). The rainfall varies from 800&#x2013;1,200 mm in the north and 400&#x2013;800 mm in the central plains, with the lowest of &lt; 400 mm in the southwestern region (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1D</bold>
</xref>).</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>
<bold>(A)</bold> Location map of Punjab showing the land-use and land cover distribution in 2021 (<xref ref-type="bibr" rid="B34">34</xref>) along with the fluoride data points at high and low concentrations; <bold>(B)</bold> aridity (2002); <bold>(C)</bold> elevation (<xref ref-type="bibr" rid="B35">35</xref>); <bold>(D)</bold> rainfall map of Punjab (<xref ref-type="bibr" rid="B36">36</xref>).</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fsoil-04-1407502-g001.tif"/>
</fig>
</sec>
<sec id="s3" sec-type="materials|methods">
<label>3</label>
<title>Materials and methodology</title>
<sec id="s3_1">
<label>3.1</label>
<title>Data collection, database creation, and processing</title>
<p>Groundwater quality analyses from across the entire state of Punjab were retrieved from various published reports and research articles. We collected a total of 17,317 F<sup>&#x2212;</sup> observations: 1,705 data points from Central Groundwater Board (CGWB), 433 observations from Central University of Punjab (CUPB), 745 observations from Duggal and Sharma (<xref ref-type="bibr" rid="B41">41</xref>), 11,226 from Khattak et&#xa0;al. (<xref ref-type="bibr" rid="B8">8</xref>), 59 observations from Sharma et&#xa0;al. (<xref ref-type="bibr" rid="B42">42</xref>), 38 observations from British Geological Survey (BGS) (<xref ref-type="bibr" rid="B43">43</xref>), and 3,111 from Department of water supply and sanitation, Punjab government. Besides F<sup>&#x2212;</sup>, the availability of groundwater physicochemical parameters such as pH, Electrical Conductivity (EC), Total Dissolved Solids (TDS), Chloride (Cl<sup>&#x2212;</sup>), Nitrate (NO<sub>3</sub>
<sup>&#x2212;</sup>), Sulphate (SO<sub>4</sub>
<sup>2&#x2212;</sup>), Phosphate (PO<sub>4</sub>
<sup>3&#x2212;</sup>), Bicarbonate (HCO<sub>3</sub>
<sup>&#x2212;</sup>), Sodium (Na<sup>+</sup>), Potassium (K<sup>+</sup>), Calcium (Ca<sup>2+</sup>), and Magnesium (Mg<sup>2+</sup>) is essential for model development. In conjunction with the groundwater quality determinants, the depth of the collected samples was also considered an important parameter for this study. These variables were selected based on their established or suspected association with the discharge and accumulation of F<sup>&#x2212;</sup> in groundwater and were further used to screen the data.</p>
<p>Although these attributes are often measured during groundwater monitoring assessments, some were missing from the datasets collected from different sources. Data collected from research papers (<xref ref-type="bibr" rid="B8">8</xref>, <xref ref-type="bibr" rid="B41">41</xref>, <xref ref-type="bibr" rid="B42">42</xref>) did not contain all of this information and, hence, was not included in the final database. In comparison, almost all variables were present in data collected from BGS, CUPB, and CGWB. Although CGWB data was collected from 2013&#x2013;2015 and 2018&#x2013;2020 (<xref ref-type="bibr" rid="B44">44</xref>&#x2013;<xref ref-type="bibr" rid="B49">49</xref>), only the recent data of the year 2020 was used for prediction modeling. Also, the observations from locations monitored in the previous years but not in 2020 were considered for prediction modeling. This resulted in a total of 298 observations from CGWB. Furthermore, CUPB data consisted of some samples collected from canals and other surface water sources, which were excluded, resulting in a total of 420 data points. Besides the water quality variables and sampling depth, information on the geographical coordinates of the sampling locations was also considered as an essential screening criterion. For sampling points lacking the georeferenced location, Google Earth Pro was used to determine the same by using the name of the sampling location. Therefore, the final database that was finally used for creating a suitable prediction model for groundwater F<sup>&#x2212;</sup> concentration contained a total of 756 data points from CGWB, CUPB, and BGS (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1A</bold>
</xref>, <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>). The final data was also classified into two classes as per the depth at which the samples were collected (optimum depth was considered to be 60 m) (<xref ref-type="bibr" rid="B52">52</xref>, <xref ref-type="bibr" rid="B53">53</xref>).</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Detailed information of the groundwater fluoride dataset compiled from different sources.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Data source</th>
<th valign="top" align="left">Total data points</th>
<th valign="top" align="left">Districts covered</th>
<th valign="top" align="left">Max./median concentration (mg/L)</th>
<th valign="top" align="left">Year</th>
<th valign="top" align="left">Reference(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Central Ground Water Board (CGWB)</td>
<td valign="top" align="left">298</td>
<td valign="top" align="left">All</td>
<td valign="top" align="left">9.2/0.6</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B48">48</xref>)</td>
</tr>
<tr>
<td valign="top" align="left">Central University of Punjab (CUPB)</td>
<td valign="top" align="left">420</td>
<td valign="top" align="left">Barnala, Bathinda, Fatehgarh Sahib, Fazilka, Ludhiana, Roop Nagar, SBS Nagar</td>
<td valign="top" align="left">2.59/0.57</td>
<td valign="top" align="left">2016</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B13">13</xref>, <xref ref-type="bibr" rid="B50">50</xref>, <xref ref-type="bibr" rid="B51">51</xref>)</td>
</tr>
<tr>
<td valign="top" align="left">British Geological Survey (BGS)</td>
<td valign="top" align="left">38</td>
<td valign="top" align="left">Hoshiarpur, Jalandhar, Kapurthala, SBS Nagar</td>
<td valign="top" align="left">5.76/0.62</td>
<td valign="top" align="left">2016</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B43">43</xref>)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The values of some of the abovementioned variables were missing in the final dataset, which was estimated with the help of standard formulas. One such missing parameter was TDS in CGWB data that was determined using the following formula (<xref ref-type="disp-formula" rid="eq1">Equations 1</xref>, <xref ref-type="disp-formula" rid="eq2">2</xref>) (<xref ref-type="bibr" rid="B54">54</xref>):</p>
<disp-formula id="eq1">
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mi>T</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>S</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mi>L</mml:mi>
</mml:mfrac>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>p</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>C</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>640</mml:mn>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>E</mml:mi>
<mml:mi>C</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>f</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mn>0.1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>5</mml:mn>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="eq2">
<label>(2)</label>
<mml:math display="block" id="M2">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mi>T</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>S</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mi>L</mml:mi>
</mml:mfrac>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>p</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>C</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>800</mml:mn>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>E</mml:mi>
<mml:mi>C</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>g</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mn>5</mml:mn>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Similarly, a few other parameters were reported below their respective detection limit (BDL). These values were then replaced by dividing the BDL value by two. Furthermore, the values of all the parameters were converted to their respective similar unit to ensure uniformity in the dataset. All the anions, cations and TDS were represented in mg/L, EC in &#xb5;S/cm, and depth in meters, while pH is unitless. Of the final 756 F<sup>&#x2212;</sup> measurements, 609 (~81%) were under the permissible limit of 1.5 mg/L, 100 (~13%) ranged from 1.5&#x2013;3 mg/L, and the remaining 47 (~6%) were greater than 3 mg/L.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Groundwater hydrochemical characterization and depth distribution</title>
<p>To understand the nature of the distribution of groundwater F<sup>&#x2212;</sup> levels and their corresponding physicochemical parameters at different depths, graphical and statistical inference methods were adopted. The compiled dataset was characterized by enumerating its descriptive statistics (minimum, maximum, mean, median, coefficient of variation, first and third quantiles, and percentage of samples exceeding the respective permissible limits). The normality for all the variables was tested using Kolmogorov&#x2013;Smirnov test. Testing whether the data is normally distributed is necessary, especially for geochemical and other environmental data, because they are generally skewed, consisting of outliers and originating from varied sources (<xref ref-type="bibr" rid="B55">55</xref>). Normality testing further aided in selecting the appropriate statistical treatments for the data. Since most parameters are not normally distributed, Spearman&#x2019;s rank correlation coefficient was enumerated to identify the potential associations of the F<sup>&#x2212;</sup> concentrations with the concurrently evaluated physicochemical attributes and sampling depth. All the statistical analyses and graphical plotting were performed in the R software version 4.3.2.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Model description and development</title>
<p>Machine learning algorithms were further applied to uncover the hidden patterns between the compiled F<sup>&#x2212;</sup> concentrations and the physicochemical variables and well depth and develop an optimized model for predicting F<sup>&#x2212;</sup> concentration in the study domain. In this study, F<sup>&#x2212;</sup> is the output or target (<italic>y</italic>) variable that will be determined using the input or predictor (<italic>x</italic>) variables, i.e., the abovementioned physicochemical attributes and the sampling well depth. Five different machine learning models, i.e., Extreme Gradient Boosting (Xgboost), Random Forest (RF), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Multilayer Perceptron (MLP), were implemented and tested on the final dataset. All of these models have been frequently used in the literature for groundwater-based investigations and, hence, considered for groundwater F<sup>&#x2212;</sup> prediction modeling. R software version 4.3.2 was used to develop these proposed models. Before the implementation of these models, a pre-processing step was involved in which data standardization was performed using the <italic>Z</italic>-score method with the following formula (<xref ref-type="disp-formula" rid="eq3">Equation 3</xref>) (<xref ref-type="bibr" rid="B56">56</xref>):</p>
<disp-formula id="eq3">
<label>(3)</label>
<mml:math display="block" id="M3">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#xb4;</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#xb4;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> denotes the standardized <italic>i</italic>
<sup>th</sup> variable, <inline-formula>
<mml:math display="inline" id="im2">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the <italic>i</italic>
<sup>th</sup> variable, <inline-formula>
<mml:math display="inline" id="im3">
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> denotes standard deviation, and <inline-formula>
<mml:math display="inline" id="im4">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> is the mean. Following standardization, the entire dataset was randomly shuffled, and a cross-validation technique was employed to further split the data for training and testing the model. 80% of the data was used for training the model, and the remaining 20% was used for validation.</p>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Random Forest</title>
<p>Random Forest is one of the widely recognized and extensively implemented ensemble machine learning methods that has successfully solved real-world issues (<xref ref-type="bibr" rid="B56">56</xref>). RF algorithm generates numerous decision trees (hence called a &#x2018;forest&#x2019;), each of which is built from a random subsample of the data used to train the model (and therefore, the name &#x2018;random&#x2019;). The algorithm uses the bootstrapping method to select samples randomly, thereby using different combinations of the information in the training dataset. This aids in reducing the semblance among the trees, ultimately making the model more robust. The remaining samples of the input sub-sample set used to train the model are referred to as &#x2018;out of bag&#x2019; samples or OOB samples that are utilized for internal cross-validation of the trained model (<xref ref-type="bibr" rid="B57">57</xref>). Furthermore, the model opts for a random subset of independent or predictor variables in order to split the data at each node for growing an unbiased tree. The creation of several trees and considering the average number of decisions made for these trees minimizes the problem of overfitting, which is an issue when considering a single decision tree. This aggregation of decisions from different trees enhances the generalization capacity of the Random Forest model (<xref ref-type="bibr" rid="B58">58</xref>). To develop an effective RF model, optimization of two hyperparameters, i.e., the total number of trees and the least number of leaf sizes, is required. For this work, four physicochemical parameters as predictor variables were used at each node split. These predictor variables were split by applying a curvature test to grow an unbiased tree. The decision trees count ranged from 1 to 500, and the random search approach was employed for determining the minimum number of leaf sizes.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Support Vector Machine</title>
<p>Support Vector Machine is a structural risk minimization-based statistical learning method that was first proposed by V. N. Vapnik (<xref ref-type="bibr" rid="B59">59</xref>). In contrast to the neural network (NN) technique, which may have overfitting and generalization issues, the upper limit of extended risk is reduced in SVM, which enhances its generalization capability (<xref ref-type="bibr" rid="B60">60</xref>). Instead of considering a two-dimensional plane, SVM employs hyper-planes to specify decision boundaries between the data observations of distinct classes by using a kernel method (<xref ref-type="bibr" rid="B61">61</xref>). An in-depth explanation of the SVM model is given by Ceryan et&#xa0;al. (<xref ref-type="bibr" rid="B62">62</xref>). In this study, several kernel functions, such as polynomial, linear, and radial basis kernel functions, were tested, and the best-performing kernel was further selected for prediction. In our study, epsilon value, gamma, and cost were the hyperparameters selected for this model. The epsilon value influences the number of support vectors, which lowers the chances of the model overfitting. In this study, the hyperparameters were optimized by the Bayesian optimization technique, where the Epsilon value was searched in the range of (10<sup>&#x2212;3</sup>, 10<sup>2</sup>) and box constraints in the range of (10<sup>&#x2212;3</sup>, 10<sup>3</sup>).</p>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Extreme Gradient Boosting</title>
<p>Chen et&#xa0;al. (<xref ref-type="bibr" rid="B63">63</xref>) developed the Xgboost model, which is an advanced and improved version of the gradient-boosting machine (GBM). As compared to GBM, Xgboost has a faster learning speed and higher accuracy. It can be employed for both classification and regression problems. It is an ensemble method composed of numerous decision trees where the data splits according to the features. The prediction errors of previous trees are rectified by the addition of new trees for model fitting. Based on the values of the input parameters, each sample is allocated to a set of leaves in a tree that each have a certain numerical weight. The model&#x2019;s projected output for a particular sample is calculated by adding the sum of the leaves allocated to that sample for each regression tree (<xref ref-type="bibr" rid="B64">64</xref>). Step-wise information about Xgboost is provided by Osman et al. (<xref ref-type="bibr" rid="B65">65</xref>). In order to achieve better modeling performance and prediction efficiency, it is essential to calculate the optimization parameters. For this study, four hyperparameter algorithms were applied such as Grid Search, Adaptive Random Search, Genetic Algorithm, and Bayesian Optimization, for optimizing the model parameters (nround, eta, lambda, and alpha).</p>
</sec>
<sec id="s3_3_4">
<label>3.3.4</label>
<title>Extreme Learning Machine</title>
<p>ELM is one of the most commonly used ML models due to its incredibly quick learning speed and ability to achieve the minimum training error with the smallest weight norm (<xref ref-type="bibr" rid="B66">66</xref>). It is being frequently utilized in various scientific domains such as picture recognition, text classification, biomedicine, environmental forecasting, and others (<xref ref-type="bibr" rid="B67">67</xref>&#x2013;<xref ref-type="bibr" rid="B69">69</xref>). ELM is a feedforward neural network that has a single hidden layer between an input layer and an output layer with a strong generalization capacity. Interconnected networks or neurons link the input and hidden layers and also the hidden and output layers. The input weights and biases are generated randomly during the training stage, while the least-square method determines the output weights. Consequently, output weights are established analytically, and therefore, the model is generalized efficiently (<xref ref-type="bibr" rid="B66">66</xref>). The performance of this model can be enhanced by optimizing the number of neurons of the intermediate hidden layer and the activation function. In this study, the optimized count of hidden layer neurons was determined by increasing from 1 until the best model was obtained (<xref ref-type="bibr" rid="B70">70</xref>). In this study, the activation functions such as rectified linear unit, sigmoid, hard-limit, triangular basis, radial-basis, satlins, and tansig were explored, and the function performing optimally was selected to build the ELM model.</p>
</sec>
<sec id="s3_3_5">
<label>3.3.5</label>
<title>Multilayer Perceptron</title>
<p>Multilayer Perceptron or MLP model is among the most popular neural network models that mimic the human brain for decision-making and problem-solving (<xref ref-type="bibr" rid="B71">71</xref>). A comprehensive explanation of the entire model is described by Haykin (<xref ref-type="bibr" rid="B64">64</xref>). However, in a nutshell, this model&#x2019;s structure is composed of an input and output layer with one or more intermediate layers known as hidden layers. The input layer consists of source nodes or neurons that transfer input information to the subsequent hidden layer. Similarly, the hidden layer(s) computes the information provided by the units in the input layer and distributes it further to the output layer. All the input signals are processed by the neurons of hidden and output layers by assigning weights to them. Also, an extra unit known as a bias node is attached to each layer, which primarily generates a signal as an output to the neurons of the current layer. Weights are applied to each input node, which is further integrated and processed by a transfer function that regulates the signal strength discharged through the output nodes (<xref ref-type="bibr" rid="B72">72</xref>). Among the various activation functions in MLP architecture, the most frequently used, i.e., the sigmoid activation function, was considered in this study (<xref ref-type="bibr" rid="B73">73</xref>). MLP was developed based on a back-propagation technique of the Levenberg&#x2013;Marquardt (LM) algorithm, which, on further training, acquired the bias and optimal weight (<xref ref-type="bibr" rid="B74">74</xref>). The random search method was applied to tune the learning rate of the LM algorithm, which ranged from 0.1&#x2013;0.9. In this study, a single hidden layer was used to build the MLP model. Also, since the hidden neurons&#x2019; count is considered a significant factor in MLP architecture, it was also optimized to prevent the model from overfitting. The number of neurons was tuned by increasing from unity until the model was optimized (<xref ref-type="bibr" rid="B70">70</xref>).</p>
</sec>
<sec id="s3_3_6">
<label>3.3.6</label>
<title>Model performance evaluation</title>
<p>The performance of ML models adopted for groundwater F<sup>&#x2212;</sup> prediction was assessed using three measures: coefficient of determination (<italic>R<sup>2</sup>
</italic>), root mean square error (RMSE), and mean absolute error (MAE) (<xref ref-type="bibr" rid="B31">31</xref>, <xref ref-type="bibr" rid="B75">75</xref>). <italic>R<sup>2</sup>
</italic> shows the degree of correlation between two linearly related variables. If the value is close to 1, it indicates a good correlation between the predicted and observed values. Contrarily, RMSE and MAE values close to zero would indicate an excellent fit between the predicted and observed values. The equations for all the three statistical performance measures are provided as follows (<xref ref-type="disp-formula" rid="eq4">Equations 4</xref>&#x2013;<xref ref-type="disp-formula" rid="eq6">6</xref>) (<xref ref-type="bibr" rid="B76">76</xref>, <xref ref-type="bibr" rid="B77">77</xref>):</p>
<disp-formula id="eq4">
<label>(4)</label>
<mml:math display="block" id="M4">
<mml:mrow>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>O</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>O</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="eq5">
<label>(5)</label>
<mml:math display="block" id="M5">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="eq6">
<label>(6)</label>
<mml:math display="block" id="M6">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>N</italic> is the total number of observed data, predicted and observed values are denoted by <inline-formula>
<mml:math display="inline" id="im5">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im6">
<mml:mrow>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, respectively, and the average of the predicted and observed values are given as <inline-formula>
<mml:math display="inline" id="im7">
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im8">
<mml:mover accent="true">
<mml:mi>O</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:math>
</inline-formula>, respectively. The entire methodology has been summarized in <xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>.</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>Flow diagram demonstrating the methodology applied for groundwater fluoride database creation and the comparison of different machine-learning models to predict fluoride concentration.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fsoil-04-1407502-g002.tif"/>
</fig>
</sec>
<sec id="s3_3_7">
<label>3.3.7</label>
<title>Determination of variable importance</title>
<p>The influence of different explanatory variables on the model&#x2019;s performance was determined using the &#x2018;<italic>varImp</italic>&#x2019; function of the &#x2018;<italic>caret</italic>&#x2019; package in the R environment. This commonly used function helps rank all the input variables with a standardized measure of importance ranging from 0&#x2013;100%.</p>
</sec>
</sec>
</sec>
<sec id="s4" sec-type="results">
<label>4</label>
<title>Results</title>
<sec id="s4_1">
<label>4.1</label>
<title>Hydrochemical characterization</title>
<p>Knowledge of the hydrochemical conditions of groundwater is indispensable for identifying potential contaminants to safeguard human health. The descriptive statistics for summarizing the hydrochemical characteristics of the groundwater samples compiled from different sources for this study are presented in <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>. The pH value for the entire dataset ranged from 6.0 to 9.1 (<xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>), with median values of 8.05 and 7.33 at depths within and exceeding 60 meters (<xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>), respectively, indicating the predominance of alkaline conditions in the aquifers of this region. Likewise, EC and TDS ranged from 41&#x2013;16,760 &#x3bc;S cm<sup>&#x2212;1</sup> and 29&#x2013;13,408 mg/L (<xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>), respectively, with median values higher in shallow (809 &#x3bc;S cm<sup>&#x2212;1</sup> and 531.6 mg L<sup>&#x2212;1</sup>, respectively) than in deeper (601 &#x3bc;S cm<sup>&#x2212;1</sup> and 419.2 mg L<sup>&#x2212;1</sup>, respectively) waters (<xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>). According to Freeze and Cherry&#x2019;s groundwater classification (<xref ref-type="bibr" rid="B78">78</xref>), shallow groundwaters in Punjab can be majorly considered as brackish (1000 &lt; TDS &lt; 10000 mg L<sup>&#x2212;1</sup>), while deeper waters are classified as freshwater (TDS &lt;1000 mg L<sup>&#x2212;1</sup>). Furthermore, the results also show the occurrence of both cations and anions in excess, particularly in shallow depths. Dominant cations in shallow groundwater include Ca<sup>2+</sup>, Mg<sup>2+</sup>, Na<sup>+</sup>, and K<sup>+</sup>, and anions such as Cl<sup>&#x2212;</sup>, NO<sub>3</sub>
<sup>&#x2212;</sup>, SO<sub>4</sub>
<sup>2&#x2212;</sup>, F<sup>&#x2212;</sup>, and HCO<sub>3</sub>
<sup>&#x2212;</sup>, whereas, in deeper waters were Mg<sup>2+</sup>, Na<sup>+</sup>, Cl<sup>&#x2212;</sup>, and HCO<sub>3</sub>
<sup>&#x2212;</sup>. Based on overall median concentrations, the cations and anions were arranged in the following order: Na<sup>+</sup> &gt; Mg<sup>2+</sup> &gt; Ca<sup>2+</sup> &gt; K<sup>+</sup> and HCO<sub>3</sub>
<sup>&#x2212;</sup> &gt; Cl<sup>&#x2212;</sup> &gt; SO<sub>4</sub>
<sup>2&#x2212;</sup> &gt; NO<sub>3</sub>
<sup>&#x2212;</sup> &gt; F<sup>&#x2212;</sup> &gt; PO<sub>4</sub>
<sup>3&#x2212;</sup>, respectively. Furthermore, it was also observed that the median concentrations of Cl<sup>&#x2212;</sup>, NO<sub>3</sub>
<sup>&#x2212;</sup>, SO<sub>4</sub>
<sup>2&#x2212;</sup>, Mg<sup>2+</sup>, Na<sup>+</sup>, Ca<sup>2+</sup>, and K<sup>+</sup> were also elevated in shallow waters compared to deeper waters. The median values of Mg<sup>2+</sup> and F<sup>&#x2212;</sup> were slightly higher in deeper waters than in shallow aquifers (Mg<sup>2+</sup>: 35 and 34.39 mg L<sup>&#x2212;1</sup>, respectively; F<sup>&#x2212;</sup>: 0.71 and 0.57 mg L<sup>&#x2212;1</sup>, respectively). Similarly, the median concentration of HCO<sub>3</sub>
<sup>&#x2212;</sup> was much higher in deeper waters than in shallow waters. Therefore, it is evident that the majority of the ions, along with other water quality parameters, are in excess in the shallow aquifers than the deeper groundwater (<xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>).</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>Descriptive statistics for characterization of fluoride (F<sup>&#x2212;</sup>) and concurrently measured physicochemical variables in the alluvial aquifer of Punjab.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="middle" align="center">Variable</th>
<th valign="middle" align="center">Min</th>
<th valign="middle" align="center">Max</th>
<th valign="middle" align="center">Mean</th>
<th valign="middle" align="center">Median</th>
<th valign="middle" align="center">CV (%)</th>
<th valign="middle" align="center">Q1</th>
<th valign="middle" align="center">Q3</th>
<th valign="middle" align="center">% exceeding (WHO limit) (<xref ref-type="bibr" rid="B1">1</xref>)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="left">Depth (meter)</td>
<td valign="middle" align="center">0.34</td>
<td valign="middle" align="center">518.29</td>
<td valign="middle" align="center">42.24</td>
<td valign="middle" align="center">28.12</td>
<td valign="middle" align="center">105.87</td>
<td valign="middle" align="center">11.60</td>
<td valign="middle" align="center">60.97</td>
<td valign="middle" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="middle" align="left">pH</td>
<td valign="middle" align="center">6</td>
<td valign="middle" align="center">9.1</td>
<td valign="middle" align="center">7.75</td>
<td valign="middle" align="center">7.61</td>
<td valign="middle" align="center">8.29</td>
<td valign="middle" align="center">7.2</td>
<td valign="middle" align="center">8.35</td>
<td valign="middle" align="center">- (6.5&#x2013;8.5)</td>
</tr>
<tr>
<td valign="middle" align="left">EC (&#x3bc;S cm<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">41</td>
<td valign="middle" align="center">16760</td>
<td valign="middle" align="center">1092.72</td>
<td valign="middle" align="center">715</td>
<td valign="middle" align="center">112.01</td>
<td valign="middle" align="center">507</td>
<td valign="middle" align="center">1218</td>
<td valign="middle" align="center">32% (1,000)</td>
</tr>
<tr>
<td valign="middle" align="left">TDS (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">29</td>
<td valign="middle" align="center">13408</td>
<td valign="middle" align="center">745.56</td>
<td valign="middle" align="center">486.76</td>
<td valign="middle" align="center">123.98</td>
<td valign="middle" align="center">345.6</td>
<td valign="middle" align="center">808</td>
<td valign="middle" align="center">48% (500)</td>
</tr>
<tr>
<td valign="middle" align="left">Cl<sup>&#x2212;</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">1.61</td>
<td valign="middle" align="center">4023</td>
<td valign="middle" align="center">104.75</td>
<td valign="middle" align="center">42</td>
<td valign="middle" align="center">221.28</td>
<td valign="middle" align="center">21</td>
<td valign="middle" align="center">97.46</td>
<td valign="middle" align="center">13% (200)</td>
</tr>
<tr>
<td valign="middle" align="left">NO<sub>3</sub>
<sup>&#x2212;</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.5</td>
<td valign="middle" align="center">1448</td>
<td valign="middle" align="center">41.82</td>
<td valign="middle" align="center">16</td>
<td valign="middle" align="center">246.05</td>
<td valign="middle" align="center">3.08</td>
<td valign="middle" align="center">38</td>
<td valign="middle" align="center">20% (50)</td>
</tr>
<tr>
<td valign="middle" align="left">SO<sub>4</sub>
<sup>2&#x2212;</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.05</td>
<td valign="middle" align="center">3354</td>
<td valign="middle" align="center">124.98</td>
<td valign="middle" align="center">36.03</td>
<td valign="middle" align="center">233.78</td>
<td valign="middle" align="center">12.32</td>
<td valign="middle" align="center">102.39</td>
<td valign="middle" align="center">26% (100)</td>
</tr>
<tr>
<td valign="middle" align="left">HCO<sub>3</sub>
<sup>&#x2212;</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">49</td>
<td valign="middle" align="center">1025</td>
<td valign="middle" align="center">350.46</td>
<td valign="middle" align="center">342</td>
<td valign="middle" align="center">45.88</td>
<td valign="middle" align="center">215.5</td>
<td valign="middle" align="center">452</td>
<td valign="middle" align="center">38% (400)</td>
</tr>
<tr>
<td valign="middle" align="left">Na<sup>+</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.05</td>
<td valign="middle" align="center">2200</td>
<td valign="middle" align="center">141.83</td>
<td valign="middle" align="center">68</td>
<td valign="middle" align="center">160.51</td>
<td valign="middle" align="center">33.72</td>
<td valign="middle" align="center">158.13</td>
<td valign="middle" align="center">20% (200)</td>
</tr>
<tr>
<td valign="middle" align="left">K<sup>+</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.05</td>
<td valign="middle" align="center">467</td>
<td valign="middle" align="center">16.94</td>
<td valign="middle" align="center">6.48</td>
<td valign="middle" align="center">285.77</td>
<td valign="middle" align="center">3.67</td>
<td valign="middle" align="center">9.84</td>
<td valign="middle" align="center">19% (12)</td>
</tr>
<tr>
<td valign="middle" align="left">Ca<sup>2+</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.05</td>
<td valign="middle" align="center">493</td>
<td valign="middle" align="center">64.98</td>
<td valign="middle" align="center">45</td>
<td valign="middle" align="center">92.86</td>
<td valign="middle" align="center">20</td>
<td valign="middle" align="center">100</td>
<td valign="middle" align="center">3% (200)</td>
</tr>
<tr>
<td valign="middle" align="left">Mg<sup>2+</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.05</td>
<td valign="middle" align="center">900</td>
<td valign="middle" align="center">125.8</td>
<td valign="middle" align="center">62.33</td>
<td valign="middle" align="center">104.69</td>
<td valign="middle" align="center">27</td>
<td valign="middle" align="center">215</td>
<td valign="middle" align="center">54% (50)</td>
</tr>
<tr>
<td valign="middle" align="left">F<sup>&#x2212;</sup> (mg L<sup>&#x2212;1</sup>)</td>
<td valign="middle" align="center">0.02</td>
<td valign="middle" align="center">9.2</td>
<td valign="middle" align="center">0.99</td>
<td valign="middle" align="center">0.6</td>
<td valign="middle" align="center">108.73</td>
<td valign="middle" align="center">0.38</td>
<td valign="middle" align="center">1.09</td>
<td valign="middle" align="center">19% (1.5)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Min, minimum; Max, maximum; StDev, standard deviation; Q1 and Q3, First and third quartile, respectively; CV, coefficient of variation; TDS, total dissolved solids; EC, electrolytic conductivity.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>Box plots showing the variations in groundwater physicochemical parameters and F<sup>&#x2212;</sup> concentration in shallow (&lt; 60 meters) and deeper (&gt; 60 meters) aquifers.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fsoil-04-1407502-g003.tif"/>
</fig>
<p>All the hydrochemical parameters except pH displayed a greater degree of coefficient of variation [CV (%)] (<xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>), as well. This clearly implies a wide range of variability within each of the water quality parameters in the study region, arising from various natural sub-surface and surface phenomena and anthropogenic influences. In addition to this, the physicochemical attributes of most of the groundwater samples, particularly sampled from the shallow aquifers, exceeded the recommended safe limit by the World Health Organization (<xref ref-type="bibr" rid="B1">1</xref>) (<xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>). The overall percentage of samples exceeding their respective permissible limit for each parameter is as follows (total % exceeding/% exceeding in shallow groundwater samples): EC: 32%/29%; TDS: 48%/40%; Cl<sup>&#x2212;</sup>: 13%/12%; NO<sub>3</sub>
<sup>&#x2212;</sup>: 20%/18%; SO<sub>4</sub>
<sup>2&#x2212;</sup>: 26%/24%; HCO<sub>3</sub>
<sup>&#x2212;</sup>: 38%/24%; Na<sup>+</sup>: 20%/18%; K<sup>+</sup>: 19%/17%; Ca<sup>2+</sup>: 3%/2.6%; Mg<sup>2+</sup>: 54%/34%; F<sup>&#x2212;</sup>: 19%/15%.</p>
<p>Furthermore, the Kolmogorov&#x2013;Smirnov test verified that all the groundwater quality variables did not follow a normal distribution. In shallow waters, F<sup>&#x2212;</sup> had a weak to moderate positive correlation with almost all the variables except Ca<sup>2+</sup>. The Spearman&#x2019;s rank correlation coefficients of F<sup>&#x2212;</sup> with all the variables are: Depth = 0.14 (<italic>p</italic>&lt;0.01); pH = -0.23 (<italic>p</italic>&lt;0.01); EC = 0.41 (<italic>p</italic>&lt;0.01); TDS = 0.44 (<italic>p</italic>&lt;0.01); Cl<sup>&#x2212;</sup> = 0.31 (<italic>p</italic>&lt;0.01); NO<sub>3</sub>
<sup>&#x2212;</sup> = 0.14 (<italic>p</italic>&lt;0.01); SO<sub>4</sub>
<sup>2&#x2212;</sup> = 0.35 (<italic>p</italic>&lt;0.01); HCO<sub>3</sub>
<sup>&#x2212;</sup> = 0.34 (<italic>p</italic>&lt;0.01); Na<sup>+</sup> = 0.30 (<italic>p</italic>&lt;0.01); K<sup>+</sup> = 0.22 (<italic>p</italic>&lt;0.01); Mg<sup>2+</sup> = 0.31 (<italic>p</italic>&lt;0.01); Ca<sup>2+</sup> = 0.05 (<italic>p</italic>&gt;0.01). EC and TDS had the highest influence on F<sup>&#x2212;</sup> concentration, thus indicating an increase in its concentration with an increase of these parameters.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Predictive performance evaluation of models</title>
<p>The utilization of groundwater physicochemical parameters as predictor variables for forecasting F<sup>&#x2212;</sup> contamination levels through ML approaches has been well established (<xref ref-type="bibr" rid="B31">31</xref>). In this study, five different models with diverse architectures, such as RF, SVM, Xgboost, ELM, and MLP, were employed for predicting the groundwater fluoride concentration in the aquifers of Punjab. The model performance was evaluated based on the <italic>R<sup>2</sup>
</italic>, RMSE, and MAE values. These are some of the commonly used metrics for determining the predictive ability of ML models. In the case of the Xgboost model, the Adaptive random search function among the other functions had the highest <italic>R<sup>2</sup>
</italic> value and the lowest RMSE and MAE values in the testing stage (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>). This implies that the Adaptive random search function is the best activation function for Xgboost in the current study, which was further considered for comparing the prediction performance with all the proposed models. For SVM, the radial basis kernel function (RBF) performed better than polynomial, sigmoid, and linear kernel functions as it can handle non-linear datasets (<xref ref-type="bibr" rid="B31">31</xref>) and, therefore, selected for prediction purposes in our study. This superior performance of RBF over other kernel functions was also confirmed by Rajasekaran et&#xa0;al. (<xref ref-type="bibr" rid="B79">79</xref>), Wu and Wang (<xref ref-type="bibr" rid="B80">80</xref>), Amirmojahedi et&#xa0;al. (<xref ref-type="bibr" rid="B81">81</xref>). Also, among the various activation functions in the ELM model, the &#x2018;tansig&#x2019; function had the most satisfactory output and, therefore, was selected for further prediction-performance comparison between the selected models.</p>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>Performance measures in the testing and training stages of proposed models.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" rowspan="2" align="left">Model</th>
<th valign="top" colspan="3" align="left">Training</th>
<th valign="top" colspan="3" align="left">Testing</th>
</tr>
<tr>
<th valign="top" align="left">
<italic>R<sup>2</sup>
</italic>
</th>
<th valign="top" align="left">RMSE</th>
<th valign="top" align="left">MAE</th>
<th valign="top" align="left">
<italic>R<sup>2</sup>
</italic>
</th>
<th valign="top" align="left">RMSE</th>
<th valign="top" align="left">MAE</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">0.42</td>
<td valign="top" align="left">0.77</td>
<td valign="top" align="left">0.47</td>
<td valign="top" align="left">0.44</td>
<td valign="top" align="left">1.03</td>
<td valign="top" align="left">0.58</td>
</tr>
<tr>
<td valign="top" align="left">SVM</td>
<td valign="top" align="left">0.52</td>
<td valign="top" align="left">0.70</td>
<td valign="top" align="left">0.41</td>
<td valign="top" align="left">0.66</td>
<td valign="top" align="left">0.56</td>
<td valign="top" align="left">0.38</td>
</tr>
<tr>
<td valign="top" align="left">Xgboost</td>
<td valign="top" align="left">0.34</td>
<td valign="top" align="left">0.84</td>
<td valign="top" align="left">0.49</td>
<td valign="top" align="left">0.70</td>
<td valign="top" align="left">0.96</td>
<td valign="top" align="left">0.54</td>
</tr>
<tr>
<td valign="top" align="left">ELM</td>
<td valign="top" align="left">0.85</td>
<td valign="top" align="left">0.46</td>
<td valign="top" align="left">0.36</td>
<td valign="top" align="left">0.95</td>
<td valign="top" align="left">0.305</td>
<td valign="top" align="left">0.33</td>
</tr>
<tr>
<td valign="top" align="left">MLP</td>
<td valign="top" align="left">0.21</td>
<td valign="top" align="left">0.10</td>
<td valign="top" align="left">0.06</td>
<td valign="top" align="left">0.33</td>
<td valign="top" align="left">0.10</td>
<td valign="top" align="left">0.06</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>RF, Random Forest; SVM, Support Vector Machine; Xgboost, Extreme Gadient Boosting; ELM, Extreme Learning Machine; MLP, Multilayer Perceptron; <italic>R<sup>2</sup>
</italic>, Coefficient of determination of Rsquared value; RMSE, Root mean square error; MAE, Mean absolute error.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The overall statistical evaluation criteria for all the models yielded poor to satisfactory results, implying that a few models outperformed others in predicting the F<sup>&#x2212;</sup> levels. Based on the 80% of the total dataset used for training purpose, the <italic>R<sup>2</sup>
</italic> achieved for different models are 0.42 (RF), 0.52 (SVM), 0.34 (Xgboost), 0.85 (ELM), and 0.21 (MLP) (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>). Ideally, <italic>R<sup>2</sup>
</italic> close to unity display greater proximity between the observed and simulated values. Although <italic>R<sup>2</sup>
</italic> provides an indication of how well the model fits the data, with values close to 1 implying a better fit, it does not provide information about the magnitude of the errors between the actual and predicted values. Hence, RMSE and MAE values were computed along with <italic>R<sup>2</sup>
</italic> to assess the performance of the different ML models. The RMSE values were 0.77 (RF), 0.70 (SVM), 0.84 (Xgboost), 0.46 (ELM), and 0.10 (ML), and MAE was 0.47 (RF), 0.41 (SVM), 0.49 (Xgboost), 0.36 (ELM), and 0.06 (MLP). Both RMSE and MAE values closer to 0 suggest little error between the actual and predicted values. Based on these values in the testing stage, MLP had the least amount of error followed by ELM, SVM, RF, and Xgboost. Despite the lowest RMSE and MAE values, MLP had the lowest <italic>R<sup>2</sup>
</italic>, suggesting unreliable performance for F<sup>&#x2212;</sup> determination in this study. Furthermore, RF also trained very poorly, which is evident from the low <italic>R<sup>2</sup>
</italic> and significantly greater RMSE and MAE. In addition to MLP and RF, SVM and Xgboost were also trained unsatisfactorily as per the <italic>R<sup>2</sup>
</italic>, RMSE, and MAE values. On the contrary, relatively lower MAE and RMSE values and greater <italic>R<sup>2</sup>
</italic> value of the ELM model indicates superior training ability relative to the other four models.</p>
<p>After model training, the remaining 20% of the dataset was utilized for testing the model, and the same evaluation metrics were applied to analyze each model&#x2019;s predictability. The trend of the performance evaluation criteria for all the models was almost similar. The order of the proposed models in terms of the <italic>R<sup>2</sup>
</italic> values was ELM (0.95) &gt; Xgboost (0.70) &gt; SVM (0.66) &gt; RF (0.44) &gt; MLP (0.33) (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>). Satisfactory <italic>R<sup>2</sup>
</italic> values were observed for ELM, Xgboost, and SVM. Furthermore, error metrics RMSE and MAE for MLP were 0.10 and 0.06, respectively, which were the least among all the models. However, the lowest <italic>R<sup>2</sup>
</italic> value for MLP obtained from the trained data implies poor model performance and proved unreliable for groundwater fluoride prediction in this region. After MLP, the RMSE (0.31) and MAE (0.33) values for ELM in the testing phase were minimal among the remaining models, emphasizing good prediction ability. From comparing the statistical performance metrics of the training and testing stages of different models, it is evident that only ELM had the optimum values and can be considered for modeling F<sup>&#x2212;</sup> concentrations in Punjab. It is noteworthy that MLP and ELM have relatively less complex topology and training algorithms than the remaining three models (<xref ref-type="bibr" rid="B30">30</xref>). Nevertheless, their performance varied greatly in predicting the groundwater fluoride concentration in the study domain.</p>
<p>In order to better comprehend the accuracy of model prediction, the observed F<sup>&#x2212;</sup> concentrations and their corresponding predicted values after model training were plotted in a scatter diagram (<xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4</bold>
</xref>). From <xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4</bold>
</xref>, it is quite evident that the distribution of predicted F<sup>&#x2212;</sup> values in relation to the observed F<sup>&#x2212;</sup> concentrations is quite closely placed to the best fitting line as opposed to other models, which validates the robustness of the ELM model. Besides ELM, the predicted values of SVM, Xgboost, RF, and MLP did not closely match the actual values, which is substantiated by poor <italic>R<sup>2</sup>
</italic> values (<xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4</bold>
</xref>).</p>
<fig id="f4" position="float">
<label>Figure&#xa0;4</label>
<caption>
<p>Observed versus predicted fluoride (F<italic>
<sup>&#x2212;</sup>
</italic>) concentrations in groundwater for the test data of ELM, SVM, Xgboost, RF, and MLP models. (ELM: Extreme Learning Machine; SVM: Support Vector Machine; Xgboost: Extreme Gradient Boosting; RF: Random Forest; MLP: Multilayer Perceptron).</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fsoil-04-1407502-g004.tif"/>
</fig>
<p>Spatial distribution maps were prepared to better visualize the actual and predicted F<sup>&#x2212;</sup> concentration values for all five models (<xref ref-type="fig" rid="f5">
<bold>Figure&#xa0;5</bold>
</xref>). The predicted values of ELM, Xgboost, SVM, RF, and MLP were compared with the original F<sup>&#x2212;</sup> concentrations, and a significant difference between the model outcomes was noted. The south and southwestern regions of Punjab have elevated F<sup>&#x2212;</sup> levels (&gt; 1.5 mg/L) in its groundwater system (<xref ref-type="fig" rid="f5">
<bold>Figure&#xa0;5A</bold>
</xref>), whereas the remaining areas exhibited relatively lower concentrations. A significantly similar F<sup>&#x2212;</sup> distribution pattern of the ELM predicted values (<xref ref-type="fig" rid="f5">
<bold>Figure&#xa0;5B</bold>
</xref>) with the original F<sup>&#x2212;</sup> concentrations was observed, implying a substantial prediction accuracy. Central Punjab had concentrations ranging between 0.5&#x2013;1.0 mg/L, with parts of northwestern districts having groundwater fluoride surpassing 1.5 mg/L, identical to the original F<sup>&#x2212;</sup> distribution and ELM predicted map. Furthermore, based on the concentration values predicted by the remaining four models, excess F<sup>&#x2212;</sup> levels were evident across the entire Punjab state. Although central and northern regions had relatively safe groundwater fluoride levels (&lt; 1.5 mg/L) (<xref ref-type="fig" rid="f5">
<bold>Figure&#xa0;5A</bold>
</xref>), contradictory F<sup>&#x2212;</sup> distribution as per the predicted values of Xgboost, SVM, RF, and MLP was observed (<xref ref-type="fig" rid="f5">
<bold>Figures&#xa0;5C&#x2013;F</bold>
</xref>), implying poor model performance. Also, the regions falling in the south and southwest had a greater magnitude of F<sup>&#x2212;</sup> content than the original F<sup>&#x2212;</sup> levels, indicating an overestimation of excess contaminant levels. These spatial distribution maps of the proposed ML models, in comparison to the original F<sup>&#x2212;</sup> distribution, are evidently in accordance with the performance evaluation metrics, i.e., <italic>R<sup>2</sup>
</italic>, RMSE, and MAE (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>, <xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4</bold>
</xref>). Consequently, it can be stated that ELM outperformed the remaining four models in groundwater fluoride levels in the study region.</p>
<fig id="f5" position="float">
<label>Figure&#xa0;5</label>
<caption>
<p>Spatial comparison of observed F<sup>&#x2212;</sup> concentration values with the model predicted F<sup>&#x2212;</sup> concentration values. <bold>(A)</bold> Observed F<sup>&#x2212;</sup> concentration; <bold>(B)</bold> ELM predicted F<sup>&#x2212;</sup> concentration; <bold>(C)</bold> Xgboost predicted F<sup>&#x2212;</sup> concentration; <bold>(D)</bold> SVM predicted F<sup>&#x2212;</sup> concentration; <bold>(E)</bold> RF predicted F<sup>&#x2212;</sup> concentration; <bold>(F)</bold> MLP predicted F<sup>&#x2212;</sup> concentration.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fsoil-04-1407502-g005.tif"/>
</fig>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Variable importance by ML models</title>
<p>The predictor (input) variables govern the robustness and stability of the prediction models (<xref ref-type="bibr" rid="B82">82</xref>, <xref ref-type="bibr" rid="B83">83</xref>), and therefore, the relative importance ranking of these variables aid in determining the significant variables or factors influencing the contamination. The ranking of the variables was found to be consistent in ELM and SVM (<xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>), while other models displayed certain variations. In both ELM and SVM, the variables contributing the most to model prediction were TDS, EC, Cl<sup>&#x2212;</sup>, Na<sup>+</sup>, and Ca<sup>2+</sup>, each with relative importance greater than 15% (<xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>). These variables also displayed a significant correlation with F<sup>&#x2212;</sup>. Based on the relative importance scores, TDS, EC, and Na<sup>+</sup> were the top three variables in these two models in the order TDS &gt; EC &gt; Na<sup>+</sup>, indicating their potential role in mobilizing and enhancing F<sup>&#x2212;</sup> in the groundwater of the study domain, particularly in the shallow aquifers. The variable importance of the abovementioned factors was also observed to be significant in the remaining three models; however, it was not in the same order as ELM and SVM. Based on the relative importance scores, the order of the variables was EC &gt; TDS &gt; Na<sup>+</sup> in Xgboost, TDS &gt; Na<sup>+</sup> &gt; EC in RF, whereas EC &gt; Ca<sup>2+</sup> &gt; Na<sup>+</sup> &gt; TDS in MLP was observed. The relative importance of Cl<sup>&#x2212;</sup> and Ca<sup>2+</sup> ranked 4<sup>th</sup> and 5<sup>th</sup> in ELM, SVM, and RF, while Xgboost (Ca<sup>2+</sup> and Cl<sup>&#x2212;</sup> ranking 4<sup>th</sup> and 5<sup>th</sup>, respectively) and MLP (Ca<sup>2+</sup> and Cl<sup>&#x2212;</sup> ranking 2<sup>nd</sup> and 5<sup>th</sup>, respectively) displayed slight variation. The variable SO<sub>4</sub>
<sup>2&#x2212;</sup> attained the least importance in ELM (0.5), SVM (0.2), Xgboost (1.1), RF (0.3), and MLP (0.3) (<xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>). The ranking discrepancies between these variables in all the five models could have resulted due to the differences in model algorithms (<xref ref-type="bibr" rid="B24">24</xref>).</p>
<fig id="f6" position="float">
<label>Figure&#xa0;6</label>
<caption>
<p>Relative importance ranking of hydrochemical parameters for ELM, SVM, Xgboost, RF, and MLP models. (ELM: Extreme Learning Machine; SVM: Support Vector Machine; Xgboost: Extreme Gradient Boosting; RF: Random Forest; MLP: Multilayer Perceptron).</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fsoil-04-1407502-g006.tif"/>
</fig>
</sec>
</sec>
<sec id="s5" sec-type="discussion">
<label>5</label>
<title>Discussion</title>
<sec id="s5_1">
<label>5.1</label>
<title>Groundwater fluoride and other physicochemical characteristics</title>
<p>The occurrence of elevated F<sup>&#x2212;</sup> concentrations in the groundwater of the Punjab basin is attributed mainly to geogenic origin (<xref ref-type="bibr" rid="B84">84</xref>), which results from the interplay of multiple complex interdependent hydrogeochemical processes (<xref ref-type="bibr" rid="B85">85</xref>). Naturally occurring F<sup>&#x2212;</sup> in minerals and rocks are nearly insoluble in water. However, favorable conditions facilitate the dissolution of these minerals, further releasing F<sup>&#x2212;</sup> into the groundwater (<xref ref-type="bibr" rid="B86">86</xref>). Dissolution of fluoride-bearing minerals, in particular fluorite, under suitable conditions, such as alkaline pH with excess EC and TDS, as observed in the aquifers of Punjab, favors F<sup>&#x2212;</sup> enrichment. The alkaline nature of the groundwater could be attributed to the presence of sediments containing abundant carbonate minerals (<xref ref-type="bibr" rid="B85">85</xref>). Elevated EC and TDS could be a consequence of the rapid and greater degree of rock and mineral weathering, waterlogging, and dissolution of salts (<xref ref-type="bibr" rid="B17">17</xref>, <xref ref-type="bibr" rid="B56">56</xref>). Furthermore, anthropogenic inputs from agricultural and industrial practices and groundwater recharge through shallow aquifers also contribute to high EC and TDS (<xref ref-type="bibr" rid="B87">87</xref>). An increase in EC and TDS elevates the water&#x2019;s ionic strength and major ion concentration. This results in a greater competitive effect between ions and F<sup>&#x2212;</sup> from soil exchange sites and mineral surfaces through the ion exchange process, thus reducing the adsorption potential of F<sup>&#x2212;</sup> and enhancing their mobilization (<xref ref-type="bibr" rid="B84">84</xref>, <xref ref-type="bibr" rid="B88">88</xref>). Sodium and chloride ions, responsible for TDS/EC, also significantly correlated with F<sup>&#x2212;</sup>. Calcite precipitation (decrease in Ca<sup>2+</sup>) enriches F<sup>&#x2212;</sup> in the groundwaters, hence the negative correlation between the two. Also, F<sup>&#x2212;</sup> and all other hydrochemical parameters in shallow aquifers surpassed their respective permissible limits compared to the deeper water samples. Shallow waters are easily accessible for human consumption and other activities, and therefore, raises concern over affecting human well-being.</p>
<p>Besides the groundwater hydrogeochemical conditions, the prevailing arid and semi-arid climate in the study region increases evaporation rates relative to humid areas. High rainfall inputs and subsequent dilution effect in humid climatic zones result in lower groundwater fluoride levels compared to drier environments. Also, the groundwater movement in arid/semi-arid regions is generally slow, thereby increasing the contact time between the water and rock, which further causes F<sup>&#x2212;</sup> enrichment in water (<xref ref-type="bibr" rid="B85">85</xref>, <xref ref-type="bibr" rid="B89">89</xref>).</p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Model output and performance</title>
<p>Excess groundwater fluoride incidence in arid and semi-arid regions is a common phenomenon (<xref ref-type="bibr" rid="B90">90</xref>, <xref ref-type="bibr" rid="B91">91</xref>), as observed in the aquifers of Punjab. Furthermore, due to its geogenic origin, the concentration of F<sup>&#x2212;</sup> depends directly on the hydrogeochemical conditions. Also, relatively limited studies have forecast F<sup>&#x2212;</sup> levels in arid and semi-arid locations using hydrochemical characteristics. Therefore, developing a predictive model for determining F<sup>&#x2212;</sup> levels using water quality variables in locations lacking monitoring assessments is essential. This study proposed five different ML models (RF, MLP, SVM, ELM, and Xgboost) and determined the best-performing model based on the evaluation metrics (<italic>R<sup>2</sup>
</italic>, RMSE, and MAE). Out of all the models, MLP trained extremely poorly for the dataset and is, therefore, unsuitable for making reliable predictions of groundwater fluoride concentration in our study area. This finding is contrary to Nafouanti et&#xa0;al. (<xref ref-type="bibr" rid="B28">28</xref>) and Gupta and Maiti (<xref ref-type="bibr" rid="B30">30</xref>), where MLP performed accurately in predicting F<sup>&#x2212;</sup> concentrations in the Datong basin (China) and Maharashtra (India), respectively. The poor performance of MLP in our study could be due to its incapability to extrapolate beyond the data used for training, which further leads to overfitting issues during the training phase (<xref ref-type="bibr" rid="B92">92</xref>, <xref ref-type="bibr" rid="B93">93</xref>). The worst prediction performing model in both the training and testing stages was MLP, which contradicts Bui et&#xa0;al. (<xref ref-type="bibr" rid="B94">94</xref>). The MLP is based on neural network architecture that can generate more accurate results on a badly structured dataset than on tree-based models such as RF (<xref ref-type="bibr" rid="B94">94</xref>). On the contrary, the RF model overcomes overfitting issues by combining many trees, thereby free from bias, resulting in enhanced prediction performance (<xref ref-type="bibr" rid="B32">32</xref>). Regardless, RF performed poorly in the training phase as well. Both MLP and RF generated values that deviated greatly from the original values, implying unsatisfactory performance. MLP tends to overfit from the training data, interfering with its ability to infer the remaining data (test/cross-validation dataset) (<xref ref-type="bibr" rid="B28">28</xref>). Gupta and Maiti (<xref ref-type="bibr" rid="B30">30</xref>), in their work, also stated that MLP, RF, and SVM are less effective in uncovering the intricate non-linear association between the target and predictor variables. Unsatisfactory training values of MLP, RF, SVM, and Xgboost could have resulted from a very wide variability in the range of both the target (output) and predictor (input) variables within the compiled dataset. This adds a limitation to model fitting in our study, resulting in inaccurate prediction results. Data pre-treatment involving outlier suppression and logarithmic transformation can be a possible solution to further improve the prediction accuracies (<xref ref-type="bibr" rid="B30">30</xref>). However, these pre-processing steps on the raw dataset and their influence on the model performance need further evaluation. Gupta and Maiti (<xref ref-type="bibr" rid="B30">30</xref>) also emphasized on the limited prediction efficiency of the ELM model due to its design and direct inverse in estimating the bias and weights. Irrespective of this fact, the ELM model with relatively higher <italic>R<sup>2</sup>
</italic> value and low RMSE and MAE values in the training phase in comparison to the remaining models implies good generalization capability for our dataset without undergoing much pre-processing, unlike other models tested in this study.</p>
<p>Classification-based ML models have been commonly used in groundwater fluoride level prediction (<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B26">26</xref>, <xref ref-type="bibr" rid="B27">27</xref>, <xref ref-type="bibr" rid="B95">95</xref>, <xref ref-type="bibr" rid="B96">96</xref>), with very few studies on regression-based prediction modeling of the same (<xref ref-type="bibr" rid="B30">30</xref>, <xref ref-type="bibr" rid="B31">31</xref>). The complexity and accuracy of the datasets, diverse algorithm architectures, and type and number of input parameters significantly influence the performance of the models, and therefore, there is no universal agreement on which ML model performs the best for all prediction-related studies (<xref ref-type="bibr" rid="B94">94</xref>). For instance, groundwater fluoride concentration in the Datong basin, China, was modelled using RF, Linear regression (LR), and MLP-based Artificial neural network (ANN), where RF proved to be the best prediction model (<xref ref-type="bibr" rid="B28">28</xref>). Similarly, the RF model displayed higher prediction accuracy for other contaminants, such as nitrate, than the enhanced regression tree, classified regression tree, and multiple linear regression (<xref ref-type="bibr" rid="B97">97</xref>). Khosravi et&#xa0;al. (<xref ref-type="bibr" rid="B98">98</xref>), instead, reported that the M5P model had the highest predictive power than Instance Based Learner (IBK), KStar, Locally Weighted Learning (LWL), and Regression by discretization (RBD) that were tested for predicting F<sup>&#x2212;</sup> in the aquifers of Maku plain in Iran. Similarly, F<sup>&#x2212;</sup> levels in the groundwater of Sindhudurg district in Maharashtra, India, were predicted using six different models, out of which ELM yielded the most unsatisfactory results (<xref ref-type="bibr" rid="B30">30</xref>). On the contrary, Barzegar et&#xa0;al. (<xref ref-type="bibr" rid="B31">31</xref>) compared the performance of three different models and determined ELM to be the best for forecasting F<sup>&#x2212;</sup> in the Maku Valley of Iran, which is in accordance with the findings of the current study. Therefore, it is advisable to test diverse algorithms with the same dataset and assess their performance in terms of prediction before selecting the best.</p>
<p>The ELM model has a simple architecture and an uncomplicated training process and is generally known for its efficient computational power, requiring fewer hyperparameters for model tuning and training. The parameters of the hidden layer in this model do not require manual adjustments and are also independent of the input data. It only determines the weights of the output analytically and thus has rapid learning speed and lower computation complexity (<xref ref-type="bibr" rid="B99">99</xref>) than the other models proposed in this work. Additionally, ELM also has good generalization capability for high dimensional datasets by initialization of weights and biases stochastically to avoid overfitting problems and thus making the model more robust (<xref ref-type="bibr" rid="B100">100</xref>). The superior predictive performance of ELM over other models was also confirmed in other works (<xref ref-type="bibr" rid="B69">69</xref>, <xref ref-type="bibr" rid="B101">101</xref>&#x2013;<xref ref-type="bibr" rid="B104">104</xref>).</p>
<p>It is worth nothing that most of the studies conducted in the Indian subcontinent have reported RF and MLP to be the best models for predicting F<sup>&#x2212;</sup> concentrations in groundwater. However, these models did not take into consideration the groundwater-physicochemical parameters and used only continuous variables such as climate, soil, geological, and topological parameters as predictors (<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B27">27</xref>, <xref ref-type="bibr" rid="B96">96</xref>). Machine learning algorithms are designed to perform both classification and regression-related tasks. Classification-based ML models have been commonly applied for studies in India that facilitated in forecasting the contamination-risk prone areas (<xref ref-type="bibr" rid="B105">105</xref>&#x2013;<xref ref-type="bibr" rid="B107">107</xref>). Nevertheless, it is equally essential to develop models for predicting the concentration of the contaminants based on the driving factors that directly influence its enhancement and mobility. In this context, regression-based modeling will prove to be much more beneficial than classification models. This study attempted to achieve this goal; therefore, such contrasting results could be due to these reasons.</p>
</sec>
<sec id="s5_3">
<label>5.3</label>
<title>Hydrochemical drivers affecting the model performance</title>
<p>The different variables influencing groundwater fluoride contamination in any region are complex and require an in-depth understanding to identify the potential parameters for proper groundwater resource management. The variable importance ranking in the present work highlighted that TDS, EC, Na<sup>+</sup>, Cl<sup>&#x2212;</sup>, and Ca<sup>2+</sup> were the most crucial factors and were highly correlated with F<sup>&#x2212;</sup> content in the study region. The increase in TDS and EC results in increased ionic strength and higher concentration of major ions dissolved in water. These factors enhance competition between ions and F<sup>&#x2212;</sup> from mineral surfaces and soil exchange sites through the ion-exchange process, which further minimizes the adsorption of F<sup>&#x2212;</sup> and makes them more mobile (<xref ref-type="bibr" rid="B84">84</xref>, <xref ref-type="bibr" rid="B88">88</xref>). Sodium, one of the important parameters responsible for EC and TDS, forms compounds with F<sup>&#x2212;</sup>, such as NaF, which further dissolves in water and becomes more mobile (<xref ref-type="bibr" rid="B108">108</xref>). Other factors, such as Cl<sup>&#x2212;</sup> and Ca<sup>2+</sup>, contributed significantly to the model performance at varying degrees. The primary source of F<sup>&#x2212;</sup> in this region is fluorite mineral (CaF<sub>2</sub>), which undergoes dissolution further releasing F<sup>&#x2212;</sup> and Ca<sup>2+</sup> and the latter precipitates in the presence of excess bicarbonate (HCO<sub>3</sub>
<sup>&#x2212;</sup>), thereby resulting in free F<sup>&#x2212;</sup> ions (<xref ref-type="bibr" rid="B109">109</xref>). On the contrary, the chloride ion undergoes ionic exchange with F<sup>&#x2212;</sup> from the aquifer substrate, bringing about the discharge of F<sup>&#x2212;</sup> ions from these surfaces. Furthermore, the significant contributions of the top 5 variables, i.e., TDS, EC, Na<sup>+</sup>, Cl<sup>&#x2212;</sup>, and Ca<sup>2+</sup>, in all the five models, irrespective of their prediction accuracies, indicate their potential in determining F<sup>&#x2212;</sup> levels in regions lacking groundwater quality monitoring practices. However, the variability in the accuracy post-tuning of the different models might have impacted the outcomes. In other words, data quality, input parameters, hyperparameter tuning process, and varying algorithm architecture play a significant role in the prediction of the target variable. Moreover, the top contributors and their influence on groundwater F<sup>&#x2212;</sup> concentration are clearly highlighted from their relative importance scores in all the models, particularly in the ELM model (<xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>). In addition to this, the maximum prediction accuracy of ELM relative to other proposed models (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>, <xref ref-type="fig" rid="f4">
<bold>Figures&#xa0;4</bold>
</xref>, <xref ref-type="fig" rid="f5">
<bold>5</bold>
</xref>) makes it an acceptable method for groundwater quality assessment investigations.</p>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Limitations and future research directions</title>
<p>The study compiled a huge amount of data from different sources, introducing varying degrees of discrepancies within the complete dataset. Diverse analytical techniques and procedures might have been adopted in determining F<sup>&#x2212;</sup> and the other water quality parameters among the different data sources, affecting the consistency within the final dataset. In addition to this, the seasonal factor, which plays a key role in the contaminant levels in the aquifers, was not considered as a screening criterion in our study. A small proportion of values of certain variables were missing in the complete dataset, which was estimated based on established formulae reported in the literature. All of these factors might have introduced some inconsistencies within the data, which were ultimately used for prediction modeling. Furthermore, the number of samples within each district of the state of Punjab varied greatly, providing an incomplete picture of the study region. Also, the outlier impact post data treatment was also still quite significant. Therefore, the predictive performance of the models might have been affected by all of these factors. The resulting uncertainty among the different models might have originated from the amount of data and noise within the data and variables. The number of variables might have affected the performance of the models. Yet, the results obtained offer satisfactory results regarding a reliable F<sup>&#x2212;</sup> prediction model, i.e., ELM. This model accurately captures the role of the different hydrochemical parameters and delivers precise concentration values, proving to be reliable for groundwater fluoride estimation in the region.</p>
<p>This study highlights the significant role of outliers in impacting the prediction model performance. This implies the further need for data pre-processing for environmental datasets that often exhibit non-normal distribution. De-noising and efficient data transformation methods should be explored to enhance the data quality and predictive performance. Furthermore, more advanced and hybrid models can be applied to this kind of dataset to build a more robust contamination prediction system. The prediction performance of the models based on the varying number of potential input variables should also be assessed to enhance its applicability. These same models can also be tested for other groundwater contaminants and compared to determine the best predictive model. As mentioned earlier, classification-based ML modeling for groundwater contaminants, including F<sup>&#x2212;</sup>, is commonly applied with very limited work on regression-based contaminant concentration modeling. Therefore, this issue should be addressed more, particularly in the Indian sub-continent, where various locations exist without any monitoring assessments.</p>
</sec>
<sec id="s7" sec-type="conclusions">
<label>7</label>
<title>Conclusion</title>
<p>In this work, a comparative performance ability of five different models for predicting F<sup>&#x2212;</sup> concentrations in the alluvial aquifers of Punjab was assessed. Models including ELM, SVM, MLP, RF, and Xgboost models were developed, and performance was evaluated using <italic>R<sup>2</sup>
</italic>, RMSE, and MAE. Except for ELM, the remaining four models performed very poorly both during the training and testing phases. Excess variability within the target and predictor variables post data normalization might have impacted the model performance. Although ELM performed satisfactorily, it can be improved with the further pre-treatment of the dataset. Hybrid models can also produce superior prediction accuracy for such complicated environmental problems that need to be explored. Furthermore, similar regression-based modeling studies should be conducted to thoroughly understand the groundwater fluoride problem. Input variables such as TDS, EC, Na<sup>+</sup>, Cl<sup>&#x2212;</sup>, and Ca<sup>2+</sup> contributed significantly to the model performance. Evidently, the dynamics of groundwater chemistry are highly complex and vary from location to location. The groundwater fluoride prediction based on the corresponding water quality parameters is crucial for sustainable groundwater management, planning, and further safeguarding human health. Therefore, it is essential to build robust non-linear models to resolve this problem efficiently.</p>
</sec>
<sec id="s8" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec id="s9" sec-type="author-contributions">
<title>Author contributions</title>
<p>AK: Data curation, Formal analysis, Methodology, Software, Visualization, Writing &#x2013; original draft. PKS: Methodology, Supervision, Writing &#x2013; review &amp; editing. HSK: Methodology, Software, Supervision, Writing &#x2013; review &amp; editing.</p>
</sec>
</body>
<back>
<sec id="s10" sec-type="funding-information">
<title>Funding</title>
<p>The author(s) declare financial support was received for the research, authorship, and/or publication of this article. AK would like to thank the University Grant Commission (UGC: 3810/(NET-JULY2018)), Government of India, for providing financial support in terms of the research fellowship. PKS sincerely acknowledges DST SERB New Delhi (Government of India) for providing support to this work through core research grant (CRG/2021/002567). We would also like to thank the DST-FIST lab at the Department of Environmental Science and Technology, Central University of Punjab for technical support. HSK also acknowledges the DST FIST (SR/FST/MS-I/2021/104) for supporting this work.</p>
</sec>
<sec id="s11" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
<p>The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.</p>
</sec>
<sec id="s12" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>WHO</collab>
</person-group>. <source>Guidelines for drinking water quality Vol. 1</source>. <publisher-loc>Geneva, Switzerland</publisher-loc>: <publisher-name>World Health Organization</publisher-name> (<year>2011</year>).</citation>
</ref>
<ref id="B2">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neisi</surname> <given-names>A</given-names>
</name>
<name>
<surname>Mirzabeygi</surname> <given-names>M</given-names>
</name>
<name>
<surname>Zeyduni</surname> <given-names>G</given-names>
</name>
<name>
<surname>Hamzezadeh</surname> <given-names>A</given-names>
</name>
<name>
<surname>Jalili</surname> <given-names>D</given-names>
</name>
<name>
<surname>Abbasnia</surname> <given-names>A</given-names>
</name>
<etal/>
</person-group>. <article-title>Data on fluoride concentration levels in cold and warm season in City area of Sistan and Baluchistan Province, Iran</article-title>. <source>Data Brief</source>. (<year>2018</year>) <volume>18</volume>:<page-range>713&#x2013;8</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.dib.2018.03.060</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashrafi</surname> <given-names>SD</given-names>
</name>
<name>
<surname>Jaafari</surname> <given-names>J</given-names>
</name>
<name>
<surname>Sattari</surname> <given-names>L</given-names>
</name>
<name>
<surname>Esmaeilzadeh</surname> <given-names>N</given-names>
</name>
<name>
<surname>Safari</surname> <given-names>GH</given-names>
</name>
</person-group>. <article-title>Monitoring and health risk assessment of fluoride in drinking water of East Azerbaijan Province, Iran</article-title>. <source>Int J Environ Anal Chem</source>. (<year>2020</year>) <volume>103</volume>:<page-range>1&#x2013;15</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/03067319.2020.1849662</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ayoob</surname> <given-names>S</given-names>
</name>
<name>
<surname>Gupta</surname> <given-names>AK</given-names>
</name>
</person-group>. <article-title>Fluoride in drinking water: A review on the status and stress effects</article-title>. <source>Crit Rev Environ Sci Technol</source>. (<year>2006</year>) <volume>36</volume>:<page-range>433&#x2013;87</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/10643380600678112</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mukherjee</surname> <given-names>I</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>UK</given-names>
</name>
</person-group>. <article-title>Groundwater fluoride contamination, probable release, and containment mechanisms: a review on Indian context</article-title>. <source>Environ Geochem Health</source>. (<year>2018</year>) <volume>40</volume>:<page-range>2259&#x2013;301</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10653-018-0096-x</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chakraborti</surname> <given-names>D</given-names>
</name>
<name>
<surname>Rahman</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Chatterjee</surname> <given-names>A</given-names>
</name>
<name>
<surname>Das</surname> <given-names>D</given-names>
</name>
<name>
<surname>Das</surname> <given-names>B</given-names>
</name>
<name>
<surname>Nayak</surname> <given-names>B</given-names>
</name>
<etal/>
</person-group>. <article-title>Fate of over 480 million inhabitants living in arsenic and fluoride endemic Indian districts: Magnitude, health, socio-economic effects and mitigation approaches</article-title>. <source>J Trace Elem Med Biol</source>. (<year>2016</year>) <volume>38</volume>:<fpage>33</fpage>&#x2013;<lpage>45</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.jtemb.2016.05.001</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chakraborti</surname> <given-names>D</given-names>
</name>
<name>
<surname>Das</surname> <given-names>B</given-names>
</name>
<name>
<surname>Murrill</surname> <given-names>MT</given-names>
</name>
</person-group>. <article-title>Examining India&#x2019;s groundwater quality management</article-title>. <source>Environ Sci Technol</source>. (<year>2011</year>) <volume>45</volume>:<fpage>27</fpage>&#x2013;<lpage>33</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1021/es101695d</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khattak</surname> <given-names>JA</given-names>
</name>
<name>
<surname>Farooqi</surname> <given-names>A</given-names>
</name>
<name>
<surname>Hussain</surname> <given-names>I</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>A</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>CK</given-names>
</name>
<name>
<surname>Mailloux</surname> <given-names>BJ</given-names>
</name>
<etal/>
</person-group>. <article-title>Groundwater fluoride across the Punjab plains of Pakistan and India: Distribution and underlying mechanisms</article-title>. <source>Sci Total Environ</source>. (<year>2022</year>) <volume>806</volume>:<elocation-id>151353</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.151353</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rasool</surname> <given-names>A</given-names>
</name>
<name>
<surname>Farooqi</surname> <given-names>A</given-names>
</name>
<name>
<surname>Xiao</surname> <given-names>T</given-names>
</name>
<name>
<surname>Ali</surname> <given-names>W</given-names>
</name>
<name>
<surname>Noor</surname> <given-names>S</given-names>
</name>
<name>
<surname>Abiola</surname> <given-names>O</given-names>
</name>
<etal/>
</person-group>. <article-title>A review of global outlook on fluoride contamination in groundwater with prominence on the Pakistan current situation</article-title>. <source>Environ Geochem Health</source> (<year>2018</year>) <volume>40</volume>:<page-range>1265&#x2013;81</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10653-017-0054-z</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Podgorski</surname> <given-names>JE</given-names>
</name>
<name>
<surname>Labhasetwar</surname> <given-names>P</given-names>
</name>
<name>
<surname>Saha</surname> <given-names>D</given-names>
</name>
<name>
<surname>Berg</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Prediction modeling and mapping of groundwater fluoride contamination throughout India</article-title>. <source>Environ Sci Technol</source>. (<year>2018</year>) <volume>52</volume>:<page-range>9889&#x2013;98</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1021/acs.est.8b01679</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ali</surname> <given-names>W</given-names>
</name>
<name>
<surname>Aslam</surname> <given-names>MW</given-names>
</name>
<name>
<surname>Junaid</surname> <given-names>M</given-names>
</name>
<name>
<surname>Ali</surname> <given-names>K</given-names>
</name>
<name>
<surname>Guo</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Rasool</surname> <given-names>A</given-names>
</name>
<etal/>
</person-group>. <article-title>Elucidating various geochemical mechanisms drive fluoride contamination in unconfined aquifers along the major rivers in Sindh and Punjab, Pakistan</article-title>. <source>Environ Pollut</source>. (<year>2019</year>) <volume>249</volume>:<page-range>535&#x2013;49</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.envpol.2019.03.043</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname> <given-names>M</given-names>
</name>
<name>
<surname>Goswami</surname> <given-names>R</given-names>
</name>
<name>
<surname>Patel</surname> <given-names>AK</given-names>
</name>
<name>
<surname>Srivastava</surname> <given-names>M</given-names>
</name>
<name>
<surname>Das</surname> <given-names>N</given-names>
</name>
</person-group>. <article-title>Scenario, perspectives and mechanism of arsenic and fluoride Co-occurrence in the groundwater: A review</article-title>. <source>Chemosphere</source>. (<year>2020</year>) <volume>249</volume>:<elocation-id>126126</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.chemosphere.2020.126126</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaswal</surname> <given-names>V</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Sahoo</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Mittal</surname> <given-names>S</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>A</given-names>
</name>
<name>
<surname>Sahoo</surname> <given-names>SK</given-names>
</name>
<etal/>
</person-group>. <article-title>Multi-parametric groundwater quality and human health risk assessment vis-&#xe0;-vis hydrogeochemical process in an Agri-intensive region of Indus basin, Punjab, India</article-title>. <source>Toxin Rev</source>. (<year>2022</year>) <volume>41</volume>:<page-range>768&#x2013;84</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/15569543.2021.1929324</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rishi</surname> <given-names>MS</given-names>
</name>
<name>
<surname>Keesari</surname> <given-names>T</given-names>
</name>
<name>
<surname>Sharma</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Pant</surname> <given-names>D</given-names>
</name>
<name>
<surname>Sinha</surname> <given-names>UK</given-names>
</name>
</person-group>. <article-title>Spatial trends in uranium distribution in groundwaters of Southwest Punjab, India-A hydrochemical perspective</article-title>. <source>J Radioanalytical Nucl Chem</source> (<year>2017</year>) <volume>311</volume>:<page-range>1937&#x2013;45</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10967-017-5178-1</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nizam</surname> <given-names>S</given-names>
</name>
<name>
<surname>Virk</surname> <given-names>HS</given-names>
</name>
<name>
<surname>Sen</surname> <given-names>IS</given-names>
</name>
</person-group>. <article-title>High levels of fluoride in groundwater from Northern parts of Indo-Gangetic plains reveals detrimental fluorosis health risks</article-title>. <source>Environ Adv</source> (<year>2022</year>) <volume>8</volume>:<elocation-id>100200</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.envadv.2022.100200</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krishan</surname> <given-names>G</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>B</given-names>
</name>
<name>
<surname>Sudarsan</surname> <given-names>N</given-names>
</name>
<name>
<surname>Rao</surname> <given-names>MS</given-names>
</name>
<name>
<surname>Ghosh</surname> <given-names>NC</given-names>
</name>
<name>
<surname>Taloor</surname> <given-names>AK</given-names>
</name>
<etal/>
</person-group>. <article-title>Isotopes (&#x3b4;18O, &#x3b4;D and 3H) variations in groundwater with emphasis on salinization in the state of Punjab, India</article-title>. <source>Sci Total Environ</source>. (<year>2021</year>) <volume>789</volume>:<elocation-id>148051</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.148051</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sahoo</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Virk</surname> <given-names>HS</given-names>
</name>
<name>
<surname>Powell</surname> <given-names>MA</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Pattanaik</surname> <given-names>JK</given-names>
</name>
<name>
<surname>Salom&#xe3;o</surname> <given-names>GN</given-names>
</name>
<etal/>
</person-group>. <article-title>Meta-analysis of uranium contamination in groundwater of the alluvial plains of Punjab, northwest India: Status, health risk, and hydrogeochemical processes</article-title>. <source>Sci Total Environ</source>. (<year>2022</year>) <volume>807</volume>:<elocation-id>151753</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.151753</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alagha</surname> <given-names>JS</given-names>
</name>
<name>
<surname>Said</surname> <given-names>MAM</given-names>
</name>
<name>
<surname>Mogheir</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>Modeling of nitrate concentration in groundwater using artificial intelligence approach-a case study of Gaza coastal aquifer</article-title>. <source>Environ Monit Assess</source>. (<year>2014</year>) <volume>186</volume>:<fpage>35</fpage>&#x2013;<lpage>45</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10661-013-3353-6</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saboe</surname> <given-names>D</given-names>
</name>
<name>
<surname>Ghasemi</surname> <given-names>H</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Samardzic</surname> <given-names>M</given-names>
</name>
<name>
<surname>Hristovski</surname> <given-names>KD</given-names>
</name>
<name>
<surname>Boscovic</surname> <given-names>D</given-names>
</name>
<etal/>
</person-group>. <article-title>Real-time monitoring and prediction of water quality parameters and algae concentrations using microbial potentiometric sensor signals and machine learning tools</article-title>. <source>Sci Total Environ</source>. (<year>2021</year>) <volume>764</volume>:<elocation-id>142876</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2020.142876</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huynh</surname> <given-names>TMT</given-names>
</name>
<name>
<surname>Ni</surname> <given-names>CF</given-names>
</name>
<name>
<surname>Su</surname> <given-names>YS</given-names>
</name>
<name>
<surname>Nguyen</surname> <given-names>VCN</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>IH</given-names>
</name>
<name>
<surname>Lin</surname> <given-names>CP</given-names>
</name>
<etal/>
</person-group>. <article-title>Predicting heavy metal concentrations in shallow aquifer systems based on low-cost physiochemical parameters using machine learning techniques</article-title>. <source>Int J Environ Res Public Health</source>. (<year>2022</year>) <volume>19</volume>:<elocation-id>12180</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/ijerph191912180</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Javadi</surname> <given-names>AA</given-names>
</name>
<name>
<surname>AL-Najjar</surname> <given-names>MM</given-names>
</name>
</person-group>. <article-title>Finite element modeling of contaminant transport in soils including the effect of chemical reactions</article-title>. <source>J Hazard Mater</source>. (<year>2007</year>) <volume>143</volume>:<fpage>690</fpage>&#x2013;<lpage>701</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.jhazmat.2007.01.016</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghosh</surname> <given-names>D</given-names>
</name>
<name>
<surname>Donselaar</surname> <given-names>ME</given-names>
</name>
</person-group>. <article-title>Predictive geospatial model for arsenic accumulation in Holocene aquifers based on interactions of oxbow-lake biogeochemistry and alluvial geomorphology</article-title>. <source>Sci Total Environ</source>. (<year>2023</year>) <volume>856</volume>:<elocation-id>158952</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2022.158952</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Coppola</surname> <given-names>EA</given-names>
</name>
<name>
<surname>Rana</surname> <given-names>AJ</given-names>
</name>
<name>
<surname>Poulton</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Szidarovszky</surname> <given-names>F</given-names>
</name>
<name>
<surname>Uhl</surname> <given-names>VW</given-names>
</name>
</person-group>. <article-title>A neural network model for predicting aquifer water level elevations</article-title>. <source>Ground Water</source>. (<year>2005</year>) <volume>43</volume>:<page-range>231&#x2013;41</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1111/j.1745-6584.2005.0003.x</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname> <given-names>H</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>P</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>A</given-names>
</name>
<name>
<surname>Ye</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Cui</surname> <given-names>R</given-names>
</name>
<etal/>
</person-group>. <article-title>Prediction of phosphorus concentrations in shallow groundwater in intensive agricultural regions based on machine learning</article-title>. <source>Chemosphere</source>. (<year>2023</year>) <volume>313</volume>:<elocation-id>137623</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.chemosphere.2022.137623</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cui</surname> <given-names>L</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Mapping the daily nitrous acid (HONO) concentrations across China during 2006&#x2013;2017 through ensemble machine-learning algorithm</article-title>. <source>Sci Total Environ</source>. (<year>2021</year>) <volume>785</volume>:<elocation-id>147325</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.147325</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Podgorski</surname> <given-names>J</given-names>
</name>
<name>
<surname>Berg</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Global analysis and prediction of fluoride in groundwater</article-title>. <source>Nat Commun</source>. (<year>2022</year>) <volume>13</volume>:<fpage>4232</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41467-022-31940-x</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aind</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Malakar</surname> <given-names>P</given-names>
</name>
<name>
<surname>Sarkar</surname> <given-names>S</given-names>
</name>
<name>
<surname>Mukherjee</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Controls on groundwater fluoride contamination in eastern parts of India: insights from unsaturated zone fluoride profiles and AI-based modeling</article-title>. <source>Water (Switzerland)</source>. (<year>2022</year>) <volume>14</volume>:<elocation-id>3220</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/w14203220</pub-id>
</citation>
</ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nafouanti</surname> <given-names>MB</given-names>
</name>
<name>
<surname>Li</surname> <given-names>J</given-names>
</name>
<name>
<surname>Mustapha</surname> <given-names>NA</given-names>
</name>
<name>
<surname>Uwamungu</surname> <given-names>P</given-names>
</name>
<name>
<surname>AL-Alimi</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Prediction on the fluoride contamination in groundwater at the Datong Basin, Northern China: Comparison of random forest, logistic regression and artificial neural network</article-title>. <source>Appl Geochem</source>. (<year>2021</year>) <volume>132</volume>:<elocation-id>105054</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.apgeochem.2021.105054</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ata&#x15f;</surname> <given-names>M</given-names>
</name>
<name>
<surname>Ye&#x15f;ilnacar</surname> <given-names>M&#x130;</given-names>
</name>
<name>
<surname>Demir Yeti&#x15f;</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Novel machine learning techniques based hybrid models (LR-KNN-ANN and SVM) in prediction of dental fluorosis in groundwater</article-title>. <source>Environ Geochem Health</source>. (<year>2022</year>) <volume>44</volume>:<page-range>3891&#x2013;905</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10653-021-01148-x</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Maiti</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Enhancing data-driven modeling of fluoride concentration using new data mining algorithms</article-title>. <source>Environ Earth Sci</source>. (<year>2022</year>) <volume>81</volume>:<fpage>89</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s12665-022-10216-z</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barzegar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Asghari Moghaddam</surname> <given-names>A</given-names>
</name>
<name>
<surname>Adamowski</surname> <given-names>J</given-names>
</name>
<name>
<surname>Fijani</surname> <given-names>E</given-names>
</name>
</person-group>. <article-title>Comparison of machine learning models for predicting fluoride contamination in groundwater</article-title>. <source>Stochastic Environ Res Risk Assess</source>. (<year>2017</year>) <volume>31</volume>:<page-range>2705&#x2013;18</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00477-016-1338-z</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nafouanti</surname> <given-names>MB</given-names>
</name>
<name>
<surname>Li</surname> <given-names>J</given-names>
</name>
<name>
<surname>Nyakilla</surname> <given-names>EE</given-names>
</name>
<name>
<surname>Mwakipunda</surname> <given-names>GC</given-names>
</name>
<name>
<surname>Mulashani</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>A novel hybrid random forest linear model approach for forecasting groundwater fluoride contamination</article-title>. <source>Environ Sci pollut Res</source>. (<year>2023</year>) <volume>30</volume>:<page-range>50661&#x2013;74</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s11356-023-25886-w</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hundal</surname> <given-names>HS</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>K</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Occurrence and geochemistry of arsenic in groundwater of Punjab, northwest India</article-title>. <source>Commun Soil Sci Plant Anal</source>. (<year>2007</year>) <volume>38</volume>:<page-range>2257&#x2013;77</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/00103620701588312</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>Esri Microsoft IO</collab>
</person-group>. <source>Sentinel-2 10m land Use/Land cover timeseries downloader (Mature support)</source> (<year>2022</year>). Available at: <uri xlink:href="https://www.arcgis.com/home/item.html?id=fc92d38533d440078f17678ebc20e8e2">https://www.arcgis.com/home/item.html?id=fc92d38533d440078f17678ebc20e8e2</uri> (Accessed <access-date>4th June, 2022</access-date>).</citation>
</ref>
<ref id="B35">
<label>35</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>U.S. Geological Survey</collab>
</person-group>. <source>3D elevation program 1-meter resolution digital elevation model (published 20220439)</source>.</citation>
</ref>
<ref id="B36">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abatzoglou</surname> <given-names>JT</given-names>
</name>
<name>
<surname>Dobrowski</surname> <given-names>SZ</given-names>
</name>
<name>
<surname>Parks</surname> <given-names>SA</given-names>
</name>
<name>
<surname>Hegewisch</surname> <given-names>KC</given-names>
</name>
</person-group>. <article-title>TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958-2015</article-title>. <source>Sci Data.</source> (<year>2018</year>) <volume>5</volume>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/sdata.2017.191</pub-id>
</citation>
</ref>
<ref id="B37">
<label>37</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roy</surname> <given-names>PS</given-names>
</name>
<name>
<surname>Meiyappan</surname> <given-names>P</given-names>
</name>
<name>
<surname>Joshi</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Kale</surname> <given-names>MP</given-names>
</name>
<name>
<surname>Srivastav</surname> <given-names>VK</given-names>
</name>
<name>
<surname>Srivasatava</surname> <given-names>SK</given-names>
</name>
<etal/>
</person-group>. <article-title>Decadal land use and land cover classifications across India 1985, 1995, 2005</article-title>. <source>Ornl Daac</source>. (<year>2016</year>) <volume>7</volume>:<fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.3334/ORNLDAAC/1336</pub-id>
</citation>
</ref>
<ref id="B38">
<label>38</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Ground Water Year Book Punjab and Chandigarh (UT)</source>. (<year>2021</year>) (<publisher-loc>India</publisher-loc>: <publisher-name>Central Groundwater Board</publisher-name>).</citation>
</ref>
<ref id="B39">
<label>39</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Concept Note On GEOGENIC CONTAMINATION OF GROUND WATER IN INDIA</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Central Ground Water Board Ministry of Water Resources Govt. of India</publisher-name> (<year>2014</year>) p. <fpage>1</fpage>&#x2013;<lpage>99</lpage>.</citation>
</ref>
<ref id="B40">
<label>40</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <article-title>Uranium occurance in shallow aquifers in India</article-title> (<year>2020</year>). Available online at: <uri xlink:href="http://cgwb.gov.in/WQ/URANIUM_REPORT_2020.pdf">http://cgwb.gov.in/WQ/URANIUM_REPORT_2020.pdf</uri>.</citation>
</ref>
<ref id="B41">
<label>41</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duggal</surname> <given-names>V</given-names>
</name>
<name>
<surname>Sharma</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Fluoride contamination in drinking water and associated health risk assessment in the Malwa Belt of Punjab, India</article-title>. <source>Environ Adv</source>. (<year>2022</year>) <volume>8</volume>:<elocation-id>100242</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.envadv.2022.100242</pub-id>
</citation>
</ref>
<ref id="B42">
<label>42</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname> <given-names>T</given-names>
</name>
<name>
<surname>Litoria</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Bajwa</surname> <given-names>BS</given-names>
</name>
<name>
<surname>Kaur</surname> <given-names>I</given-names>
</name>
</person-group>. <article-title>Appraisal of groundwater quality and associated risks in Mansa district (Punjab, India)</article-title>. <source>Environ Monit Assess</source>. (<year>2021</year>) <volume>193</volume>:<fpage>159</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10661-021-08892-8</pub-id>
</citation>
</ref>
<ref id="B43">
<label>43</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lapworth</surname> <given-names>DJ</given-names>
</name>
<name>
<surname>Krishan</surname> <given-names>G</given-names>
</name>
<name>
<surname>MacDonald</surname> <given-names>AM</given-names>
</name>
<name>
<surname>Rao</surname> <given-names>MS</given-names>
</name>
</person-group>. <article-title>Groundwater quality in the alluvial aquifer system of northwest India: New evidence of the extent of anthropogenic and geogenic contamination</article-title>. <source>Sci Total Environ</source>. (<year>2017</year>) <volume>599&#x2013;600</volume>:<page-range>1433&#x2013;44</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2017.04.223</pub-id>
</citation>
</ref>
<ref id="B44">
<label>44</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Annual Report 2014&#x2013;15; Central Ground Water Board: Faridabad</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Government of India</publisher-name> (<year>2014</year>).</citation>
</ref>
<ref id="B45">
<label>45</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Annual Report 2015&#x2013;16; Central Ground Water Board: Faridabad</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Government of India</publisher-name> (<year>2015</year>).</citation>
</ref>
<ref id="B46">
<label>46</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Annual Report 2018&#x2013;19; Central Ground Water Board: Faridabad</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Government of India</publisher-name> (<year>2018</year>).</citation>
</ref>
<ref id="B47">
<label>47</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Annual Report 2019&#x2013;20; Central Ground Water Board: Faridabad</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Government of India</publisher-name> (<year>2019</year>).</citation>
</ref>
<ref id="B48">
<label>48</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Annual Report 2020&#x2013;21; Central Ground Water Board: Faridabad</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Government of India</publisher-name> (<year>2020</year>).</citation>
</ref>
<ref id="B49">
<label>49</label>
<citation citation-type="book">
<person-group person-group-type="author">
<collab>CGWB</collab>
</person-group>. <source>Annual Report 2013&#x2013;14; Central Ground Water Board: Faridabad</source>. <publisher-loc>India</publisher-loc>: <publisher-name>Government of India</publisher-name> (<year>2013</year>).</citation>
</ref>
<ref id="B50">
<label>50</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mittal</surname> <given-names>S</given-names>
</name>
<name>
<surname>Sahoo</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Sahoo</surname> <given-names>SK</given-names>
</name>
<name>
<surname>Kumar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Tiwari</surname> <given-names>RP</given-names>
</name>
</person-group>. <article-title>Hydrochemical characteristics and human health risk assessment of groundwater in the Shivalik region of Sutlej basin, Punjab, India</article-title>. <source>Arab J Geosci</source>. (<year>2021</year>) <volume>14</volume>:<fpage>847</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s12517-021-07043-0</pub-id>
</citation>
</ref>
<ref id="B51">
<label>51</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Mittal</surname> <given-names>S</given-names>
</name>
<name>
<surname>Peechat</surname> <given-names>S</given-names>
</name>
<name>
<surname>Sahoo</surname> <given-names>PK</given-names>
</name>
<name>
<surname>Sahoo</surname> <given-names>SK</given-names>
</name>
</person-group>. <article-title>Quantification of groundwater&#x2013;agricultural soil quality and associated health risks in the agri-intensive Sutlej River Basin of Punjab, India</article-title>. <source>Environ Geochem Health</source>. (<year>2020</year>) <volume>42</volume>:<page-range>4245&#x2013;68</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10653-020-00636-w</pub-id>
</citation>
</ref>
<ref id="B52">
<label>52</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chopra</surname> <given-names>RPS</given-names>
</name>
<name>
<surname>Krishan</surname> <given-names>G</given-names>
</name>
</person-group>. <article-title>Analysis of aquifer characteristics and groundwater quality in southwest punjab, india</article-title>. <source>J Earth Sci Eng.</source> (<year>2014</year>) <volume>4</volume>(<issue>10</issue>):<fpage>597</fpage>&#x2013;<lpage>604</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.17265/2159-581X/2014.10.002</pub-id>
</citation>
</ref>
<ref id="B53">
<label>53</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bala</surname> <given-names>R</given-names>
</name>
<name>
<surname>Karanveer</surname>
</name>
<name>
<surname>Das</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Occurrence and behaviour of uranium in the groundwater and potential health risk associated in semi-arid region of punjab, india</article-title>. <source>Groundw Sustain Dev</source> (<year>2022</year>) <volume>17</volume>:<elocation-id>100731</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.gsd.2022.100731</pub-id>
</citation>
</ref>
<ref id="B54">
<label>54</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Grattan</surname> <given-names>SR</given-names>
</name>
</person-group>. <article-title>Irrigation Water Salinity and Crop Production</article-title>. In: <source>Irrigation Water Salinity and Crop Production</source> (<year>2002</year>) (<publisher-loc>California</publisher-loc>: <publisher-name>ANR Publication</publisher-name>). doi:&#xa0;<pub-id pub-id-type="doi">10.3733/ucanr.8066</pub-id>
</citation>
</ref>
<ref id="B55">
<label>55</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Filzmoser</surname> <given-names>P</given-names>
</name>
<name>
<surname>Reimann</surname> <given-names>C</given-names>
</name>
</person-group>. <article-title>Normal and lognormal data distribution in geochemistry : death of a myth. Consequences for the statistical treatment of geochemical and environmental data</article-title>. <source>Environ Geol</source>. (<year>1999</year>) <volume>39</volume>:<page-range>1001&#x2013;14</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s002549900081</pub-id>
</citation>
</ref>
<ref id="B56">
<label>56</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname> <given-names>G</given-names>
</name>
<name>
<surname>Rishi</surname> <given-names>MS</given-names>
</name>
<name>
<surname>Herojeet</surname> <given-names>R</given-names>
</name>
<name>
<surname>Kaur</surname> <given-names>L</given-names>
</name>
<name>
<surname>Priyanka</surname>
</name>
<name>
<surname>Sharma</surname> <given-names>K</given-names>
</name>
</person-group>. <article-title>Multivariate analysis and geochemical signatures of groundwater in the agricultural dominated taluks of Jalandhar district, Punjab, India</article-title>. <source>J Geochem Explor</source>. (<year>2020</year>) <volume>208</volume>:<elocation-id>106395</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.gexplo.2019.106395</pub-id>
</citation>
</ref>
<ref id="B57">
<label>57</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname> <given-names>L</given-names>
</name>
</person-group>. <article-title>Random forests</article-title>. <source>Random Forests</source>. (<year>2001</year>) <volume>45</volume>:<fpage>5</fpage>&#x2013;<lpage>32</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.4324/9781003109396-5</pub-id>
</citation>
</ref>
<ref id="B58">
<label>58</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prasad</surname> <given-names>AM</given-names>
</name>
<name>
<surname>Iverson</surname> <given-names>LR</given-names>
</name>
<name>
<surname>Liaw</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Newer classification and regression tree techniques: Bagging and random forests for ecological prediction</article-title>. <source>Ecosystems</source>. (<year>2006</year>) <volume>9</volume>:<page-range>181&#x2013;99</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10021-005-0054-1</pub-id>
</citation>
</ref>
<ref id="B59">
<label>59</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Vapnik</surname> <given-names>VN</given-names>
</name>
</person-group>. <source>The nature of statistical learning theory</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>1995</year>). doi:&#xa0;<pub-id pub-id-type="doi">10.1007/978-1-4757-2440-0</pub-id>
</citation>
</ref>
<ref id="B60">
<label>60</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gunn</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Support Vector Machines for classification and regression</article-title>. In: <source>ISIS Technical Report</source>, vol. <volume>14</volume>. (<year>1998</year>) (<publisher-loc>Southampton. U.K.</publisher-loc>: <publisher-name>Image Speech and Intelligent Systems Group</publisher-name>). doi:&#xa0;<pub-id pub-id-type="doi">10.1039/b918972f</pub-id>
</citation>
</ref>
<ref id="B61">
<label>61</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kecman</surname> <given-names>V</given-names>
</name>
</person-group>. <article-title>Support Vector Machines: Theory and Applications</article-title>. In: <source>Springer Science &amp; Business Media</source>, vol. <volume>177</volume>. (<year>2005</year>) (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>). Available at: <uri xlink:href="https://books.google.com/books?hl=en&amp;lr=&amp;id=uTzMPJjVjsMC&amp;oi=fnd&amp;pg=PA1&amp;dq=support+vector+machines&amp;ots=GFAK9w2Hfb&amp;sig=4AddZM1BrpsopEIiErlIzeys6zI">https://books.google.com/books?hl=en&amp;lr=&amp;id=uTzMPJjVjsMC&amp;oi=fnd&amp;pg=PA1&amp;dq=support+vector+machines&amp;ots=GFAK9w2Hfb&amp;sig=4AddZM1BrpsopEIiErlIzeys6zI</uri>.</citation>
</ref>
<ref id="B62">
<label>62</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ceryan</surname> <given-names>N</given-names>
</name>
<name>
<surname>Ozkat</surname> <given-names>EC</given-names>
</name>
<name>
<surname>Korkmaz Can</surname> <given-names>N</given-names>
</name>
<name>
<surname>Ceryan</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Machine learning models to estimate the elastic modulus of weathered magmatic rocks</article-title>. <source>Environ Earth Sci</source>. (<year>2021</year>) <volume>80</volume>:<fpage>448</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s12665-021-09738-9</pub-id>
</citation>
</ref>
<ref id="B63">
<label>63</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname> <given-names>T</given-names>
</name>
<name>
<surname>He</surname> <given-names>T</given-names>
</name>
<name>
<surname>Benesty</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>XGBoost : eXtreme gradient boosting</article-title>. <source>R Package version 0.71-2</source>. (<year>2018</year>). doi:&#xa0;<pub-id pub-id-type="doi">10.1145/2939672.2939785</pub-id>
</citation>
</ref>
<ref id="B64">
<label>64</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Haykin</surname> <given-names>S</given-names>
</name>
</person-group>. <source>Neural networks: a comprehensive foundation</source>. <publisher-loc>United States</publisher-loc>: <publisher-name>Prentice-Hall</publisher-name> (<year>1999</year>).</citation>
</ref>
<ref id="B65">
<label>65</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Osman</surname> <given-names>AIA</given-names>
</name>
<name>
<surname>Ahmed</surname> <given-names>AN</given-names>
</name>
<name>
<surname>Chow</surname> <given-names>MF</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>YF</given-names>
</name>
<name>
<surname>El-Shafie</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia</article-title>. <source>Ain Shams Eng J</source>. (<year>2021</year>) <volume>12</volume>:<page-range>1545&#x2013;56</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.asej.2020.11.011</pub-id>
</citation>
</ref>
<ref id="B66">
<label>66</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname> <given-names>G</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>QY</given-names>
</name>
<name>
<surname>Siew</surname> <given-names>CK</given-names>
</name>
</person-group>. <article-title>Extreme learning machine: Theory and applications</article-title>. <source>Neurocomputing</source>. (<year>2006</year>) <volume>70</volume>:<fpage>489</fpage>&#x2013;<lpage>501</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.neucom.2005.12.126</pub-id>
</citation>
</ref>
<ref id="B67">
<label>67</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ali</surname> <given-names>S</given-names>
</name>
<name>
<surname>Li</surname> <given-names>J</given-names>
</name>
<name>
<surname>Pei</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Aslam</surname> <given-names>MS</given-names>
</name>
<name>
<surname>Shaukat</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Azeem</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>An effective and improved cnn-elm classifier for handwritten digits recognition and classification</article-title>. <source>Symmetry</source>. (<year>2020</year>) <volume>12</volume>:<fpage>1</fpage>&#x2013;<lpage>15</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/sym12101742</pub-id>
</citation>
</ref>
<ref id="B68">
<label>68</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname> <given-names>X</given-names>
</name>
<name>
<surname>Luo</surname> <given-names>M</given-names>
</name>
<name>
<surname>Jin</surname> <given-names>H</given-names>
</name>
</person-group>. <article-title>Application of improved ELM algorithm in the prediction of earthquake casualties</article-title>. <source>PloS One</source>. (<year>2020</year>) <volume>15</volume>:<elocation-id>e0235236</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1371/journal.pone.0235236</pub-id>
</citation>
</ref>
<ref id="B69">
<label>69</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Govindarajan</surname> <given-names>S</given-names>
</name>
<name>
<surname>Swaminathan</surname> <given-names>R</given-names>
</name>
</person-group>. <article-title>Extreme Learning Machine based Differentiation of Pulmonary Tuberculosis in Chest Radiographs using Integrated Local Feature Descriptors</article-title>. <source>Comput Methods Progr Biomed</source>. (<year>2021</year>) <volume>204</volume>:<elocation-id>106058</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.cmpb.2021.106058</pub-id>
</citation>
</ref>
<ref id="B70">
<label>70</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meshram</surname> <given-names>SG</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>VP</given-names>
</name>
<name>
<surname>Kisi</surname> <given-names>O</given-names>
</name>
<name>
<surname>Karimi</surname> <given-names>V</given-names>
</name>
<name>
<surname>Meshram</surname> <given-names>C</given-names>
</name>
</person-group>. <article-title>Application of artificial neural networks, support vector machine and multiple model-ANN to sediment yield prediction</article-title>. <source>Water Resour Manage</source>. (<year>2020</year>) <volume>34</volume>:<page-range>4561&#x2013;75</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s11269-020-02672-8</pub-id>
</citation>
</ref>
<ref id="B71">
<label>71</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cherkassky</surname> <given-names>V</given-names>
</name>
<name>
<surname>Krasnopolsky</surname> <given-names>V</given-names>
</name>
<name>
<surname>Solomatine</surname> <given-names>DP</given-names>
</name>
<name>
<surname>Valdes</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Computational intelligence in earth sciences and environmental applications: Issues and challenges</article-title>. <source>Neural Networks</source>. (<year>2006</year>) <volume>19</volume>:<page-range>113&#x2013;21</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.neunet.2006.01.001</pub-id>
</citation>
</ref>
<ref id="B72">
<label>72</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malekmohamadi</surname> <given-names>I</given-names>
</name>
<name>
<surname>Bazargan-Lari</surname> <given-names>MR</given-names>
</name>
<name>
<surname>Kerachian</surname> <given-names>R</given-names>
</name>
<name>
<surname>Nikoo</surname> <given-names>MR</given-names>
</name>
<name>
<surname>Fallahnia</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Evaluating the efficacy of SVMs, BNs, ANNs and ANFIS in wave height prediction</article-title>. <source>Ocean Eng</source>. (<year>2011</year>) <volume>38</volume>:<page-range>487&#x2013;97</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.oceaneng.2010.11.020</pub-id>
</citation>
</ref>
<ref id="B73">
<label>73</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maiti</surname> <given-names>S</given-names>
</name>
<name>
<surname>Erram</surname> <given-names>VC</given-names>
</name>
<name>
<surname>Gupta</surname> <given-names>G</given-names>
</name>
<name>
<surname>Tiwari</surname> <given-names>RK</given-names>
</name>
<name>
<surname>Kulkarni</surname> <given-names>UD</given-names>
</name>
<name>
<surname>Sangpal</surname> <given-names>RR</given-names>
</name>
</person-group>. <article-title>Assessment of groundwater quality: A fusion of geochemical and geophysical information via Bayesian neural networks</article-title>. <source>Environ Monit Assess</source>. (<year>2013</year>) <volume>185</volume>:<page-range>3445&#x2013;65</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10661-012-2802-y</pub-id>
</citation>
</ref>
<ref id="B74">
<label>74</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kisi</surname> <given-names>O</given-names>
</name>
<name>
<surname>Tombul</surname> <given-names>M</given-names>
</name>
<name>
<surname>Kermani</surname> <given-names>MZ</given-names>
</name>
</person-group>. <article-title>Modeling soil temperatures at different depths by using three different neural computing techniques</article-title>. <source>Theor Appl Climatol</source>. (<year>2015</year>) <volume>121</volume>:<page-range>377&#x2013;87</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00704-014-1232-x</pub-id>
</citation>
</ref>
<ref id="B75">
<label>75</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maiti</surname> <given-names>S</given-names>
</name>
<name>
<surname>Gupta</surname> <given-names>G</given-names>
</name>
<name>
<surname>Erram</surname> <given-names>VC</given-names>
</name>
<name>
<surname>Tiwari</surname> <given-names>RK</given-names>
</name>
</person-group>. <article-title>Inversion of schlumberger resistivity sounding data from the critically dynamic Koyna region using the hybrid Monte Carlo-based neural network approach</article-title>. <source>Nonlinear Processes Geophys</source>. (<year>2011</year>) <volume>18</volume>:<page-range>179&#x2013;92</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.5194/npg-18-179-2011</pub-id>
</citation>
</ref>
<ref id="B76">
<label>76</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Draper</surname> <given-names>NR</given-names>
</name>
</person-group>. <article-title>The box-wetz criterion versus R2</article-title>. <source>J R Stat Soc</source>. (<year>1984</year>) <volume>147</volume>:<page-range>100&#x2013;3</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.2307/2981740</pub-id>
</citation>
</ref>
<ref id="B77">
<label>77</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sammut</surname> <given-names>C</given-names>
</name>
<name>
<surname>Webb</surname> <given-names>G</given-names>
</name>
</person-group>. <article-title>Mean Absolute Error</article-title>. In: <source>Encyclopedia of Machine Learning</source>, vol. <volume>652</volume>. (<year>2010</year>) (<publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Springer</publisher-name>). doi:&#xa0;<pub-id pub-id-type="doi">10.1007/978-1-4899-7687-1_953</pub-id>
</citation>
</ref>
<ref id="B78">
<label>78</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Freeze</surname> <given-names>RA</given-names>
</name>
<name>
<surname>Cherry</surname> <given-names>JA</given-names>
</name>
</person-group>. <article-title>Groundwater</article-title>. (<year>1979</year>). Available at: <uri xlink:href="https://www.un-igrac.org/sites/default/files/resources/files/Groundwater%25%0A20book%2520-%2520English.pdf">https://www.un-igrac.org/sites/default/files/resources/files/Groundwater%25%0A20book%2520-%2520English.pdf</uri>.</citation>
</ref>
<ref id="B79">
<label>79</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rajasekaran</surname> <given-names>S</given-names>
</name>
<name>
<surname>Gayathri</surname> <given-names>S</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>TL</given-names>
</name>
</person-group>. <article-title>Support vector regression methodology for storm surge predictions</article-title>. <source>Ocean Eng</source>. (<year>2008</year>) <volume>35</volume>:<page-range>1578&#x2013;87</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.oceaneng.2008.08.004</pub-id>
</citation>
</ref>
<ref id="B80">
<label>80</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname> <given-names>KP</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>SD</given-names>
</name>
</person-group>. <article-title>Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space</article-title>. <source>Pattern Recogn</source>. (<year>2009</year>) <volume>42</volume>:<page-range>710&#x2013;7</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.patcog.2008.08.030</pub-id>
</citation>
</ref>
<ref id="B81">
<label>81</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amirmojahedi</surname> <given-names>M</given-names>
</name>
<name>
<surname>Mohammadi</surname> <given-names>K</given-names>
</name>
<name>
<surname>Shamshirband</surname> <given-names>S</given-names>
</name>
<name>
<surname>Seyed Danesh</surname> <given-names>A</given-names>
</name>
<name>
<surname>Mostafaeipour</surname> <given-names>A</given-names>
</name>
<name>
<surname>Kamsin</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>A hybrid computational intelligence method for predicting dew point temperature</article-title>. <source>Environ Earth Sci</source>. (<year>2016</year>) <volume>75</volume>:<fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s12665-015-5135-7</pub-id>
</citation>
</ref>
<ref id="B82">
<label>82</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beyene</surname> <given-names>J</given-names>
</name>
<name>
<surname>Atenafu</surname> <given-names>EG</given-names>
</name>
<name>
<surname>Hamid</surname> <given-names>JS</given-names>
</name>
<name>
<surname>To</surname> <given-names>T</given-names>
</name>
<name>
<surname>Sung</surname> <given-names>L</given-names>
</name>
</person-group>. <article-title>Determining relative importance of variables in developing and validating predictive models</article-title>. <source>BMC Med Res Method</source>. (<year>2009</year>) <volume>9</volume>:<fpage>64</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/1471-2288-9-64</pub-id>
</citation>
</ref>
<ref id="B83">
<label>83</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>G&#xfc;ltekin</surname> <given-names>B</given-names>
</name>
<name>
<surname>Sakar</surname> <given-names>BE</given-names>
</name>
</person-group>. (<year>2018</year>). <article-title>Variable importance analysis in default prediction using machine learning techniques</article-title>, in: <conf-name>DATA 2018 - Proceedings of the 7th International Conference on Data Science, Technology and Applications</conf-name>, <publisher-loc>Portugal</publisher-loc>. pp. <fpage>56</fpage>&#x2013;<lpage>62</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.5220/0006872400560062</pub-id>
</citation>
</ref>
<ref id="B84">
<label>84</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paikaray</surname> <given-names>S</given-names>
</name>
<name>
<surname>Chander</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Geochemical variations in uranium and fluoride enriched saline groundwater around a semi-arid region of SW Punjab, India</article-title>. <source>Appl Geochem</source>. (<year>2022</year>) <volume>136</volume>:<fpage>105167</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.apgeochem.2021.105167</pub-id>
</citation>
</ref>
<ref id="B85">
<label>85</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname> <given-names>C</given-names>
</name>
<name>
<surname>Mahajan</surname> <given-names>A</given-names>
</name>
<name>
<surname>Kumar Garg</surname> <given-names>U</given-names>
</name>
</person-group>. <article-title>Fluoride and nitrate in groundwater of south-western Punjab, India&#x2014;occurrence, distribution and statistical analysis</article-title>. <source>Desalin Water Treat</source>. (<year>2016</year>) <volume>57</volume>:<page-range>3928&#x2013;39</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/19443994.2014.989415</pub-id>
</citation>
</ref>
<ref id="B86">
<label>86</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohapatra</surname> <given-names>M</given-names>
</name>
<name>
<surname>Anand</surname> <given-names>S</given-names>
</name>
<name>
<surname>Mishra</surname> <given-names>BK</given-names>
</name>
<name>
<surname>Giles</surname> <given-names>DE</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>Review of fluoride removal from drinking water</article-title>. <source>J Environ Manage</source>. (<year>2009</year>) <volume>91</volume>:<fpage>67</fpage>&#x2013;<lpage>77</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.jenvman.2009.08.015</pub-id>
</citation>
</ref>
<ref id="B87">
<label>87</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Rishi</surname> <given-names>MS</given-names>
</name>
<name>
<surname>Keesari</surname> <given-names>T</given-names>
</name>
</person-group>. <article-title>Evaluation of groundwater quality and suitability for irrigation and drinking purposes in southwest Punjab, India using hydrochemical approach</article-title>. <source>Appl Water Sci</source>. (<year>2017</year>) <volume>7</volume>:<page-range>3137&#x2013;50</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s13201-016-0456-6</pub-id>
</citation>
</ref>
<ref id="B88">
<label>88</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chander</surname> <given-names>S</given-names>
</name>
<name>
<surname>Paikaray</surname> <given-names>S</given-names>
</name>
<name>
<surname>Bansal</surname> <given-names>S</given-names>
</name>
<name>
<surname>Sharma</surname> <given-names>K</given-names>
</name>
<name>
<surname>Dhiman</surname> <given-names>D</given-names>
</name>
<name>
<surname>Deshpande</surname> <given-names>RD</given-names>
</name>
</person-group>. <article-title>&#x3b4;18O and &#x3b4;2H isotopes, trace metals and major ions in groundwater around uranium and fluoride contaminated Indus valley Quaternary alluvial plain, SW Punjab, India: Implications on hydrogeochemical processes, irrigation use and source</article-title>. <source>Appl Geochem</source>. (<year>2023</year>) <volume>152</volume>:<elocation-id>105652</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.apgeochem.2023.105652</pub-id>
</citation>
</ref>
<ref id="B89">
<label>89</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Frencken</surname> <given-names>JE</given-names>
</name>
</person-group>. <source>Endemic Fluorosis in Developing Countries: Causes, Effects, and Possible Solutions</source>. (<year>1992</year>) (<publisher-loc>Netherlands</publisher-loc>: <publisher-name>NIPG-TNO</publisher-name>). pp. <fpage>1</fpage>&#x2013;<lpage>50</lpage>.</citation>
</ref>
<ref id="B90">
<label>90</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aull&#xf3;n Alcaine</surname> <given-names>A</given-names>
</name>
<name>
<surname>Schulz</surname> <given-names>C</given-names>
</name>
<name>
<surname>Bundschuh</surname> <given-names>J</given-names>
</name>
<name>
<surname>Jacks</surname> <given-names>G</given-names>
</name>
<name>
<surname>Thunvik</surname> <given-names>R</given-names>
</name>
<name>
<surname>Gustafsson</surname> <given-names>JP</given-names>
</name>
<etal/>
</person-group>. <article-title>Hydrogeochemical controls on the mobility of arsenic, fluoride and other geogenic co-contaminants in the shallow aquifers of northeastern La Pampa Province in Argentina</article-title>. <source>Sci Total Environ</source>. (<year>2020</year>) <volume>715</volume>:<elocation-id>136671</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2020.136671</pub-id>
</citation>
</ref>
<ref id="B91">
<label>91</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kom</surname> <given-names>KP</given-names>
</name>
<name>
<surname>Gurugnanam</surname> <given-names>B</given-names>
</name>
<name>
<surname>Bairavi</surname> <given-names>S</given-names>
</name>
<name>
<surname>Chidambaram</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Sources and geochemistry of high fluoride groundwater in hard rock aquifer of the semi-arid region. A special focus on human health risk assessment</article-title>. <source>Total Environ Res Themes</source>. (<year>2023</year>) <volume>5</volume>:<elocation-id>100026</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.totert.2023.100026</pub-id>
</citation>
</ref>
<ref id="B92">
<label>92</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Minns</surname> <given-names>AW</given-names>
</name>
<name>
<surname>Hall</surname> <given-names>MJ</given-names>
</name>
</person-group>. <article-title>Artificial neural networks as rainfall-runoff models</article-title>. <source>Hydrol Sci J</source>. (<year>1996</year>) <volume>41</volume>:<fpage>399</fpage>&#x2013;<lpage>417</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/02626669609491511</pub-id>
</citation>
</ref>
<ref id="B93">
<label>93</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Al-Mukhtar</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Random forest, support vector machine, and neural networks to modelling suspended sediment in Tigris River-Baghdad</article-title>. <source>Environ Monit Assess</source>. (<year>2019</year>) <volume>191</volume>:<fpage>673</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10661-019-7821-5</pub-id>
</citation>
</ref>
<ref id="B94">
<label>94</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bui</surname> <given-names>DT</given-names>
</name>
<name>
<surname>Tsangaratos</surname> <given-names>P</given-names>
</name>
<name>
<surname>Nguyen</surname> <given-names>VT</given-names>
</name>
<name>
<surname>Van Liem</surname> <given-names>N</given-names>
</name>
<name>
<surname>Trinh</surname> <given-names>PT</given-names>
</name>
</person-group>. <article-title>Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment</article-title>. <source>Catena</source>. (<year>2020</year>) <volume>188</volume>:<fpage>104426</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.catena.2019.104426</pub-id>
</citation>
</ref>
<ref id="B95">
<label>95</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ling</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Podgorski</surname> <given-names>J</given-names>
</name>
<name>
<surname>Sadiq</surname> <given-names>M</given-names>
</name>
<name>
<surname>Rasheed</surname> <given-names>H</given-names>
</name>
<name>
<surname>Eqani</surname> <given-names>SAMAS</given-names>
</name>
<name>
<surname>Berg</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Monitoring and prediction of high fluoride concentrations in groundwater in Pakistan</article-title>. <source>Sci Total Environ</source>. (<year>2022</year>) <volume>839</volume>:<fpage>156058</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2022.156058</pub-id>
</citation>
</ref>
<ref id="B96">
<label>96</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sarkar</surname> <given-names>S</given-names>
</name>
<name>
<surname>Mukherjee</surname> <given-names>A</given-names>
</name>
<name>
<surname>Chakraborty</surname> <given-names>M</given-names>
</name>
<name>
<surname>Quamar</surname> <given-names>MT</given-names>
</name>
<name>
<surname>Duttagupta</surname> <given-names>S</given-names>
</name>
<name>
<surname>Bhattacharya</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Prediction of elevated groundwater fluoride across India using multi-model approach: insights on the influence of geologic and environmental factors</article-title>. <source>Environ Sci Pollut Res</source>. (<year>2022</year>) <volume>30</volume>:<page-range>31998&#x2013;2013</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s11356-022-24328-3</pub-id>
</citation>
</ref>
<ref id="B97">
<label>97</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Knoll</surname> <given-names>L</given-names>
</name>
<name>
<surname>Breuer</surname> <given-names>L</given-names>
</name>
<name>
<surname>Bach</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning</article-title>. <source>Sci Total Environ</source>. (<year>2019</year>) <volume>668</volume>:<page-range>1317&#x2013;27</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2019.03.045</pub-id>
</citation>
</ref>
<ref id="B98">
<label>98</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khosravi</surname> <given-names>K</given-names>
</name>
<name>
<surname>Barzegar</surname> <given-names>R</given-names>
</name>
<name>
<surname>Miraki</surname> <given-names>S</given-names>
</name>
<name>
<surname>Adamowski</surname> <given-names>J</given-names>
</name>
<name>
<surname>Daggupati</surname> <given-names>P</given-names>
</name>
<name>
<surname>Alizadeh</surname> <given-names>MR</given-names>
</name>
<etal/>
</person-group>. <article-title>Stochastic modeling of groundwater fluoride contamination: introducing lazy learners</article-title>. <source>Groundwater</source>. (<year>2020</year>) <volume>58</volume>:<page-range>723&#x2013;34</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1111/gwat.12963</pub-id>
</citation>
</ref>
<ref id="B99">
<label>99</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>B</given-names>
</name>
<name>
<surname>Tang</surname> <given-names>L</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Zhao</surname> <given-names>B</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Visual tracking based on extreme learning machine and sparse representation</article-title>. <source>Sensors (Switzerland)</source>. (<year>2015</year>) <volume>15</volume>:<page-range>26877&#x2013;905</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/s151026877</pub-id>
</citation>
</ref>
<ref id="B100">
<label>100</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Lu</surname> <given-names>S</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>SH</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>YD</given-names>
</name>
</person-group>. <article-title>A review on extreme learning machine</article-title>. <source>Multimed Tools Appl</source>. (<year>2022</year>) <volume>81</volume>:<page-range>41611&#x2013;60</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s11042-021-11007-7</pub-id>
</citation>
</ref>
<ref id="B101">
<label>101</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heddam</surname> <given-names>S</given-names>
</name>
<name>
<surname>Kisi</surname> <given-names>O</given-names>
</name>
</person-group>. <article-title>Extreme learning machines: a new approach for modeling dissolved oxygen (DO) concentration with and without water quality variables as predictors</article-title>. <source>Environ Sci Pollut Res</source>. (<year>2017</year>) <volume>24</volume>:<page-range>16702&#x2013;24</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s11356-017-9283-z</pub-id>
</citation>
</ref>
<ref id="B102">
<label>102</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alizadeh</surname> <given-names>MJ</given-names>
</name>
<name>
<surname>Kavianpour</surname> <given-names>MR</given-names>
</name>
<name>
<surname>Danesh</surname> <given-names>M</given-names>
</name>
<name>
<surname>Adolf</surname> <given-names>J</given-names>
</name>
<name>
<surname>Shamshirband</surname> <given-names>S</given-names>
</name>
<name>
<surname>Chau</surname> <given-names>KW</given-names>
</name>
</person-group>. <article-title>Effect of river flow on the quality of estuarine and coastal waters using machine learning models</article-title>. <source>Eng Appl Comput Fluid Mech</source>. (<year>2018</year>) <volume>12</volume>:<page-range>810&#x2013;23</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/19942060.2018.1528480</pub-id>
</citation>
</ref>
<ref id="B103">
<label>103</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sekhar Roy</surname> <given-names>S</given-names>
</name>
<name>
<surname>Roy</surname> <given-names>R</given-names>
</name>
<name>
<surname>Balas</surname> <given-names>VE</given-names>
</name>
</person-group>. <article-title>Estimating heating load in buildings using multivariate adaptive regression splines, extreme learning machine, a hybrid model of MARS and ELM</article-title>. <source>Renewable Sustain Energy Rev</source>. (<year>2018</year>) <volume>82</volume>:<page-range>4256&#x2013;68</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.rser.2017.05.249</pub-id>
</citation>
</ref>
<ref id="B104">
<label>104</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>S</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>Feasibility analysis of extreme learning machine for predicting thermal conductivity of rocks</article-title>. <source>Environ Earth Sci</source>. (<year>2021</year>) <volume>80</volume>:<fpage>455</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s12665-021-09745-w</pub-id>
</citation>
</ref>
<ref id="B105">
<label>105</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chakraborty</surname> <given-names>M</given-names>
</name>
<name>
<surname>Sarkar</surname> <given-names>S</given-names>
</name>
<name>
<surname>Mukherjee</surname> <given-names>A</given-names>
</name>
<name>
<surname>Shamsudduha</surname> <given-names>M</given-names>
</name>
<name>
<surname>Ahmed</surname> <given-names>KM</given-names>
</name>
<name>
<surname>Bhattacharya</surname> <given-names>A</given-names>
</name>
<etal/>
</person-group>. <article-title>Modeling regional-scale groundwater arsenic hazard in the transboundary Ganges River Delta, India and Bangladesh: Infusing physically-based model with machine learning</article-title>. <source>Sci Total Environ</source>. (<year>2020</year>) <volume>748</volume>:<elocation-id>141107</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2020.141107</pub-id>
</citation>
</ref>
<ref id="B106">
<label>106</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Podgorski</surname> <given-names>J</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>R</given-names>
</name>
<name>
<surname>Chakravorty</surname> <given-names>B</given-names>
</name>
<name>
<surname>Polya</surname> <given-names>DA</given-names>
</name>
</person-group>. <article-title>Groundwater arsenic distribution in India by machine learning geospatial modeling</article-title>. <source>Int J Environ Res Public Health</source>. (<year>2020</year>) <volume>17</volume>:<fpage>1</fpage>&#x2013;<lpage>17</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/ijerph17197119</pub-id>
</citation>
</ref>
<ref id="B107">
<label>107</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mukherjee</surname> <given-names>A</given-names>
</name>
<name>
<surname>Sarkar</surname> <given-names>S</given-names>
</name>
<name>
<surname>Chakraborty</surname> <given-names>M</given-names>
</name>
<name>
<surname>Duttagupta</surname> <given-names>S</given-names>
</name>
<name>
<surname>Bhattacharya</surname> <given-names>A</given-names>
</name>
<name>
<surname>Saha</surname> <given-names>D</given-names>
</name>
<etal/>
</person-group>. <article-title>Occurrence, predictors and hazards of elevated groundwater arsenic across India through field observations and regional-scale AI-based modeling</article-title>. <source>Sci Total Environ</source>. (<year>2021</year>) <volume>759</volume>:<elocation-id>143511</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.scitotenv.2020.143511</pub-id>
</citation>
</ref>
<ref id="B108">
<label>108</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mondal</surname> <given-names>D</given-names>
</name>
<name>
<surname>Gupta</surname> <given-names>S</given-names>
</name>
<name>
<surname>Reddy</surname> <given-names>DV</given-names>
</name>
<name>
<surname>Nagabhushanam</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>Geochemical controls on fluoride concentrations in groundwater from alluvial aquifers of the Birbhum district, West Bengal, India</article-title>. <source>J Geochem Explor</source>. (<year>2014</year>) <volume>145</volume>:<fpage>190</fpage>&#x2013;<lpage>206</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.gexplo.2014.06.005</pub-id>
</citation>
</ref>
<ref id="B109">
<label>109</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alarc&#xf3;n-Herrera</surname> <given-names>MT</given-names>
</name>
<name>
<surname>Bundschuh</surname> <given-names>J</given-names>
</name>
<name>
<surname>Nath</surname> <given-names>B</given-names>
</name>
<name>
<surname>Nicolli</surname> <given-names>HB</given-names>
</name>
<name>
<surname>Gutierrez</surname> <given-names>M</given-names>
</name>
<name>
<surname>Reyes-Gomez</surname> <given-names>VM</given-names>
</name>
<etal/>
</person-group>. <article-title>Co-occurrence of arsenic and fluoride in groundwater of semi-arid regions in Latin America: Genesis, mobility and remediation</article-title>. <source>J Hazard Mater</source>. (<year>2013</year>) <volume>262</volume>:<page-range>960&#x2013;9</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.jhazmat.2012.08.005</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>