<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Water</journal-id>
<journal-title>Frontiers in Water</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Water</abbrev-journal-title>
<issn pub-type="epub">2624-9375</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frwa.2023.1244024</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Water</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>River reach-level machine learning estimation of nutrient concentrations in Great Britain</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Tso</surname> <given-names>Chak-Hau Michael</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/839333/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Magee</surname> <given-names>Eugene</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2353956/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Huxley</surname> <given-names>David</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x02020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Eastman</surname> <given-names>Michael</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Fry</surname> <given-names>Matthew</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02021;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>UK Centre for Ecology and Hydrology</institution>, <addr-line>Lancaster</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff2"><sup>2</sup><institution>Centre of Excellence for Environmental Data Science</institution>, <addr-line>Lancaster</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff3"><sup>3</sup><institution>UK Centre for Ecology and Hydrology</institution>, <addr-line>Wallingford</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff4"><sup>4</sup><institution>Formerly Data Science MSc Programme, School of Computing and Communications, Lancaster University</institution>, <addr-line>Lancaster</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff5"><sup>5</sup><institution>Met Office</institution>, <addr-line>Exeter</addr-line>, <country>United Kingdom</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Lorenzo Antonio Picos Corrales, Universidad Aut&#x000F3;noma de Sinaloa, Mexico</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Ho-Rim Kim, Korea Institute of Geoscience and Mineral Resources, Republic of Korea; Wangshou Zhang, Nanjing Institute of Geography and Limnology (CAS), China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Chak-Hau Michael Tso <email>mtso&#x00040;ceh.ac.uk</email></corresp>
<fn fn-type="present-address" id="fn001"><p>&#x02020;Present address: David Huxley, Department of Mathematics, The University of Manchester, Alan Turing Building, Manchester, United Kingdom</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02021;ORCID: Matthew Fry <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0003-1142-4039">orcid.org/0000-0003-1142-4039</ext-link></p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>09</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>5</volume>
<elocation-id>1244024</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>06</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>08</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Tso, Magee, Huxley, Eastman and Fry.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Tso, Magee, Huxley, Eastman and Fry</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Nitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels in rivers are typically calculated at station or grid (&#x0003E;1 km) scale; therefore, it is difficult to visualise the evolution of water quality as water travels downstream. Using a high-resolution reach-scale river network and associating each reach with land cover fractions and catchment descriptors, we trained random forest models on aggregated data (2010&#x02013;2020) from the Environmental Agency Open Water Quality Data Archive for 2,343 stations to predict long-term nitrate and orthophosphate concentrations at each river reach in Great Britain (GB). We separated the model training and predictions for different seasons to investigate the potential difference in feature importance. Our model predicted concentrations with an average testing coefficient of determination (<italic>R</italic><sup>2</sup>) of 0.71 for nitrate and 0.58 for orthophosphate using 5-fold cross-validation. Our model showed slightly better performance for higher Strahler stream orders, highlighting the challenges of making predictions in small streams. Our results revealed that arable and horticultural land use is the strongest and most reliable predictor for nitrate, while floodplain extents and standard percentage runoff are stronger predictors for orthophosphate. Nationally, higher orthophosphate concentrations were observed in urbanised areas. This study shows how combining a river network model with machine learning can easily provide a river network understanding of the spatial distribution of water quality levels.</p></abstract>
<kwd-group>
<kwd>river network</kwd>
<kwd>machine learning</kwd>
<kwd>nutrients</kwd>
<kwd>water quality</kwd>
<kwd>random forest</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="5"/>
<equation-count count="4"/>
<ref-count count="73"/>
<page-count count="18"/>
<word-count count="11890"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Environmental Water Quality</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Highlights</title>
<list list-type="simple">
<list-item><p>- A method to map point water quality observations to river reaches is developed.</p></list-item>
<list-item><p>- Catchment descriptors and land covers are mapped to reaches and used as input features.</p></list-item>
<list-item><p>- Random forest models perform well for nitrate and orthophosphate over Great Britain.</p></list-item>
</list></sec>
<sec id="s2">
<title>1. Introduction</title>
<p>Anthropogenic demands for food, energy, and raw materials have reshaped the abundance and recycling of nitrogen (N) and phosphorus (P). Excess N and P input to the landscape can contaminate drinking water supplies and accelerate eutrophication, though they are important nutrients for plant and algal growth. To meet the UN Sustainable Development Goals, it has been argued that we must eliminate nutrient overuse and still allow a 30% increase in the production of major cereals (Mueller et al., <xref ref-type="bibr" rid="B41">2012</xref>).</p>
<p>Reliable prediction and modelling underpin water quality management practises. However, the abundances of N and P in rivers are controlled by multiple factors, and often, the physical process is not easily observed. Therefore, it is challenging to model N and P distribution for large areas or at high frequencies using physically based process models. The mapping of nutrients in rivers has traditionally been performed using statistical models. Because data are sparse, many early efforts focussed on statistical modelling in small catchments using methods such as general linear models and non-linear estimation models (Howden and Burt, <xref ref-type="bibr" rid="B26">2009</xref>) or generalised additive models (Morton and Henderson, <xref ref-type="bibr" rid="B40">2008</xref>; Yang and Moyer, <xref ref-type="bibr" rid="B70">2020</xref>). These aimed to provide a robust regression for the estimation of non-linear trends in water quality in the presence of potentially correlated errors.</p>
<p>A very different type of modelling framework is the Source Apportionment Geographical Information System (SAGIS), which uses readily available national datasets to estimate concentrations of nutrients, among other chemicals, from multiple sector sources (Comber et al., <xref ref-type="bibr" rid="B12">2013</xref>). Concentrations and loads are modelled using the Environment Agency&#x00027;s catchment river model, SIMCAT, at the locations of model features or every 1 km along each river, taking into account all upstream sources and user defined river losses. Similarly, the GREEN model is a simple three-parameter statistical model for source apportionment of riverine nutrient loads and has been applied widely across the European Union (Grizzetti et al., <xref ref-type="bibr" rid="B23">2005</xref>). Finally, process-based models are also used to simulate the chemical and biological status of river networks (Evans et al., <xref ref-type="bibr" rid="B18">2006</xref>). For example, INCA is a semi-distributed catchment model that is widely used in the UK and globally and can account for diffuse and point sources of pollution, land use change, and climate change (Whitehaed et al., <xref ref-type="bibr" rid="B65">1998</xref>). This is done by accounting for all input sources and driving data and accounting for the process pathways in different compartments (e.g., soil horizon, groundwater zone, in-stream water column, streambed, and sediments).</p>
<p>Recently, machine learning methods have been increasingly applied to water quality predictions (see review by Najah Ahmed et al., <xref ref-type="bibr" rid="B42">2019</xref>). While a small number of studies focus on classifying waters into discrete classes (O&#x00027;Sullivan et al., <xref ref-type="bibr" rid="B47">2022</xref>), most research seeks to predict quantitative values using regression. An important distinction within water quality machine learning applications is that while some focus on high-frequency predictions (i.e., daily or sub-daily), others focus on long-term or seasonal predictions. The former focuses on capturing the rapid dynamics of the system in response to changes in input variables and could be used for near real-time monitoring and early warning systems, while the latter is often applied to a large area and seeks to improve understanding of the key controls of overall water quality trends. For the first group, examples include Xu et al. (<xref ref-type="bibr" rid="B67">2021</xref>), who compared eight machine learning regressions to predict total nitrogen (TN) in the Lianjiang River basin, Guangdong, China; Granata et al. (<xref ref-type="bibr" rid="B22">2017</xref>), who compared support vector machines and regression trees to predict wastewater quality from surrogate variables and training data from the US National Stormwater Quality Database; and Ahmed et al. (<xref ref-type="bibr" rid="B1">2019</xref>), who compared the use of 15 supervised machine learning methods for water quality index predictions. Examples of the second group include the use of random forest modelling to explore the relationships between stream N and watershed features, climate, and N input rates at nearly 5,000 US watersheds (Lin J. et al., <xref ref-type="bibr" rid="B34">2021</xref>). In another study (Frei et al., <xref ref-type="bibr" rid="B19">2021</xref>), the importance of land use and land cover for lake vs. stream on water quality were compared using four machine learning methods. Bhattarai et al. (<xref ref-type="bibr" rid="B6">2021</xref>) used ML algorithms to predict nitrate and total phosphorus for five watersheds of different types draining into Lake Erie, while Shen et al. (<xref ref-type="bibr" rid="B55">2020</xref>) estimated seasonal TN and total phosphate (TP) maps at 30 arc-second (&#x0007E;1 km) spatial resolution using 47 global gridded environmental variables and the random forest (RF) algorithm. For a review of machine learning paradigms in hydrology, see Zounemat-Kermani et al. (<xref ref-type="bibr" rid="B73">2021</xref>).</p>
<p>Most existing river modelling works are applied at point, pixel (usually 1 km or greater), or catchment scales. One common approach for modelling rivers is to model the entire area using a grid-based approach (typically at a resolution of 1 km or less) and just display the river pixels (e.g., Lane and Kay, <xref ref-type="bibr" rid="B33">2021</xref>). The use of river network graphs has emerged to improve the understanding of the physical properties of rivers and catchments as datasets of drainage and high-quality river graphs have become increasingly available (Demir and Szczepanek, <xref ref-type="bibr" rid="B16">2017</xref>; Giachetta and Willett, <xref ref-type="bibr" rid="B21">2018</xref>; Sarker et al., <xref ref-type="bibr" rid="B54">2019</xref>; Lin P. et al., <xref ref-type="bibr" rid="B35">2021</xref>). These graphs represent river networks as a series of connected lines and nodes and can better represent the evolution of water quality as chemicals are transported across the catchment. Flexible regression models have been successfully applied to the River Tweed catchment river network to model nitrate pollution, and it has provided valuable insight into changes in water quality in both space and time (O&#x00027;Donnell et al., <xref ref-type="bibr" rid="B45">2014</xref>). However, their method requires flows at each stream to be known to obtain flow-based distance for smoothing, which can be challenging for nationwide modelling or mapping of nutrient levels. In addition, statistical regression requires the selection of kernels for smoothing. A potential alternative option is the use of random forest modelling to model the levels of nutrients on a river network graph. It is also noteworthy that river flow directions extracted from river reach network graphs have been used as input for the statistical modelling of water quality (Smith et al., <xref ref-type="bibr" rid="B56">1997</xref>).</p>
<p>In this study, we present a modelling framework that maps nationwide water quality levels from point observations to the United Kingdom (UK) river network graph. This approach is motivated by the need to develop a flexible and easy-to-use approach to map point data to river reaches by incorporating readily available ancillary datasets. Specifically, we used random forest and input features that can be readily matched to the network graph. We used this modelling framework to address the following research questions:</p>
<list list-type="order">
<list-item><p>What are the most important drivers for predicting nitrate and orthophosphate variability?</p></list-item>
<list-item><p>What is the long-term seasonal distribution of nitrate and orthophosphate in each river reach in GB?</p></list-item>
<list-item><p>What is the reach-scale variability of the predicted concentrations?</p></list-item>
</list>
<p>Catchment descriptors and land covers are readily available for all river reaches in the UK, and to the best of our knowledge, these have not been used for water quality prediction. While Shen et al. (<xref ref-type="bibr" rid="B55">2020</xref>) used gridded input datasets to predict N and P at a 1 km grid, in our study, we trained and predicted concentrations at point locations (which are matched to river reaches) within the river network. The rest of the article is arranged as follows: the methods and data used are described in Section 2. We report and compare the performance of various machine learning methods in Section 3, followed by discussions and conclusions in Sections 4 and 5.</p></sec>
<sec id="s3">
<title>2. Methods and data</title>
<sec>
<title>2.1. Method overview</title>
<p>The overarching framework for the river reach-level machine learning water quality prediction described herein is as follows (<xref ref-type="fig" rid="F1">Figure 1</xref>):</p>
<list list-type="order">
<list-item><p>Obtain access to a digital river network graph.</p></list-item>
<list-item><p>Match and append water quality data and input variables (e.g., catchment characteristics and land cover) from different data sources to each reach of the river network graph.</p></list-item>
<list-item><p>Extract data tables from the river network graph (i.e., remove geographical information).</p></list-item>
<list-item><p>Perform machine learning training and predictions.</p></list-item>
<list-item><p>Match the predictions back to the river network graph for visualisation and evaluation.</p></list-item>
</list>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Overall framework for river reach-level machine learning predictions of water quality. Note that this framework is statistics-free and only involves the joining of data frames. It includes the following steps: (1) Input features and water quality observations are first matched to river reaches. (2) Spatial information is then removed to obtain a standard data frame so that standard machine learning methods can be used. (3) The spatial information is appended to the results for visualisation and post-analysis.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0001.tif"/>
</fig>
<p>Details of the data sources and machine learning methods used to demonstrate this method are given in the remainder of this section. Jupyter notebooks to reproduce our workflow in Python are available in Magee et al. (<xref ref-type="bibr" rid="B38">2023</xref>).</p></sec>
<sec>
<title>2.2. Data sources</title>
<sec>
<title>2.2.1. High-resolution river network graph for the UK</title>
<p>In our study, we subdivided our analysis based on 107 UK hydrometric areas (National River Flow Archive, <xref ref-type="bibr" rid="B44">2014</xref>). These 107 hydrometric areas were either integral catchments with a single outlet to the sea or tidal estuary, or they included several river catchments having topographical similarity with separate tidal outlets. We also used a UK reach-level river network digitised from OS mapping at a 1:50,000 scale (Fry et al., <xref ref-type="bibr" rid="B20">2000</xref>). Canals and other artificial water bodies were removed, and the flow paths through lakes were represented by centrelines. Rivers stretches contain connectivity information, but this was not explicitly made use of in this study. Most river stretches in the network represented the entire line between confluences and included bifurcations. The river network graph also included information such as length, identifier, and name of the parent river (for larger rivers), hydrometric area, and the Strahler and Shreve stream order for each reach.</p></sec>
<sec>
<title>2.2.2. UK Environment Agency (EA) water quality data</title>
<p>The Environment Agency maintains water quality monitoring data for a multitude of water sampling sites throughout England for a range of water body types from coastal or estuarine waters, rivers, lakes, ponds, canals, or ground waters in the Water Quality Archive (WQA, <ext-link ext-link-type="uri" xlink:href="https://environment.data.gov.uk/water-quality/view/landing">https://environment.data.gov.uk/water-quality/view/landing</ext-link>). Readings are taken for a variety of purposes, including compliance assessments against discharge permits, environmental monitoring, as well as investigations for pollution incidents. The WQA only contains complete samples where all analyses have been completed. The data analysed for this study was accessed on 23 June 2021.</p>
<p>We extracted the data from WQA for 2010&#x02013;2020 for &#x0201C;orthophosphate, reactive as P&#x0201D; and &#x0201C;nitrate as N.&#x0201D; These datasets were further filtered to consider only sample material types from rivers or running surface water bodies. We then aggregated the data by taking the mean for each season for each year at each sampling location contained within the datasets (i.e., winter: December, January, and February; spring: March, April, and May; summer: June, July, and August; autumn: September, October, and November). Note that the typical sampling interval for nitrates and orthophosphates varies considerably but is roughly between biweekly and monthly. However, it is not uncommon that at some sites, there may be periods of more than 8 weeks between samples. To minimise the effects of outliers, we used only the middle 95% of the data. The modelling was performed on log-transformed nitrate and orthophosphate data.</p></sec>
<sec>
<title>2.2.3. Catchment descriptors and Land Cover Map</title>
<p>Unlike some of the studies mentioned earlier, we used physical descriptions of the catchment area to aid with predictions for the machine learning models in this study. The UK Centre for Ecology and Hydrology (CEH) develops and maintains a number of catchment descriptor datasets to inform its UK freshwater research. Catchment descriptors are available on a gridded representation of the UK at 50 m resolution&#x02014;the CEH Integrated Hydrological Digital Terrain Model, IHDTM (Morris and Flavin, <xref ref-type="bibr" rid="B39">1990</xref>), where the values for each cell represent the catchment upstream of that cell. Cells are connected using topographical information but also include information from mapped contours and the digital river network to ensure consistency where mapped surface water bodies are present. The Flood Estimation Handbook (FEH) provides a dataset of landform catchment descriptors for every grid cell with a catchment area &#x0003E; 0.5 km<sup>2</sup>. Further catchment descriptors, including land cover fractions from the UK Land Cover Map 2015 (Rowland et al., <xref ref-type="bibr" rid="B53">2017</xref>), are maintained and provided at the same scale by the National River Flow Archive. For this study, the FEH descriptors and catchment land cover statistics were extracted for each individual river reach within the digital river network. A representative IHDTM grid cell was identified for each reach as that closest to the 70th percentile by area of all grid cells intersecting with the reach river line to maximise the likelihood that the cell correctly lies on the river stretch. All catchment descriptors were from the raster datasets and stored alongside the river reach attributes.</p>
<p>The full list of FEH descriptors and land cover input features initially considered for machine learning algorithms are described in <xref ref-type="table" rid="T1">Table 1</xref>. The full and final features chosen for the final model are discussed in Section 3. The FEH descriptors were then matched to the river stretch ID for the Environment Agency&#x00027;s phosphate and nitrate readings.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Description of catchment descriptors used as input features for the construction of water quality machine learning models.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Descriptor code</bold></th>
<th valign="top" align="left"><bold>Description</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">CCAR</td>
<td valign="top" align="left">Catchment drainage area (km2), derived from the IHDTM.</td>
</tr> <tr>
<td valign="top" align="left">HGHT</td>
<td valign="top" align="left">An estimate of the depth of precipitation for some specified duration by and frequency or recurrence interval.</td>
</tr> <tr>
<td valign="top" align="left">QALT</td>
<td valign="top" align="left">Mean catchment altitude (m above sea level), derived from the IHDTM.</td>
</tr> <tr>
<td valign="top" align="left">QASB</td>
<td valign="top" align="left">Index representing the invariability in aspect of catchment slopes (&#x000B0;).</td>
</tr> <tr>
<td valign="top" align="left">QASV</td>
<td valign="top" align="left">Index representing the dominant aspect of catchment slopes (&#x000B0;).</td>
</tr> <tr>
<td valign="top" align="left">QBFI</td>
<td valign="top" align="left">A base flow index is a measure of catchment responsiveness derived using the 29-class Hydrology Of Soil Types (HOST) classicationREF2.</td>
</tr> <tr>
<td valign="top" align="left">QDPB</td>
<td valign="top" align="left">Mean of distances between each node on the IHDTM grid and the catchment outlet, in kilometres.</td>
</tr> <tr>
<td valign="top" align="left">QDPS</td>
<td valign="top" align="left">This landform descriptor (mean Drainage Path Slope) provides an index of overall catchment steepness. It was developed for the Flood Estimation Handbook and is calculated as the mean of all inter-nodal slopes (derived using the IHDTM) for the catchment.</td>
</tr> <tr>
<td valign="top" align="left">QFAR</td>
<td valign="top" align="left">The Flood Attenuation by Reservoirs and Lakes (FARL) index, developed for the Flood Estimation Handbook, provides a guide to the degree of flood attenuation attributable to reservoirs and lakes in the catchment Values close to unity indicate the absence of attenuation due to lakes and reservoirs whereas index values below 0.8 indicate a substantial influence on flood response.</td>
</tr> <tr>
<td valign="top" align="left">QFPD</td>
<td valign="top" align="left">The mean depth of water on floodplains in a 100-year event.</td>
</tr> <tr>
<td valign="top" align="left">QFPX</td>
<td valign="top" align="left">The floodplain extent is defined as the fraction of the catchment that is estimated to be inundated by a 100-year flood.</td>
</tr> <tr>
<td valign="top" align="left">QFPL</td>
<td valign="top" align="left">The location of floodplains within the catchment is described using the same principles employed to derive values of the FEH index URBLOC.</td>
</tr> <tr>
<td valign="top" align="left">QLDP</td>
<td valign="top" align="left">Longest drainage path (in kilometres), defined by recording the greatest distance from a catchment node to the defined outlet.</td>
</tr> <tr>
<td valign="top" align="left">QPRW</td>
<td valign="top" align="left">This catchment wetness index (PROPortion of time soils are WET), developed for the Flood Estimation Handbook, provides a measure of the proportion of time that catchment soils are defined as wet. PROPWET values range from over 80% in the wettest catchments to less than 20% in the driest parts of the country.</td>
</tr> <tr>
<td valign="top" align="left">QS47</td>
<td valign="top" align="left">Average annual rainfall in the standard period (1941&#x02013;1970) in millimetres.</td>
</tr> <tr>
<td valign="top" align="left">QS69</td>
<td valign="top" align="left">Average annual rainfall in the standard period (1961&#x02013;1990) in millimetres.</td>
</tr> <tr>
<td valign="top" align="left">QSPR</td>
<td valign="top" align="left">Standard percentage runoff (%) associated with each HOST soil class.</td>
</tr> <tr>
<td valign="top" align="left">QUCO</td>
<td valign="top" align="left">Index of the location of urban and suburban land cover in 1990 expressed as a fraction.</td>
</tr>
<tr>
<td valign="top" align="left">QUEX</td>
<td valign="top" align="left">Index of urban and suburban land cover in 1990 expressed as a fraction.</td>
</tr> <tr>
<td valign="top" align="left">QULO</td>
<td valign="top" align="left">Index of the location of urban and suburban land cover in 1990 expressed as a fraction.</td>
</tr> <tr>
<td valign="top" align="left">QUC2</td>
<td valign="top" align="left">Index of the location of urban and suburban land cover in 2000 expressed as a fraction.</td>
</tr>
<tr>
<td valign="top" align="left">QUE2</td>
<td valign="top" align="left">Index of urban and suburban land cover in 2000 expressed as a fraction.</td>
</tr> <tr>
<td valign="top" align="left">QUL2</td>
<td valign="top" align="left">Index of the location of urban and suburban land cover in 2000 expressed as a fraction.</td>
</tr> <tr>
<td valign="top" align="left">QB19</td>
<td valign="top" align="left">Centroid of the catchment (km) cover. first used in HiFlows-UK Version 3.</td>
</tr> <tr>
<td valign="top" align="left">QR1D</td>
<td valign="top" align="left">1day average rainfall.</td>
</tr> <tr>
<td valign="top" align="left">QR1H</td>
<td valign="top" align="left">1 hour average rainfall.</td>
</tr> <tr>
<td valign="top" align="left">QR2D</td>
<td valign="top" align="left">2 day average rainfall.</td>
</tr> <tr>
<td valign="top" align="left">Arable and Horticulture</td>
<td valign="top" align="left">LCM2015 % land use: Arable and Horticulture</td>
</tr> <tr>
<td valign="top" align="left">Coastal</td>
<td valign="top" align="left">LCM2015 % land use: Coastal</td>
</tr> <tr>
<td valign="top" align="left">Grassland</td>
<td valign="top" align="left">LCM2015 % land use: Grassland</td>
</tr> <tr>
<td valign="top" align="left">Heath/Bog</td>
<td valign="top" align="left">LCM2015 % land use: Heath/Bog</td>
</tr> <tr>
<td valign="top" align="left">Inland Rock</td>
<td valign="top" align="left">LCM2015 % land use: Inland Rock</td>
</tr> <tr>
<td valign="top" align="left">Unknown</td>
<td valign="top" align="left">LCM2015 % land use: Unknown</td>
</tr> <tr>
<td valign="top" align="left">Urban</td>
<td valign="top" align="left">LCM2015 % land use: Urban</td>
</tr>
<tr>
<td valign="top" align="left">Water</td>
<td valign="top" align="left">LCM2015 % land use: Water</td>
</tr>
</tbody>
</table>
</table-wrap></sec></sec>
<sec>
<title>2.3. Training data, pre-processing, and feature engineering</title>
<p>After collating all data sources, the sample point IDs from the WQA were matched to the closest river stretch in the digital river network to integrate the two data sources. Note that this matching was a statistics-free process&#x02014;all variables were matched to a river reach. As illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>, all spatial information was excluded from the machine learning model and was not used in the post-analysis of results. Once the datasets were integrated, rows with missing data were excluded. The filtered nitrate and phosphate datasets contained 5,187 and 5,594 rows, respectively, for winter and a similar number of rows for other seasons. Once all missing values were removed, continuous features were normalised using min-max transformation. The reason for such a transformation is that large input values in a neural network can result in a model that learns large weights; models with large weight values are often unstable and may result in poor performance during learning, resulting in a higher generalisation error.</p>
<p>Min-max scaling was also chosen since it does not affect Pearson correlation scores between the potential features, helping with feature reduction in machine learning models. Feature reduction is a key part of data prepossessing, as reducing the dimensionality of a machine learning algorithm potentially reduces the execution time of machine learning algorithms, which is especially important for the tree-based algorithms implemented in this study. Irrelevant features within training data may also mislead the learning process of the final model, resulting in unexpected predictions. Including too many features may also result in overfitting of the model to the training data, resulting in poor predictions of new data (Kantardzic, <xref ref-type="bibr" rid="B31">2019</xref>).</p>
<p>Highly correlated features can often be considered candidates for feature reduction, and the inclusion of highly correlated features provides little extra information from the data. Feature selection algorithms tend to fall within two categories: philtre and wrapper methods. Philtre methods rely on the general characteristics of the considered datasets to select features without involving the training of any machine learning methods. Therefore, this method is not affected by any inherent bias in the machine learning methods used. Wrapper methods, on the other hand, take up large amounts of processing time due to the training of many machine learning models (Kohavi and John, <xref ref-type="bibr" rid="B32">1997</xref>). For this reason, a basic correlation philtre was applied to all continuous features. For each feature, the first feature that was correlated with an absolute value above 0.8 was removed from the datasets. The list of land cover and FEH descriptors with the descriptor codes that were used as input features are outlined in <xref ref-type="table" rid="T1">Table 1</xref>.</p></sec>
<sec>
<title>2.4. Seasonal nitrate and orthophosphate predictions</title>
<sec>
<title>2.4.1. Random forest regressor for water quality modelling</title>
<p>Random forest models offer great flexibility and high predictive performance for environmental applications (e.g., Tyralis et al., <xref ref-type="bibr" rid="B59">2019</xref>; Vergopolan et al., <xref ref-type="bibr" rid="B60">2021</xref>). In this study, we trained a random forest (RF) model (Ho, <xref ref-type="bibr" rid="B25">1995</xref>; Breiman, <xref ref-type="bibr" rid="B11">2001</xref>) to predict either nitrate or orthophosphate levels for each season. Input features and water quality observations were matched to river reaches. The data were then used for training and predictions in the RF model. The final results were matched back to the river reaches. Fitting different models for each season and each chemical species allowed the RF models to select the most relevant input features for the given data.</p>
<p>The RF models are ensemble learners for classification and regression tasks. Ensemble methods use multiple weak learners to obtain a better predictive performance than using any of the constituent learning algorithms alone (Zhang, <xref ref-type="bibr" rid="B71">2012</xref>). Ensemble learners such as RF models always converge by the strong law of large numbers and provide a distinct advantage over single decision trees, as overfitting is not as large of a problem (Ho, <xref ref-type="bibr" rid="B25">1995</xref>; Breiman, <xref ref-type="bibr" rid="B11">2001</xref>). To generate each tree in this ensemble method, bagging is often utilised. Bagging, also referred to as bootstrap aggregation, works as follows: given an initial training dataset <italic>D</italic> of size <italic>N</italic>, bagging generates new training sets <italic>D</italic><sub><italic>i</italic></sub>, each of size <italic>n</italic> by random sampling with replacement. Should <italic>N</italic> &#x0003D; <italic>n</italic>, then for large <italic>n</italic>, the set of training data in <italic>D</italic><sub><italic>i</italic></sub> is expected to have the fraction 1 &#x02212; 1/<italic>e</italic> &#x0007E; 63% of the unique examples of <italic>D</italic>, with the rest being duplicates (Aslam et al., <xref ref-type="bibr" rid="B3">2007</xref>). Sampling with replacement ensures that each bootstrap is independent of other bootstrapped samples since it does not depend on the previously chosen samples when sampling. For each training dataset <italic>D</italic><sub><italic>i</italic></sub>, a tree is trained, and the tree&#x00027;s outputs are combined, usually as an average of all tree outputs or as a voting system for classification. Bagging reduces variance and hence limits overfitting; however, unlike single trees, bagging and ensemble learners lose interpretability. Moreover, sampling and generation of many learners to produce suitable bagging ensemble models can be computationally expensive. RF models can also benefit from randomisation of features where a random subset of features is considered for splitting at each node. Boosting and the ability to consider random subsets for splitting tree nodes decreases the variance of the RF estimator. Moreover, for regression tasks, by taking an average of tree predictions, errors within single trees can be mitigated with a large number of estimators.</p>
<p>To obtain the optimal RF model, we tested each RF model with combinations of hyperparameters, which are listed in <xref ref-type="table" rid="T2">Table 2</xref>. We have reported only the results from RF methods in this study. For a comparison of the performance of different machine learning methods, see the preliminary study of Huxley (<xref ref-type="bibr" rid="B28">2021</xref>).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The combination of hyper-parameters tested to optimise the random forest model in a random grid search.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Hyperparameter</bold></th>
<th valign="top" align="left"><bold>Values tested</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Number of tree estimators</td>
<td valign="top" align="left">25, 50, 75, 100, 125, 150, 175, 200, 225, 250</td>
</tr> <tr>
<td valign="top" align="left">Minimum samples needed to split a node</td>
<td valign="top" align="left">2, 5, 10, 15, 20</td>
</tr> <tr>
<td valign="top" align="left">Minimum samples needed to form a leaf node</td>
<td valign="top" align="left">1, 5, 10, 15, 20</td>
</tr></tbody>
</table>
</table-wrap></sec>
<sec>
<title>2.4.2. Feature importance and selection</title>
<p>To avoid overfitting, we performed a two-step procedure to select input features for the final RF models. First, we ran full RF models with all available features and ranked the features by descending importance values. Subsequently, the list of features was iterated by adding one feature at a time and calculating the variance inflation factor (VIF), which is defined as <italic>VIF</italic> &#x0003D; 1/(1 &#x02212; <italic>R</italic><sup>2</sup>), where <italic>R</italic><sup>2</sup> is the coefficient of determination between two feature pairs. If the inclusion of the feature caused the VIF to exceed 10, then the feature was dropped. Otherwise, the feature was retained.</p></sec>
<sec>
<title>2.4.3. Cross-validation</title>
<p>It is not recommended to train a model on the same data it will be tested on since machine learning models tend to overfit the training data (Srivastava et al., <xref ref-type="bibr" rid="B57">2014</xref>). Machine learning algorithms should be developed to maximise predictive accuracy on new data, not necessarily the training data. Fixation on fitting the best fit on training data will fit its noise by memorising its peculiarities rather than finding a general predictive rule (i.e., overfitting; Dietterich, <xref ref-type="bibr" rid="B17">1995</xref>). To analyse whether a machine learning model is overfitted to the training data, we can use cross-validation and assess the performance of a machine learning algorithm on separate testing datasets.</p>
<p>To implement a random grid search, each nitrate and orthophosphate dataset was split into a training and a testing set. This was done by randomly assigning data points to the sets, with each testing set comprising 25% of the original dataset and the training sets with the remaining. The grid search was performed on the testing sets with k-fold cross-validation where <italic>k</italic> &#x0003D; 4. In k-fold validation, data is partitioned into k-equal or nearly equal sets using a stratification process or randomisation. Training and testing are performed on these partitioned sets, referred to as folds, in k iterations such that at each iteration, we leave 1-fold out for testing the trained model, where the remaining k-1 folds are used for training (Yadav and Shukla, <xref ref-type="bibr" rid="B68">2016</xref>). The performance of the machine learning algorithm is determined by the mean of the metric scores of the k iterations. It has been shown for classification problems that k-fold validation provides a good indicator of model performance for large datasets. This is despite a trade-off between the number of cross-validation folds and the computation time for evaluating metrics, where more folds lead to increased computation time (Yadav and Shukla, <xref ref-type="bibr" rid="B68">2016</xref>). K-fold validation is selected over other validation techniques, such as &#x0201C;hold one out,&#x0201D; mainly due to time and computational restraints. &#x0201C;Hold one out&#x0201D; trains the model with the whole training set except a single point and tests with a single point. For a random grid search, this would have led to a longer search time for the best hyperparameters compared to a k-fold validation approach due to the greater number of models trained. Using the &#x0201C;hold one out&#x0201D; method with a large training set could also lead to the selection of a hyperparameter set that overfits, with more outlier trends being learnt that lead to a final model that generalises poorly to new data. Due to the number of folds and time trade-off, k = 3 folds were used in the random grid search for hyperparameters to reduce searching time. Once the hyperparameters were selected, final models, which included 3-folds as the final training data, were trained. The final performance was determined using metrics on a held-out testing dataset, as illustrated in the next section.</p></sec>
<sec>
<title>2.4.4. Performance evaluation</title>
<p>To evaluate the performance of our machine learning methods, we considered the mean squared error (MSE), Nash-Sutcliffe model efficiency coefficient (NSE; Nash and Sutcliffe, <xref ref-type="bibr" rid="B43">1970</xref>), and the Kling-Gupta efficiency (KGE; Gupta et al., <xref ref-type="bibr" rid="B24">2009</xref>). The mean squared error (MSE) is defined as:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>/</mml:mo><mml:mi>n</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>x</italic><sub><italic>i</italic></sub> represents the observation and <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the predicted value for data (<italic>i</italic>). The Nash-Sutcliffe model efficiency coefficient (NSE) is defined as:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M4"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> represents the mean of observations. The Kling-Gupta efficiency (KGE) is defined as:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>K</mml:mi><mml:mi>G</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B1;</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B2;</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>r</italic> is the linear correlation between observations and simulations, &#x003B1; is a measure of the variability error, and &#x003B2; <italic>is</italic> a bias term, which can also be written as:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>K</mml:mi><mml:mi>G</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:mfrac><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:mfrac><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003BC; and &#x003C3; correspond to the mean and standard deviation, respectively. When NSE = 1 and KGE = 1, it indicates perfect agreement between simulation and observations. When NSE = 0, it indicates that the mean of observations provides better estimates than simulations.</p></sec></sec></sec>
<sec id="s4">
<title>3. Results</title>
<sec>
<title>3.1. Feature selections and predictions</title>
<p>As discussed in the methods section, we adopted a two-step approach to initially run full RF models with all features and then select a subset of the features to run the final RF models. <xref ref-type="table" rid="T3">Table 3</xref> shows the features selected for the models for each water quality species and season. In all models, coastal and unknown land use had zero feature importance. The Flood Attenuation by Reservoirs and Lakes (FARL) index [QFAR], catchment wetness index [QPRW], as well as 1-day, 2-day, 1-h average rainfall [QR1D, QR1H, and QR2D] were not selected in any models. Arable and horticulture land use was an important feature of all nitrate models. While five or more catchment descriptors were selected as input features in all other models, only three and four of them were selected for the winter and autumn nitrate models, respectively. While all nitrate models did not select grassland as an input feature, it was included in three of the four orthophosphate models. For predicting orthophosphate in winter, fewer land use features were selected, while average annual rainfall [QS69] and arable and horticulture were selected instead.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Feature screening results.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th/>
<th valign="top" align="center" colspan="4"><bold>Nitrate</bold></th>
<th valign="top" align="center" colspan="4"><bold>Orthophosphate</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="center"><bold>Feature</bold></th>
<th valign="top" align="center"><bold>Spring</bold></th>
<th valign="top" align="center"><bold>Summer</bold></th>
<th valign="top" align="center"><bold>Autumn</bold></th>
<th valign="top" align="center"><bold>Winter</bold></th>
<th valign="top" align="center"><bold>Spring</bold></th>
<th valign="top" align="center"><bold>Summer</bold></th>
<th valign="top" align="center"><bold>Autumn</bold></th>
<th valign="top" align="center"><bold>Winter</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">CCAR</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
</tr> <tr>
<td valign="top" align="left">HGHT</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QALT</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QASB</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
</tr> <tr>
<td valign="top" align="left">QASV</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QBFI</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.06</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.08</bold></td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center">0.04</td>
</tr> <tr>
<td valign="top" align="left">QDPB</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
</tr> <tr>
<td valign="top" align="left">QDPS</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.05</td>
</tr> <tr>
<td valign="top" align="left">QFAR</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
</tr> <tr>
<td valign="top" align="left">QFPD</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.08</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.09</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
</tr> <tr>
<td valign="top" align="left">QFPL</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QFPX</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.1</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
</tr> <tr>
<td valign="top" align="left">QLDP</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.01</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QPRW</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
</tr> <tr>
<td valign="top" align="left">QR1D</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QR1H</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QR2D</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">QS69</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.08</bold></td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.09</bold></td>
</tr> <tr>
<td valign="top" align="left">QSPR</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.06</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.06</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.07</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
</tr> <tr>
<td valign="top" align="left">Arable and horticulture</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.35</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.19</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.32</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.45</bold></td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
</tr> <tr>
<td valign="top" align="left">Coastal</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr> <tr>
<td valign="top" align="left">Grassland</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center">0.04</td>
</tr> <tr>
<td valign="top" align="left">Heath/bog</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.01</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.01</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.03</td>
</tr> <tr>
<td valign="top" align="left">Inland rock</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.01</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.01</bold></td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.01</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.02</td>
</tr> <tr>
<td valign="top" align="left">Unknown</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr> <tr>
<td valign="top" align="left">Urban</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.04</bold></td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.08</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.07</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.07</bold></td>
</tr> <tr>
<td valign="top" align="left">Water</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.02</bold></td>
</tr> <tr>
<td valign="top" align="left">Woodland</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.03</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.06</bold></td>
<td valign="top" align="center" style="background-color:&#x00023;BCDDF7"><bold>0.05</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Features highlighted and bold values were used in the random forest models. Note that some features with higher importance were not selected because they increased the VIF to above 10, and therefore, they were skipped.</p>
</table-wrap-foot>
</table-wrap>
<p>The selected features listed in <xref ref-type="table" rid="T3">Table 3</xref> were then used to run the final RF models and the final feature selection results are reported in <xref ref-type="fig" rid="F2">Figure 2</xref>. For nitrate models, arable and horticulture land use was by far the most important input feature, while other land use features mostly had low importance. Catchment descriptors tended to have higher importance in autumn and winter, partly because fewer of them were selected in the previous stage. In the orthophosphate models, the contributions of feature importance were much more evenly distributed. The spring, summer, and autumn models were very similar, while the winter model had a rather different set of features, and their feature importance values were non-trivial. Specifically, the longest drainage length [QLDP], average annual rainfall [QS69], and arable and horticulture land use were included, while the baseflow index [QBFI], mean distance to catchment outlet [QDPB], and grassland were excluded. Catchment drainage area [CCAR], catchment slope invariability [QASB], mean depth of water and floodplain extent of a 100-year event [QFPD, QFPX], standard percentage runoff [QSPR], and urban and woodland land use were included in all orthophosphate models.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Final feature selection results of the random forest models. Note that features with zero importance are not included in the final models.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0002.tif"/>
</fig></sec>
<sec>
<title>3.2. Overall model performance</title>
<sec>
<title>3.2.1. Nitrate models</title>
<p>Once the predictions of nitrate concentrations were made by the RF model at each river reach, they were mapped back to the river network graph. <xref ref-type="fig" rid="F3">Figure 3</xref> shows the long-term predicted nitrate levels at each river reach in GB for each season. Central and eastern England were predicted to have higher nitrate concentrations, and they are higher in winter and spring than in summer and autumn. The exception was the Pennines, which had a low nitrate concentration that may be attributed to its topography, lower-intensity land use, lack of sewage inputs, and low base flow index. In Scotland and northern England, higher nitrate concentrations were mostly observed on the East Coast alone. While in some cases, small streams in remote areas had higher nitrate concentrations, in general, nitrate concentrations were higher in bigger streams.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Predictions of nitrate concentration in rivers across GB. Note line widths are proportional to Strahler stream order (i.e., thicker for larger streams downstream).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0003.tif"/>
</fig>
<p><bold>Figures 5A</bold>, <bold>B</bold> illustrate the nitrate model performance in training and testing. For brevity, we grouped the results from all seasons together. The training data achieved a very good <italic>R</italic><sup>2</sup> value of 0.96 (NSE of 0.91 and KGE of 0.83), and there was a very high density along the 1:1 line. It was also obvious from the Hexbin plot that nitrate observations were mostly concentrated between 1.0 and 2.0 of the log-transformed data, and there was a long tail for concentrations below 1.0.</p>
<p>There is evidence that the nitrate RF model exhibited slight overfitting as the training MSE of 0.25 (NSE of 0.51 and KGE of 0.61) was not as good as the testing MSE. However, its <italic>R</italic><sup>2</sup> value of 0.71 was good, and despite some spread, the scatter points fell along the 1:1 line well.</p>
<p>Spatially, <bold>Figure 6</bold> and <xref ref-type="table" rid="T4">Table 4</xref> show that the nitrate RF model performed well and better generalised the whole of England based on testing the MSE values for each hydrometric area (HA, see <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 1</xref>). The RF models, on average, showed the larger HAs, and those not along the south and northeast coasts made better predictions and showed more consistent performance. The NSEs of many HAs reported a good value of 0.3 or above. A few HAs reported negative NSEs, indicating they had issues reproducing the mean. These were small Has, so their small sample size can be attributed to the NSE value, and it is not an indication of the model&#x00027;s predictive power in general. For KGE, better-than-average performances were observed in Tweed (HA = 21) and the HAs on the southwest coast.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Model performance metrics based on hydrometric areas (HAs) in England.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="center" colspan="2"></th>
<th valign="top" align="center" colspan="2"><bold>MSE</bold></th>
<th valign="top" align="center" colspan="2"><bold>NSE</bold></th>
<th valign="top" align="center" colspan="2"><bold>KGE</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>HA</bold></th>
<th valign="top" align="left"><bold>HA name</bold></th>
<th valign="top" align="center"><bold>Nitrate</bold></th>
<th valign="top" align="center"><bold>Orthophosphate</bold></th>
<th valign="top" align="center"><bold>Nitrate</bold></th>
<th valign="top" align="center"><bold>Orthophosphate</bold></th>
<th valign="top" align="center"><bold>Nitrate</bold></th>
<th valign="top" align="center"><bold>Orthophosphate</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">21</td>
<td valign="top" align="left">Tweed</td>
<td valign="top" align="center">0.048</td>
<td valign="top" align="center">0.729</td>
<td valign="top" align="center">0.103</td>
<td valign="top" align="center">-inf</td>
<td valign="top" align="center">0.419</td>
<td valign="top" align="center">nan</td>
</tr> <tr>
<td valign="top" align="left">22</td>
<td valign="top" align="left">Coquet Group</td>
<td valign="top" align="center">0.439</td>
<td valign="top" align="center">1.236</td>
<td valign="top" align="center">&#x02212;1.139</td>
<td valign="top" align="center">&#x02212;1.106</td>
<td valign="top" align="center">0.103</td>
<td valign="top" align="center">0.407</td>
</tr> <tr>
<td valign="top" align="left">23</td>
<td valign="top" align="left">Tyne (Northumberland)</td>
<td valign="top" align="center">0.39</td>
<td valign="top" align="center">1.454</td>
<td valign="top" align="center">0.25</td>
<td valign="top" align="center">&#x02212;0.235</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">&#x02212;0.101</td>
</tr> <tr>
<td valign="top" align="left">24</td>
<td valign="top" align="left">Wear</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">1.305</td>
<td valign="top" align="center">0.234</td>
<td valign="top" align="center">0.157</td>
<td valign="top" align="center">0.271</td>
<td valign="top" align="center">0.226</td>
</tr> <tr>
<td valign="top" align="left">25</td>
<td valign="top" align="left">Tees Group</td>
<td valign="top" align="center">0.246</td>
<td valign="top" align="center">0.798</td>
<td valign="top" align="center">0.407</td>
<td valign="top" align="center">0.481</td>
<td valign="top" align="center">0.56</td>
<td valign="top" align="center">0.548</td>
</tr> <tr>
<td valign="top" align="left">27</td>
<td valign="top" align="left">Ouse (Yorkshire)</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">0.943</td>
<td valign="top" align="center">0.578</td>
<td valign="top" align="center">0.299</td>
<td valign="top" align="center">0.693</td>
<td valign="top" align="center">0.451</td>
</tr> <tr>
<td valign="top" align="left">28</td>
<td valign="top" align="left">Trent</td>
<td valign="top" align="center">0.259</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.401</td>
<td valign="top" align="center">0.213</td>
<td valign="top" align="center">0.557</td>
<td valign="top" align="center">0.322</td>
</tr> <tr>
<td valign="top" align="left">29</td>
<td valign="top" align="left">Ancholme Group</td>
<td valign="top" align="center">0.226</td>
<td valign="top" align="center">1.14</td>
<td valign="top" align="center">&#x02212;0.208</td>
<td valign="top" align="center">&#x02212;0.097</td>
<td valign="top" align="center">0.378</td>
<td valign="top" align="center">&#x02212;0.006</td>
</tr> <tr>
<td valign="top" align="left">30</td>
<td valign="top" align="left">Witham and Steeping</td>
<td valign="top" align="center">0.241</td>
<td valign="top" align="center">1.06</td>
<td valign="top" align="center">0.263</td>
<td valign="top" align="center">0.081</td>
<td valign="top" align="center">0.532</td>
<td valign="top" align="center">0.124</td>
</tr> <tr>
<td valign="top" align="left">31</td>
<td valign="top" align="left">Welland</td>
<td valign="top" align="center">0.208</td>
<td valign="top" align="center">0.885</td>
<td valign="top" align="center">0.507</td>
<td valign="top" align="center">&#x02212;0.037</td>
<td valign="top" align="center">0.556</td>
<td valign="top" align="center">0.106</td>
</tr> <tr>
<td valign="top" align="left">32</td>
<td valign="top" align="left">Nene</td>
<td valign="top" align="center">0.195</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.553</td>
<td valign="top" align="center">&#x02212;0.007</td>
<td valign="top" align="center">0.563</td>
<td valign="top" align="center">0.236</td>
</tr> <tr>
<td valign="top" align="left">33</td>
<td valign="top" align="left">Great Ouse</td>
<td valign="top" align="center">0.2</td>
<td valign="top" align="center">0.802</td>
<td valign="top" align="center">0.213</td>
<td valign="top" align="center">0.268</td>
<td valign="top" align="center">0.436</td>
<td valign="top" align="center">0.328</td>
</tr> <tr>
<td valign="top" align="left">34</td>
<td valign="top" align="left">Norfolk Rivers Group</td>
<td valign="top" align="center">0.182</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">0.236</td>
<td valign="top" align="center">0.165</td>
<td valign="top" align="center">0.411</td>
<td valign="top" align="center">0.285</td>
</tr> <tr>
<td valign="top" align="left">35</td>
<td valign="top" align="left">East Suffolk Rivers</td>
<td valign="top" align="center">0.147</td>
<td valign="top" align="center">0.446</td>
<td valign="top" align="center">0.434</td>
<td valign="top" align="center">0.433</td>
<td valign="top" align="center">0.457</td>
<td valign="top" align="center">0.459</td>
</tr> <tr>
<td valign="top" align="left">36</td>
<td valign="top" align="left">Stour (Essex and Suffolk)</td>
<td valign="top" align="center">0.163</td>
<td valign="top" align="center">0.612</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">&#x02212;0.64</td>
<td valign="top" align="center">0.271</td>
<td valign="top" align="center">&#x02212;0.002</td>
</tr> <tr>
<td valign="top" align="left">37</td>
<td valign="top" align="left">Essex Rivers Group</td>
<td valign="top" align="center">0.194</td>
<td valign="top" align="center">0.446</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.159</td>
<td valign="top" align="center">0.544</td>
<td valign="top" align="center">0.293</td>
</tr> <tr>
<td valign="top" align="left">38</td>
<td valign="top" align="left">Lee</td>
<td valign="top" align="center">0.219</td>
<td valign="top" align="center">1.082</td>
<td valign="top" align="center">0.233</td>
<td valign="top" align="center">&#x02212;0.027</td>
<td valign="top" align="center">0.547</td>
<td valign="top" align="center">0.121</td>
</tr> <tr>
<td valign="top" align="left">39</td>
<td valign="top" align="left">Thames</td>
<td valign="top" align="center">0.314</td>
<td valign="top" align="center">0.79</td>
<td valign="top" align="center">0.163</td>
<td valign="top" align="center">0.237</td>
<td valign="top" align="center">0.289</td>
<td valign="top" align="center">0.318</td>
</tr> <tr>
<td valign="top" align="left">40</td>
<td valign="top" align="left">Kent Rivers Group</td>
<td valign="top" align="center">0.393</td>
<td valign="top" align="center">0.733</td>
<td valign="top" align="center">0.196</td>
<td valign="top" align="center">0.352</td>
<td valign="top" align="center">0.358</td>
<td valign="top" align="center">0.407</td>
</tr> <tr>
<td valign="top" align="left">41</td>
<td valign="top" align="left">Sussex Rivers Group</td>
<td valign="top" align="center">0.437</td>
<td valign="top" align="center">0.855</td>
<td valign="top" align="center">0.266</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.381</td>
<td valign="top" align="center">0.407</td>
</tr> <tr>
<td valign="top" align="left">42</td>
<td valign="top" align="left">Hampshire Rivers Group</td>
<td valign="top" align="center">0.355</td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">0.34</td>
<td valign="top" align="center">0.142</td>
<td valign="top" align="center">0.468</td>
<td valign="top" align="center">0.475</td>
</tr> <tr>
<td valign="top" align="left">43</td>
<td valign="top" align="left">Avon and Stour</td>
<td valign="top" align="center">0.117</td>
<td valign="top" align="center">0.551</td>
<td valign="top" align="center">0.54</td>
<td valign="top" align="center">0.409</td>
<td valign="top" align="center">0.594</td>
<td valign="top" align="center">0.476</td>
</tr> <tr>
<td valign="top" align="left">44</td>
<td valign="top" align="left">Frome Group</td>
<td valign="top" align="center">0.138</td>
<td valign="top" align="center">0.465</td>
<td valign="top" align="center">0.327</td>
<td valign="top" align="center">&#x02212;0.101</td>
<td valign="top" align="center">0.444</td>
<td valign="top" align="center">0.16</td>
</tr> <tr>
<td valign="top" align="left">45</td>
<td valign="top" align="left">Exe Group</td>
<td valign="top" align="center">0.078</td>
<td valign="top" align="center">0.479</td>
<td valign="top" align="center">0.408</td>
<td valign="top" align="center">0.157</td>
<td valign="top" align="center">0.712</td>
<td valign="top" align="center">0.476</td>
</tr> <tr>
<td valign="top" align="left">46</td>
<td valign="top" align="left">Dart Group</td>
<td valign="top" align="center">0.269</td>
<td valign="top" align="center">0.924</td>
<td valign="top" align="center">0.445</td>
<td valign="top" align="center">&#x02212;0.338</td>
<td valign="top" align="center">0.561</td>
<td valign="top" align="center">&#x02212;0.149</td>
</tr> <tr>
<td valign="top" align="left">47</td>
<td valign="top" align="left">Tamar Group</td>
<td valign="top" align="center">0.095</td>
<td valign="top" align="center">1.407</td>
<td valign="top" align="center">0.539</td>
<td valign="top" align="center">&#x02212;0.587</td>
<td valign="top" align="center">0.7</td>
<td valign="top" align="center">&#x02212;0.004</td>
</tr> <tr>
<td valign="top" align="left">48</td>
<td valign="top" align="left">Fal Group</td>
<td valign="top" align="center">0.107</td>
<td valign="top" align="center">1.045</td>
<td valign="top" align="center">0.637</td>
<td valign="top" align="center">&#x02212;0.058</td>
<td valign="top" align="center">0.732</td>
<td valign="top" align="center">0.199</td>
</tr> <tr>
<td valign="top" align="left">49</td>
<td valign="top" align="left">Camel Group</td>
<td valign="top" align="center">0.137</td>
<td valign="top" align="center">1.323</td>
<td valign="top" align="center">0.264</td>
<td valign="top" align="center">&#x02212;0.029</td>
<td valign="top" align="center">0.493</td>
<td valign="top" align="center">0.179</td>
</tr> <tr>
<td valign="top" align="left">50</td>
<td valign="top" align="left">Taw and Torridge</td>
<td valign="top" align="center">0.112</td>
<td valign="top" align="center">0.857</td>
<td valign="top" align="center">0.609</td>
<td valign="top" align="center">&#x02212;0.077</td>
<td valign="top" align="center">0.673</td>
<td valign="top" align="center">0.45</td>
</tr> <tr>
<td valign="top" align="left">51</td>
<td valign="top" align="left">East Lyn Group</td>
<td valign="top" align="center">0.038</td>
<td valign="top" align="center">0.011</td>
<td valign="top" align="center">0.853</td>
<td valign="top" align="center">0.491</td>
<td valign="top" align="center">0.67</td>
<td valign="top" align="center">0.337</td>
</tr> <tr>
<td valign="top" align="left">52</td>
<td valign="top" align="left">Somerset Rivers Group</td>
<td valign="top" align="center">0.253</td>
<td valign="top" align="center">0.609</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">0.328</td>
</tr> <tr>
<td valign="top" align="left">53</td>
<td valign="top" align="left">Avon (Bristol)</td>
<td valign="top" align="center">0.099</td>
<td valign="top" align="center">0.658</td>
<td valign="top" align="center">0.447</td>
<td valign="top" align="center">0.165</td>
<td valign="top" align="center">0.536</td>
<td valign="top" align="center">0.275</td>
</tr> <tr>
<td valign="top" align="left">54</td>
<td valign="top" align="left">Severn</td>
<td valign="top" align="center">0.185</td>
<td valign="top" align="center">0.491</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">0.404</td>
<td valign="top" align="center">0.537</td>
<td valign="top" align="center">0.506</td>
</tr> <tr>
<td valign="top" align="left">55</td>
<td valign="top" align="left">Wye (Hereford)</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="center">0.36</td>
<td valign="top" align="center">0.529</td>
<td valign="top" align="center">0.485</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.586</td>
</tr> <tr>
<td valign="top" align="left">67</td>
<td valign="top" align="left">Dee (Cheshire)</td>
<td valign="top" align="center">0.363</td>
<td valign="top" align="center">0.364</td>
<td valign="top" align="center">0.009</td>
<td valign="top" align="center">0.321</td>
<td valign="top" align="center">0.504</td>
<td valign="top" align="center">0.586</td>
</tr> <tr>
<td valign="top" align="left">68</td>
<td valign="top" align="left">Cheshire Rivers Group</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.536</td>
<td valign="top" align="center">0.031</td>
<td valign="top" align="center">0.214</td>
<td valign="top" align="center">0.268</td>
<td valign="top" align="center">0.316</td>
</tr> <tr>
<td valign="top" align="left">69</td>
<td valign="top" align="left">Mersey and Irwell</td>
<td valign="top" align="center">0.361</td>
<td valign="top" align="center">0.454</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.634</td>
<td valign="top" align="center">0.579</td>
<td valign="top" align="center">0.619</td>
</tr> <tr>
<td valign="top" align="left">70</td>
<td valign="top" align="left">Douglas Group</td>
<td valign="top" align="center">0.392</td>
<td valign="top" align="center">0.644</td>
<td valign="top" align="center">0.068</td>
<td valign="top" align="center">&#x02212;0.097</td>
<td valign="top" align="center">0.367</td>
<td valign="top" align="center">0.211</td>
</tr> <tr>
<td valign="top" align="left">71</td>
<td valign="top" align="left">Ribble</td>
<td valign="top" align="center">0.291</td>
<td valign="top" align="center">0.505</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.459</td>
<td valign="top" align="center">0.352</td>
<td valign="top" align="center">0.633</td>
</tr>
<tr>
<td valign="top" align="left">72</td>
<td valign="top" align="left">Wyre and Lune</td>
<td valign="top" align="center">0.118</td>
<td valign="top" align="center">0.381</td>
<td valign="top" align="center">0.437</td>
<td valign="top" align="center">0.359</td>
<td valign="top" align="center">0.374</td>
<td valign="top" align="center">0.482</td>
</tr>
<tr>
<td valign="top" align="left">73</td>
<td valign="top" align="left">Kent Group</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.345</td>
<td valign="top" align="center">&#x02212;0.509</td>
<td valign="top" align="center">0.263</td>
<td valign="top" align="center">0.139</td>
<td valign="top" align="center">0.679</td>
</tr> <tr>
<td valign="top" align="left">74</td>
<td valign="top" align="left">Esk Group (Cumbria)</td>
<td valign="top" align="center">0.371</td>
<td valign="top" align="center">0.652</td>
<td valign="top" align="center">&#x02212;0.084</td>
<td valign="top" align="center">&#x02212;0.169</td>
<td valign="top" align="center">0.243</td>
<td valign="top" align="center">0.495</td>
</tr> <tr>
<td valign="top" align="left">75</td>
<td valign="top" align="left">Derwent Group (Cumbria)</td>
<td valign="top" align="center">0.165</td>
<td valign="top" align="center">1.508</td>
<td valign="top" align="center">0.375</td>
<td valign="top" align="center">&#x02212;0.047</td>
<td valign="top" align="center">0.364</td>
<td valign="top" align="center">0.129</td>
</tr> <tr>
<td valign="top" align="left">76</td>
<td valign="top" align="left">Eden (Cumbria)</td>
<td valign="top" align="center">0.222</td>
<td valign="top" align="center">0.335</td>
<td valign="top" align="center">0.296</td>
<td valign="top" align="center">&#x02212;0.003</td>
<td valign="top" align="center">0.568</td>
<td valign="top" align="center">0.486</td>
</tr> <tr>
<td valign="top" align="left">101</td>
<td valign="top" align="left">Isle of Wight</td>
<td valign="top" align="center">0.135</td>
<td valign="top" align="center">0.721</td>
<td valign="top" align="center">0.379</td>
<td valign="top" align="center">0.036</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">0.269</td>
</tr></tbody>
</table>
</table-wrap>
<p>Based on the MSE, the nitrate RF models performed better on river reaches with a Strahler stream order 4&#x02013;7 (<xref ref-type="table" rid="T5">Table 5</xref>) than lower-order streams. Based on the NSE and KGE, streams of orders 5 and 6 outperformed other streams. In particular, based on the NSE, the performance of order 1 and 7 streams were very similar. This indicated that nitrate predictions were more challenging for small streams and very large streams (order = 7), with the latter only having a few occurrences in the UK river network graph.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Model performance metrics based on the Strahler stream order.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th/>
<th valign="top" align="center" colspan="2"><bold>MSE</bold></th>
<th valign="top" align="center" colspan="2"><bold>NSE</bold></th>
<th valign="top" align="center" colspan="2"><bold>KGE</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Strahler</bold></th>
<th valign="top" align="center"><bold>Nitrate</bold></th>
<th valign="top" align="center"><bold>Orthophosphate</bold></th>
<th valign="top" align="center"><bold>Nitrate</bold></th>
<th valign="top" align="center"><bold>Orthophosphate</bold></th>
<th valign="top" align="center"><bold>Nitrate</bold></th>
<th valign="top" align="center"><bold>Orthophosphate</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">0.475</td>
<td valign="top" align="center">1.08</td>
<td valign="top" align="center">0.318</td>
<td valign="top" align="center">0.088</td>
<td valign="top" align="center">0.456</td>
<td valign="top" align="center">0.182</td>
</tr> <tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">0.348</td>
<td valign="top" align="center">0.973</td>
<td valign="top" align="center">0.427</td>
<td valign="top" align="center">0.25</td>
<td valign="top" align="center">0.521</td>
<td valign="top" align="center">0.311</td>
</tr> <tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">0.277</td>
<td valign="top" align="center">0.819</td>
<td valign="top" align="center">0.48</td>
<td valign="top" align="center">0.327</td>
<td valign="top" align="center">0.584</td>
<td valign="top" align="center">0.408</td>
</tr> <tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">0.194</td>
<td valign="top" align="center">0.575</td>
<td valign="top" align="center">0.631</td>
<td valign="top" align="center">0.465</td>
<td valign="top" align="center">0.703</td>
<td valign="top" align="center">0.554</td>
</tr> <tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">0.672</td>
<td valign="top" align="center">0.549</td>
<td valign="top" align="center">0.744</td>
<td valign="top" align="center">0.612</td>
</tr> <tr>
<td valign="top" align="left">6</td>
<td valign="top" align="center">0.138</td>
<td valign="top" align="center">0.458</td>
<td valign="top" align="center">0.75</td>
<td valign="top" align="center">0.428</td>
<td valign="top" align="center">0.748</td>
<td valign="top" align="center">0.607</td>
</tr> <tr>
<td valign="top" align="left">7</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="center">0.656</td>
<td valign="top" align="center">0.327</td>
<td valign="top" align="center">0.235</td>
<td valign="top" align="center">0.652</td>
<td valign="top" align="center">0.412</td>
</tr></tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> shows only subtle changes in nitrate levels between any two seasons. While nitrate levels between seasons are well-correlated, considerable variability within &#x0002B;/&#x02013; 0.5 order of magnitude exists (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 2A</xref>). This highlighted that with good training and cross-validation results for the models for each season (<bold>Figure 5</bold>), applying machine learning methods for nitrate predictions in every reach of the UK river network could lead to greater variability in predictions.</p></sec>
<sec>
<title>3.2.2. Orthophosphate models</title>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> illustrates the long-term predicted orthophosphate levels at each river reach in GB for each season. Similar to nitrate, central and eastern England had higher orthophosphate levels, but regions with high orthophosphate levels appeared to be smaller. Unlike nitrate levels, orthophosphate levels were higher in summer and autumn than in spring and winter.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Predicted concentrations of orthophosphate in rivers across GB. Note line widths are proportional to Strahler stream order (i.e., thicker for larger streams downstream).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0004.tif"/>
</fig>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows the orthophosphate model&#x00027;s performance in training and testing. The training data achieved a good <italic>R</italic><sup>2</sup> value of 0.95 (NSE of 0.88 and KGE of 0.77), and there was a very high density along the 1:1 line. Furthermore, unlike nitrate, the Hexbin plot for orthophosphate did not show a skewed distribution. Some bias was noticeable in the predictions; the slope of the best-fit line was slightly steeper than the 1:1 line, indicating predicted values were higher than observed for high orthophosphate levels, while the opposite was true for low orthophosphate levels.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Hexbin plots showing the performance of random forest models. Note that models from the four seasons are plotted on a single plot. <bold>(A)</bold> Nitrate. Training data. <bold>(B)</bold> Nitrate. Testing data. <bold>(C)</bold> Orthophosphate. Training data. <bold>(D)</bold> Orthophosphate. Testing data.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0005.tif"/>
</fig>
<p>There is evidence that the orthophosphate RF models exhibited slight overfitting as the training MSE of 0.77 (NSE of 0.33 and KGE of 0.43) was not as good as the testing MSE. Despite a rather large spread of the scatter points, the <italic>R</italic><sup>2</sup> value of 0.77 indicated a good correlation between the predicted and observed data.</p>
<p>Overall, <xref ref-type="fig" rid="F6">Figure 6</xref> and <xref ref-type="table" rid="T4">Table 4</xref> demonstrate that the orthophosphate RF models performed well and better generalised the whole of England based on the testing MSE values for each HA. The RF models, on rage, registered lower MSE at larger HAs and in the west of England (excluding the southwest coasts). The NSE of many HAs reported negative values, indicating they had issues reproducing the mean, which we partly observed in the Hexbin plots in <xref ref-type="fig" rid="F5">Figure 5</xref>. For KGE, good performance (KGE &#x0003E; 0.5) could be observed in the Tees group (HA = 25), Severn (HA = 54), Wye (Hereford; HA = 55), Dee (Cheshire; HA = 67), Ribble (HA = 71), and Kent group (HA = 73), with many other HAs achieving similar performance. Meanwhile, Tyne (Northumberland; HA = 23) and Dart group (HA = 46) performed poorly (KGE &#x0003C; 0.1).</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Spatial map of mean MSE, NSE, and KGE by hydrometric areas for testing data for nitrate, and orthophosphate.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0006.tif"/>
</fig>
<p>Identical to the results for nitrate, the orthophosphate RF models performed better on river reaches with a Strahler stream order 4&#x02013;7 (<xref ref-type="table" rid="T5">Table 5</xref>) than lower-order streams based on MSE. Based on NSE and KGE, streams of order 5 and 6 outperformed other streams. Again, this indicated that the orthophosphate predictions were more challenging for small streams and very large streams (order = 7).</p>
<p><xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 2B</xref> shows a scatter plot of predicted nitrate against orthophosphate at each river reaches in spring. It shows that despite some higher correlations in high nitrate-high orthophosphate conditions and at high Strahler stream order (6 or above), nitrate and orthophosphate levels were not well-correlated. This highlighted the differences in sources for nitrate and orthophosphate and in their sensitivity to input features. This also suggested that nitrate and orthophosphate may not be suitable to be used as a proxy measurement for each other.</p></sec></sec>
<sec>
<title>3.3. Results from selected catchments</title>
<p>It can be difficult to visualise the nitrate and orthophosphate prediction at individual river reaches when all the river reaches of GB are presented in the same plot. Therefore, we focused on the results of four selected hydrometric areas in <xref ref-type="fig" rid="F7">Figure 7</xref>. In Tweed (HA = 21), we observed much higher nitrate concentrations in streams in the east of the HA in winter. However, it caused just a small increase in nitrate concentration in its main river channel (i.e., River Tweed). Orthophosphate levels in the Tweed were generally very low; however, their levels were higher in smaller streams in summer and autumn. In the Thames (HA = 39) area, higher nitrate levels were observed in River Mole in the southeast of the HA, while higher orthophosphate levels tended to occur in smaller tributaries in the northwest of the HA. We observed slightly higher nitrate in spring and winter than in summer and autumn. There were a few localised orthophosphate hotspots in summer and autumn, causing some increase in orthophosphate in the Thames. In Wye (Hereford; HA = 55), nitrate levels were generally seasonally invariant. Obvious increases in orthophosphate in small streams near Hereford in summer and autumn were observed. However, it did not lead to a change in the very low orthophosphate levels in its 6th order streams&#x02014;River Wye and River Monnow. For the Tay (HA = 15) in Scotland (note that all training data is from England), we observed very low nitrate levels in most of the HAs. These were slightly higher in the larger streams, and high levels were observed in the southeast corner of the HAs, which were slightly higher in spring and winter. Orthophosphate levels were also very low for most of the HAs. Slightly higher levels were observed in very small streams and some streams in the southeast corner of the HAs, while higher orthophosphate levels were observed in summer and autumn.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Nitrate and orthophosphate prediction results at <bold>(A)</bold> Tweed (HA = 21), <bold>(B)</bold> Thames (HA = 39), <bold>(C)</bold> Wye (Hereford; HA = 55), and <bold>(D)</bold> Tay (HA = 15). Note line widths are proportional to Strahler stream order (i.e., thicker for larger streams downstream). The maps on the right overlay the spring nitrate predictions on a UK map. In the right column, spring nitrate results are overlaid on a base map. To view results from other HAs, go to the following web application: <ext-link ext-link-type="uri" xlink:href="https://moisture-wqmlviewer.datalabs.ceh.ac.uk/wqml_viewer">https://moisture-wqmlviewer.datalabs.ceh.ac.uk/wqml_viewer</ext-link>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frwa-05-1244024-g0007.tif"/>
</fig></sec></sec>
<sec id="s5">
<title>4. Discussion</title>
<sec>
<title>4.1. Key findings</title>
<p>We presented a flexible modelling framework that mapped point observations of river water quality of more than 200,000 river reaches across GB using machine learning. Our key findings are as follows:</p>
<list list-type="bullet">
<list-item><p><italic>Model skill</italic>: The modelling approach we developed was able to estimate nitrate and orthophosphate levels with higher skill than existing statistical modelling approaches (Rothwell et al., <xref ref-type="bibr" rid="B52">2010</xref>). A testing <italic>R</italic><sup>2</sup> of 0.71 and 0.58, was attained for nitrate and orthophosphate, respectively.</p></list-item>
<list-item><p><italic>Flexibility</italic>: Our modelling approach is highly flexible. After matching the input features and observations to the river reaches, they could be used to build machine learning models without geographical or network information. This meant that input datasets required no modifications for commonly used machine learning methods to be applied. After the machine learning predictions are made, they can be mapped back to the river network (Section 2.1). Therefore, our method is applicable in all situations where (i) a high-resolution river network graph is available and (ii) input features and observations can be mapped to the graph. Features that were not considered in this study can be easily incorporated.</p></list-item>
<list-item><p><italic>Stream order</italic>: Plotting river-reach concentration predictions with stream order information is highly informative as it allows the visualisation of the evolution of nutrient levels downstream. Further, our model performs better with streams with higher Strathler stream order. This may be due to challenges in accurately linking catchment and land cover attributes to small streams with fewer observations.</p></list-item>
</list>
<p>The ability to estimate water quality at every point in a river network [based on the models of everywhere concept (Beven and Alcock, <xref ref-type="bibr" rid="B5">2012</xref>; Blair et al., <xref ref-type="bibr" rid="B7">2019</xref>)] has huge potential to revolutionise environmental science. For example, the chemical levels or other properties at any point in the river network can be queried, the effects along reaches on downstream biodiversity can be studied, and the cumulative exposure to a chemical to an organism can be calculated based on their trajectory in a simple and straightforward manner.</p></sec>
<sec>
<title>4.2. Drivers for water quality variability in the GB river network</title>
<p>Predicting river water quality using catchment characteristics (Davies and Neal, <xref ref-type="bibr" rid="B14">2004</xref>, <xref ref-type="bibr" rid="B15">2007</xref>; Rothwell et al., <xref ref-type="bibr" rid="B52">2010</xref>; Oehler and Elliott, <xref ref-type="bibr" rid="B46">2011</xref>; Lintern et al., <xref ref-type="bibr" rid="B37">2018</xref>) and land use (Jarvie et al., <xref ref-type="bibr" rid="B30">2008</xref>; Hutchins et al., <xref ref-type="bibr" rid="B27">2010</xref>; Worrall et al., <xref ref-type="bibr" rid="B66">2012</xref>) has been common practise. However, existing methods rely on multi-linear relationships between these characteristics and water quality, and they have rarely been applied at the national level. Furthermore, previous catchment characteristics and land use attributes are not matched at a fine scale. Similar to the findings by Rothwell et al. (<xref ref-type="bibr" rid="B52">2010</xref>), we found nitrate concentrations in UK rivers highly linked to agricultural land use, while diffuse and point sources (Bowes et al., <xref ref-type="bibr" rid="B9">2008</xref>, <xref ref-type="bibr" rid="B10">2009</xref>) tended to play a major role in orthophosphate concentrations. This was because household sources dominate P loads in many of GB&#x00027;s waters near high population density (White and Hammond, <xref ref-type="bibr" rid="B64">2009</xref>).</p></sec>
<sec>
<title>4.3. Challenges, limitations, and future work</title>
<p>It is important to note that in this study, machine learning predictions were made at each river reach without any reference to their spatial location or connectivity. The land cover and catchment descriptors of each river reach were used as input features in a non-spatial way, and the resultant predictions were mapped back on the river network graph. This offered a very flexible approach to convert point observations of river water quality to maps at the relevant spatial scale (i.e., river reach). This method could be applied to other chemical species or geographical regions. Future studies could also investigate the method&#x00027;s applicability to river biodiversity indicators such as macroinvertebrate abundance (Powell et al., <xref ref-type="bibr" rid="B49">2022</xref>).</p>
<p>A trade-off for the ease of use of our framework was that we did not make explicit assumptions on geostatistics based on distance or connectivity. However, as emerging approaches such as graph neural networks (Sun et al., <xref ref-type="bibr" rid="B58">2022</xref>) or graph Gaussian processes (Pinder et al., <xref ref-type="bibr" rid="B48">2022</xref>) have highlighted the importance of the connectivity of networks and provided more flexible tools to model them, future studies can extend our framework to include geostatistics or network connectivity.</p>
<p>Our study focused on the use of static input features for long-term predictions of water quality. An opposite group of approaches used very high temporal resolution driving data and sparse water quality data, as well as methods such as Long Short-term Memory (LSTM) to model the dynamics of water quality variations at chemically ungauged basins (Zhi et al., <xref ref-type="bibr" rid="B72">2021</xref>). Future studies can consider both static and dynamic input features to obtain predictions that capture spatial trends and temporal variations.</p>
<p>Many physical processes that control the distribution and evolution of nitrate and orthophosphate are not explicitly considered in our study. For instance, the long-term evolution of these chemical species (Bell et al., <xref ref-type="bibr" rid="B4">2021</xref>), the migration of nitrate from land surface to groundwater and its storage in the vadose zone (Wang et al., <xref ref-type="bibr" rid="B62">2016</xref>; Ascott et al., <xref ref-type="bibr" rid="B2">2017</xref>), or the discharge from sewage treatment works (Jarvie et al., <xref ref-type="bibr" rid="B29">2006</xref>; Bowes et al., <xref ref-type="bibr" rid="B8">2010</xref>) have not been explicitly considered. Future studies can also strive to improve the joint use and interpretation of process-based and machine learning water quality model results.</p>
<p>Because of the flexibility of the methods described in this study, they can potentially be applied elsewhere in the world or with different input variables. The water quality portal (Read et al., <xref ref-type="bibr" rid="B50">2017</xref>) in the United States and the Global River Water Quality Archive (Virro et al., <xref ref-type="bibr" rid="B61">2021</xref>) are examples of other centralised databases for water quality measurements where the models from this study can be applied. The availability of global high-resolution river network graphs makes it possible to repeat a similar analysis globally (Linke et al., <xref ref-type="bibr" rid="B36">2019</xref>; Yan et al., <xref ref-type="bibr" rid="B69">2022</xref>). However, if the use of river reach characteristics that are not provided in those graphs is required, users need to match those characteristics to the graphs. For GB, CAMELS-GB (Coxon et al., <xref ref-type="bibr" rid="B13">2020</xref>) may be a richer, alternative dataset that can be matched to the river network graph for an analysis similar to the one presented in this study. Future studies can also compare river network water quality predictions with remote sensing of water quality for inland waters, such as those obtained from AquaSat (Ross et al., <xref ref-type="bibr" rid="B51">2019</xref>).</p>
<p>Finally, the proposed framework may be applied iteratively to optimise the design of water quality monitoring networks. It can be used to design the placement of new point sampling locations or to assess the information content of sampling locations by comparing the resultant reach scale water quality maps.</p></sec></sec>
<sec id="s6">
<title>5. Conclusion</title>
<p>Current methods for water quality mapping are often conducted at a grid-based level, masking the important sense of network connectivity that is intrinsic to rivers. This limits their utility to inform policy and decision-making. While some methods have been developed for mapping river quality in networks, they are often not readily applicable at a national scale.</p>
<p>With the advancement of machine learning and very high-resolution river graphs becoming available at national levels, it becomes possible to map the spatial variability of water quality variables nationally. To our knowledge, this study is the first to predict water quality at each river reach nationally for Great Britain. Our study builds on previous approaches by integrating static variables into seasonal water quality prediction and by demonstrating the use of machine learning to effectively make water quality predictions without the need to specify geostatistical constraints. Mapping the water quality of every British river reach also has the potential to serve as a new fit-for-purpose tool when evaluating the water quality in British rivers (Whelan et al., <xref ref-type="bibr" rid="B63">2022</xref>).</p>
<p>By demonstrating a practical way to map water quality monitoring data from a network of stations to river reaches in an entire country, this study provides a way for reach-scale interrogation of water quality data in decision-making, which allows much more targeted actions to improve and protect water quality in rivers.</p></sec>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The data presented in the study are deposited in the Environmental Information Data Centre, accession number <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5285/ba208b6c-6f1a-43b1-867d-bc1adaff6445">https://doi.org/10.5285/ba208b6c-6f1a-43b1-867d-bc1adaff6445</ext-link>.</p></sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>C-HT: conceptualisation, software, visualisation, and writing&#x02014;original draught preparation. EM: formal analysis, investigation, and software. DH: methodology, investigation, and software. ME: methodology. MF: conceptualisation, data curation, methodology, resources, and supervision. All authors: writing&#x02014;reviewing and editing. All authors contributed to the article and approved the submitted version.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This work was part of the UK-SCAPE: UK Status, Change and Projections of the Environment project, a National Capability award funded by the UK Natural Environmental Research Council (NERC: NE/R016429/1).</p>
</sec>
<ack><p>This study was developed during DH&#x00027;s summer placement at UKCEH as part of his M.Sc. in Data Science dissertation at Lancaster University. We thank Mike Bowes (UKCEH) for his helpful feedback on the manuscript.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s11">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frwa.2023.1244024/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frwa.2023.1244024/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahmed</surname> <given-names>U.</given-names></name> <name><surname>Mumtaz</surname> <given-names>R.</given-names></name> <name><surname>Anwar</surname> <given-names>H.</given-names></name> <name><surname>Shah</surname> <given-names>A. A.</given-names></name> <name><surname>Irfan</surname> <given-names>R.</given-names></name> <name><surname>Garc&#x000ED;a-Nieto</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Efficient water quality prediction using supervised machine learning</article-title>. <source>Water</source> <volume>11</volume>, <fpage>2210</fpage>. <pub-id pub-id-type="doi">10.3390/w11112210</pub-id><pub-id pub-id-type="pmid">36269432</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ascott</surname> <given-names>M. J.</given-names></name> <name><surname>Gooddy</surname> <given-names>D. C.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Stuart</surname> <given-names>M. E.</given-names></name> <name><surname>Lewis</surname> <given-names>M. A.</given-names></name> <name><surname>Ward</surname> <given-names>R. S.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Global patterns of nitrate storage in the vadose zone</article-title>. <source>Nat. Commun</source>. <volume>8</volume>, <fpage>1416</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-017-01321-w</pub-id><pub-id pub-id-type="pmid">29123090</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Aslam</surname> <given-names>J. A.</given-names></name> <name><surname>Popa</surname> <given-names>R. A.</given-names></name> <name><surname>Rivest</surname> <given-names>R. L.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;On estimating the size and confidence of a statistical audit,&#x0201D;</article-title> in <source>Proceedings of the USENIX Workshop on Accurate Electronic Voting Technology, EVT&#x00027;07</source> (<publisher-loc>Philadelphia, PA</publisher-loc>: <publisher-name>USENIX Association</publisher-name>), 8.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>V. A.</given-names></name> <name><surname>Naden</surname> <given-names>P. S.</given-names></name> <name><surname>Tipping</surname> <given-names>E.</given-names></name> <name><surname>Davies</surname> <given-names>H. N.</given-names></name> <name><surname>Carnell</surname> <given-names>E.</given-names></name> <name><surname>Davies</surname> <given-names>J. A. C.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Long term simulations of macronutrients (C, N and P) in UK freshwaters</article-title>. <source>Sci. Total Environ</source>. <volume>776</volume>, <fpage>145813</fpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.145813</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beven</surname> <given-names>K. J.</given-names></name> <name><surname>Alcock</surname> <given-names>R. E.</given-names></name></person-group> (<year>2012</year>). <article-title>Modelling everything everywhere: a new approach to decision-making for water management under uncertainty</article-title>. <source>Freshw. Biol.</source> <volume>57</volume>, <fpage>124</fpage>&#x02013;<lpage>132</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2427.2011.02592.x</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bhattarai</surname> <given-names>A.</given-names></name> <name><surname>Dhakal</surname> <given-names>S.</given-names></name> <name><surname>Gautam</surname> <given-names>Y.</given-names></name> <name><surname>Bhattarai</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Prediction of nitrate and phosphorus concentrations using machine learning algorithms in watersheds with different landuse</article-title>. <source>Water</source> <volume>13</volume>, <fpage>3096</fpage>. <pub-id pub-id-type="doi">10.3390/w13213096</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blair</surname> <given-names>G. S.</given-names></name> <name><surname>Beven</surname> <given-names>K.</given-names></name> <name><surname>Lamb</surname> <given-names>R.</given-names></name> <name><surname>Bassett</surname> <given-names>R.</given-names></name> <name><surname>Cauwenberghs</surname> <given-names>K.</given-names></name> <name><surname>Hankin</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Models of everywhere revisited: a technological perspective</article-title>. <source>Environ. Model. Softw.</source> <volume>122</volume>, <fpage>104521</fpage>. <pub-id pub-id-type="doi">10.1016/j.envsoft.2019.104521</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowes</surname> <given-names>M. J.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name> <name><surname>Jarvie</surname> <given-names>H. P.</given-names></name> <name><surname>Smith</surname> <given-names>J. T.</given-names></name> <name><surname>Davies</surname> <given-names>H. N.</given-names></name></person-group> (<year>2010</year>). <article-title>Predicting phosphorus concentrations in British rivers resulting from the introduction of improved phosphorus removal from sewage effluent</article-title>. <source>Sci. Total Environ</source>. <volume>408</volume>, <fpage>4239</fpage>&#x02013;<lpage>4250</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2010.05.016</pub-id><pub-id pub-id-type="pmid">20547413</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowes</surname> <given-names>M. J.</given-names></name> <name><surname>Smith</surname> <given-names>J. T.</given-names></name> <name><surname>Jarvie</surname> <given-names>H. P.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Modelling of phosphorus inputs to rivers from diffuse and point sources</article-title>. <source>Sci. Total Environ</source>. <volume>395</volume>, <fpage>125</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2008.01.054</pub-id><pub-id pub-id-type="pmid">18367235</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowes</surname> <given-names>M. J.</given-names></name> <name><surname>Smith</surname> <given-names>J. T.</given-names></name> <name><surname>Jarvie</surname> <given-names>H. P.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name> <name><surname>Barden</surname> <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Changes in point and diffuse source phosphorus inputs to the River Frome (Dorset, UK) from 1966 to 2006</article-title>. <source>Sci. Total Environ</source>. <volume>407</volume>, <fpage>1954</fpage>&#x02013;<lpage>1966</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2008.11.026</pub-id><pub-id pub-id-type="pmid">19095288</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn</source>. <volume>45</volume>, <fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Comber</surname> <given-names>S. D. W.</given-names></name> <name><surname>Smith</surname> <given-names>R.</given-names></name> <name><surname>Daldorph</surname> <given-names>P.</given-names></name> <name><surname>Gardner</surname> <given-names>M. J.</given-names></name> <name><surname>Constantino</surname> <given-names>C.</given-names></name> <name><surname>Ellor</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Development of a chemical source apportionment decision support framework for catchment management</article-title>. <source>Environ. Sci. Technol</source>. <volume>47</volume>, <fpage>9824</fpage>&#x02013;<lpage>9832</lpage>. <pub-id pub-id-type="doi">10.1021/es401793e</pub-id><pub-id pub-id-type="pmid">29212057</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coxon</surname> <given-names>G.</given-names></name> <name><surname>Addor</surname> <given-names>N.</given-names></name> <name><surname>Bloomfield</surname> <given-names>J. P.</given-names></name> <name><surname>Freer</surname> <given-names>J.</given-names></name> <name><surname>Fry</surname> <given-names>M.</given-names></name> <name><surname>Hannaford</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain</article-title>. <source>Earth Syst. Sci. Data</source> <volume>12</volume>, <fpage>2459</fpage>&#x02013;<lpage>2483</lpage>. <pub-id pub-id-type="doi">10.5194/essd-12-2459-2020</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>H.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name></person-group> (<year>2004</year>). <article-title>GIS-based methodologies for assessing nitrate, nitrite and ammonium distributions across a major UK basin, the Humber</article-title>. <source>Hydrol. Earth Syst. Sci</source>. <volume>8</volume>, <fpage>823</fpage>&#x02013;<lpage>833</lpage>. <pub-id pub-id-type="doi">10.5194/hess-8-823-2004</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>H.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name></person-group> (<year>2007</year>). <article-title>Estimating nutrient concentrations from catchment characteristics across the UK</article-title>. <source>Hydrol. Earth Syst. Sci</source>. <volume>11</volume>, <fpage>550</fpage>&#x02013;<lpage>558</lpage>. <pub-id pub-id-type="doi">10.5194/hess-11-550-2007</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Demir</surname> <given-names>I.</given-names></name> <name><surname>Szczepanek</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>Optimization of river network representation data models for web-based systems</article-title>. <source>Earth Sp. Sci</source>. <volume>4</volume>, <fpage>336</fpage>&#x02013;<lpage>347</lpage>. <pub-id pub-id-type="doi">10.1002/2016EA000224</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dietterich</surname> <given-names>T.</given-names></name></person-group> (<year>1995</year>). <article-title>Overfitting and undercomputing in machine learning</article-title>. <source>ACM Comput. Surv</source>. <volume>27</volume>, <fpage>326</fpage>&#x02013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1145/212094.212114</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Evans</surname> <given-names>C. D.</given-names></name> <name><surname>Cooper</surname> <given-names>D. M.</given-names></name> <name><surname>Juggins</surname> <given-names>S.</given-names></name> <name><surname>Jenkins</surname> <given-names>A.</given-names></name> <name><surname>Norris</surname> <given-names>D.</given-names></name></person-group> (<year>2006</year>). <article-title>A linked spatial and temporal model of the chemical and biological status of a large, acid-sensitive river network</article-title>. <source>Sci. Total Environ</source>. <volume>365</volume>, <fpage>167</fpage>&#x02013;<lpage>185</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2006.02.037</pub-id><pub-id pub-id-type="pmid">16580046</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frei</surname> <given-names>R. J.</given-names></name> <name><surname>Lawson</surname> <given-names>G. M.</given-names></name> <name><surname>Norris</surname> <given-names>A. J.</given-names></name> <name><surname>Cano</surname> <given-names>G.</given-names></name> <name><surname>Vargas</surname> <given-names>M. C.</given-names></name> <name><surname>Kujanp&#x000E4;&#x000E4;</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Limited progress in nutrient pollution in the U.S. caused by spatially persistent nutrient sources</article-title>. <source>PLoS ONE</source> <volume>16</volume>, <fpage>e0258952</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0258952</pub-id><pub-id pub-id-type="pmid">34843503</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Fry</surname> <given-names>M.</given-names></name> <name><surname>Moore</surname> <given-names>R. V.</given-names></name> <name><surname>Morris</surname> <given-names>D. G.</given-names></name> <name><surname>Flavin</surname> <given-names>R. W.</given-names></name></person-group> (<year>2000</year>). <source>UKCEH Digital River Network of Great Britain (1:50,000)</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://catalogue.ceh.ac.uk/documents/7d5e42b6-7729-46c8-99e9-f9e4efddde1d">https://catalogue.ceh.ac.uk/documents/7d5e42b6-7729-46c8-99e9-f9e4efddde1d</ext-link></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giachetta</surname> <given-names>E.</given-names></name> <name><surname>Willett</surname> <given-names>S. D.</given-names></name></person-group> (<year>2018</year>). <article-title>A global dataset of river network geometry</article-title>. <source>Sci. Data</source> <volume>5</volume>, <fpage>180127</fpage>. <pub-id pub-id-type="doi">10.1038/sdata.2018.127</pub-id><pub-id pub-id-type="pmid">29989592</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Granata</surname> <given-names>F.</given-names></name> <name><surname>Papirio</surname> <given-names>S.</given-names></name> <name><surname>Esposito</surname> <given-names>G.</given-names></name> <name><surname>Gargano</surname> <given-names>R.</given-names></name> <name><surname>De Marinis</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Machine learning algorithms for the forecasting of wastewater quality indicators</article-title>. <source>Water</source> <volume>9</volume>, <fpage>105</fpage>. <pub-id pub-id-type="doi">10.3390/w9020105</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grizzetti</surname> <given-names>B.</given-names></name> <name><surname>Bouraoui</surname> <given-names>F.</given-names></name> <name><surname>de Marsily</surname> <given-names>G.</given-names></name> <name><surname>Bidoglio</surname> <given-names>G.</given-names></name></person-group> (<year>2005</year>). <article-title>A statistical method for source apportionment of riverine nitrogen loads</article-title>. <source>J. Hydrol</source>. <volume>304</volume>, <fpage>302</fpage>&#x02013;<lpage>315</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2004.07.036</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gupta</surname> <given-names>H. V.</given-names></name> <name><surname>Kling</surname> <given-names>H.</given-names></name> <name><surname>Yilmaz</surname> <given-names>K. K.</given-names></name> <name><surname>Martinez</surname> <given-names>G. F.</given-names></name></person-group> (<year>2009</year>). <article-title>Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling</article-title>. <source>J. Hydrol</source>. <volume>377</volume>, <fpage>80</fpage>&#x02013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2009.08.003</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ho</surname> <given-names>T. K.</given-names></name></person-group> (<year>1995</year>). <article-title>&#x0201C;Random decision forests,&#x0201D;</article-title> in <source>Proceedings of 3rd International Conference on Document Analysis and Recognition</source> (<publisher-loc>Montreal</publisher-loc>: <publisher-name>IEEE Computer Society Press</publisher-name>), <fpage>278</fpage>&#x02013;<lpage>282</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Howden</surname> <given-names>N. J. K.</given-names></name> <name><surname>Burt</surname> <given-names>T. P.</given-names></name></person-group> (<year>2009</year>). <article-title>Statistical analysis of nitrate concentrations from the Rivers Frome and Piddle (Dorset, UK) for the period 1965-2007</article-title>. <source>Ecohydrology</source> <volume>2</volume>, <fpage>55</fpage>&#x02013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1002/eco.39</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hutchins</surname> <given-names>M. G.</given-names></name> <name><surname>Deflandre-Vlandas</surname> <given-names>A.</given-names></name> <name><surname>Posen</surname> <given-names>P. E.</given-names></name> <name><surname>Davies</surname> <given-names>H. N.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>How do river nitrate concentrations respond to changes in land-use? A modelling case study of headwaters in the River Derwent Catchment, North Yorkshire, UK</article-title>. <source>Environ. Model. Assess</source>. <volume>15</volume>, <fpage>93</fpage>&#x02013;<lpage>109</lpage>. <pub-id pub-id-type="doi">10.1007/s10666-009-9218-2</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huxley</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <source>Spatiotemporal Analysis of Nitrate and Phosphate in UK River Stretches Using Machine Learning</source>. <publisher-loc>Lancaster</publisher-loc>: <publisher-name>Lancaster University</publisher-name>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jarvie</surname> <given-names>H. P.</given-names></name> <name><surname>Neal</surname> <given-names>C.</given-names></name> <name><surname>Withers</surname> <given-names>P. J. A.</given-names></name></person-group> (<year>2006</year>). <article-title>Sewage-effluent phosphorus: a greater risk to river eutrophication than agricultural phosphorus?</article-title> <source>Sci. Total Environ</source>. <volume>360</volume>, <fpage>246</fpage>&#x02013;<lpage>253</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2005.08.038</pub-id><pub-id pub-id-type="pmid">16226299</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jarvie</surname> <given-names>H. P.</given-names></name> <name><surname>Withers</surname> <given-names>P. J. A.</given-names></name> <name><surname>Hodgkinson</surname> <given-names>R.</given-names></name> <name><surname>Bates</surname> <given-names>A.</given-names></name> <name><surname>Neal</surname> <given-names>M.</given-names></name> <name><surname>Wickham</surname> <given-names>H. D.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Influence of rural land use on streamwater nutrients and their ecological significance</article-title>. <source>J. Hydrol</source>. <volume>350</volume>, <fpage>166</fpage>&#x02013;<lpage>186</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2007.10.042</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kantardzic</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <source>Data Mining: Concepts, Models, Methods, and Algorithms, 3rd Edn</source>. <publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>Wiley-IEEE Press</publisher-name>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kohavi</surname> <given-names>R.</given-names></name> <name><surname>John</surname> <given-names>G. H.</given-names></name></person-group> (<year>1997</year>). <article-title>Wrappers for feature subset selection</article-title>. <source>Artif. Intell</source>. <volume>97</volume>, <fpage>273</fpage>&#x02013;<lpage>324</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(97)00043-X</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lane</surname> <given-names>R. A.</given-names></name> <name><surname>Kay</surname> <given-names>A. L.</given-names></name></person-group> (<year>2021</year>). <article-title>Climate change impact on the magnitude and timing of hydrological extremes across Great Britain</article-title>. <source>Front. Water</source> <volume>3</volume>, <fpage>684982</fpage>. <pub-id pub-id-type="doi">10.3389/frwa.2021.684982</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Compton</surname> <given-names>J. E.</given-names></name> <name><surname>Hill</surname> <given-names>R. A.</given-names></name> <name><surname>Herlihy</surname> <given-names>A. T.</given-names></name> <name><surname>Sabo</surname> <given-names>R. D.</given-names></name> <name><surname>Brooks</surname> <given-names>J. R.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Context is everything: interacting inputs and landscape characteristics control stream nitrogen</article-title>. <source>Environ. Sci. Technol</source>. <volume>55</volume>, <fpage>7890</fpage>&#x02013;<lpage>7899</lpage>. <pub-id pub-id-type="doi">10.1021/acs.est.0c07102</pub-id><pub-id pub-id-type="pmid">34060819</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>P.</given-names></name> <name><surname>Pan</surname> <given-names>M.</given-names></name> <name><surname>Wood</surname> <given-names>E. F.</given-names></name> <name><surname>Yamazaki</surname> <given-names>D.</given-names></name> <name><surname>Allen</surname> <given-names>G. H.</given-names></name></person-group> (<year>2021</year>). <article-title>A new vector-based global river network dataset accounting for variable drainage density</article-title>. <source>Sci. Data</source> <volume>8</volume>, <fpage>28</fpage>. <pub-id pub-id-type="doi">10.1038/s41597-021-00819-9</pub-id><pub-id pub-id-type="pmid">33500418</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Linke</surname> <given-names>S.</given-names></name> <name><surname>Lehner</surname> <given-names>B.</given-names></name> <name><surname>Ouellet Dallaire</surname> <given-names>C.</given-names></name> <name><surname>Ariwi</surname> <given-names>J.</given-names></name> <name><surname>Grill</surname> <given-names>G.</given-names></name> <name><surname>Anand</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Global hydro-environmental sub-basin and river reach characteristics at high spatial resolution</article-title>. <source>Sci. Data</source> <volume>6</volume>, <fpage>283</fpage>. <pub-id pub-id-type="doi">10.1038/s41597-019-0300-6</pub-id><pub-id pub-id-type="pmid">31819059</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lintern</surname> <given-names>A.</given-names></name> <name><surname>Webb</surname> <given-names>J. A.</given-names></name> <name><surname>Ryu</surname> <given-names>D.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Waters</surname> <given-names>D.</given-names></name> <name><surname>Leahy</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>What are the key catchment characteristics affecting spatial differences in riverine water quality?</article-title> <source>Water Resour. Res</source>. <volume>54</volume>, <fpage>7252</fpage>&#x02013;<lpage>7272</lpage>. <pub-id pub-id-type="doi">10.1029/2017WR022172</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Magee</surname> <given-names>E.</given-names></name> <name><surname>Huxley</surname> <given-names>D.</given-names></name> <name><surname>Tso</surname> <given-names>C. M.</given-names></name></person-group> (<year>2023</year>). <source>Random Forest Model to Predict Long-Term Seasonal Nitrate and Orthophosphate Concentrations in British River Reaches. NERC EDS Environmental Information Data Centre</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://catalogue.ceh.ac.uk/documents/ba208b6c-6f1a-43b1-867d-bc1adaff6445">https://catalogue.ceh.ac.uk/documents/ba208b6c-6f1a-43b1-867d-bc1adaff6445</ext-link></citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Morris</surname> <given-names>D. G.</given-names></name> <name><surname>Flavin</surname> <given-names>R. W.</given-names></name></person-group> (<year>1990</year>). <article-title>&#x0201C;A digital terrain model for hydrology,&#x0201D;</article-title> in <source>Proc 4th International Symposium on Spatial Data Handling</source> (<publisher-loc>Z&#x000FC;rich</publisher-loc>), <fpage>250</fpage>&#x02013;<lpage>262</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morton</surname> <given-names>R.</given-names></name> <name><surname>Henderson</surname> <given-names>B. L.</given-names></name></person-group> (<year>2008</year>). <article-title>Estimation of nonlinear trends in water quality: an improved approach using generalized additive models</article-title>. <source>Water Resour. Res</source>. 44. <pub-id pub-id-type="doi">10.1029/2007WR006191</pub-id><pub-id pub-id-type="pmid">19783355</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mueller</surname> <given-names>N. D.</given-names></name> <name><surname>Gerber</surname> <given-names>J. S.</given-names></name> <name><surname>Johnston</surname> <given-names>M.</given-names></name> <name><surname>Ray</surname> <given-names>D. K.</given-names></name> <name><surname>Ramankutty</surname> <given-names>N.</given-names></name> <name><surname>Foley</surname> <given-names>J. A.</given-names></name></person-group> (<year>2012</year>). <article-title>Closing yield gaps through nutrient and water management</article-title>. <source>Nature</source> <volume>490</volume>, <fpage>254</fpage>&#x02013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.1038/nature11420</pub-id><pub-id pub-id-type="pmid">22932270</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Najah Ahmed</surname> <given-names>A.</given-names></name> <name><surname>Binti Othman</surname> <given-names>F.</given-names></name> <name><surname>Abdulmohsin Afan</surname> <given-names>H.</given-names></name> <name><surname>Khaleel Ibrahim</surname> <given-names>R.</given-names></name> <name><surname>Ming Fai</surname> <given-names>C.</given-names></name> <name><surname>Shabbir Hossain</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Machine learning methods for better water quality prediction</article-title>. <source>J. Hydrol</source>. <volume>578</volume>, <fpage>124084</fpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2019.124084</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nash</surname> <given-names>J. E.</given-names></name> <name><surname>Sutcliffe</surname> <given-names>J. V.</given-names></name></person-group> (<year>1970</year>). <article-title>River flow forecasting through conceptual models part I&#x02014;a discussion of principles</article-title>. <source>J. Hydrol</source>. <volume>10</volume>, <fpage>282</fpage>&#x02013;<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1016/0022-1694(70)90255-6</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="web"><person-group person-group-type="author"><collab>National River Flow Archive</collab></person-group> (<year>2014</year>). <source>Hydrometric Areas for Great Britain and Northern Ireland. National River Flow Archive</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://nrfa.ceh.ac.uk/">https://nrfa.ceh.ac.uk/</ext-link></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Donnell</surname> <given-names>D.</given-names></name> <name><surname>Rushworth</surname> <given-names>A.</given-names></name> <name><surname>Bowman</surname> <given-names>A. W.</given-names></name> <name><surname>Scott</surname> <given-names>E. M.</given-names></name> <name><surname>Hallard</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>Flexible regression models over river networks</article-title>. <source>J. R. Stat. Soc. Ser. C</source> <volume>63</volume>, <fpage>12024</fpage>. <pub-id pub-id-type="doi">10.1111/rssc.12024</pub-id><pub-id pub-id-type="pmid">25653460</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oehler</surname> <given-names>F.</given-names></name> <name><surname>Elliott</surname> <given-names>A. H.</given-names></name></person-group> (<year>2011</year>). <article-title>Predicting stream N and P concentrations from loads and catchment characteristics at regional scale: a concentration ratio method</article-title>. <source>Sci. Total Environ</source>. <volume>409</volume>, <fpage>5392</fpage>&#x02013;<lpage>5402</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2011.08.025</pub-id><pub-id pub-id-type="pmid">21962928</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Sullivan</surname> <given-names>C. M.</given-names></name> <name><surname>Ghahramani</surname> <given-names>A.</given-names></name> <name><surname>Deo</surname> <given-names>R. C.</given-names></name> <name><surname>Pembleton</surname> <given-names>K.</given-names></name> <name><surname>Khan</surname> <given-names>U.</given-names></name> <name><surname>Tuteja</surname> <given-names>N.</given-names></name></person-group> (<year>2022</year>). <article-title>Classification of catchments for nitrogen using Artificial Neural Network Pattern Recognition and spatial data</article-title>. <source>Sci. Total Environ</source>. <volume>809</volume>, <fpage>151139</fpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.151139</pub-id><pub-id pub-id-type="pmid">34757101</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pinder</surname> <given-names>T.</given-names></name> <name><surname>Turnbull</surname> <given-names>K.</given-names></name> <name><surname>Nemeth</surname> <given-names>C.</given-names></name> <name><surname>Leslie</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Street-level air pollution modelling with graph gaussian processes,&#x0201D;</article-title> <italic>in ICLR: AI for Earth and Space Science</italic>.</citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Powell</surname> <given-names>K. E.</given-names></name> <name><surname>Oliver</surname> <given-names>T. H.</given-names></name> <name><surname>Johns</surname> <given-names>T.</given-names></name> <name><surname>Gonz&#x000E1;lez-Su&#x000E1;rez</surname> <given-names>M.</given-names></name> <name><surname>England</surname> <given-names>J.</given-names></name> <name><surname>Roy</surname> <given-names>D. B.</given-names></name></person-group> (<year>2022</year>). <article-title>Abundance trends for river macroinvertebrates vary across taxa, trophic group and river typology</article-title>. <source>Glob. Chang. Biol</source>. <volume>29</volume>, <fpage>1282</fpage>&#x02013;<lpage>1295</lpage>. <pub-id pub-id-type="doi">10.1111/gcb.16549</pub-id><pub-id pub-id-type="pmid">36462155</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Read</surname> <given-names>E. K.</given-names></name> <name><surname>Carr</surname> <given-names>L.</given-names></name> <name><surname>De Cicco</surname> <given-names>L.</given-names></name> <name><surname>Dugan</surname> <given-names>H. A.</given-names></name> <name><surname>Hanson</surname> <given-names>P. C.</given-names></name> <name><surname>Hart</surname> <given-names>J. A.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Water quality data for national-scale aquatic research: the Water Quality Portal</article-title>. <source>Water Resour. Res</source>. <volume>53</volume>, <fpage>1735</fpage>&#x02013;<lpage>1745</lpage>. <pub-id pub-id-type="doi">10.1002/2016WR019993</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ross</surname> <given-names>M. R. V.</given-names></name> <name><surname>Topp</surname> <given-names>S. N.</given-names></name> <name><surname>Appling</surname> <given-names>A. P.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name> <name><surname>Kuhn</surname> <given-names>C.</given-names></name> <name><surname>Butman</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>AquaSat: a data set to enable remote sensing of water quality for inland waters</article-title>. <source>Water Resour. Res</source>. <volume>55</volume>, <fpage>10012</fpage>&#x02013;<lpage>10025</lpage>. <pub-id pub-id-type="doi">10.1029/2019WR024883</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rothwell</surname> <given-names>J. J.</given-names></name> <name><surname>Dise</surname> <given-names>N. B.</given-names></name> <name><surname>Taylor</surname> <given-names>K. G.</given-names></name> <name><surname>Allott</surname> <given-names>T. E. H.</given-names></name> <name><surname>Scholefield</surname> <given-names>P.</given-names></name> <name><surname>Davies</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Predicting river water quality across North West England using catchment characteristics</article-title>. <source>J. Hydrol</source>. <volume>395</volume>, <fpage>153</fpage>&#x02013;<lpage>162</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2010.10.015</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowland</surname> <given-names>C. S.</given-names></name> <name><surname>Morton</surname> <given-names>R. D.</given-names></name> <name><surname>Carrasco</surname> <given-names>L.</given-names></name> <name><surname>McShane</surname> <given-names>G.</given-names></name> <name><surname>O&#x00027;Neil</surname> <given-names>A. W.</given-names></name> <name><surname>Wood</surname> <given-names>C. M.</given-names></name></person-group> (<year>2017</year>). <source>Land Cover Map 2015 (1 km Percentage Aggregate Class, GB)</source>. NERC EDS Environmental Information Data Centre (Dataset). <pub-id pub-id-type="doi">10.5285/7115bc48-3ab0-475d-84ae-fd3126c20984</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sarker</surname> <given-names>S.</given-names></name> <name><surname>Veremyev</surname> <given-names>A.</given-names></name> <name><surname>Boginski</surname> <given-names>V.</given-names></name> <name><surname>Singh</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Critical nodes in river networks</article-title>. <source>Sci. Rep</source>. <volume>9</volume>, <fpage>11178</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-019-47292-4</pub-id><pub-id pub-id-type="pmid">31371735</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>L. Q.</given-names></name> <name><surname>Amatulli</surname> <given-names>G.</given-names></name> <name><surname>Sethi</surname> <given-names>T.</given-names></name> <name><surname>Raymond</surname> <given-names>P.</given-names></name> <name><surname>Domisch</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework</article-title>. <source>Sci. Data</source> <volume>7</volume>, <fpage>161</fpage>. <pub-id pub-id-type="doi">10.1038/s41597-020-0478-7</pub-id><pub-id pub-id-type="pmid">32467642</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>R. A.</given-names></name> <name><surname>Schwarz</surname> <given-names>G. E.</given-names></name> <name><surname>Alexander</surname> <given-names>R. B.</given-names></name></person-group> (<year>1997</year>). <article-title>Regional interpretation of water-quality monitoring data</article-title>. <source>Water Resour. Res</source>. <volume>33</volume>, <fpage>2781</fpage>&#x02013;<lpage>2798</lpage>. <pub-id pub-id-type="doi">10.1029/97WR02171</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srivastava</surname> <given-names>N.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name> <name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>. <source>J. Mach. Learn. Res</source>. <volume>15</volume>, <fpage>1929</fpage>&#x02013;<lpage>1958</lpage>.<pub-id pub-id-type="pmid">33259321</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>A. Y.</given-names></name> <name><surname>Jiang</surname> <given-names>P.</given-names></name> <name><surname>Yang</surname> <given-names>Z.-L.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name></person-group> (<year>2022</year>). <article-title>A graph neural network approach to basin-scale river network learning: the role of physics-based connectivity and data fusion</article-title>. <source>Hydrol. Earth Syst. Sci. Discuss</source>. <volume>2022</volume>, <fpage>1</fpage>&#x02013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.5194/hess-26-5163-2022</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tyralis</surname> <given-names>H.</given-names></name> <name><surname>Papacharalampous</surname> <given-names>G.</given-names></name> <name><surname>Langousis</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>A brief review of random forests for water scientists and practitioners and their recent history in water resources</article-title>. <source>Water</source> <volume>11</volume>, <fpage>910</fpage>. <pub-id pub-id-type="doi">10.3390/w11050910</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vergopolan</surname> <given-names>N.</given-names></name> <name><surname>Xiong</surname> <given-names>S.</given-names></name> <name><surname>Estes</surname> <given-names>L.</given-names></name> <name><surname>Wanders</surname> <given-names>N.</given-names></name> <name><surname>Chaney</surname> <given-names>N. W.</given-names></name> <name><surname>Wood</surname> <given-names>E. F.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Field-scale soil moisture bridges the spatial-scale gap between drought monitoring and agricultural yields</article-title>. <source>Hydrol. Earth Syst. Sci</source>. <volume>25</volume>, <fpage>1827</fpage>&#x02013;<lpage>1847</lpage>. <pub-id pub-id-type="doi">10.5194/hess-25-1827-2021</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Virro</surname> <given-names>H.</given-names></name> <name><surname>Amatulli</surname> <given-names>G.</given-names></name> <name><surname>Kmoch</surname> <given-names>A.</given-names></name> <name><surname>Shen</surname> <given-names>L.</given-names></name> <name><surname>Uuemaa</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <article-title>GRQA: global river water quality archive</article-title>. <source>Earth Syst. Sci. Data</source> <volume>13</volume>, <fpage>5483</fpage>&#x02013;<lpage>5507</lpage>. <pub-id pub-id-type="doi">10.5194/essd-13-5483-2021</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Stuart</surname> <given-names>M. E.</given-names></name> <name><surname>Lewis</surname> <given-names>M. A.</given-names></name> <name><surname>Ward</surname> <given-names>R. S.</given-names></name> <name><surname>Skirvin</surname> <given-names>D.</given-names></name> <name><surname>Naden</surname> <given-names>P. S.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>The changing trend in nitrate concentrations in major aquifers due to historical nitrate loading from agricultural land across England and Wales from 1925 to 2150</article-title>. <source>Sci. Total Environ</source>. <volume>542</volume>, <fpage>694</fpage>&#x02013;<lpage>705</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2015.10.127</pub-id><pub-id pub-id-type="pmid">26546765</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whelan</surname> <given-names>M. J.</given-names></name> <name><surname>Linstead</surname> <given-names>C.</given-names></name> <name><surname>Worrall</surname> <given-names>F.</given-names></name> <name><surname>Ormerod</surname> <given-names>S. J.</given-names></name> <name><surname>Durance</surname> <given-names>I.</given-names></name> <name><surname>Johnson</surname> <given-names>A. C.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Is water quality in British rivers &#x0201C;better than at any time since the end of the Industrial Revolution&#x0201D;?</article-title> <source>Sci. Total Environ</source>. <volume>843</volume>, <fpage>157014</fpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2022.157014</pub-id><pub-id pub-id-type="pmid">35772542</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>White</surname> <given-names>P. J.</given-names></name> <name><surname>Hammond</surname> <given-names>J. P.</given-names></name></person-group> (<year>2009</year>). <article-title>The sources of phosphorus in the waters of Great Britain</article-title>. <source>J. Environ. Qual</source>. <volume>38</volume>, <fpage>13</fpage>&#x02013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.2134/jeq2007.0658</pub-id><pub-id pub-id-type="pmid">19141791</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whitehaed</surname> <given-names>P.</given-names></name> <name><surname>Wilson</surname> <given-names>E.</given-names></name> <name><surname>Butterfield</surname> <given-names>D.</given-names></name></person-group> (<year>1998</year>). <article-title>A semi-distributed ntegrated itrogen model for multiple source assessment in tchments (INCA): part I&#x02014;model structure and process equations</article-title>. <source>Sci. Total Environ</source>. 210&#x02013;<volume>211</volume>, <fpage>547</fpage>&#x02013;<lpage>558</lpage>. <pub-id pub-id-type="doi">10.1016/S0048-9697(98)00037-0</pub-id></citation>
</ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Worrall</surname> <given-names>F.</given-names></name> <name><surname>Davies</surname> <given-names>H.</given-names></name> <name><surname>Burt</surname> <given-names>T.</given-names></name> <name><surname>Howden</surname> <given-names>N. J. K.</given-names></name> <name><surname>Whelan</surname> <given-names>M. J.</given-names></name> <name><surname>Bhogal</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>The flux of dissolved nitrogen from the UK&#x02014;evaluating the role of soils and land use</article-title>. <source>Sci. Total Environ</source>. <volume>434</volume>, <fpage>90</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2012.01.035</pub-id><pub-id pub-id-type="pmid">22424770</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>Z.</given-names></name> <name><surname>Kuang</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>C.</given-names></name> <name><surname>Xiao</surname> <given-names>L.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>An alternative to laboratory testing: random forest-based water quality prediction framework for inland and nearshore water bodies</article-title>. <source>Water</source> <volume>13</volume>, <fpage>3262</fpage>. <pub-id pub-id-type="doi">10.3390/w13223262</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yadav</surname> <given-names>S.</given-names></name> <name><surname>Shukla</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification,&#x0201D;</article-title> in <source>2016 IEEE 6th International Conference on Advanced Computing (IACC)</source> (<publisher-loc>Bhimavaram</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>78</fpage>&#x02013;<lpage>83</lpage>.</citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Feng</surname> <given-names>J.</given-names></name> <name><surname>Dong</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>A data set of global river networks and corresponding water resources zones divisions v2</article-title>. <source>Sci. Data</source> <volume>9</volume>, <fpage>770</fpage>. <pub-id pub-id-type="doi">10.1038/s41597-022-01888-0</pub-id><pub-id pub-id-type="pmid">36522353</pub-id></citation></ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>G.</given-names></name> <name><surname>Moyer</surname> <given-names>D. L.</given-names></name></person-group> (<year>2020</year>). <article-title>Estimation of nonlinear water-quality trends in high-frequency monitoring data</article-title>. <source>Sci. Total Environ</source>. <volume>715</volume>, <fpage>136686</fpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2020.136686</pub-id><pub-id pub-id-type="pmid">32032984</pub-id></citation></ref>
<ref id="B71">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>C.</given-names></name></person-group> (<year>2012</year>). <source>Ensemble Machine Learning: Methods and Applications</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhi</surname> <given-names>W.</given-names></name> <name><surname>Feng</surname> <given-names>D.</given-names></name> <name><surname>Tsai</surname> <given-names>W. P.</given-names></name> <name><surname>Sterle</surname> <given-names>G.</given-names></name> <name><surname>Harpold</surname> <given-names>A.</given-names></name> <name><surname>Shen</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>From hydrometeorology to river water quality: can a deep learning model predict dissolved oxygen at the continental scale?</article-title> <source>Environ. Sci. Technol</source>. <volume>55</volume>, <fpage>2357</fpage>&#x02013;<lpage>2368</lpage>. <pub-id pub-id-type="doi">10.1021/acs.est.0c06783</pub-id><pub-id pub-id-type="pmid">33533608</pub-id></citation></ref>
<ref id="B73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zounemat-Kermani</surname> <given-names>M.</given-names></name> <name><surname>Batelaan</surname> <given-names>O.</given-names></name> <name><surname>Fadaee</surname> <given-names>M.</given-names></name> <name><surname>Hinkelmann</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Ensemble machine learning paradigms in hydrology: a review</article-title>. <source>J. Hydrol</source>. <volume>598</volume>, <fpage>126266</fpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2021.126266</pub-id></citation>
</ref>
</ref-list> 
</back>
</article> 