<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Remote Sens.</journal-id>
<journal-title>Frontiers in Remote Sensing</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Remote Sens.</abbrev-journal-title>
<issn pub-type="epub">2673-6187</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1531097</article-id>
<article-id pub-id-type="doi">10.3389/frsen.2025.1531097</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Remote Sensing</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Choosing blocks for spatial cross-validation: lessons from a marine remote sensing case study</article-title>
<alt-title alt-title-type="left-running-head">Stock</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/frsen.2025.1531097">10.3389/frsen.2025.1531097</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Stock</surname>
<given-names>Andy</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2900337/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/project-administration/"/>
<role content-type="https://credit.niso.org/contributor-roles/resources/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>NIVA Denmark Water Research</institution>, <addr-line>Copenhagen</addr-line>, <country>Denmark</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Norwegian Institute for Water Research</institution>, <institution>Section for Environmental Informatics</institution>, <addr-line>Oslo</addr-line>, <country>Norway</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1857069/overview">Tongwen Li</ext-link>, Sun Yat-sen University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/116252/overview">Thomas Groen</ext-link>, University of Twente, Netherlands</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/2121190/overview">Qianqian Yang</ext-link>, Wuhan University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Andy Stock, <email>anc@niva.no</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>21</day>
<month>03</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>6</volume>
<elocation-id>1531097</elocation-id>
<history>
<date date-type="received">
<day>19</day>
<month>11</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>03</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2025 Stock.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Stock</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Supervised learning allows broad-scale mapping of variables measured at discrete points in space and time, e.g., by combining satellite and <italic>in situ</italic> data. However, it can fail to make accurate predictions in new locations without training data. Training and testing data must be sufficiently separated to detect such failures and select models that make good predictions across the study region. Spatial block cross-validation, which splits the data into spatial blocks left out for testing one after the other, is a key tool for this purpose. However, it requires choices such as the size and shape of spatial blocks. Here, we ask, how do such choices affect estimates of prediction accuracy? We tested spatial cross-validation strategies differing in block size, shape, number of folds, and assignment of blocks to folds with 1,426 synthetic data sets mimicking a marine remote sensing application (satellite mapping of chlorophyll <italic>a</italic> in the Baltic Sea). With synthetic data, prediction errors were known across the study region, allowing comparisons of how well spatial cross-validation with different blocks estimated them. The most important methodological choice was the block size. The block shape, number of folds, and assignment to folds had minor effects on the estimated errors. Overall, the best blocking strategy was the one that best reflected the data and application: leaving out whole subbasins of the study region for testing. Correlograms of the predictors helped choose a good block size. While all approaches with sufficiently large blocks worked well, none gave unbiased error estimates in all tests, and large blocks sometimes led to an overestimation of errors. Furthermore, even the best choice of blocks reduced but did not eliminate a bias to select too complex models. These results 1) yield practical lessons for testing spatial predictive models in remote sensing and other applications, 2) highlight the limitations of model testing by splitting a single data set, even when following elaborate and theoretically sound splitting strategies; and 3) help explain contradictions between past studies evaluating cross-validation methods and model transferability in remote sensing and other spatial applications of supervised learning.</p>
</abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>satellite</kwd>
<kwd>ocean color</kwd>
<kwd>random forest</kwd>
<kwd>Baltic Sea</kwd>
<kwd>accuracy</kwd>
<kwd>autocorrelation</kwd>
<kwd>supervised learning</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Data Fusion and Assimilation</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Supervised learning is a critical tool for mapping environmental variables like marine chlorophyll <italic>a</italic>, land cover types, and species distributions at broad spatial scales (<xref ref-type="bibr" rid="B18">Elith and Leathwick, 2009</xref>; <xref ref-type="bibr" rid="B28">Kerr and Ostrovsky, 2003</xref>; <xref ref-type="bibr" rid="B64">Tuia et al., 2022</xref>). In supervised learning, training a model involves extracting relationships between output (response) and input (predictor) variables from example data. In this way, supervised learning allows the continuous mapping of variables measured at discrete points in space and time. In marine satellite remote sensing, which serves as a case study here, common supervised learning approaches range from simple linear regression (e.g., <xref ref-type="bibr" rid="B13">Darecki et al., 2005</xref>; <xref ref-type="bibr" rid="B29">Kratzer et al., 2003</xref>; <xref ref-type="bibr" rid="B41">O&#x27;Reilly et al., 1998</xref>; <xref ref-type="bibr" rid="B42">O&#x27;Reilly and Werdell, 2019</xref>) to complicated machine learning methods (e.g., <xref ref-type="bibr" rid="B27">Kattenborn et al., 2021</xref>; <xref ref-type="bibr" rid="B73">Yuan et al., 2020</xref>; <xref ref-type="bibr" rid="B74">Zhang et al., 2023</xref>).</p>
<p>These models typically rely on <italic>in situ</italic> observations of the response variable for training and validation. A sound sampling design is critical when collecting <italic>in situ</italic> data for this purpose (<xref ref-type="bibr" rid="B52">Rocha et al., 2020</xref>). However, collecting data at sea over broad spatial scales and according to a sound sampling design would be extremely expensive. Therefore, to obtain sufficiently large <italic>in situ</italic> data sets, many broad-scale marine studies rely on databases that compile measurements from individual field campaigns with different objectives and without an overarching sampling strategy. Such data often have substantial spatial biases, i.e., some places are well-covered by data, whereas others have little or no data (<xref ref-type="bibr" rid="B8">Boakes et al., 2010</xref>; <xref ref-type="bibr" rid="B9">Bowler et al., 2022</xref>; <xref ref-type="bibr" rid="B59">Stock and Subramaniam, 2020</xref>). The spatial biases in such databases pose a critical statistical challenge in supervised-learning-based marine remote sensing (<xref ref-type="bibr" rid="B56">Stock, 2022</xref>).</p>
<p>A key question about models intended to generate broad-scale maps is how well they make predictions across the whole region of interest, including data-poor subregions (<xref ref-type="bibr" rid="B45">Peterson et al., 2007</xref>; <xref ref-type="bibr" rid="B49">Qiao et al., 2019</xref>; <xref ref-type="bibr" rid="B59">Stock and Subramaniam, 2020</xref>; <xref ref-type="bibr" rid="B72">Yates et al., 2018</xref>). Researchers traditionally evaluate and compare models by randomly splitting the available data into a training set for fitting the model and a test (or validation) set for estimating its prediction accuracy (sometimes, an additional development set is used for model selection and fine-tuning). This split can be done once or repeatedly in cross-validation. However, evaluating models based on random splits produces misleading results in many remote sensing and other environmental applications that involve spatial data (<xref ref-type="bibr" rid="B20">Fourcade et al., 2018</xref>; <xref ref-type="bibr" rid="B47">Ploton et al., 2020</xref>; <xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>). In particular, environmental variables are often spatially autocorrelated (<xref ref-type="bibr" rid="B31">Legendre, 1993</xref>), making nearby observations dependent. Dependence between training and testing data violates a core assumption of many statistical methods (<xref ref-type="bibr" rid="B2">Arlot and Celisse, 2010</xref>; <xref ref-type="bibr" rid="B39">Nikparvar and Thill, 2021</xref>), causes the selection of too complex models that do not generalize well (<xref ref-type="bibr" rid="B23">Gregr et al., 2019</xref>), and is a key driver of data leakage, a common cause of wrong results in scientific applications of supervised learning (<xref ref-type="bibr" rid="B25">Kapoor and Narayanan, 2023</xref>).</p>
<p>Two factors exacerbate these statistical problems as the popularity of machine learning as a scientific tool is rising, and machine learning is claimed to be superior to simpler statistical approaches (<xref ref-type="bibr" rid="B46">Pichler and Hartig, 2023</xref>). First, machine learning models can easily pick up location-specific relationships that fail to transfer to new locations (<xref ref-type="bibr" rid="B5">Beery et al., 2018</xref>), yet such failures are missed when training and testing data come from the same locations (<xref ref-type="bibr" rid="B57">Stock et al., 2023</xref>). Second, machine learning methods are rarely tailored to the limitations of typical environmental data, such as autocorrelated observations taken near each other. Ideally, models intended to make predictions for data-poor locations or to yield generalizable insights should be tested with independent, out-of-distribution data (<xref ref-type="bibr" rid="B1">Ara&#xfa;jo et al., 2005</xref>; <xref ref-type="bibr" rid="B21">Geirhos et al., 2020</xref>; <xref ref-type="bibr" rid="B23">Gregr et al., 2019</xref>), yet such data are rarely available.</p>
<p>When only a single data set is available for model training and testing, cross-validation can mimic tests with independent data and extrapolation to data-poor regions by separating training and testing data spatially, temporally, or in predictor space (<xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>; <xref ref-type="bibr" rid="B70">Wenger and Olden, 2012</xref>). However, separating training and testing data does not guarantee sound error estimates for two reasons. First, if some subregions of the study area have no data, error estimates calculated for held-out subregions with data are not necessarily valid for subregions without data (for a method to estimate the area where a cross-validated error estimate applies, see <xref ref-type="bibr" rid="B36">Meyer and Pebesma, 2021</xref>). Second, the data being split might contain non-spatial biases and shortcuts. A sound data separation strategy is therefore necessary, but not sufficient, to avoid data leakage and obtain sound estimates of a spatial model&#x2019;s prediction accuracy (<xref ref-type="bibr" rid="B25">Kapoor and Narayanan, 2023</xref>; <xref ref-type="bibr" rid="B57">Stock et al., 2023</xref>).</p>
<p>Two main approaches exist for separating training and testing data spatially. First, one can leave out one observation at a time for testing and withhold all data within a spatial buffer around the test observation from training (<xref ref-type="bibr" rid="B32">Le Rest et al., 2013</xref>; <xref ref-type="bibr" rid="B33">2014</xref>; <xref ref-type="bibr" rid="B48">Pohjankukka et al., 2017</xref>). Second, one can split the data into blocks based geographical space (block cross-validation; <xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>; <xref ref-type="bibr" rid="B62">Sweet et al., 2023</xref>). Spatial cross-validation strategies yield better error estimates under spatial dependence and are hence a key tool in many environmental applications (<xref ref-type="bibr" rid="B4">Bald et al., 2023</xref>; <xref ref-type="bibr" rid="B12">Crego et al., 2022</xref>; <xref ref-type="bibr" rid="B17">El-Gabbas et al., 2021</xref>; <xref ref-type="bibr" rid="B54">Smith et al., 2021</xref>; <xref ref-type="bibr" rid="B58">Stock et al., 2018</xref>). An R package for spatial cross-validation is available (<xref ref-type="bibr" rid="B66">Valavi et al., 2019</xref>). However, spatial cross-validation remains underused in marine remote sensing and requires methodological choices such as the size and shape of spatial blocks.</p>
<p>Here, we explore how such choices affect error estimates with synthetic data that mimic a marine remote sensing application. With this example, we aim to inform the evaluation of predictive models in applications that 1) use supervised learning in satellite remote sensing or to create other broad-scale maps from point data, 2) must split a single data set for training and testing, and 3) rely on point data that were collected without an overarching sampling strategy, e.g., obtained from databases combining measurements from many individual field campaigns. Specifically, we ask: How do block size, shape, the number of cross-validation folds, and assignment of blocks to folds affect prediction error estimates and model selection? Which of these choices is most important? Might such choices explain contradictory results between prior studies comparing spatial cross-validation methods and testing the spatial transferability of models?</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>2 Materials and methods</title>
<sec id="s2-1">
<title>2.1 Overview</title>
<p>To answer our research questions, we exploit synthetic data that mimic a remote sensing application in marine biology (<xref ref-type="bibr" rid="B56">Stock, 2022</xref>). These data cover the Baltic Sea in northern Europe from 2003 to 2019. They consist of many individual data sets (henceforth, subsets) with geographic points (measurement locations and dates only) extracted from an oceanographic database. Each data point contains a response variable (synthetic chlorophyll <italic>a</italic> concentration) and satellite-based predictors (remote sensing reflectance in different wavelength bands) for these locations and dates where actual, <italic>in situ</italic> chlorophyll measurements existed. With each subset, three models of different complexity were trained and evaluated with various cross-validation strategies. Using a synthetic response variable that was generated with a model instead of values measured <italic>in situ</italic> allowed for calculating the models&#x2019; &#x201c;true&#x201d; prediction error across the study region and period, which were compared to cross-validated estimates limited to using the subsets, i.e., locations and dates where real <italic>in situ</italic> data existed (<xref ref-type="fig" rid="F1">Figure 1</xref>). Importantly, &#x201c;true&#x201d; error here refers to a model&#x2019;s prediction error in its intended task (generating daily maps of synthetic chlorophyll <italic>a</italic> for the whole Baltic Sea), not its skill predicting real-world, <italic>in situ</italic> chlorophyll <italic>a</italic> concentration.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Overview of data sources and study design.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g001.tif"/>
</fig>
</sec>
<sec id="s2-2">
<title>2.2 Synthetic data</title>
<p>The synthetic data were developed in four steps outlined below to support the comparison of validation methods in a realistic use case of supervised learning. Additional details are provided in <xref ref-type="bibr" rid="B56">Stock (2022)</xref>.</p>
<p>First, to create synthetic data with realistic distributions in space and time, we extracted locations and times of <italic>in situ</italic> chlorophyll <italic>a</italic> measurements from an oceanographic database (<ext-link ext-link-type="uri" xlink:href="http://ocean.ices.dk/HydChem">http://ocean.ices.dk/HydChem</ext-link>, accessed 31 August 2020). Such data are typically collected from ships during research cruises over many years. During cruises, researchers choose measurement locations based on the cruise&#x2019;s scientific objectives instead of an overarching sampling strategy for the database. We excluded <italic>in situ</italic> measurement locations within 5&#xa0;km from the coastline, made at depths &#x3e;2&#xa0;m, and with chlorophyll <italic>a</italic> concentrations &#x3e;30&#xa0;mg&#xa0;m<sup>&#x2212;3</sup>.</p>
<p>Second, for predictors, each <italic>in situ</italic> observation was matched with satellite measurements of remote sensing reflectance in five wavelength bands (412&#xa0;nm, 443&#xa0;nm, 490&#xa0;nm, 555&#xa0;nm and 670&#xa0;nm: <ext-link ext-link-type="uri" xlink:href="http://globcolour.info/">http://globcolour.info</ext-link>, accessed 4 September 2020). The satellite data came from the GlobColour project, which combines data from several satellite-borne instruments to improve spatiotemporal coverage (<xref ref-type="bibr" rid="B19">Fanton d&#x2019;Andon et al., 2009</xref>; <xref ref-type="bibr" rid="B35">Maritorena et al., 2010</xref>). The spatial resolution was 4km, and the temporal resolution was 1&#xa0;day. Because clouds often obscure satellite views of the sea surface, many field observations had no matching satellite data. This reduces the number of usable observations and can introduce additional spatiotemporal biases due to uneven cloud cover (<xref ref-type="bibr" rid="B61">Stock et al., 2020</xref>). We matched the <italic>in situ</italic> and the satellite data with a same-calendar-day temporal window and bilinear interpolation from the four surrounding pixels, yielding 2,728 <italic>in situ</italic> observations with matching satellite data (henceforth, matchups: <xref ref-type="fig" rid="F2">Figure 2A</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Map and selected statistics of the synthetic data used to evaluate cross-validation methods. <bold>(A)</bold> Locations of <italic>in situ</italic> observations of chlorophyll <italic>a</italic> with matching satellite data (which the subsets were sampled from) and the number of subsets that had observations within 50&#xa0;km. <bold>(B, C)</bold> Two example subsets. <bold>(D)</bold> Histogram of synthetic chlorophyll <italic>a</italic> concentration used as response variable for the data from which the subsets were sampled. <bold>(E)</bold> Number of observations in the subsets. <bold>(F)</bold> Percent of the study area with at least one observation within 50&#xa0;km across the generated subsets.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g002.tif"/>
</fig>
<p>Third, to compare how well cross-validated error estimates approximated &#x201c;true&#x201d; prediction errors for the whole study region and period, the <italic>in situ</italic> chlorophyll <italic>a</italic> concentrations were replaced with synthetic values. These values were the weighted average of two sources with 4&#xa0;km spatial and 1-day temporal resolution: 1) a biogeochemical simulation model of the Baltic Sea with 60% weight (Baltic Sea Biogeochemical Reanalysis, <ext-link ext-link-type="uri" xlink:href="https://marine.copernicus.eu/">https://marine.copernicus.eu</ext-link>, accessed 31 August 2020), and 2) existing satellite-based maps of chlorophyll <italic>a</italic>, also from the GlobColour project, with 40% weight (these maps were previously generated with the same remote sensing reflectance data but another algorithm, and hence reflected some spatial patterns of the predictors). The averaging was necessary because simulated chlorophyll <italic>a</italic> was less correlated with remote sensing reflectance and with the original <italic>in situ</italic> measurements than in most real applications, whereas the satellite product could have been too easily reconstructed by flexible machine learning methods with remote sensing reflectance as predictors. The weights were chosen manually to correct for these unrealistically small correlations while keeping the biogeochemical simulation dominant (correlation of log<sub>10</sub>-transformed <italic>in situ</italic> chlorophyll with simulated values: Pearson correlation coefficient r &#x3d; 0.16; with satellite chlorophyll from GlobColour: r &#x3d; 0.49; with weighted average: r &#x3d; 0.46). The Spearman rank correlation of the band ratio <italic>R</italic> (a common predictor of chlorophyll <italic>a</italic>, see <xref ref-type="sec" rid="s2-3">section 2.3</xref>) with <italic>in situ</italic> chlorophyll <italic>a</italic> was &#x3c1; &#x3d; 0.26, with simulated chlorophyll was &#x3c1; &#x3d; 0.03, and with merged chlorophyll was &#x3c1; &#x3d; 0.25. The moderate but significant (p &#x3c; 0.001) correlations reflect high concentrations of other optical water constituents that make remote sensing of the Baltic Sea tricky (<xref ref-type="bibr" rid="B14">Darecki and Stramski, 2004</xref>; <xref ref-type="bibr" rid="B53">Siegel and Gerth, 2008</xref>; <xref ref-type="bibr" rid="B55">Stock, 2015</xref>). Furthermore, as is typical in real applications, the merged, synthetic chlorophyll <italic>a</italic> was roughly log-normally distributed (<xref ref-type="fig" rid="F2">Figure 2D</xref>). Therefore, while chosen manually, the selected weights resulted in a synthetic response variable with statistical properties and relationships similar to the <italic>in situ</italic> measurements it replaced. Henceforth, &#x201c;synthetic concentrations&#x201d; refer to this weighted average.</p>
<p>Fourth, to create many synthetic yet realistic data sets with different sizes and spatial biases, 2000 random subsets were sampled from the 2,728 matchups (<xref ref-type="fig" rid="F2">Figures 2B, C</xref>). To mimic oceanographic data collection, whole cruises were sampled (not individual observations). However, the automatic generation of spatial blocks with a common R package (<xref ref-type="bibr" rid="B66">Valavi et al., 2019</xref>) included in our test of cross-validation approaches failed for larger blocks in some small subsets (see <xref ref-type="sec" rid="s2-4">Section 2.4</xref>). These subsets were excluded from the analyses to allow a comparison of all tested cross-validation methods. The remaining 1,426 subsets contained between 200 and 1,500 observations and exhibited different degrees of spatial bias (<xref ref-type="fig" rid="F2">Figures 2E, F</xref>).</p>
</sec>
<sec id="s2-3">
<title>2.3 Predictive models</title>
<p>With each subset, we trained and tested three predictive models common in marine remote sensing. The response was always synthetic, log<sub>10</sub>-transformed chlorophyll <italic>a</italic>, but the models used different predictors and underlying mathematical structures.</p>
<p>The first model was a simple linear model:<disp-formula id="equ1">
<mml:math id="m1">
<mml:mrow>
<mml:mi mathvariant="italic">log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>10</mml:mn>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="&#x7c;">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="equ2">
<mml:math id="m2">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="italic">log</mml:mi>
<mml:mn>10</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="&#x7c;">
<mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="&#x7c;">
<mml:mrow>
<mml:mi mathvariant="italic">max</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="&#x7c;">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>S</mml:mi>
<mml:mn>443</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>S</mml:mi>
<mml:mn>490</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>S</mml:mi>
<mml:mn>555</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Here, RRSxxx is the remote sensing reflectance in the respective wavelength band. Such models are called maximum band ratio algorithms and are among the longest-established statistical models for mapping chlorophyll <italic>a</italic> from satellites (<xref ref-type="bibr" rid="B41">O&#x2019;Reilly et al., 1998</xref>).</p>
<p>The second model was a random forest (RF) using remote sensing reflectances in different wavelength bands and the band ratio <italic>R</italic> as predictors. Random forests are a basic machine-learning approach. They consist of many regression trees (here: 300) fitted to bootstrap samples of the training data while using only some predictors when fitting each tree (<xref ref-type="bibr" rid="B10">Breiman, 2001</xref>). Random forests work well for smaller data sets with correlated predictors and are a common choice in remote sensing applications (<xref ref-type="bibr" rid="B6">Belgiu and Dr&#x103;gu, 2016</xref>).</p>
<p>The third model was a random forest with projected X and Y coordinates as additional predictors (RFXY). These spatial predictors allow the model to harness spatial structures in the data for predictions (<xref ref-type="bibr" rid="B74">Zhang et al., 2023</xref>). However, including them risks overfitting the model to these structures and limits its applicability when spatial structures change over time, e.g., because of climate change. <xref ref-type="bibr" rid="B56">Stock (2022)</xref> found that including spatial coordinates in a random forest caused large prediction errors that spatial, temporal, and environmental block cross-validation methods underestimated. Hence, the RFXY model is a &#x201c;worst case&#x201d; illustrating the limits of estimating prediction errors with spatial block cross-validation.</p>
</sec>
<sec id="s2-4">
<title>2.4 Spatial blocks</title>
<p>We tested two kinds of spatial blocks: (1) blocks and folds automatically generated with the R package <italic>blockCV</italic> (<xref ref-type="fig" rid="F3">Figures 3A&#x2013;F</xref>; <xref ref-type="bibr" rid="B66">Valavi et al., 2019</xref>), and (2) blocks manually created for the Baltic Sea (<xref ref-type="fig" rid="F3">Figures 3G&#x2013;I</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Examples of spatial blocks used for cross-validation. The blocks were either created automatically with the R package <italic>blockCV</italic> [examples in <bold>(A&#x2013;F)</bold>, with plot headings reflecting key parameters described in the text] or created manually for the Baltic Sea: subbasins <bold>(G)</bold> and latitudinal blocks reflecting environmental gradients in the study region <bold>(H, I)</bold>.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g003.tif"/>
</fig>
<p>The <italic>blockCV</italic> package allows the automatic generation of spatial blocks based on user-provided parameters. Here, we varied the following parameters: 1) block size (2&#xa0;km&#x2013;300&#xa0;km); 2) block shape (squares or hexagons), 3) how blocks were assigned to folds (random, systematically, or in a checkerboard pattern), and 4) the number of folds (5 or 10 for random and systematic assignment, 2 for checkerboard assignment).</p>
<p>In addition, we manually created three sets of blocks. The first set was subbasins of the Baltic Sea, defined by HELCOM (the intergovernmental organization governing environmental issues in the Baltic Sea region). The second and third sets reflected the Baltic Sea&#x2019;s environmental gradients from its connection with the Atlantic Ocean in the southwest to its northernmost bays, with north-south block sizes of 80&#xa0;km and 200&#xa0;km. In these manual designs, each block served as a fold. In each subset, folds with fewer than 20 observations were merged with the next-smallest fold until all blocks had at least 20 observations.</p>
</sec>
<sec id="s2-5">
<title>2.5 Spatial autocorrelation</title>
<p>To be considered independent, training and testing data must be farther apart than the autocorrelation range (<xref ref-type="bibr" rid="B63">Trachsel and Telford, 2016</xref>). This range is thus critical information for spatial block cross-validation. It is traditionally estimated for residuals of the fitted model (<xref ref-type="bibr" rid="B33">Le Rest et al., 2014</xref>). However, fitting the model first precludes model selection, and residuals may be underestimated for flexible models overfitted to spatial structures (<xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>). Furthermore, with three models and 1,426 synthetic subsets, this study involved over 4,000 fitted models. Exploring residual autocorrelation for all was impractical. Consequently, we followed <xref ref-type="bibr" rid="B66">Valavi et al. (2019)</xref> and examined spatial autocorrelation of the predictors, assuming that they reflect the spatial structure of relevant environmental variables. Spatial autocorrelation can be examined, e.g., through variograms or correlograms, which provide similar information (<xref ref-type="bibr" rid="B16">Dormann et al., 2007</xref>). While variograms are a fundamental tool of geostatistics, correlograms are common in other fields like ecology and can be more robust when data are clustered (<xref ref-type="bibr" rid="B71">Wilde and Deutsch, 2006</xref>). Here, some clustering of available predictor data might have occurred because of differences in cloud cover across the study region. We hence calculated variograms as well as correlograms.</p>
<p>Spatiotemporal sample variograms were calculated for each predictor in two selected years (2005 and 2018) with the <italic>R</italic> package <italic>gstat</italic> (<xref ref-type="bibr" rid="B22">Gr&#xe4;ler et al., 2016</xref>; <xref ref-type="bibr" rid="B44">Pebesma, 2012</xref>; <xref ref-type="bibr" rid="B43">Pebesma, 2004</xref>). For computational efficiency, each variogram calculation used a sample consisting of 5% pixels with data from the respective year. We calculated and averaged spatial correlograms with Moran&#x2019;s <italic>I</italic> as a measure of spatial dependence for 100 randomly selected days during the study period with the <italic>R</italic> package <italic>ncf</italic> (<xref ref-type="bibr" rid="B7">Bjornstad, 2022</xref>).</p>
</sec>
<sec id="s2-6">
<title>2.6 &#x201c;True&#x201d; errors vs. cross-validation errors</title>
<p>Predictive models should be tested with data reflecting their target application (<xref ref-type="bibr" rid="B25">Kapoor and Narayanan, 2023</xref>). Because the target application was to create maps for the whole Baltic Sea, we compared cross-validated error estimates calculated with the spatial block options described in <xref ref-type="sec" rid="s2-4">Section 2.4</xref> and with standard 10-fold cross-validation to &#x201c;true&#x201d; prediction errors calculated for the whole study region and period. These &#x201c;true&#x201d; errors were calculated in three steps, as described below. Importantly, all prediction errors were calculated with the synthetic chlorophyll concentrations (which are known everywhere) as response variable. Hence, &#x201c;true&#x201d; refers to errors that are valid for the whole study region and period, not errors that reflect the real-world chlorophyll <italic>a</italic> concentration (which are only known where <italic>in situ</italic> data exist).</p>
<p>First, we trained each model (MBR, RF, RFXY) with each complete subset, i.e., without withholding any data from the subset (<xref ref-type="bibr" rid="B30">Kuhn and Johnson, 2013</xref>). Each subset contained synthetic chlorophyll <italic>a</italic> values as the response variable and the predictor variables as described in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>. This process yielded 4,278 trained models (three kinds of models trained on 1,426 subsets). Because the subsets were sampled from a database of field campaigns (see <xref ref-type="sec" rid="s2-2">Section 2.2</xref>), training the models relied exclusively on locations and times where real <italic>in situ</italic> data existed.</p>
<p>Second, we created validation data covering the whole study region and period to calculate the &#x201c;true&#x201d; errors. Because making pixel-by-pixel predictions for 18&#xa0;years of daily satellite data with over 4,000 models was computationally too expensive, we randomly sampled 1% of pixels in each daily satellite image. This sample comprised over 380,000 observations. Each observation contained a synthetic chlorophyll <italic>a</italic> value as response and predictor variables as described in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>. Hence, the data used to calculate &#x201c;true&#x201d; errors&#x2013;in contrast to the test sets of the various cross-validation methods&#x2013;contained observations from randomly sampled locations and times and covering the whole study region and period (as far as cloud cover allowed).</p>
<p>Third, with each of the 4,278 trained models, we made predictions for this test set covering the whole study region and period, yielding &#x201c;true&#x201d; error estimates in the sense that they reflected the purpose of broad-scale, satellite-based mapping precisely (making daily maps for the whole study region and period).</p>
<p>Finally, we applied the various cross-validation methods (<xref ref-type="sec" rid="s2-4">Section 2.4</xref>) to each model and subset, resulting in 4,278 error estimates from each cross-validation method. Comparing these cross-validation estimates to the &#x201c;true&#x201d; errors revealed how well each method estimated the models&#x2019; prediction accuracy in the intended application.</p>
<p>As error measures, we used the root mean squared error (RMSE) and the absolute percentage difference (APD), calculated with the standard equations (like in <xref ref-type="bibr" rid="B56">Stock, 2022</xref>).</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Error estimates and model selection</title>
<p>Synthetic chlorophyll <italic>a</italic> concentrations predicted with the MBR model had smaller &#x201c;true&#x201d; errors than those of the random forests (RF and RFXY) in 99% (RMSE) and 97% (APD) of subsets. Prediction errors were highest (1) in the Bothnian Bay, where the fewest training data were available (RMSE and APD) and (2) the eastern Gulf of Finland, the Gulf of Riga, and some smaller areas with very high synthetic chlorophyll <italic>a</italic> concentrations (RMSE only) (<xref ref-type="fig" rid="F4">Figure 4</xref>). The APD&#x2019;s comparatively small values in these high-chlorophyll areas might reflect this error measure&#x2019;s low sensitivity to differences between larger numbers. Moderate &#x201c;true&#x201d; errors also occurred in large offshore areas where relatively low chlorophyll <italic>a</italic> concentrations and sparse data coverage coincided, like the Bothnian Sea (APD and RMSE).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Spatial distribution of mean &#x201c;true&#x201d; errors (RMSE and APD of the three model types predicting synthetic chlorophyll <italic>a</italic>) averaged over 300 randomly sampled, partly cloud-free days across the whole study period. For each of the three model types, each subset yielded a different trained model, and prediction errors where averaged across subsets for this figure. &#x201c;True&#x201d; errors refers to errors when predicting synthetic chlorophyll <italic>a</italic> concentrations for the whole study region and period, not real-world concentrations.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g004.tif"/>
</fig>
<p>The tested cross-validation methods often underestimated errors, especially for the RFXY model (<xref ref-type="fig" rid="F5">Figure 5</xref>; <xref ref-type="table" rid="T1">Table 1</xref>). Overall, spatial block cross-validation yielded better error estimates than 10-fold cross-validation but sometimes overestimated errors. Error estimates from the blockCV package depended on the specific options, especially block size (see <xref ref-type="sec" rid="s3-2">Section 3.2</xref>). They were larger than estimates from 10-fold cross-validation and smaller than estimates from large, manually created blocks (subbasins). Blocks generated with the blockCV package and good options led to a stronger underestimation than large manually created blocks in some cases but avoided an overestimation in others.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Estimated errors generated with different options in the <italic>blockCV</italic> R package as a function of block size. The solid black lines show the models&#x2019; &#x201c;true&#x201d; errors (mean error predicting synthetic Chl <italic>a</italic> concentration for the whole study region and period across all subsets). The dashed black line shows errors estimated with 10-fold cross-validation. The dotted line shows errors estimated using subbasins as spatial blocks.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g005.tif"/>
</fig>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>&#x201c;True&#x201d; errors and estimated errors with different cross-validation approaches. With the <italic>blockCV</italic> package&#x2019;s various settings, there were too many combinations to show in the table. Instead, the table shows the best estimate obtained with the package (i.e., the one closest to the &#x201c;true&#x201d; error, representing an optimal choice of parameters) and the 25th percentile of absolute difference to the &#x201c;true&#x201d; error (P<sub>25</sub>, representing a good but not optimal choice of parameters). The estimates closest to the &#x201c;true&#x201d; errors are highlighted in bold font.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="left">&#x201c;True&#x201d;</th>
<th align="left">10-fold</th>
<th align="left">Best blockCV</th>
<th align="left">P<sub>25</sub> blockCV</th>
<th align="left">Subbasins</th>
<th align="left">Lat. bl. 80km</th>
<th align="left">Lat. bl. 300&#xa0;km</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td colspan="8" align="left">
<italic>APD</italic>
</td>
</tr>
<tr>
<td align="left">MBR</td>
<td align="left">40%</td>
<td align="left">31%</td>
<td align="left">35%</td>
<td align="left">32%</td>
<td align="left">
<bold>38%</bold>
</td>
<td align="left">34%</td>
<td align="left">36%</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="left">49%</td>
<td align="left">33%</td>
<td align="left">41%</td>
<td align="left">36%</td>
<td align="left">
<bold>47%</bold>
</td>
<td align="left">40%</td>
<td align="left">45%</td>
</tr>
<tr>
<td align="left">RFXY</td>
<td align="left">54%</td>
<td align="left">24%</td>
<td align="left">31%</td>
<td align="left">28%</td>
<td align="left">
<bold>37%</bold>
</td>
<td align="left">30%</td>
<td align="left">34%</td>
</tr>
<tr>
<td colspan="8" align="left">
<italic>RMSE</italic>
</td>
</tr>
<tr>
<td align="left">MBR</td>
<td align="left">0.18</td>
<td align="left">0.17</td>
<td align="left">
<bold>0.18</bold>
</td>
<td align="left">0.18</td>
<td align="left">0.20</td>
<td align="left">0.18</td>
<td align="left">0.19</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="left">0.21</td>
<td align="left">0.18</td>
<td align="left">
<bold>0.21</bold>
</td>
<td align="left">0.19</td>
<td align="left">0.23</td>
<td align="left">0.20</td>
<td align="left">0.22</td>
</tr>
<tr>
<td align="left">RFXY</td>
<td align="left">0.22</td>
<td align="left">0.14</td>
<td align="left">0.17</td>
<td align="left">0.16</td>
<td align="left">
<bold>0.19</bold>
</td>
<td align="left">0.16</td>
<td align="left">0.17</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Depending on the model and error measure, 10-fold cross-validation underestimated prediction errors by 5% (RMSE of MBR) to 54% (APD of RFXY). The different block cross-validation methods yielded more accurate error estimates than 10-fold cross-validation, but the RMSE was sometimes overestimated. The best RMSE and APD estimates for RFXY were achieved with subbasins as blocks. The best APD estimates for MBR and RF were achieved by blocks generated with <italic>blockCV</italic> when optimal options were chosen; with solid but not optimal choice of options, the 80&#xa0;km north-south blocks estimated the APD of these models best.</p>
<p>When choosing between the MBR and the RF models, all spatial cross-validation methods with large block sizes led to correct model selection for &#x3e;98% of subsets. Ten-fold cross-validation selected the best model for fewer subsets (APD: 86%, RMSE: 93%). In contrast, model selection failed even with the best methods when choosing between all three models (MBR, RF, and RFXY). Ten-fold cross-validation incorrectly chose RFXY for over 99% of subsets. Spatial cross-validation with subbasins as blocks worked best, but RFXY was still incorrectly chosen in over 50% (APD) and 80% (RMSE) of subsets.</p>
</sec>
<sec id="s3-2">
<title>3.2 Options when generating blocks with <italic>blockCV</italic>
</title>
<p>When creating square or hexagonal blocks automatically, choosing a large block size was the most important (<xref ref-type="table" rid="T2">Table 2</xref>). On average, cross-validation with ten folds yielded slightly better error estimates than five folds, square blocks yielded slightly better error estimates than hexagonal blocks, and systematic or checkerboard assignment of blocks to folds yielded slightly better error estimates than random assignment. However, except for the block size, the differences between the options were small. For example, averaged over all subsets and block sizes&#x2265;200&#xa0;km, the random forest&#x2019;s APD was underestimated by 25% with hexagonal blocks and 24% with square blocks. Nevertheless, large square blocks with systematic assignment to folds was always among the best choices, and often the best, across models and error measures (<xref ref-type="fig" rid="F5">Figure 5</xref>).</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Percentage of subsets for which different options were in the set of parameters yielding the most accurate blockCV-based error estimate. The highest percentages in each parameter group are shown in bold font.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th colspan="3" align="left">Blocks to folds</th>
<th colspan="2" align="left">Block shape</th>
<th colspan="2" align="left">&#x23; of folds</th>
<th colspan="2" align="left">Block sizes</th>
<th rowspan="2" align="left">&#x3e;200&#xa0;km</th>
</tr>
<tr>
<th align="left">Model</th>
<th align="left">Random</th>
<th align="left">Systematic</th>
<th align="left">Checkerb</th>
<th align="left">Hexagons</th>
<th align="left">Squares</th>
<th align="left">10</th>
<th align="left">5</th>
<th align="left">&#x2264;100&#xa0;km</th>
<th align="left">100&#x2013;200&#xa0;km</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td colspan="11" align="left">
<italic>APD</italic>
</td>
</tr>
<tr>
<td align="left">MBR</td>
<td align="left">28%</td>
<td align="left">
<bold>55%</bold>
</td>
<td align="left">17%</td>
<td align="left">36%</td>
<td align="left">
<bold>64%</bold>
</td>
<td align="left">
<bold>85%</bold>
</td>
<td align="left">15%</td>
<td align="left">2%</td>
<td align="left">13%</td>
<td align="left">
<bold>85%</bold>
</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="left">35%</td>
<td align="left">
<bold>45%</bold>
</td>
<td align="left">21%</td>
<td align="left">40%</td>
<td align="left">
<bold>60%</bold>
</td>
<td align="left">
<bold>91%</bold>
</td>
<td align="left">9%</td>
<td align="left">3%</td>
<td align="left">22%</td>
<td align="left">
<bold>75%</bold>
</td>
</tr>
<tr>
<td align="left">RFXY</td>
<td align="left">35%</td>
<td align="left">29%</td>
<td align="left">
<bold>36%</bold>
</td>
<td align="left">43%</td>
<td align="left">
<bold>57%</bold>
</td>
<td align="left">
<bold>91%</bold>
</td>
<td align="left">9%</td>
<td align="left">2%</td>
<td align="left">15%</td>
<td align="left">
<bold>82%</bold>
</td>
</tr>
<tr>
<td colspan="11" align="left">
<italic>RMSE</italic>
</td>
</tr>
<tr>
<td align="left">MBR</td>
<td align="left">32%</td>
<td align="left">
<bold>46%</bold>
</td>
<td align="left">22%</td>
<td align="left">48%</td>
<td align="left">
<bold>52%</bold>
</td>
<td align="left">
<bold>59%</bold>
</td>
<td align="left">41%</td>
<td align="left">26%</td>
<td align="left">
<bold>43%</bold>
</td>
<td align="left">31%</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="left">25%</td>
<td align="left">34%</td>
<td align="left">
<bold>40%</bold>
</td>
<td align="left">34%</td>
<td align="left">
<bold>66%</bold>
</td>
<td align="left">
<bold>61%</bold>
</td>
<td align="left">39%</td>
<td align="left">15%</td>
<td align="left">41%</td>
<td align="left">
<bold>44%</bold>
</td>
</tr>
<tr>
<td align="left">RFXY</td>
<td align="left">30%</td>
<td align="left">34%</td>
<td align="left">
<bold>36%</bold>
</td>
<td align="left">47%</td>
<td align="left">
<bold>53%</bold>
</td>
<td align="left">
<bold>57%</bold>
</td>
<td align="left">43%</td>
<td align="left">2%</td>
<td align="left">16%</td>
<td align="left">
<bold>82%</bold>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-3">
<title>3.3 Spatiotemporal autocorrelation</title>
<p>Spatiotemporal variograms (<xref ref-type="fig" rid="F6">Figure 6</xref>) showed that all predictors were spatially autocorrelated over several hundred kilometers, yet none of the variograms reached their sill within 500&#xa0;km (already beyond a practical block size). Variograms calculated for 2005 (not shown) were similar to those for 2018. While correctly suggesting the need for large blocks to achieve independent training and testing data, the variograms did not suggest an optimal block size.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Empirical spatiotemporal variograms of the predictors for 2018.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g006.tif"/>
</fig>
<p>Correlograms showed a more apparent autocorrelation range of the predictors (<xref ref-type="fig" rid="F7">Figure 7</xref>). The spatial correlation dropped sharply within the first 100&#xa0;km. It plateaued near 200&#xa0;km for the 412&#xa0;nm, 443&#xa0;nm, and 490&#xa0;nm wavelength bands and near 300&#xa0;km for the 555&#xa0;nm and 670&#xa0;nm wavelength bands. Hence, the correlograms suggested a sound range for the block size in this application.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Correlograms for the predictors on 100 randomly sampled days (thin gray lines) and their average (thick black lines). On some days, there is no data for the largest distances shown due to cloud cover.</p>
</caption>
<graphic xlink:href="frsen-06-1531097-g007.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4 Discussion</title>
<sec id="s4-1">
<title>4.1 Block size and spatial distribution of data explain contradictions between prior studies evaluating spatial cross-validation methods</title>
<p>Several past studies have evaluated cross-validation methods with sometimes contradictory results.</p>
<p>On the one hand, several studies found that separating training and testing data spatially yields higher estimated errors than random data splits (<xref ref-type="bibr" rid="B3">Bahn and McGill, 2013</xref>; <xref ref-type="bibr" rid="B26">Karasiak et al., 2022</xref>; <xref ref-type="bibr" rid="B37">Meyer et al., 2018</xref>; <xref ref-type="bibr" rid="B38">2019</xref>; <xref ref-type="bibr" rid="B58">Stock et al., 2018</xref>; <xref ref-type="bibr" rid="B59">Stock and Subramaniam, 2020</xref>). For example, <xref ref-type="bibr" rid="B47">Ploton et al. (2020)</xref> evaluated a random forest predicting above-ground forest biomass with random splits and two spatial cross-validation approaches. Random splits suggested good predictive skill, but spatial cross-validation suggested no predictive skill, reflecting the known effects of data leakage when training and testing data are insufficiently separated (<xref ref-type="bibr" rid="B25">Kapoor and Narayanan, 2023</xref>). Other tests with synthetic, autocorrelated data also show that error estimates from spatial block cross-validation are more accurate than random splits (<xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>; <xref ref-type="bibr" rid="B56">Stock, 2022</xref>). Furthermore, models selected with spatial block cross-validation can transfer better to new geographic locations (<xref ref-type="bibr" rid="B65">Tziachris et al., 2023</xref>). These prior results are consistent with this study.</p>
<p>On the other hand, several studies found that differences between spatial and random cross-validation were small and supported the same conclusions (<xref ref-type="bibr" rid="B34">Lyons et al., 2018</xref>; <xref ref-type="bibr" rid="B68">Valavi et al., 2023</xref>; <xref ref-type="bibr" rid="B74">Zhang et al., 2023</xref>). For example, <xref ref-type="bibr" rid="B68">Valavi et al. (2023)</xref> found that random and spatial block cross-validation yielded a similar ranking of models and that flexible models transferred well to new locations - contrary to, e.g., <xref ref-type="bibr" rid="B23">Gregr et al. (2019)</xref>, where more flexible models failed when applied to independent data.</p>
<p>These <italic>prima facie</italic> contradictory results are explained by two aspects of the studies&#x2019; design. First, the studies used different block sizes&#x2013;a critical choice according to our results. For example, <xref ref-type="bibr" rid="B68">Valavi et al. (2023)</xref> used a block size of 75&#xa0;km to mimic extrapolation over comparatively short distances. As these authors correctly argue, results for extrapolation over larger distances might have been different. Second, spatial cross-validation is most important when data are unevenly distributed in space and time. For example, <xref ref-type="bibr" rid="B34">Lyons et al. (2018)</xref> compared cross-validation methods in a terrestrial vegetation mapping case study. They had a small study area (50&#xa0;km<sup>2</sup>) and collected data specifically for their study with sound spatial sampling methods. Yet, with sound spatial sampling covering the whole study region, the biases of random cross-validation demonstrated in this and other studies become negligible, because randomly held-out test observations are not systematically farther from training observations than locations for which predictions are needed (<xref ref-type="bibr" rid="B50">Ramezan et al., 2019</xref>; <xref ref-type="bibr" rid="B56">Stock, 2022</xref>; <xref ref-type="bibr" rid="B69">Wadoux et al., 2021</xref>). In contrast, with data resembling the synthetic data here (i.e., databases that compile data from various sources without an overarching sampling strategy), cross-validation with random splits or too small blocks yields wrong error estimates.</p>
<p>Together, the importance of block size highlighted here and the spatiotemporal distribution of data adequately explain these contradictions in previously published research.</p>
</sec>
<sec id="s4-2">
<title>4.2 How to choose blocks for spatial cross-validation</title>
<p>The most important parameter when automatically generating square or hexagonal blocks for spatial cross-validation was the block size. This choice is implicit but equally important when using existing regions as blocks (for example, when choosing between broad biogeographical regions or finer-scale subregions).</p>
<p>The first step in choosing a block size is analyzing spatial autocorrelation (<xref ref-type="bibr" rid="B33">Le Rest et al., 2014</xref>; <xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>). Here, correlograms showed autocorrelation ranges reflecting a suitable block size, whereas sample variograms showed that large blocks were needed but did not allow choosing a specific size. Hence, determining a good block size can require data exploration with several analytical tools. In addition, modelers must choose a cross-validation strategy that reflects the model&#x2019;s intended application (<xref ref-type="bibr" rid="B11">Christin et al., 2020</xref>; <xref ref-type="bibr" rid="B25">Kapoor and Narayanan, 2023</xref>; <xref ref-type="bibr" rid="B57">Stock et al., 2023</xref>) &#x2013; especially whether predictions beyond locations that are well-covered by data are needed.</p>
<p>Iterating over a plausible range of block sizes can yield additional insights, for example, exploring how error estimates change with increasing separation distance (<xref ref-type="bibr" rid="B48">Pohjankukka et al., 2017</xref>; <xref ref-type="bibr" rid="B60">Stock and Subramaniam, 2022</xref>). While a single set of manually crafted blocks is computationally more efficient and can reflect characteristics of the study region (such as biogeographical boundaries), an iterative approach avoids the need to select a block size <italic>a priori</italic>. Thus, it helps resolve situations where geostatistical analyses and domain knowledge do not clearly suggest which block size to use.</p>
<p>The block shape, the number of folds, and the assignment of blocks to folds were less important here, likely because they did not directly influence how the model testing reflected the target application. For example, while the spatial boundaries of statistical analysis units can affect results (the modifiable areal unit problem; <xref ref-type="bibr" rid="B40">Openshaw and Taylor, 1979</xref>), the shape of the blocks had minor effects on whether model testing reflected extrapolation to subregions without data. As another example, the number of folds influences the size of the training sets and, thus, the estimated prediction errors. The smallest data sets in this study had 200 observations. With 10 folds, each training set had 180 observations, and with 5 folds, 160 observations, with minor effects on the error estimates. While these options were unimportant here, they can matter in other applications. For example, it can be best to keep the training set as large as possible for very small data sets by using many folds or spatial buffers around single, held-out observations. Without such special considerations, when using a blocking strategy like those in the <italic>blockCV</italic> R package, square blocks, 10 folds, and systematic assignment of blocks to folds were good default choices.</p>
</sec>
<sec id="s4-3">
<title>4.3 Limitations and generalizability</title>
<p>This study&#x2019;s main limitation is that it presents a single supervised learning application in one study region. Nevertheless, it can inform other applications because the results are theoretically plausible and sufficiently broad to explain apparent contradictions between prior studies (see <xref ref-type="sec" rid="s4-1">Section 4.1</xref>). This study&#x2019;s marine remote sensing example can, therefore, inform other supervised learning applications with spatially biased point data. However, like the conflicting past results discussed above, our recommendations&#x2019; relevance must be carefully judged in other applications and data contexts.</p>
<p>Environmental data might be autocorrelated in space and time, but this study tested only spatial blocks. <xref ref-type="bibr" rid="B62">Sweet et al. (2023)</xref> found that using clusters in predictor space as blocks worked best in a crop modeling example with spatiotemporal autocorrelation. In contrast, for synthetic chlorophyll <italic>a</italic> data like those used here, spatial blocks produced better error estimates than blocks in time or predictor space (<xref ref-type="bibr" rid="B56">Stock, 2022</xref>). Exploring the nuances of choosing spatial blocks was thus most critical for this study&#x2019;s example application.</p>
<p>Basing the study on synthetic data allowed the evaluation of error estimates across the whole study region (not only locations where <italic>in situ</italic> data existed); such &#x201c;simulation experiments&#x201d; are a common tool to evaluate statistical methods (e.g., <xref ref-type="bibr" rid="B15">Dormann et al., 2012</xref>; <xref ref-type="bibr" rid="B75">Strobl et al., 2007</xref>; <xref ref-type="bibr" rid="B51">Roberts et al., 2017</xref>). However, simulated data from the biogeochemical model used to build the synthetic data was only weakly correlated with chlorophyll-a and a maximum band ratio, a key predictor in many chlorophyll remote sensing algorithms. This was alleviated by using a weighted average with an independent satellite data product, as opposed to the biogeochemical simulation results alone, as synthetic response variable. The synthetic data represented &#x201c;real&#x201d; marine remote sensing applications realistically for three reasons, hence allowing relevant insights into the performance of cross-validation methods. First, remote sensing reflectances and the band ratio serving as predictors were the same data used in many ocean color remote sensing studies. Second, the locations and dates of observations for model training and testing came from actual field campaigns, resampled to reflect the campaign-by-campaign growth of oceanographic databases. Third, the synthetic chlorophyll concentrations (averaged from biogeochemical simulations and a different satellite data product) had statistical properties similar to <italic>in situ</italic> chlorophyll concentrations. Therefore, the synthetic data were realistic regarding the predictors and the spatial and temporal distribution of data.</p>
<p>While focusing on a single study region, the Baltic Sea is typical for Case 2 waters, where remote sensing often relies on supervised learning with local to regional-scale data (<xref ref-type="bibr" rid="B24">Hafeez et al., 2019</xref>). Remote sensing reflectance is the foundation of many satellite algorithms besides mapping chlorophyll <italic>a</italic>. Therefore, the results are most relevant for other marine remote sensing applications in Case 2 waters.</p>
</sec>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: <ext-link ext-link-type="uri" xlink:href="https://figshare.com/s/132c0a410cc2800ca68f">https://figshare.com/s/132c0a410cc2800ca68f</ext-link>.</p>
</sec>
<sec sec-type="author-contributions" id="s6">
<title>Author contributions</title>
<p>AS: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing&#x2013;original draft, Writing&#x2013;review and editing.</p>
</sec>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research and/or publication of this article.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s9">
<title>Generative AI statement</title>
<p>The author(s) declare that no Generative AI was used in the creation of this manuscript.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ara&#xfa;jo</surname>
<given-names>M. B.</given-names>
</name>
<name>
<surname>Pearson</surname>
<given-names>R. G.</given-names>
</name>
<name>
<surname>Thuiller</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Erhard</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Validation of species&#x2013;climate impact models under climate change</article-title>. <source>Glob. Change Biol.</source> <volume>11</volume> (<issue>9</issue>), <fpage>1504</fpage>&#x2013;<lpage>1513</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2486.2005.01000.x</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arlot</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Celisse</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A survey of cross-validation procedures for model selection</article-title>. <source>Stat. Surv.</source> <volume>4</volume>, <fpage>40</fpage>&#x2013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1214/09-SS054A</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bahn</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>McGill</surname>
<given-names>B. J.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Testing the predictive performance of distribution models</article-title>. <source>Oikos</source> <volume>122</volume> (<issue>3</issue>), <fpage>321</fpage>&#x2013;<lpage>331</lpage>. <pub-id pub-id-type="doi">10.1111/j.1600-0706.2012.00299.x</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bald</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gottwald</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zeuss</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>spatialMaxent: adapting species distribution modeling to spatial data</article-title>. <source>Ecol. Evol.</source> <volume>13</volume> (<issue>10</issue>), <fpage>e10635</fpage>. <pub-id pub-id-type="doi">10.1002/ece3.10635</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Beery</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Van Horn</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Perona</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Recognition in <italic>terra incognita</italic>
</article-title>,&#x201d; in <source>Computer vision &#x2013; eccv 2018</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Ferrari</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Hebert</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sminchisescu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Weiss</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<publisher-name>Springer International Publishing</publisher-name>), <fpage>472</fpage>&#x2013;<lpage>489</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-01270-0_28</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Belgiu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Dr&#x103;gu</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Random forest in remote sensing: a review of applications and future directions</article-title>. <source>ISPRS J. Photogrammetry Remote Sens.</source> <volume>114</volume>, <fpage>24</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.isprsjprs.2016.01.011</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Bjornstad</surname>
<given-names>O. N.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Ncf: spatial covariance functions</article-title>. <comment>Available online at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=ncf">https://CRAN.R-project.org/package&#x3d;ncf</ext-link>.</comment>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boakes</surname>
<given-names>E. H.</given-names>
</name>
<name>
<surname>McGowan</surname>
<given-names>P. J. K.</given-names>
</name>
<name>
<surname>Fuller</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Chang-qing</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>N. E.</given-names>
</name>
<name>
<surname>O&#x2019;Connor</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2010</year>). <article-title>Distorted views of biodiversity: spatial and temporal bias in species occurrence data</article-title>. <source>PLOS Biol.</source> <volume>8</volume> (<issue>6</issue>), <fpage>e1000385</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pbio.1000385</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowler</surname>
<given-names>D. E.</given-names>
</name>
<name>
<surname>Callaghan</surname>
<given-names>C. T.</given-names>
</name>
<name>
<surname>Bhandari</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Henle</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Benjamin Barth</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Koppitz</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Temporal trends in the spatial bias of species occurrence records</article-title>. <source>Ecography</source> <volume>2022</volume> (<issue>8</issue>), <fpage>e06219</fpage>. <pub-id pub-id-type="doi">10.1111/ecog.06219</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn.</source> <volume>45</volume> (<issue>1</issue>), <fpage>5</fpage>&#x2013;<lpage>32</lpage>. <comment>Article 1</comment>. <pub-id pub-id-type="doi">10.1023/a:1010933404324</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Christin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hervet</surname>
<given-names>&#xc9;.</given-names>
</name>
<name>
<surname>Lecomte</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Going further with model verification and deep learning</article-title>. <source>Methods Ecol. Evol.</source> <volume>12</volume> (<issue>1</issue>), <fpage>130</fpage>&#x2013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210X.13494</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crego</surname>
<given-names>R. D.</given-names>
</name>
<name>
<surname>Stabach</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Connette</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Implementation of species distribution models in google earth engine</article-title>. <source>Divers. Distributions</source> <volume>28</volume> (<issue>5</issue>), <fpage>904</fpage>&#x2013;<lpage>916</lpage>. <pub-id pub-id-type="doi">10.1111/ddi.13491</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Darecki</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kaczmarek</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Olszewski</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>SeaWiFS ocean colour chlorophyll algorithms for the southern Baltic Sea</article-title>. <source>Int. J. Remote Sens.</source> <volume>26</volume> (<issue>2</issue>), <fpage>247</fpage>&#x2013;<lpage>260</lpage>. <pub-id pub-id-type="doi">10.1080/01431160410001720298</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Darecki</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Stramski</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>An evaluation of MODIS and SeaWiFS bio-optical algorithms in the Baltic Sea</article-title>. <source>Remote Sens. Environ.</source> <volume>89</volume> (<issue>3</issue>), <fpage>326</fpage>&#x2013;<lpage>350</lpage>. <pub-id pub-id-type="doi">10.1016/j.rse.2003.10.012</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dormann</surname>
<given-names>C. F.</given-names>
</name>
<name>
<surname>Elith</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bacher</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Buchmann</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Carl</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Carr&#xe9;</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Collinearity: a review of methods to deal with it and a simulation study evaluating their performance</article-title>. <source>Ecography</source> <volume>36</volume> (<issue>1</issue>), <fpage>27</fpage>&#x2013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1111/j.1600-0587.2012.07348.x</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dormann</surname>
<given-names>C. F.</given-names>
</name>
<name>
<surname>Mcpherson</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Arau</surname>
<given-names>M. B.</given-names>
</name>
<name>
<surname>Bivand</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Bolliger</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Carl</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <source>Methods Acc. spatial autocorrelation analysis species distributional data A Rev.</source> <pub-id pub-id-type="doi">10.1111/j.2007.0906-7590.05171.x</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>El-Gabbas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Van Opzeeland</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Burkhardt</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Boebel</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Static species distribution models in the marine realm: the case of baleen whales in the Southern Ocean</article-title>. <source>Divers. Distributions</source> <volume>27</volume> (<issue>8</issue>), <fpage>1536</fpage>&#x2013;<lpage>1552</lpage>. <pub-id pub-id-type="doi">10.1111/ddi.13300</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elith</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Leathwick</surname>
<given-names>J. R.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Species distribution models: ecological explanation and prediction across space and time</article-title>. <source>Annu. Rev. Ecol. Evol. Syst.</source> <volume>40</volume> (<issue>1</issue>), <fpage>677</fpage>&#x2013;<lpage>697</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.ecolsys.110308.120159</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fanton d&#x2019;Andon</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Mangin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lavender</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Antoine</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Maritorena</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Morel</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>GlobColour&#x2014;the European service for ocean colour</article-title>. <source>Proc. 2009 IEEE Int. Geoscience and Remote Sens. Symposium</source>. <pub-id pub-id-type="doi">10.1029/2006JC004007</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fourcade</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Besnard</surname>
<given-names>A. G.</given-names>
</name>
<name>
<surname>Secondi</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics</article-title>. <source>Glob. Ecol. Biogeogr.</source> <volume>27</volume> (<issue>2</issue>), <fpage>245</fpage>&#x2013;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1111/geb.12684</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geirhos</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Jacobsen</surname>
<given-names>J.-H.</given-names>
</name>
<name>
<surname>Michaelis</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zemel</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Brendel</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Bethge</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Shortcut learning in deep neural networks</article-title>. <source>Nat. Mach. Learn.</source> <volume>2</volume>, <fpage>665</fpage>&#x2013;<lpage>673</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-020-00257-z</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gr&#xe4;ler</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Pebesma</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Heuvelink</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Spatio-temporal geostatistics using gstat</article-title>. <source>R J.</source> <volume>8</volume> (<issue>1</issue>), <fpage>204</fpage>&#x2013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-17885-1</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gregr</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Palacios</surname>
<given-names>D. M.</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>K. M. A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Why less complexity produces better forecasts: an independent data evaluation of kelp habitat models</article-title>. <source>Ecography</source> <volume>42</volume>, <fpage>428</fpage>&#x2013;<lpage>443</lpage>. <pub-id pub-id-type="doi">10.1111/ecog.03470</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hafeez</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Nazeer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nichol</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Abbas</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Comparison of machine learning algorithms for retrieval of water quality indicators in case-II waters: a case study of Hong Kong</article-title>. <source>Remote Sens.</source> <volume>11</volume> (<issue>6</issue>), <fpage>617</fpage>. <comment>Article 6</comment>. <pub-id pub-id-type="doi">10.3390/rs11060617</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kapoor</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Narayanan</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Leakage and the reproducibility crisis in machine-learning-based science</article-title>. <source>Patterns</source> <volume>4</volume> (<issue>9</issue>), <fpage>100804</fpage>. <pub-id pub-id-type="doi">10.1016/j.patter.2023.100804</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karasiak</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Dejoux</surname>
<given-names>J.-F.</given-names>
</name>
<name>
<surname>Monteil</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sheeren</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Spatial dependence between training and test sets: another pitfall of classification accuracy assessment in remote sensing</article-title>. <source>Mach. Learn.</source> <volume>111</volume> (<issue>7</issue>), <fpage>2715</fpage>&#x2013;<lpage>2740</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-021-05972-1</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kattenborn</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Leitloff</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Schiefer</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hinz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Review on convolutional neural networks (CNN) in vegetation remote sensing</article-title>. <source>ISPRS J. Photogrammetry Remote Sens.</source> <volume>173</volume> (<issue>July 2020</issue>), <fpage>24</fpage>&#x2013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1016/j.isprsjprs.2020.12.010</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kerr</surname>
<given-names>J. T.</given-names>
</name>
<name>
<surname>Ostrovsky</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>From space to species: ecological applications for remote sensing</article-title>. <source>Trends Ecol. Evol.</source> <volume>18</volume> (<issue>6</issue>), <fpage>299</fpage>&#x2013;<lpage>305</lpage>. <pub-id pub-id-type="doi">10.1016/S0169-5347(03)00071-5</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kratzer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>H&#xe5;kansson</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Sahlin</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Assessing secchi and photic zone depth in the Baltic Sea from satellite data</article-title>. <source>Ambio</source> <volume>32</volume> (<issue>8</issue>), <fpage>577</fpage>&#x2013;<lpage>585</lpage>. <pub-id pub-id-type="doi">10.1579/0044-7447-32.8.577</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kuhn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2013</year>). <source>Applied predictive modeling</source>. <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Legendre</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1993</year>). <article-title>Spatial autocorrelation: trouble or new paradigm?</article-title> <source>Ecology</source> <volume>74</volume> (<issue>6</issue>), <fpage>1659</fpage>&#x2013;<lpage>1673</lpage>. <pub-id pub-id-type="doi">10.2307/1939924</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le Rest</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Pinaud</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bretagnolle</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Accounting for spatial autocorrelation from model selection to statistical inference: application to a national survey of a diurnal raptor</article-title>. <source>Ecol. Inf.</source> <volume>14</volume>, <fpage>17</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/j.ecoinf.2012.11.008</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le Rest</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Pinaud</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Monestiez</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Chadoeuf</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bretagnolle</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation</article-title>. <source>Glob. Ecol. Biogeogr.</source> <volume>23</volume> (<issue>7</issue>), <fpage>811</fpage>&#x2013;<lpage>820</lpage>. <pub-id pub-id-type="doi">10.1111/geb.12161</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lyons</surname>
<given-names>M. B.</given-names>
</name>
<name>
<surname>Keith</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Phinn</surname>
<given-names>S. R.</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Elith</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A comparison of resampling methods for remote sensing classification and accuracy assessment</article-title>. <source>Remote Sens. Environ.</source> <volume>208</volume> (<issue>February</issue>), <fpage>145</fpage>&#x2013;<lpage>153</lpage>. <pub-id pub-id-type="doi">10.1016/j.rse.2018.02.026</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maritorena</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>d&#x2019;Andon</surname>
<given-names>O. H. F.</given-names>
</name>
<name>
<surname>Mangin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Siegel</surname>
<given-names>D. A.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Merged satellite ocean color data products using a bio-optical model: characteristics, benefits and issues</article-title>. <source>Remote Sens. Environ.</source> <volume>114</volume> (<issue>8</issue>), <fpage>1791</fpage>&#x2013;<lpage>1804</lpage>. <pub-id pub-id-type="doi">10.1016/J.RSE.2010.04.002</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Pebesma</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Predicting into unknown space? Estimating the area of applicability of spatial prediction models</article-title>. <source>Methods Ecol. Evol.</source> <volume>12</volume> (<issue>9</issue>), <fpage>1620</fpage>&#x2013;<lpage>1633</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210x.13650</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Reudenbach</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hengl</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Katurji</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nauss</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation</article-title>. <source>Environ. Model. and Softw.</source> <volume>101</volume>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.envsoft.2017.12.001</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Reudenbach</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>W&#xf6;llauer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nauss</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Importance of spatial predictor variable selection in machine learning applications &#x2013; moving from data reproduction to spatial prediction</article-title>. <source>Ecol. Model.</source> <volume>411</volume>, <fpage>108815</fpage>. <pub-id pub-id-type="doi">10.1016/j.ecolmodel.2019.108815</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nikparvar</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Thill</surname>
<given-names>J.-C.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Machine learning of spatial data</article-title>. <source>ISPRS Int. J. Geo-Information</source> <volume>10</volume> (<issue>9</issue>), <fpage>600</fpage>. <comment>Article 9</comment>. <pub-id pub-id-type="doi">10.3390/ijgi10090600</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Openshaw</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1979</year>). &#x201c;<article-title>A million or so correlation coefficients: three experiments on the modifiable areal unit problem</article-title>,&#x201d; in <source>Statistical applications in spatial sciences</source>. <publisher-loc>London</publisher-loc>: Editor <person-group person-group-type="editor">
<name>
<surname>Wrigley</surname>
<given-names>N.</given-names>
</name>
</person-group> (<publisher-name>Pion</publisher-name>), <fpage>127</fpage>&#x2013;<lpage>144</lpage>.</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>O&#x2019;Reilly</surname>
<given-names>J. E.</given-names>
</name>
<name>
<surname>Maritorena</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mitchell</surname>
<given-names>B. G.</given-names>
</name>
<name>
<surname>Siegel</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Carder</surname>
<given-names>K. L.</given-names>
</name>
<name>
<surname>Garver</surname>
<given-names>S. A.</given-names>
</name>
<etal/>
</person-group> (<year>1998</year>). <article-title>Ocean color chlorophyll algorithms for SeaWiFS</article-title>. <source>J. Geophys. Res. Oceans</source> <volume>103</volume> (<issue>C11</issue>), <fpage>24937</fpage>&#x2013;<lpage>24953</lpage>. <pub-id pub-id-type="doi">10.1029/98jc02160</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>O&#x2019;Reilly</surname>
<given-names>J. E.</given-names>
</name>
<name>
<surname>Werdell</surname>
<given-names>P. J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Chlorophyll algorithms for ocean color sensors&#x2014;OC4, OC5 and OC6</article-title>. <source>Remote Sens. Environ.</source> <volume>229</volume>, <fpage>32</fpage>&#x2013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.rse.2019.04.021</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pebesma</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Multivariable geostatistics in S: the gstat package</article-title>. <source>Comput. Geosciences</source> <volume>30</volume> (<issue>7</issue>), <fpage>683</fpage>&#x2013;<lpage>691</lpage>. <pub-id pub-id-type="doi">10.1016/j.cageo.2004.03.012</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pebesma</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>spacetime: spatio-temporal data in R</article-title>. <source>J. Stat. Softw.</source> <volume>51</volume> (<issue>7</issue>), <fpage>1</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v051.i07</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peterson</surname>
<given-names>A. T.</given-names>
</name>
<name>
<surname>Pape&#x15f;</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Eaton</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent</article-title>. <source>Ecography</source> <volume>30</volume> (<issue>4</issue>), <fpage>550</fpage>&#x2013;<lpage>560</lpage>. <pub-id pub-id-type="doi">10.1111/j.0906-7590.2007.05102.x</pub-id>
</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pichler</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hartig</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Machine learning and deep learning&#x2014;a review for ecologists</article-title>. <source>Methods Ecol. Evol.</source> <volume>14</volume> (<issue>4</issue>), <fpage>994</fpage>&#x2013;<lpage>1016</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210X.14061</pub-id>
</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ploton</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mortier</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>R&#xe9;jou-M&#xe9;chain</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Barbier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Picard</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Rossi</surname>
<given-names>V.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Spatial validation reveals poor predictive performance of large-scale ecological mapping models</article-title>. <source>Nat. Commun.</source> <volume>11</volume> (<issue>1</issue>), <fpage>4540</fpage>. <comment>Article 1</comment>. <pub-id pub-id-type="doi">10.1038/s41467-020-18321-y</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pohjankukka</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pahikkala</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Nevalainen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Heikkonen</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Estimating the prediction performance of spatial models via spatial k-fold cross validation</article-title>. <source>Int. J. Geogr. Inf. Sci.</source> <volume>31</volume> (<issue>10</issue>), <fpage>2001</fpage>&#x2013;<lpage>2019</lpage>. <pub-id pub-id-type="doi">10.1080/13658816.2017.1346255</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qiao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Escobar</surname>
<given-names>L. E.</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>A. T.</given-names>
</name>
<name>
<surname>Sober&#xf3;n</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>An evaluation of transferability of ecological niche models</article-title>. <source>Ecography</source> <volume>42</volume> (<issue>3</issue>), <fpage>521</fpage>&#x2013;<lpage>534</lpage>. <pub-id pub-id-type="doi">10.1111/ecog.03986</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramezan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Warner</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Maxwell</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification</article-title>. <source>Remote Sens.</source> <volume>11</volume> (<issue>2</issue>), <fpage>185</fpage>. <comment>Article 2</comment>. <pub-id pub-id-type="doi">10.3390/rs11020185</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roberts</surname>
<given-names>D. R.</given-names>
</name>
<name>
<surname>Bahn</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Ciuti</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Boyce</surname>
<given-names>M. S.</given-names>
</name>
<name>
<surname>Elith</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guillera-Arroita</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure</article-title>. <source>Ecography</source> <volume>40</volume> (<issue>8</issue>), <fpage>913</fpage>&#x2013;<lpage>929</lpage>. <pub-id pub-id-type="doi">10.1111/ecog.02881</pub-id>
</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rocha</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Groen</surname>
<given-names>T. A.</given-names>
</name>
<name>
<surname>Skidmore</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Willemen</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Role of sampling design when predicting spatially dependent ecological data with remote sensing</article-title>. <source>IEEE Trans. Geoscience Remote Sens.</source> <volume>59</volume> (<issue>1</issue>), <fpage>663</fpage>&#x2013;<lpage>674</lpage>. <pub-id pub-id-type="doi">10.1109/tgrs.2020.2989216</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Siegel</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Gerth</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>Optical remote sensing applications in the Baltic Sea</article-title>,&#x201d; in <source>Remote sensing of the European seas</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Barale,</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Gade</surname>
<given-names>M.</given-names>
</name>
</person-group> (<publisher-name>Springer</publisher-name>), <fpage>91</fpage>&#x2013;<lpage>102</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>J. N.</given-names>
</name>
<name>
<surname>Kelly</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Renner</surname>
<given-names>I. W.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Validation of presence-only models for conservation planning and the application to whales in a multiple-use marine park</article-title>. <source>Ecol. Appl.</source> <volume>31</volume> (<issue>1</issue>), <fpage>e02214</fpage>. <pub-id pub-id-type="doi">10.1002/eap.2214</pub-id>
</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Satellite mapping of Baltic Sea Secchi depth with multiple regression models</article-title>. <source>Int. J. Appl. Earth Observation Geoinformation</source> <volume>40</volume>, <fpage>55</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1016/j.jag.2015.04.002</pub-id>
</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Spatiotemporal distribution of labeled data can bias the validation and selection of supervised learning algorithms: a marine remote sensing example</article-title>. <source>ISPRS J. Photogrammetry Remote Sens.</source> <volume>187</volume>, <fpage>46</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1016/j.isprsjprs.2022.02.023</pub-id>
</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gregr</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>K. M. A.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Data leakage jeopardizes ecological applications of machine learning</article-title>. <source>Nat. Ecol. and Evol.</source> <volume>7</volume>, <fpage>1743</fpage>&#x2013;<lpage>1745</lpage>. <pub-id pub-id-type="doi">10.1038/s41559-023-02162-1</pub-id>
</citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Haupt</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Mach</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Micheli</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Mapping ecological indicators of human impact with statistical and machine learning methods: tests on the California coast</article-title>. <source>Ecol. Inf.</source> <volume>48</volume>, <fpage>37</fpage>&#x2013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.ecoinf.2018.07.007</pub-id>
</citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Subramaniam</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Accuracy of empirical satellite algorithms for mapping phytoplankton diagnostic pigments in the open ocean: a supervised learning perspective</article-title>. <source>Front. Mar. Sci.</source> <volume>7</volume> (<issue>599</issue>). <pub-id pub-id-type="doi">10.3389/fmars.2020.00599</pub-id>
</citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Subramaniam</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Iterative spatial leave-one-out cross-validation and gap-filling based data augmentation for supervised learning applications in marine remote sensing</article-title>. <source>GIScience and Remote Sens.</source> <volume>59</volume> (<issue>1</issue>), <fpage>1281</fpage>&#x2013;<lpage>1300</lpage>. <pub-id pub-id-type="doi">10.1080/15481603.2022.2107113</pub-id>
</citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stock</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Subramaniam</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Van Dijken</surname>
<given-names>G. L.</given-names>
</name>
<name>
<surname>Wedding</surname>
<given-names>L. M.</given-names>
</name>
<name>
<surname>Arrigo</surname>
<given-names>K. R.</given-names>
</name>
<name>
<surname>Mills</surname>
<given-names>M. M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Comparison of cloud-filling algorithms for marine satellite data</article-title>. <source>Remote Sens.</source> <volume>12</volume> (<issue>20</issue>), <fpage>3313</fpage>. <pub-id pub-id-type="doi">10.3390/rs12203313</pub-id>
</citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Strobl</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Boulesteix</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Zeileis</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hothorn</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Bias in random forest variable importance measures: Illustrations, sources and a solution</article-title>. <source>BMC bioinformatics</source>, <volume>8</volume>, <fpage>1</fpage>&#x2013;<lpage>21</lpage>.</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sweet</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>M&#xfc;ller</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Anand</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zscheischler</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Cross-validation strategy impacts the performance and interpretation of machine learning models</article-title>. <source>Artif. Intell. Earth Syst.</source> <volume>2</volume> (<issue>4</issue>). <pub-id pub-id-type="doi">10.1175/AIES-D-23-0026.1</pub-id>
</citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Trachsel</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Telford</surname>
<given-names>R. J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Technical note: estimating unbiased transfer-function performances in spatially structured environments</article-title>. <source>Clim. Past</source> <volume>12</volume>, <fpage>1215</fpage>&#x2013;<lpage>1223</lpage>. <pub-id pub-id-type="doi">10.5194/cp-12-1215-2016</pub-id>
</citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tuia</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kellenberger</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Beery</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Costelloe</surname>
<given-names>B. R.</given-names>
</name>
<name>
<surname>Zuffi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Risse</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Perspectives in machine learning for wildlife conservation</article-title>. <source>Nat. Commun.</source> <volume>13</volume> (<issue>1</issue>), <fpage>792</fpage>. <comment>Article 1</comment>. <pub-id pub-id-type="doi">10.1038/s41467-022-27980-y</pub-id>
</citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tziachris</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Nikou</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Aschonitis</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kallioras</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sachsamanoglou</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fidelibus</surname>
<given-names>M. D.</given-names>
</name>
<etal/>
</person-group> (<year>2023</year>). <article-title>Spatial or random cross-validation? The effect of resampling methods in predicting groundwater salinity with machine learning in mediterranean region</article-title>. <source>Water</source> <volume>15</volume> (<issue>12</issue>), <fpage>2278</fpage>. <comment>Article 12</comment>. <pub-id pub-id-type="doi">10.3390/w15122278</pub-id>
</citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Valavi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Elith</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lahoz-Monfort</surname>
<given-names>J. J.</given-names>
</name>
<name>
<surname>Guillera-Arroita</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>blockCV: an r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models</article-title>. <source>Methods Ecol. Evol.</source> <volume>10</volume> (<issue>2</issue>), <fpage>225</fpage>&#x2013;<lpage>232</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210X.13107</pub-id>
</citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Valavi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Elith</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lahoz-Monfort</surname>
<given-names>J. J.</given-names>
</name>
<name>
<surname>Guillera-Arroita</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Flexible species distribution modelling methods perform well on spatially separated testing data</article-title>. <source>Glob. Ecol. Biogeogr.</source> <volume>32</volume> (<issue>3</issue>), <fpage>369</fpage>&#x2013;<lpage>383</lpage>. <pub-id pub-id-type="doi">10.1111/geb.13639</pub-id>
</citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wadoux</surname>
<given-names>A. M. J.-C.</given-names>
</name>
<name>
<surname>Heuvelink</surname>
<given-names>G. B. M.</given-names>
</name>
<name>
<surname>de Bruin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Brus</surname>
<given-names>D. J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Spatial cross-validation is not the right way to evaluate map accuracy</article-title>. <source>Ecol. Model.</source> <volume>457</volume>, <fpage>109692</fpage>. <pub-id pub-id-type="doi">10.1016/j.ecolmodel.2021.109692</pub-id>
</citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wenger</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Olden</surname>
<given-names>J. D.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Assessing transferability of ecological models: an underappreciated aspect of statistical validation</article-title>. <source>Methods Ecol. Evol.</source> <volume>3</volume> (<issue>2</issue>), <fpage>260</fpage>&#x2013;<lpage>267</lpage>. <pub-id pub-id-type="doi">10.1111/j.2041-210X.2011.00170.x</pub-id>
</citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilde</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Deutsch</surname>
<given-names>C. V.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Robust alternatives to the traditional variogram</article-title>. <source>CCG Annu. Rep.</source> <volume>116</volume>.</citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yates</surname>
<given-names>K. L.</given-names>
</name>
<name>
<surname>Bouchet</surname>
<given-names>P. J.</given-names>
</name>
<name>
<surname>Caley</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Mengersen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Randin</surname>
<given-names>C. F.</given-names>
</name>
<name>
<surname>Parnell</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Outstanding challenges in the transferability of ecological models</article-title>. <source>Trends Ecol. and Evol.</source> <volume>33</volume> (<issue>10</issue>), <fpage>790</fpage>&#x2013;<lpage>802</lpage>. <pub-id pub-id-type="doi">10.1016/j.tree.2018.08.001</pub-id>
</citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Deep learning in environmental remote sensing: achievements and challenges</article-title>. <source>Remote Sens. Environ.</source> <volume>241</volume> (<issue>February</issue>), <fpage>111716</fpage>. <pub-id pub-id-type="doi">10.1016/j.rse.2020.111716</pub-id>
</citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Marine big data-driven ensemble learning for estimating global phytoplankton group composition over two decades (1997&#x2013;2020)</article-title>. <source>Remote Sens. Environ.</source> <volume>294</volume>, <fpage>113596</fpage>. <pub-id pub-id-type="doi">10.1016/j.rse.2023.113596</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>