<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article article-type="methods-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Environ. Sci.</journal-id>
<journal-title>Frontiers in Environmental Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Environ. Sci.</abbrev-journal-title>
<issn pub-type="epub">2296-665X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1392496</article-id>
<article-id pub-id-type="doi">10.3389/fenvs.2024.1392496</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Environmental Science</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Identification of mine water source by random forest combined with laser-induced fluorescence spectra</article-title>
<alt-title alt-title-type="left-running-head">Ma et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fenvs.2024.1392496">10.3389/fenvs.2024.1392496</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ma</surname>
<given-names>Xiaona</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2665698/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Yan</surname>
<given-names>Pengcheng</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Kun</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>School of Spatial Informatics and Geomatics Engineering</institution>, <institution>Anhui University of Science and Technology</institution>, <addr-line>Huainan</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>School of Electrical and Information Engineering</institution>, <institution>Anhui University of Science and Technology</institution>, <addr-line>Huainan</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1304299/overview">Ahmed El Nemr</ext-link>, National Institute of Oceanography and Fisheries (NIOF), Egypt</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1986247/overview">Zhenkun Liu</ext-link>, Nanjing University of Posts and Telecommunications, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1838911/overview">Mohammad Yazdi</ext-link>, Shahid Beheshti University, Iran</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Pengcheng Yan, <email>pcyan1988@126.com</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>07</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>12</volume>
<elocation-id>1392496</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>02</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>01</day>
<month>07</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2024 Ma, Yan and Wang.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Ma, Yan and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Mine water inrush disaster can quickly cause significant economic losses and casualties because of its strong concealing and rapid burst speed. Quickly identifying the source of mine water inrush is of great practical significance. Compared with the traditional hydrochemical analysis method, the laser-induced fluorescence (LIF) technology has fast reaction speed, high sensitivity, and strong stability, which makes up for the shortcomings of the traditional method. As an integrated algorithm, random forest (RF) has the advantage of high accuracy. A combination of LIF technology and RF algorithm is proposed to identify mine water inrush source rapidly. The experimental samples were collected from a coal mine in Hainan City, Anhui Province, and a total of 525 sets of water samples to be tested for experiments by mixing goaf water and sandstone water into A-G7 species according to different proportions. Moving average smoothing (MA), Savitzky-Golay Smoothing (SG), First derivative (FD), and Second derivative (SD) methods are used to preprocess the original spectral data to reduce the noise and interference information existing in the original spectral data. By comparison, the Moving average smoothing (MA) method has high classification accuracy and is the final method for noise reduction. Then, the RF algorithm is used to delete the less critical spectrum after noise reduction and select the characteristic wavelength with the minimum classification error of 0. Finally, SVM, PCA-SVM, MA-SVM, MA-PCA-SVM, and MA-RF algorithm recognition models were established, respectively. Comparing the prediction accuracy of the test set, the accuracy of the MA-RF algorithm in the five groups of models reached 100<inline-formula id="inf1">
<mml:math id="m1">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>, which can quickly and accurately predict mine water inrush.</p>
</abstract>
<kwd-group>
<kwd>laser induced fluorescence spectroscopy</kwd>
<kwd>mine water source</kwd>
<kwd>water source identification</kwd>
<kwd>random forest</kwd>
<kwd>preprocess</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Water and Wastewater Management</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Currently, mine water inrush is one of the most threatening disasters in the coal mine production process. The disaster has strong concealment and fast burst speed, easy to cause substantial economic losses and heavy casualties. <xref ref-type="bibr" rid="B31">Zhang et al. (2009)</xref> Therefore, identifying water inrush sources quickly and taking adequate preventive measures is the key to mine water disaster control.</p>
<p>The chemical composition of groundwater is relatively complex, and water quality analysis is the basic means of studying the chemical composition of groundwater. At present, the traditional methods of water inrushing source identification include hydrochemical characteristics analysis <xref ref-type="bibr" rid="B15">Li et al. (2014)</xref>, isotope tracer <xref ref-type="bibr" rid="B10">Huang and Wang (2018)</xref>, The water chemical characteristics of the aquifers are analyzed with computer drawn Piper three-line diagram. Fuzzy comprehensive evaluation and systematic clustering analysis are applied to analyze, compare and determine the water inrush source. The discrimination accurate rate is about 80<inline-formula id="inf2">
<mml:math id="m2">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>.The method of hydrochemistry takes a long time, which is also one of its drawbacks.</p>
<p>The traditional method dynamic change analysis of water <xref ref-type="bibr" rid="B23">Parras-Berrocal et al. (2022)</xref>, and methods based on GIS and water quality and temperature <xref ref-type="bibr" rid="B20">Oseke et al. (2021)</xref>. The variation of groundwater level in different coal seams presents different characteristics, with some dynamic changes being serrated and others being wavy. <xref ref-type="bibr" rid="B16">Li et al. (2015)</xref> Conduct a systematic three-dimensional search and identification of potential water inrush layers in coal seams, and determine calculation parameters based on GIS data and indoor experiments. Evaluate and compare the water level stability under normal and abnormal working conditions. <xref ref-type="bibr" rid="B30">Wu et al. (2011)</xref> Dynamic change analysis is greatly influenced by the geological environment, with a long data analysis cycle and low accuracy. GIS requires data processing and analysis before monitoring and prediction, so the process is relatively long, slow, and inefficient.</p>
<p>Based on previous research, some scholars have studied identification method based on coupled principal component analysis. According to the difference between the chemical components of water sources, the identification index variables of water inrush sources were determined. According to the difference between the chemical components of water sources, the identification index variables of water inrush sources were determined. The correlation between the water source groups was obtained through coupling principal component analysis. The combination of covariance matrix and Fisher discrimination, coupled with principal component analysis, can improve the recognition rate of water inrush sources to 90<inline-formula id="inf3">
<mml:math id="m3">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula> <xref ref-type="bibr" rid="B9">Huang and Chen (2011)</xref> The combination of other machine learning and optimization algorithms has been widely applied in the identification of mine water inrush sources. For example, recognition is based on models such as BP neural network <xref ref-type="bibr" rid="B18">Liu et al. (2015)</xref>, etc.,. These are all coupled with principal component analysis and often use a single classifier. Considering that the accuracy of recognition can continue to improve, further research is needed on the model&#x2019;s generalization ability and jumping out of local optima. Based on these proposed studies, we consider introducing LIF technology combined with RF algorithm models to maximize accuracy.</p>
<p>Laser-induced fluorescence (LIF) technology refers to the method of detecting the fluorescence emission after laser irradiation of a sample. It has the advantages of fast response, low interference, and high sensitivity. In recent years, LIF technology has been widely used in various fields. <xref ref-type="bibr" rid="B8">Hu et al. (2019)</xref> For example, <xref ref-type="bibr" rid="B1">Bukin et al. (2020)</xref> used LIF technology to detect soil oil pollutants, and <xref ref-type="bibr" rid="B5">Ghasemi et al. (2017)</xref> applied LIF technology to the medical field to conduct specific screening of breast tumors. <xref ref-type="bibr" rid="B27">Si-ying et al. (2022)</xref> used LIF technology to study the classification and recognition of Manuka honey adulterated with syrup. It can be seen that the LIF technology in the food field is also widely used. In the identification of mine water inrush source, there are some applications and research results, but it still needs to be improved and perfected.</p>
<p>Random forest (RF) is a supervised ensemble learning model for classification and regression <xref ref-type="bibr" rid="B19">Mantas et al. (2019)</xref>. Its essence is an integrated learning algorithm with the advantages of processing high dimensional data, high accuracy, and reasonable decision rules. <xref ref-type="bibr" rid="B28">Stevens, Stevens et al. (2015)</xref> conducted a study of regional population distribution patterns and influence mechanisms using RF models. <xref ref-type="bibr" rid="B2">Ceccato et al. (2021)</xref> used RF models to assess car-sharing switching rates for traditional transportation modes. <xref ref-type="bibr" rid="B33">Zhu et al. (2017)</xref> conducted rapeseed pest detection based on the RF model. <xref ref-type="bibr" rid="B21">Paing et al. (2020)</xref> established RF models for classifying benign and malignant lung nodules. RF algorithms are widely used in various industries, such as big data analytics, bioinformatics, financial risk control, and healthcare.</p>
<p>In this paper, firstly, the original spectral data is preprocessed using MA, SG, FD, and SD methods. Choose the MA method with the best classification accuracy as the final denoising method. Then, the RF algorithm is used to remove the spectra with lower importance after denoising, and the feature wavelengths with the minimum classification error of 0 are selected. Finally, five algorithm recognition models were constructed, including SVM, PCA-SVM, MA-SVM, MA-PCA-SVM, and MA-RF. The MA-RF algorithm with the highest accuracy was selected to quickly and accurately predict mine water inrush.</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>2 Materials and methods</title>
<p>This paper aims to identify the source of mine water inrush. Firstly, the fluorescence spectrum of the mine water source is obtained by the LIF system, and then the original fluorescence spectrum is pretreated. Then RF is used to identify the fluorescence spectrum of mine water source, and finally, the type of mine water source is identified. In particular, a desktop computer configured with Intel(R) Core(TM) i7-10700K was used as the data processing platform, and Matlab R2021a was used to complete the fluorescence spectrum analysis.</p>
<sec id="s2-1">
<title>2.1 Experimental materials</title>
<p>Goaf water is acidic, corrosive, and usually rich in high concentrations of harmful gases such as hydrogen sulfide. Goaf water is the most important and harmful source of water inrush in coal mines. This experiment mainly takes goaf water mixed with sandstone water as the research object. The experimental material was goaf water and sandstone water from a coal mine in Huainan City, Anhui Province, in July 2022. The goaf water and sandstone water were mixed at different volume ratios. 75 water samples were selected from each group to form the following sample set: 1) Group A: Mixed water with a volume ratio of goaf water and sandstone water of 4:1. 2) Group B: mixed water with a volume ratio of 3:1 between goaf water and sandstone water. 3) Group C: mixed water with a 2:1 volume ratio of goaf water and sandstone water. 4) Group D: mixed water with a 1:1 volume ratio of goaf water and sandstone water. 5) Group E: mixed water with a volume ratio 1:2 of goaf water and sandstone water. 6) Group F: mixed water with a volume ratio of goaf water and sandstone water of 1:3. 7) Group G: mixed water with a volume ratio of 1:4 of goaf water and sandstone water.</p>
<p>In order to ensure that the experimental data is more accurate and reliable, the water samples collected at the site are placed in a dark room, sealed, and stored away from light. According to the different mixing ratios, a total of 525 sets of spectral data were obtained as experimental samples.</p>
</sec>
<sec id="s2-2">
<title>2.2 Lif spectroscopy acquisition</title>
<p>The structure of the multispectral acquisition system adopted in this paper is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. The main components of the system include laser, spectrometer, fluorescence probe, and a computer equipped with spectral acquisition software. A 405&#xa0;m semiconductor laser (Beijing Huayuan Toda Laser Technology Co., LTD.) was used to excite the fluorescence of the mine water source. The spectrometer is a USB200&#x2b; mini-spectrometer (Ocean Optics, United States), which is equipped with a 2048-dimensional linear CCD for fluorescence spectra measurement. The immersion micro-fluorescent probe model FPB-405-V3 (Guangdong Koskai Company) can be inserted into the sample to obtain fluorescence signals. Spectra Suite software is installed on the computer for the acquisition, display, and saving of fluorescence spectra. The algorithm simulation is run in Matlab R2021a environment.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Schematic diagram of LIF spectral system.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g001.tif"/>
</fig>
<p>The fluorescence spectrum of mine water source is collected in the same environment to reduce the interference of external factors on the fluorescence spectrum. During the experiment, the power of the laser was set to 100&#xa0;mW, the spectrum acquisition range of the spectrometer was set to 340&#x2013;1,021&#xa0;nm, and the integration time was set to 1&#xa0;ms. After the equipment was ready, the fluorescence spectra of mine water source samples were collected by the LIF system, and 75 fluorescence spectra of seven kinds of mine water sources were obtained, totaling 525 mine water fluorescence spectra. In the process of establishing the mine water source identification model, 60 samples of each mine water sample are randomly selected as the training set and the rest as the test set. That is, the training set contains a total of 420 mine water fluorescence spectrum samples, and the test set contains a total of 105 mine water fluorescence spectrum samples. In addition, in the process of establishing the fluorescence spectrum identification model of mine water source, ten-fold cross validation is introduced to make the classification model more reliable.</p>
</sec>
<sec id="s2-3">
<title>2.3 Pretreatment of fluorescence spectra</title>
<p>Due to the interference of system noise and external noise in the collection process of laser-induced fluorescence spectrum, the original fluorescence spectrum of mine water source collected contains useless noise interference information, which has a great impact on the experimental results. Therefore, it is necessary to preprocess the original fluorescence spectral data. Common spectral preprocessing methods include Moving average smoothing (MA), Savitzky-Golay smoothing (SG) <xref ref-type="bibr" rid="B26">Schettino et al. (2016)</xref>, First derivative (FD)<xref ref-type="bibr" rid="B12">Jin et al. (2012)</xref> and Second derivative (SD) <xref ref-type="bibr" rid="B3">Czarnecki (2015)</xref>. These methods are used to denoise the fluorescence spectral data. According to the evaluation index of the selected classification model, the prediction ability of the original spectrum and the denoised spectrum is compared, and the appropriate denoising method is selected.</p>
</sec>
<sec id="s2-4">
<title>2.4 Random forest for fluorescene spectrum analysis</title>
<p>Random forest consists of many decision trees. It is a supervised algorithm for classification and regression, also known as Classified And Regression Tree (CART), which was proposed by Breiman <xref ref-type="bibr" rid="B22">Parcha et al. (2007)</xref>. The random forest algorithm process is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. In the calculation process, the binary tree segmentation rule is used to divide the trained sample set into two subsample sets each time, and each non-leaf node has two branches. The subsample set repeats the action until it can no longer split. The randomness of the random forest is reflected in the fact that when training each tree from the full sample size of N, a dataset that may have repetitions of the same size of N is selected. We call it Bootstrap sampling. At each node, a subset of all features is randomly selected to calculate the optimal segmentation method <xref ref-type="bibr" rid="B6">Goehry et al. (2021)</xref>. Bagging is a put-back sampling technique based on Bootstrap. According to the sampling probability, about 36.79<inline-formula id="inf4">
<mml:math id="m4">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula> of the original data will not be selected by Bootstrap sampling and will not participate in the establishment of a decision tree. These data constitute the out-of-bag (OOB) data set <xref ref-type="bibr" rid="B14">Kotsiantis (2011)</xref>. This part of data can be used to evaluate the performance of the decision tree and calculate the prediction error rate of the model, which is called out-of-bag error. The algorithm of each single decision tree has low precision and is prone to overfitting. If the accuracy is improved, multiple decision trees need to be gathered together to form a random forest model. The structure of its prediction model is <inline-formula id="inf5">
<mml:math id="m5">
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1,2,3</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mi>K</mml:mi>
</mml:math>
</inline-formula>, where k represents the number of decision trees, <inline-formula id="inf6">
<mml:math id="m6">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, are each random vector independent of each other and in the same direction. The independent variable x is determined, and the final prediction is decided by voting <xref ref-type="bibr" rid="B24">Quadrianto and Ghahramani (2014)</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Random forest algorithm process.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g002.tif"/>
</fig>
<p>In the random forest model, the feature importance can be measured by the OOB error. For each decision tree, the corresponding out-of-bag data error is recorded as errOOB1. Then, randomly add noise interference to the feature x of all samples of the OOB data, calculate the error of the data outside the bag again, and record it as errOOB2. If there are N trees in the random forest, The importance of feature x W is expressed as <inline-formula id="inf7">
<mml:math id="m7">
<mml:mi>W</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi>
<mml:mn>2</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> <xref ref-type="bibr" rid="B7">Gupta et al. (2022)</xref>. Suppose the accuracy rate outside the bag is greatly reduced after noise interference is added. In that case, the feature has a significant influence on the classification result, and the feature is of high importance.</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s3">
<title>3 Results and discussion</title>
<sec id="s3-1">
<title>3.1 The original fluorescene spectrum of mine water source</title>
<p>In order to ensure that no other factors affect the results of the experiment, the experiment was conducted in a laboratory without a light source and at a constant temperature. The fluorescence spectrum data of mine water source was collected by micro-optical fiber and fluorescence spectrometer. After laser irradiation, the fluorescent substances in mine water source absorb light energy, stimulate and release energy, produce fluorescence, and form fluorescence spectrum. The spectrum peaks between 420 and 650&#xa0;nm and differences in this range are concentrated. The original fluorescence spectrum is shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. <xref ref-type="fig" rid="F3">Figure 3A</xref> shows the original fluorescence spectra of all mine water samples, and <xref ref-type="fig" rid="F3">Figure 3B</xref> shows the fluorescence spectra of one sample randomly selected from each type of A-G goaf water and sandstone water mixed samples. The morphology and wave peaks of the original fluorescence spectra are very similar, and there are crosses between different spectra. The spectral differences are small, so it is difficult to observe and distinguish the differences between experimental samples. Therefore, the original fluorescence spectral data should be preprocessed.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The original fluorescence spectrum of mine water sample.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g003.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>3.2 Selection of spectral pretreatment method</title>
<p>In order to eliminate noise interference in the original fluorescence spectrum, reduce errors, and retain useful information in the fluorescence spectrum, MA, SG, FD, and SD were used to preprocess the original fluorescence spectrum data of mine water samples. Respectively, the preprocessed fluorescence spectra are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Fluorescence spectra after different pretreatments.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g004.tif"/>
</fig>
<p>The four groups of diagrams in <xref ref-type="fig" rid="F4">Figure 4</xref> show that the fluorescence spectra of MA and SG preprocessed spectra have many advantages, although there are still overlapping parts. Compared with the original fluorescence spectra, they are more dispersed, with less noise interference, and the water samples are easier to distinguish. While the FD and SD processed data are redundant, with large noise interference, which affects the accuracy of the spectral data.</p>
<p>The classification accuracy and training time of the four preprocessing methods were obtained by RF classification. The results were shown in <xref ref-type="table" rid="T1">Table 1</xref>. The classification accuracy of MA reached 99.24%, and the training time was 0.3434s. The classification accuracy of SG is 99.05%, and the training time is 0.3431s. Both methods have improved the classification accuracy of 98.10% and the training time of 0.3660 of the original spectrum. Overall, the pretreatment effect of MA is the best. The data also showed that the classification accuracy of FD and SD were 80% and 61.52%, and the training time was 0.3975s and 0.3884s, which showed poor processing effect.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Classification result of different preprocessing methods under decision tree.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center"/>
<th align="center">Original spectra</th>
<th align="center">MA</th>
<th align="center">SG</th>
<th align="center">FD</th>
<th align="center">SD</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">Classification accuracy (%)</td>
<td align="center">98.10</td>
<td align="center">99.24</td>
<td align="center">99.05</td>
<td align="center">80.00</td>
<td align="center">61.52</td>
</tr>
<tr>
<td align="center">Training time(s)</td>
<td align="center">0.3660</td>
<td align="center">0.3434</td>
<td align="center">0.3431</td>
<td align="center">0.3975</td>
<td align="center">0.3884</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-3">
<title>3.3 Random forest analysis of fluorescence spectra</title>
<p>Random forest is an ensemble learning algorithm that improves classification and regression accuracy by combining multiple decision trees. In the random forest, we need to adjust the parameters, and the first is to set the minimum number of leaves. Each decision tree divides the data set into smaller subsets through continuous segmentation, each subset corresponds to a node in the tree and the leaf node is the final subset <xref ref-type="bibr" rid="B13">Karabadji et al. (2023)</xref>. According to the setting of the minimum number of leaves, each node must have a certain number of samples in the subset after segmentation to continue downward segmentation. Otherwise, it will become a leaf node. How to set the minimum number of leaves greatly affects the classification and regression results of random forest. If the minimum number of leaves is set too large, the depth of the decision tree will be shallow, resulting in an underfitting phenomenon. If the minimum number of leaves is set too small, the depth of the decision tree will be large, and overfitting will occur <xref ref-type="bibr" rid="B25">Santra et al. (2020)</xref>. Out-of-bag (OOB) data can be used as generalization error to evaluate the model. After training, RF&#x27;s out-of-bag error rate is shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. The minimum number of leaves is set to 6, the number of trees is set to 24, and the out-of-bag error rate is reduced to 0, achieving 100<inline-formula id="inf8">
<mml:math id="m8">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula> identification accuracy and ensuring the stability and reliability of results.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>RF classification errors under different minimum number of leaves.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g005.tif"/>
</fig>
<p>There are 2048 attributes in the original fluorescence spectral data, each of which contains different spectral information, and the importance of different attributes to spectral analysis is obviously different. Non-critical attributes with low importance will affect the establishment of the classification model for mine water inrush samples and thus fail to achieve a good prediction effect. Based on random forest, the feature importance is analyzed. As shown in <xref ref-type="fig" rid="F6">Figure 6</xref>, the importance of most attributes is 0. After MA preprocessing, some attributes eliminate interference information and make their importance prominent. The fluorescence spectrum with wavelengths between 420 and 620&#xa0;nm has obvious characteristics, and the feature importance is between 0.1&#x2013;0.3.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Characteristic importance of fluorescence spectra at different wavelength positions.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g006.tif"/>
</fig>
<p>The RF classification model can select features from their importance through threshold selection for model optimization <xref ref-type="bibr" rid="B11">Hwang et al. (2023)</xref>. Try to remove unimportant features at lower levels to optimize the accuracy and efficiency of the model. When the threshold is 0.25, the number of selected feature wavelengths is 2, the minimum classification error is 1.71<inline-formula id="inf9">
<mml:math id="m9">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>, the number of trees is 60, and the training time is 0.202s. Although the training time is short, the minimum classification error value is too large. When the threshold is set to 0.10 and 0.05, the number of selected feature wavelengths is 281, and the number of decision trees is 18, the minimum classification error can reach 0. Then, the training time is 0.3790s and 0.3951s, respectively. Overall, when the threshold is 0.1, the classification effect is good, and the training time is short.</p>
<p>A total of seven groups of A-G samples mixed with different volume ratios of goaf water and sandstone water were trained. Since the test set and training set were generated randomly, the results would be different each time. The RF prediction model of a certain run was fitted, and the results are shown in <xref ref-type="fig" rid="F7">Figure 7</xref>. After MA preprocessing, the RF model threshold was set to 0.10, and the number of characteristic wavelengths was 281. The predicted results were good, and the predicted values were basically consistent with the actual values. This is consistent with the results under different feature importance thresholds in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Actual category and predicted category of the test set.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g007.tif"/>
</fig>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>The results of different feature importance thresholds.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Thresholds</th>
<th align="center">Number of wavelengths</th>
<th align="center">Minimum classification error (%)</th>
<th align="center">Number of trees</th>
<th align="center">Training time s)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">0.25</td>
<td align="center">2</td>
<td align="center">1.71</td>
<td align="center">60</td>
<td align="center">0.2022</td>
</tr>
<tr>
<td align="center">0.20</td>
<td align="center">23</td>
<td align="center">0.19</td>
<td align="center">18</td>
<td align="center">0.2341</td>
</tr>
<tr>
<td align="center">0.15</td>
<td align="center">85</td>
<td align="center">0</td>
<td align="center">37</td>
<td align="center">0.2861</td>
</tr>
<tr>
<td align="center">0.10</td>
<td align="center">281</td>
<td align="center">0</td>
<td align="center">18</td>
<td align="center">0.3790</td>
</tr>
<tr>
<td align="center">0.05</td>
<td align="center">281</td>
<td align="center">0</td>
<td align="center">18</td>
<td align="center">0.3951</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-4">
<title>3.4 Comparison with traditional</title>
<p>In order to verify the evaluation effect and reliability of the RF algorithm model for mine water inrush prediction, Support Vector Machine (SVM) <xref ref-type="bibr" rid="B4">Ding et al. (2017)</xref>, Principal Component Analysis (PCA) <xref ref-type="bibr" rid="B32">Zhou et al. (2020)</xref>, and Moving average smoothing (MA) algorithm are used to identify mine water inrush independently or combined with algorithm modeling. In the experiment, the training data was randomly selected, and the experiment was repeated three times independently. The experimental results are the average of three times, as shown in <xref ref-type="fig" rid="F8">Figure 8</xref>.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Comparison with traditional recognition methods.</p>
</caption>
<graphic xlink:href="fenvs-12-1392496-g008.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F8">Figure 8</xref>, when only SVM algorithm is used to identify mine water inrush, the spectral data has the interference of redundant information, and the recognition accuracy is the lowest 98.1<inline-formula id="inf10">
<mml:math id="m10">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>. With the combination of PCA-SVM algorithm, the recognition accuracy has been improved, reaching 99.05<inline-formula id="inf11">
<mml:math id="m11">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>. After MA preprocessing, the MA-SVM and MA-PCA-SVM algorithms reduce the noise interference in the spectrum and greatly improve the recognition accuracy, reaching 99.81<inline-formula id="inf12">
<mml:math id="m12">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>. Finally, the prediction and evaluation performance of MA-RF algorithm is the best, and the accuracy reaches 100<inline-formula id="inf13">
<mml:math id="m13">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>. It has the highest accuracy and stable results.</p>
</sec>
</sec>
<sec sec-type="conclusion" id="s4">
<title>4 Conclusion</title>
<p>This study proposes a laser induced fluorescence method for identifying mine water sources, which combines the identification model of random forests. Firstly, a LIF spectral acquisition system is established, and different spectral data are obtained by laser irradiation of water inrush samples. The original samples were collected from sandstone water and goaf water in a coal mine in Huainan, and the two types of water were mixed in different proportions to form a total of seven water samples. The laser induced fluorescence spectra of the seven water samples were identified and analyzed.</p>
<p>Then, smooth preprocessing of spectral data using different processing methods. Moving average smoothing (MA),Savitzky-Golay Smoothing (SG), First derivative (FD), and Second derivative (SD) methods are used to preprocess the original spectral data to reduce the noise and interference information existing in the original spectral data. By comparison, the Moving average smoothing (MA) method has high classification accuracy and is the final method for noise reduction. Based on the feature importance analysis of RF for fluorescence spectra at different wavelength positions, when the threshold is set to 0.1, the minimum classification error is 0. When the number of characteristic wavelengths is selected, and the best classification effect is obtained. Finally, compared with SVM, PCA-SVM, MA-SVM, and MA-PCA-SVM, the MA-RF algorithm reaches 100<inline-formula id="inf14">
<mml:math id="m14">
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula> recognition accuracy. The other numerical values are represented as 98.1<inline-formula id="inf15">
<mml:math id="m15">
<mml:mi>%</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>99.05</mml:mn>
<mml:mi>%</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>99.81</mml:mn>
<mml:mi>%</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>99.81</mml:mn>
<mml:mi>%</mml:mi>
</mml:math>
</inline-formula>.</p>
<p>The experimental analysis shows that it is feasible to use RF combined with laser-induced fluorescence technology for prediction and evaluation of mine water inrush. Compared with traditional hydrochemical analysis, principal component analysis, and dynamic water level analysis methods,Laser-induced fluorescence technology can achieve non disturbance, real-time <italic>in-situ</italic> measurement, and fluorescence spectroscopy analysis has advantages such as high sensitivity and fast speed. The RF recognition model constructed using MA preprocessed spectral data has the best recognition effect on water samples. This is due to the different effects of the four preprocessing methods on spectral data. SG mainly eliminates the influence of large scale differences in spectral data, MA is used to eliminate random noise and improve signal-to-noise ratio, while FD and SD mainly reduce the influence of uneven distribution; The MA preprocessing method performed the best in this work. The MA-RF classification model has good performance in identifying water sources. Compared to the other three SVM models, PCA-SVM, MA-SVM, and MA-PCA-SVM, the training accuracy is the best. It is providing new exploration and improvement for artificial intelligence in identifying water sources in mines.</p>
<p>Going forward, three critical areas need to be explored further. First, in the subsequent experiments, we will continuously expand the research on coal mining areas and the categories of aquifer water samples, and improve the model database. Because the water source identification model for coal mines requires a large number of representative aquifer water samples as the foundation, in order to make the model have the best adaptability and reliability. Second, given that the hydrochemical analysis method has accumulated a lot of experience, in practical analysis, comprehensive water source identification research can be carried out by adding online measurement of pH value, conductivity, <italic>etc.</italic>, as well as measuring water pressure and inflow as characteristic values, and conducting water source warning research based on various data. Finally, we should investigate more effective methods for determining the weights of weighted classifiers to enhance the predictive performance of the model.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>XM: Methodology, Writing&#x2013;original draft. PY: Data curation, Writing&#x2013;review and editing. KW: Conceptualization, Investigation, Software, Writing&#x2013;review and editing.</p>
</sec>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bukin</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Proschenko</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Alexey</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Korovetskiy</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bukin</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Yurchik</surname>
<given-names>V.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>New solutions of laser-induced fluorescence for oil pollution monitoring at sea</article-title>. <source>Photonics (MDPI)</source> <volume>7</volume>, <fpage>36</fpage>. <pub-id pub-id-type="doi">10.3390/photonics7020036</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ceccato</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chicco</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Diana</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Evaluating car-sharing switching rates from traditional transport means through logit models and random forest classifiers</article-title>. <source>Transportation Planning and Technology</source> <volume>44</volume>, <fpage>160</fpage>&#x2013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1080/03081060.2020.1868084</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Czarnecki</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Resolution enhancement in second-derivative spectra</article-title>. <source>Applied spectroscopy</source> <volume>69</volume>, <fpage>67</fpage>&#x2013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1366/14-07568</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Twin support vector machine: theory, algorithm and applications</article-title>. <source>Neural Computing and Applications</source> <volume>28</volume>, <fpage>3119</fpage>&#x2013;<lpage>3130</lpage>. <pub-id pub-id-type="doi">10.1007/s00521-016-2245-4</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghasemi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Parvin</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Motlagh</surname>
<given-names>N. S. H.</given-names>
</name>
<name>
<surname>Abachi</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Lif spectroscopy of stained malignant breast tissues</article-title>. <source>Biomedical Optics Express</source> <volume>8</volume>, <fpage>512</fpage>&#x2013;<lpage>523</lpage>. <pub-id pub-id-type="doi">10.1364/BOE.8.000512</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goehry</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Goude</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Massart</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Poggi</surname>
<given-names>J.-M.</given-names>
</name>
</person-group> (<year>2021</year>). <source>Random forests for time series</source>. <pub-id pub-id-type="doi">10.57805/revstat.v21i2.400</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kuchibhotla</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Ramdas</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Nested conformal prediction and quantile out-of-bag ensemble methods</article-title>. <source>Pattern Recognition</source> <volume>127</volume>, <fpage>108496</fpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2021.108496</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lai</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Selection of characteristic wavelengths using spa for laser induced fluorescence spectroscopy of mine water inrush</article-title>. <source>Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy</source> <volume>219</volume>, <fpage>367</fpage>&#x2013;<lpage>374</lpage>. <pub-id pub-id-type="doi">10.1016/j.saa.2019.04.045</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>P. H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J. S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Prediction of water inrush from coal floor based on Fisher discriminant analysis</article-title>. <source>Applied Mechanics and Materials</source> <volume>71</volume>, <fpage>4211</fpage>&#x2013;<lpage>4214</lpage>. <pub-id pub-id-type="doi">10.4028/www.scientific.net/amm.71-78.4211</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Groundwater-mixing mechanism in a multiaquifer system based on isotopic tracing theory: a case study in a coal mine district, china</article-title>. <source>Geofluids</source> <volume>2018</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1155/2018/9549141</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hwang</surname>
<given-names>S.-W.</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-C.</given-names>
</name>
<etal/>
</person-group> (<year>2023</year>). <article-title>Feature importance measures from random forest regressor using near-infrared spectra for predicting carbonization characteristics of kraft lignin-derived hydrochar</article-title>. <source>Journal of Wood Science</source> <volume>69</volume>, <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1186/s10086-022-02073-y</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jin</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lavery</surname>
<given-names>J. E.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>S.-C.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Univariate cubic l 1 interpolating splines based on the first derivative and on 5-point windows: analysis, algorithm and shape-preserving properties</article-title>. <source>Computational Optimization and Applications</source> <volume>51</volume>, <fpage>575</fpage>&#x2013;<lpage>600</lpage>. <pub-id pub-id-type="doi">10.1007/s10589-011-9426-y</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karabadji</surname>
<given-names>N. E. I.</given-names>
</name>
<name>
<surname>Korba</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Assi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Seridi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Aridhi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Dhifli</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Accuracy and diversity-aware multi-objective approach for random forest construction</article-title>. <source>Expert Systems with Applications</source> <volume>225</volume>, <fpage>120138</fpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2023.120138</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kotsiantis</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Combining bagging, boosting, rotation forest and random subspace methods</article-title>. <source>Artificial intelligence review</source> <volume>35</volume>, <fpage>223</fpage>&#x2013;<lpage>240</lpage>. <pub-id pub-id-type="doi">10.1007/s10462-010-9192-8</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Seawater inrush assessment based on hydrochemical analysis enhanced by hierarchy clustering in an undersea goldmine pit, china</article-title>. <source>Environmental earth sciences</source> <volume>71</volume>, <fpage>4977</fpage>&#x2013;<lpage>4987</lpage>. <pub-id pub-id-type="doi">10.1007/s12665-013-2888-8</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Dynamic risk assessment of water inrush in tunnelling and software development</article-title>. <source>Geomechanics and engineering</source> <volume>9</volume>, <fpage>57</fpage>&#x2013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.12989/gae.2015.9.1.057</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>W. T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y. S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Risk evaluation of water inrush from coal floor based on bp neural network</article-title>. <source>Applied Mechanics and Materials</source> <volume>744</volume>, <fpage>1728</fpage>&#x2013;<lpage>1732</lpage>. <pub-id pub-id-type="doi">10.4028/www.scientific.net/amm.744-746.1728</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mantas</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Castellano</surname>
<given-names>J. G.</given-names>
</name>
<name>
<surname>Moral-Garc&#xed;a</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Abell&#xe1;n</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A comparison of random forest based algorithms: random credal random forest versus oblique random forest</article-title>. <source>Soft Computing</source> <volume>23</volume>, <fpage>10739</fpage>&#x2013;<lpage>10754</lpage>. <pub-id pub-id-type="doi">10.1007/s00500-018-3628-5</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oseke</surname>
<given-names>F. I.</given-names>
</name>
<name>
<surname>Anornu</surname>
<given-names>G. K.</given-names>
</name>
<name>
<surname>Adjei</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Eduvie</surname>
<given-names>M. O.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Assessment of water quality using gis techniques and water quality index in reservoirs affected by water diversion</article-title>. <source>Water-Energy Nexus</source> <volume>4</volume>, <fpage>25</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1016/j.wen.2020.12.002</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paing</surname>
<given-names>M. P.</given-names>
</name>
<name>
<surname>Hamamoto</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Tungjitkusolmun</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Visitsattapongse</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pintavirooj</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Automatic detection of pulmonary nodules using three-dimensional chain coding and optimized random forest</article-title>. <source>Applied Sciences</source> <volume>10</volume>, <fpage>2346</fpage>. <pub-id pub-id-type="doi">10.3390/app10072346</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Parcha</surname>
<given-names>S. K.</given-names>
</name>
<name>
<surname>Sabnis</surname>
<given-names>S. V.</given-names>
</name>
<name>
<surname>Saraswati</surname>
<given-names>P. K.</given-names>
</name>
</person-group> (<year>2007</year>) <article-title>Taxonomic application of classification and regression tree (cart) and random forests (rf): a case study of middle cambrian trilobites</article-title>. <source>Geological Society of India</source>, <volume>70</volume> (<issue>6</issue>), <fpage>1033</fpage>&#x2013;<lpage>1038</lpage>. <pub-id pub-id-type="doi">10.1111/j.1752-1688.2007.00132.x</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parras-Berrocal</surname>
<given-names>I. M.</given-names>
</name>
<name>
<surname>V&#xe1;zquez</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Cabos</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Sein</surname>
<given-names>D. V.</given-names>
</name>
<name>
<surname>&#xc1;lvarez</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Bruno</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Surface and intermediate water changes triggering the future collapse of deep water formation in the north western mediterranean</article-title>. <source>Geophysical Research Letters</source> <volume>49</volume>, <fpage>e2021GL095404</fpage>. <pub-id pub-id-type="doi">10.1029/2021GL095404</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quadrianto</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Ghahramani</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>A very simple safe-Bayesian random forest</article-title>. <source>IEEE transactions on pattern analysis and machine intelligence</source> <volume>37</volume>, <fpage>1297</fpage>&#x2013;<lpage>1303</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2014.2362751</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santra</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Paul</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mukherjee</surname>
<given-names>D. P.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Deterministic dropout for deep neural networks using composite random forest</article-title>. <source>Pattern Recognition Letters</source> <volume>131</volume>, <fpage>205</fpage>&#x2013;<lpage>212</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2019.12.023</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schettino</surname>
<given-names>B. M.</given-names>
</name>
<name>
<surname>Duque</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Silveira</surname>
<given-names>P. M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Current-transformer saturation detection using savitzky-golay filter</article-title>. <source>IEEE Transactions on Power Delivery</source> <volume>31</volume>, <fpage>1400</fpage>&#x2013;<lpage>1401</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRD.2016.2521327</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Si-ying</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yi-wen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yu-rong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wen-hui</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yu-peng</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Classification and recognition of adulterated manuka honey by multi-wavelength laser-induced fluorescence</article-title>. <source>Spectrosc. Spectr. ANALYSIS</source> <volume>42</volume>, <fpage>2807</fpage>&#x2013;<lpage>2812</lpage>. <pub-id pub-id-type="doi">10.3964/j.issn.1000-0593(2022)09-2807-06</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stevens</surname>
<given-names>F. R.</given-names>
</name>
<name>
<surname>Gaughan</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Linard</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tatem</surname>
<given-names>A. J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data</article-title>. <source>PLOS ONE</source> <volume>10</volume>, <fpage>01070422</fpage>&#x2013;<lpage>e107122</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0107042</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Prediction of floor water inrush: the application of gis-based ahp vulnerable index method to donghuantuo coal mine, China</article-title>. <source>Rock Mechanics and Rock Engineering</source> <volume>44</volume>, <fpage>591</fpage>&#x2013;<lpage>600</lpage>. <pub-id pub-id-type="doi">10.1007/s00603-011-0146-5</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ahmad</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Application of an improved flow-stress-damage model to the criticality assessment of water inrush in a mine: a case study</article-title>. <source>Rock Mechanics and Rock Engineering</source> <volume>42</volume>, <fpage>911</fpage>&#x2013;<lpage>930</lpage>. <pub-id pub-id-type="doi">10.1007/s00603-008-0004-2</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Glycerol&#x2019;s generalized two-dimensional correlation ir/nir spectroscopy and its principal component analysis</article-title>. <source>Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy</source> <volume>228</volume>, <fpage>117824</fpage>. <pub-id pub-id-type="doi">10.1016/j.saa.2019.117824</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Image recognition of rapeseed pests based on random forest classifier</article-title>. <source>International Journal of Information Technology and Web Engineering (IJITWE)</source> <volume>12</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.4018/IJITWE.2017070101</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>