<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2023.1081050</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Automatic acoustic recognition of pollinating bee species can be highly improved by Deep Learning models accompanied by pre-training and strong data augmentation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ferreira</surname>
<given-names>Alef Iury Siqueira</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2251916"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>da Silva</surname>
<given-names>N&#xe1;dia Felix Felipe</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mesquita</surname>
<given-names>Fernanda Neiva</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2071482"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rosa</surname>
<given-names>Thierson Couto</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Monz&#xf3;n</surname>
<given-names>Victor Hugo</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2071147"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Mesquita-Neto</surname>
<given-names>Jos&#xe9; Neiva</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/815488"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Instituto de Informatica, Universidade Federal de Goias</institution>, <addr-line>Goiania, Goias</addr-line>, <country>Brazil</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Laboratorio Ecolog&#x131;a de Abejas, Departamento de Biolog&#x131;a y Qu&#x131;mica, Facultad de Ciencias Basicas, Universidad Catolica del Maule</institution>, <addr-line>Talca</addr-line>, <country>Chile</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Dun Wang, Northwest A&amp;F University, China</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Chunsheng Hou, Institute of Bast Fiber Crops (CAAS), China; Jenni Stockan, James Hutton Institute, United Kingdom</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Jos&#xe9; Neiva Mesquita-Neto, <email xlink:href="mailto:jmesquita@ucm.cl">jmesquita@ucm.cl</email>
</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Sustainable and Intelligent Phytoprotection, a section of the journal Frontiers in Plant Science</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>04</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1081050</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>10</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>03</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Ferreira, da Silva, Mesquita, Rosa, Monz&#xf3;n and Mesquita-Neto</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ferreira, da Silva, Mesquita, Rosa, Monz&#xf3;n and Mesquita-Neto</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Bees capable of performing floral sonication (or buzz-pollination) are among the most effective pollinators of blueberries. However, the quality of pollination provided varies greatly among species visiting the flowers. Consequently, the correct identification of flower visitors becomes indispensable to distinguishing the most efficient pollinators of blueberry. However, taxonomic identification normally depends on microscopic characteristics and the active participation of experts in the decision-making process. Moreover, the many species of bees (20,507 worldwide) and other insects are a challenge for a decreasing number of insect taxonomists. To overcome the limitations of traditional taxonomy, automatic classification systems of insects based on Machine-Learning (ML) have been raised for detecting and distinguishing a wide variety of bioacoustic signals, including bee buzzing sounds. Despite that, classical ML algorithms fed by spectrogram-type data only reached marginal performance for bee ID recognition. On the other hand, emerging systems from Deep Learning (DL), especially Convolutional Neural Networks (CNNs), have provided a substantial boost to classification performance in other audio domains, but have yet to be tested for acoustic bee species recognition tasks. Therefore, we aimed to automatically identify blueberry pollinating bee species based on characteristics of their buzzing sounds using DL algorithms.</p>
</sec>
<sec>
<title>Methods</title>
<p>We designed CNN models combined with Log Mel-Spectrogram representations and strong data augmentation and compared their performance at recognizing blueberry pollinating bee species with the current state-of-the-art models for automatic recognition of bee species.</p>
</sec>
<sec>
<title>Results and Discussion</title>
<p>We found that CNN models performed better at assigning bee buzzing sounds to their respective taxa than expected by chance. However, CNN models were highly dependent on acoustic data pre-training and data augmentation to outperform classical ML classifiers in recognizing bee buzzing sounds. Under these conditions, the CNN models could lead to automating the taxonomic recognition of flower-visiting bees of blueberry crops. However, there is still room to improve the performance of CNN models by focusing on recording samples for poorly represented bee species. Automatic acoustic recognition associated with the degree of efficiency of a bee species to pollinate a particular crop would result in a comprehensive and powerful tool for recognizing those that best pollinate and increase fruit yields.</p>
</sec>
</abstract>
<kwd-group>
<kwd>buzz-pollinated crops</kwd>
<kwd>ecosystem services</kwd>
<kwd>crop pollination</kwd>
<kwd>machine-learning</kwd>
<kwd>blueberry crops</kwd>
</kwd-group>
<contract-num rid="cn001">Fondecyt Iniciaci&#xf3;n en Investigaci&#xf3;n 11190013</contract-num>
<contract-sponsor id="cn001">Agencia Nacional de Investigaci&#xf3;n y Desarrollo<named-content content-type="fundref-id">10.13039/501100020884</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">Fondo de Innovaci&#xf3;n para la Competitividad<named-content content-type="fundref-id">10.13039/501100016014</named-content>
</contract-sponsor>
<contract-sponsor id="cn003">Agencia Nacional de Investigaci&#xf3;n y Desarrollo<named-content content-type="fundref-id">10.13039/501100020884</named-content>
</contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="4"/>
<equation-count count="6"/>
<ref-count count="79"/>
<page-count count="12"/>
<word-count count="7334"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<label>1</label>
<title>Introduction</title>
<p>Highbush blueberry (<italic>Vaccinium corymbosum</italic> L.: Ericaceae) requires insect-mediated pollination to enhance fruit quality (<xref ref-type="bibr" rid="B6">Brewer and Dobson, 1969</xref>; <xref ref-type="bibr" rid="B5">Benjamin and Winfree, 2014</xref>). The flow of pollen among flowers promoted by biotic vectors increases fruit set and berry size (<xref ref-type="bibr" rid="B22">Dogterom et&#xa0;al., 2000</xref>; <xref ref-type="bibr" rid="B44">Nicholson and Ricketts, 2019</xref>). However, the specialized morphology of blueberry flowers, characterized by the presence of poricidal anthers and narrow/bell-shaped corollas, limits pollen access to certain floral visitors (<xref ref-type="bibr" rid="B8">Buchmann, 1983</xref>; <xref ref-type="bibr" rid="B18">De Luca et&#xa0;al., 2013</xref>; <xref ref-type="bibr" rid="B16">Corbet and Huang, 2014</xref>; <xref ref-type="bibr" rid="B55">Russell et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B15">Cooley and Vallejo-Mar&#xed;n, 2021</xref>). To extract pollen efficiently, a floral visitor needs to vibrate a blueberry flower such that the vibrations are transmitted to the pollen within the anthers, stimulating it to leave <italic>via</italic> small openings. The vibrations produce a particular audible sound that gives the name to this phenomenon: buzz-pollination or sonication (<xref ref-type="bibr" rid="B8">Buchmann, 1983</xref>). Probably because of this, bees that perform floral sonication are among the most effective pollinators of blueberries (<xref ref-type="bibr" rid="B10">Cane et&#xa0;al., 1985</xref>; <xref ref-type="bibr" rid="B31">Javorek et&#xa0;al., 2002</xref>; <xref ref-type="bibr" rid="B44">Nicholson and Ricketts, 2019</xref>). In fact, only a subset of all visitors can actually pollinate (<xref ref-type="bibr" rid="B59">Schemske and Horvitz, 1984</xref>; <xref ref-type="bibr" rid="B33">Kandori, 2002</xref>). The quality of the pollination provided varies greatly and is partially related to the taxonomic identity of the flower visitors visitors (<xref ref-type="bibr" rid="B46">Nunes-Silva et&#xa0;al., 2013</xref>; <xref ref-type="bibr" rid="B58">Santos et&#xa0;al., 2014</xref>; <xref ref-type="bibr" rid="B61">Silva-Neto et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B72">Vin&#xed;cius-Silva et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B68">Toni et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B15">Cooley and Vallejo-Mar&#xed;n, 2021</xref>; <xref ref-type="bibr" rid="B17">Cort&#xe9;s-rivas et&#xa0;al., 2022</xref>). Consequently, the taxonomic identification of species becomes indispensable to distinguishing the most efficient pollinators of blueberry.</p>
<p>Nevertheless, traditional taxonomic identification of bees and other insects normally depends on microscopic morphological characteristics or specialized molecular biology methods, which require time, high and costly technology, and the active participation of experts in the decision-making process (<xref ref-type="bibr" rid="B32">Jinbo et&#xa0;al., 2011</xref>; <xref ref-type="bibr" rid="B28">Gradi&#x161;ek et&#xa0;al., 2017</xref>). Moreover, the huge number of bee species and other insects is a challenge for taxonomists. It is estimated that there are about 20, 000 beespecies worldwide (<xref ref-type="bibr" rid="B47">Orr et&#xa0;al., 2020</xref>) 58% of which, about 11, 600 species of 74 genera, are able to buzz-pollinate (<xref ref-type="bibr" rid="B11">Cardinal et&#xa0;al., 2018</xref>). Due to the limitations of traditional taxonomy, the development and implementation of new technologies that also fulfill taxonomic requirements are needed (<xref ref-type="bibr" rid="B25">Gaston and O&#x2019;Neill, 2004</xref>; <xref ref-type="bibr" rid="B38">Lewis and Basset, 2007</xref>).</p>
<p>To meet this need, the automatic classification of plants and animals based on images and sounds has been developed and tested over the last two decades (<xref ref-type="bibr" rid="B60">Schroder, 2002</xref>; <xref ref-type="bibr" rid="B70">Valliammal and Geethalakshmi, 2011</xref>; <xref ref-type="bibr" rid="B57">Santana et&#xa0;al., 2014</xref>; <xref ref-type="bibr" rid="B75">Yanikoglu et&#xa0;al., 2014</xref>), and is proving to be more practical than traditional investigations. For instance, the classification of bee species from wing images can achieve an accuracy higher than 98%, which is similar to or even higher than the classifications by human experts (<xref ref-type="bibr" rid="B51">Rebelo et&#xa0;al., 2021</xref>). Besides presenting a high accuracy, the automatic insect classification can be easily measured, tested, and replicated, relatively inexpensive and time and cost-efficient when compared to traditional manual classification (<xref ref-type="bibr" rid="B25">Gaston and O&#x2019;Neill, 2004</xref>; <xref ref-type="bibr" rid="B40">Lorenz et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B41">Martineau et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B51">Rebelo et&#xa0;al., 2021</xref>). Nonetheless, classification based on images is difficult due to complications derived from object size and orientation, image quality, and light and/or background conditions (<xref ref-type="bibr" rid="B25">Gaston and O&#x2019;Neill, 2004</xref>). On the other hand, sound is relatively easy to acquire and can, in principle, be picked up remotely and continuously (<xref ref-type="bibr" rid="B28">Gradi&#x161;ek et&#xa0;al., 2017</xref>). The automatic recognition of species based on Machine-Learning (ML), a widespread model of Artificial Intelligence, offers an automated approach for such classification tasks, and is a powerful tool for detecting and distinguishing vocal signals [e.g., (<xref ref-type="bibr" rid="B2">Acevedo et&#xa0;al., 2009</xref>; <xref ref-type="bibr" rid="B7">Briggs et&#xa0;al., 2013</xref>; <xref ref-type="bibr" rid="B30">Hershey et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B64">Stowell et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>)]. Recognizers can be used to process recordings of any acoustic wildlife species, including those of bee buzzing sounds (<xref ref-type="bibr" rid="B28">Gradi&#x161;ek et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B45">Nolasco et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B67">Terenzi et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B12">Cejrowski et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>). Despite the origin/purpose of buzzing sounds being completely different from that of animal vocal signals, the characteristics of buzzing sounds (frequency, amplitude, duration) are widely variable and may also differ between species and groups of bees (<xref ref-type="bibr" rid="B19">De Luca and Vallejo-Marin, 2013</xref>; <xref ref-type="bibr" rid="B54">Rosi-Denadai et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B51">Rebelo et&#xa0;al., 2021</xref>). Four studies addressed the problem of automatic bee species classification, dealing with twelve, two, four and fifteen classes, respectively (<xref ref-type="bibr" rid="B28">Gradi&#x161;ek et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B4">Arruda et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B34">Kawakita and Ichikawa, 2019</xref>; <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>). These studies indicated that ML algorithms can generate classifiers that are able to quickly recognize bee species based solely on their buzzing sounds. However, small data sets with few bee species and/or manual audio segmentation and noise attenuation were also reported interfering with ML performance and practical applicability. Moreover, classical ML algorithms (e.g., Random Forest, Support Vector Machines, and Logistic Regression) fed by spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC), a manually-designed summary of spectral information, represent the only method used for sound feature extraction. MFCCs can often lead to worse performance than the raw Mel spectral data from which they were derived (<xref ref-type="bibr" rid="B63">Stowell and Plumbley, 2014</xref>; <xref ref-type="bibr" rid="B69">Valletta et&#xa0;al., 2017</xref>). Further, the popularization of Deep Learning (DL), an emerging field of ML, has been outperforming classical ML, leading to significant advances in a wide range of bioacoustic tasks, including the recognition of animal vocalizations (<xref ref-type="bibr" rid="B74">Xie et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B79">Zor et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B43">Nanni et&#xa0;al., 2020</xref>).</p>
<p>Although buzzing sounds differ substantially from vocal signals both in terms of origin and functionality, automatic sound-based recognition with DL models, using multi-layered artificial neural networks, in particular convolutional neural networks (CNNs), should be especially relevant for recognizing blueberry pollinators. This may be possible because the vibrations required to efficiently extract pollen from flowers produce audible characteristic buzzing sounds that present differences among bee species (<xref ref-type="bibr" rid="B9">Burkart et&#xa0;al., 2011</xref>; <xref ref-type="bibr" rid="B28">Gradi&#x161;ek et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B34">Kawakita and Ichikawa, 2019</xref>; <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>). Thus, we aimed to apply DL models to automatically identify blueberry-pollinating bees based on the characteristics of their buzzing sounds. However, neural networks, as well as traditional ML algorithms, present some limitations. Both models usually require large amounts of training data to capture the natural variability in the data to be modeled. Several data augmentation methods allow simulating overlap between multiple sound events and the resulting occlusion effects in the spectrogram. Mixup data augmentation creates new training instances by mixing pairs of features and their corresponding targets based on a given mixing ratio (<xref ref-type="bibr" rid="B1">Abe&#xdf;er, 2020</xref>). Consequently, data augmentation can significantly enhance network performance. Thus, we also compared the performance of CNNs models combined with audio data augmentation and Mel-spectrogram with ML models at recognizing bee buzzing sounds. Due to the high efficiency and accuracy demonstrated by CNNs models in automatic sound classification in other audio domains, we expected that such models using Log Mel-Spectrogram representations and substantial data augmentation would obtain greater performance at recognizing bee species compared to classifications based on classic ML classifiers.</p>
</sec>
<sec id="s2" sec-type="materials|methods">
<label>2</label>
<title>Materials and methods</title>
<sec id="s2_1">
<label>2.1</label>
<title>Buzzing sound acquisition</title>
<p>The acoustic recording of bee buzzes was conducted in five highbush blueberry orchards (<italic>V. corymbosum</italic>) located in southern Chile (Maule and Los R&#xed;os Regions) between the months of September and November in <inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow>
<mml:mn>2020</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im2">
<mml:mrow>
<mml:mn>2021</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. The total area of cultivated blueberry, both organic and conventional farming, per orchard ranged <inline-formula>
<mml:math display="inline" id="im3">
<mml:mrow>
<mml:mn>3.2</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>141</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hectares. The most common growing cultivars were Legacy, Brigitta, Duke, and Elliot. Four of the five orchards were supplemented with colonies of managed exotic bees of <italic>Bombus terrestris</italic> and/or <italic>Apis mellifera</italic> (<xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>).</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Information for the studied highbush blueberry orchards where bee buzzing sounds were acquired between the months of September and November in 2020 and 2021.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Orchard</th>
<th valign="top" align="left">Locality</th>
<th valign="top" align="left">Latitude</th>
<th valign="top" align="left">Longitude</th>
<th valign="top" align="left">Farming</th>
<th valign="top" align="left">Area</th>
<th valign="top" align="left">Cultivars</th>
<th valign="top" align="left">Managed bees</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Agr&#xed;cola Aguas Negras</td>
<td valign="top" align="left">Paillaco</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im6">
<mml:mrow>
<mml:msup>
<mml:mn>4</mml:mn>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:msup>
<mml:mn>2</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>55.6</mml:mn>
<mml:msup>
<mml:mn>2</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im7">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>72</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>4</mml:mn>
<mml:msup>
<mml:mn>5</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>15.2</mml:mn>
<mml:msup>
<mml:mn>0</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>W</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">Conventional</td>
<td valign="top" align="left">28 ha</td>
<td valign="top" align="left">Brigitta, Legacy, Elliot, Draper, Duke</td>
<td valign="top" align="left">
<italic>Bombus terrestris</italic>,<italic>Apis mellifera</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Shine Liucura</td>
<td valign="top" align="left">Paillaco</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im8">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>40</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:msup>
<mml:mn>2</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>49.8</mml:mn>
<mml:msup>
<mml:mn>9</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im9">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>72</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>4</mml:mn>
<mml:msup>
<mml:mn>6</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>49.2</mml:mn>
<mml:msup>
<mml:mn>1</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>W</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">Organic</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im10">
<mml:mrow>
<mml:mn>8.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>ha</td>
<td valign="top" align="left">Brigitta, Bluecrop, Coville, Elliot, Legacy</td>
<td valign="top" align="left">
<italic>Bombus terrestris</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Agroberries Asque</td>
<td valign="top" align="left">Mariquina</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im11">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>39</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>3</mml:mn>
<mml:msup>
<mml:mn>3</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>59.</mml:mn>
<mml:msup>
<mml:mn>4</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im12">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>72</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>5</mml:mn>
<mml:msup>
<mml:mn>9</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>28.</mml:mn>
<mml:msup>
<mml:mn>4</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>W</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">Organic</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im13">
<mml:mrow>
<mml:mn>141</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>ha</td>
<td valign="top" align="left">Brigitta, Duke, Elliot, Legacy, Topshelf</td>
<td valign="top" align="left">
<italic>Apis mellifera</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Agroberries Cun Cun</td>
<td valign="top" align="left">Mariquina</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im14">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>39</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>3</mml:mn>
<mml:msup>
<mml:mn>3</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>44.</mml:mn>
<mml:msup>
<mml:mn>0</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im15">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>73</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>0</mml:mn>
<mml:msup>
<mml:mn>2</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>33.</mml:mn>
<mml:msup>
<mml:mn>8</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>W</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">Conventional</td>
<td valign="top" align="left">114 ha</td>
<td valign="top" align="left">Brigitta, Duke, Elliot, Legacy, Topshelf</td>
<td valign="top" align="left">
<italic>Apis mellifera</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Agr&#xed;cola Campos &#xc1;lvarez</td>
<td valign="top" align="left">Linares</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im16">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>35</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>5</mml:mn>
<mml:msup>
<mml:mn>5</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>45.</mml:mn>
<mml:msup>
<mml:mn>8</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im17">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>71</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
<mml:mn>2</mml:mn>
<mml:msup>
<mml:mn>9</mml:mn>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mn>37.</mml:mn>
<mml:msup>
<mml:mn>9</mml:mn>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mi>W</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">Conventional</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im18">
<mml:mrow>
<mml:mn>3.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>ha</td>
<td valign="top" align="left">Duke, Legacy</td>
<td valign="top" align="left">none</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Visual searches were made for foraging bees beginning at <inline-formula>
<mml:math display="inline" id="im19">
<mml:mrow>
<mml:mn>10</mml:mn>
<mml:mo>:</mml:mo>
<mml:mn>00</mml:mn>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and ending at <inline-formula>
<mml:math display="inline" id="im20">
<mml:mrow>
<mml:mn>18</mml:mn>
<mml:mo>:</mml:mo>
<mml:mn>30</mml:mn>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as bee activity declined. To record buzzing sounds, <inline-formula>
<mml:math display="inline" id="im21">
<mml:mrow>
<mml:mn>3</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> researchers constantly walked through the rows of blueberries hand-holding a recorder while searching for flower-visiting-bees. When a bee was observed approaching a flower, it was followed, holding a digital acoustic recorder (Zoom H4n Pro Handy Recorder) such that it was within <inline-formula>
<mml:math display="inline" id="im22">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> cm of the bee when it landed on the flower. The microphone head was pointed at the dorsal surface of the bee thorax. All bee individuals that could not be immediately identified were captured just after leaving the flower with an entomological net and placed in glass vials with ethyl acetate for taxonomic identification in the laboratory. As a consequence of that, we can assume the number of audio samples corresponds to the number of bee individuals. All sampled bee individuals were taxonomically identified at the lowest possible level by experts.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Acoustic pre-processing</title>
<p>We performed some data pre-processing before the training step in order to improve the performance of the ML models. The original sound recordings (.wav files) were manually classified and segments with bee buzzing sounds were selected. We categorized as <italic>sonication</italic> all the segments of buzzing sounds produced by bees vibrating blueberry flowers, and as <italic>flight</italic> the sounds produced by the flying displacement of the bees between flowers. Flight sounds and sonication buzzing could be easily distinguished on the recordings afterward by an experienced user since they present pronounced differences in acoustic characteristics; <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al. (2021)</xref> showed that both <italic>sonication</italic> and <italic>flight</italic> sounds contribute equally to the training of a bee species classifier. Thus, we used both sound types together in all trials, since flight and sonication together sum a higher number of audio samples and include bee species not capable of sonicating. Recording segments with no bee sounds were not selected but were kept for subsequent steps. The set of recordings contained <inline-formula>
<mml:math display="inline" id="im23">
<mml:mrow>
<mml:mn>518</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> audio samples (corresponding to <inline-formula>
<mml:math display="inline" id="im24">
<mml:mrow>
<mml:mn>518</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> bees individuals) lasting on average <inline-formula>
<mml:math display="inline" id="im25">
<mml:mn>2</mml:mn>
</mml:math>
</inline-formula> seconds, with <inline-formula>
<mml:math display="inline" id="im26">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>867</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> flight segments and <inline-formula>
<mml:math display="inline" id="im27">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>728</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> floral sonication segments (see <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>). We performed these analyses using Raven Lite software (Cornell Laboratory of Ornithology, Ithaca, New York).</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>Species richness and corresponding recording samples of flower-visiting bees of highbush blueberry cultivars in southern Chile in 2020 and 2021.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="left">Family</th>
<th valign="top" align="left">Species</th>
<th valign="top" align="left">N recordings</th>
<th valign="top" align="left">Flight segments</th>
<th valign="top" align="left">Sonication segments</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Apidae</td>
<td valign="top" align="left">
<italic>Apis mellifera</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im30">
<mml:mrow>
<mml:mn>29</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im31">
<mml:mrow>
<mml:mn>94</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im32">
<mml:mn>0</mml:mn>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Apidae</td>
<td valign="top" align="left">
<italic>Bombus dahlbomii</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im33">
<mml:mrow>
<mml:mn>77</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im34">
<mml:mrow>
<mml:mn>327</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im35">
<mml:mrow>
<mml:mn>77</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">Apidae</td>
<td valign="top" align="left">
<italic>Bombus ruderatus</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im36">
<mml:mrow>
<mml:mn>29</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im37">
<mml:mrow>
<mml:mn>150</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im38">
<mml:mrow>
<mml:mn>48</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Apidae</td>
<td valign="top" align="left">
<italic>Bombus terrestris</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im39">
<mml:mrow>
<mml:mn>88</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im40">
<mml:mrow>
<mml:mn>387</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im41">
<mml:mrow>
<mml:mn>468</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">Colletidae</td>
<td valign="top" align="left">
<italic>Cadeguala occidentalis</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im42">
<mml:mrow>
<mml:mn>103</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im43">
<mml:mrow>
<mml:mn>371</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im44">
<mml:mrow>
<mml:mn>696</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">Colletidae</td>
<td valign="top" align="left">
<italic>Cadeguala albopilosa</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im45">
<mml:mn>5</mml:mn>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im46">
<mml:mrow>
<mml:mn>12</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im47">
<mml:mrow>
<mml:mn>32</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">Halictidae</td>
<td valign="top" align="left">
<italic>Callistochlora chloris</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im48">
<mml:mn>8</mml:mn>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im49">
<mml:mrow>
<mml:mn>29</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im50">
<mml:mrow>
<mml:mn>13</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">Apidae</td>
<td valign="top" align="left">
<italic>Centris cineraria</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im51">
<mml:mrow>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im52">
<mml:mrow>
<mml:mn>179</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im53">
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">Colletidae</td>
<td valign="top" align="left">
<italic>Colletes cyanescens</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im54">
<mml:mrow>
<mml:mn>34</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im55">
<mml:mrow>
<mml:mn>78</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im56">
<mml:mrow>
<mml:mn>40</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">Colletidae</td>
<td valign="top" align="left">
<italic>Colletes nigritulus</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im57">
<mml:mrow>
<mml:mn>32</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im58">
<mml:mrow>
<mml:mn>60</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im59">
<mml:mrow>
<mml:mn>23</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">11</td>
<td valign="top" align="left">Halictidae</td>
<td valign="top" align="left">
<italic>Corynura</italic> sp.</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im60">
<mml:mrow>
<mml:mn>19</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im61">
<mml:mrow>
<mml:mn>28</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im62">
<mml:mrow>
<mml:mn>35</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">12</td>
<td valign="top" align="left">Colletidae</td>
<td valign="top" align="left">
<italic>Diphaglossa gayi</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im63">
<mml:mrow>
<mml:mn>15</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im64">
<mml:mrow>
<mml:mn>50</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im65">
<mml:mrow>
<mml:mn>37</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">13</td>
<td valign="top" align="left">Halictidae</td>
<td valign="top" align="left">
<italic>Lasioglossum</italic> sp.</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im66">
<mml:mrow>
<mml:mn>13</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im67">
<mml:mrow>
<mml:mn>13</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im68">
<mml:mrow>
<mml:mn>66</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">14</td>
<td valign="top" align="left">Apidae</td>
<td valign="top" align="left">
<italic>Manuelia postica</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im69">
<mml:mrow>
<mml:mn>12</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im70">
<mml:mrow>
<mml:mn>15</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im71">
<mml:mn>5</mml:mn>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">15</td>
<td valign="top" align="left">Halictidae</td>
<td valign="top" align="left">
<italic>Ruizantheda mutabilis</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im72">
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im73">
<mml:mrow>
<mml:mn>13</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im74">
<mml:mn>4</mml:mn>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">16</td>
<td valign="top" align="left">Halictidae</td>
<td valign="top" align="left">
<italic>Ruizantheda proxima</italic>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im75">
<mml:mrow>
<mml:mn>23</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im76">
<mml:mrow>
<mml:mn>61</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im77">
<mml:mrow>
<mml:mn>25</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The &#x201c;<inline-formula>
<mml:math display="inline" id="im78">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> recordings&#x201d; denotes the number of audio recordings sampled per bee species; the right columns present the total number of flight and sonication segments in the audio samples.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Audio feature extraction</title>
<p>Audio feature extraction techniques transform raw audio data generated by acoustic pre-processing into features that explicitly represent properties of the data that may be relevant for ML classification. We compared two audio feature extraction techniques separately, Log Mel-Spectrogram and MFCC. The Mel Spectrogram is a way to process audio such that various DL and ML algorithms can learn from the recorded sounds. The Mel-scale is a logarithmic transformation of the signal frequency. The Mel-Spectrogram demonstrates a compressed form of sound in the time-frequency domain. This nonlinear transformation constitutes the outcome of the Short Time Fourier Transform (STFT) after the application of Mel-filters (a bank of bandpass filters with bandwidths modeled after the Mel-scale). The conversion of the frequency in hertz (<inline-formula>
<mml:math display="inline" id="im79">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula>) to the Mel-scale is illustrated in Eq. 1.</p>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>2.595</mml:mn>
<mml:munder>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mn>700</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<sec id="s2_3_1">
<label>2.3.1</label>
<title>Data splitting</title>
<p>We partitioned the data set of audio samples into portions for cross-validation purposes. The data set was split into two equal-sized sets for training and testing in each replication, but unlike the work shown in (<xref ref-type="bibr" rid="B3">Alpaydm, 1999</xref>), the training set was separated into two pieces, with <inline-formula>
<mml:math display="inline" id="im80">
<mml:mrow>
<mml:mn>30</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> used for training and 20% used for validation. Each replication splits the data as follows: <inline-formula>
<mml:math display="inline" id="im81">
<mml:mrow>
<mml:mn>40</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for training, <inline-formula>
<mml:math display="inline" id="im82">
<mml:mrow>
<mml:mn>10</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for validation and <inline-formula>
<mml:math display="inline" id="im83">
<mml:mrow>
<mml:mn>50</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for testing, for a total of <inline-formula>
<mml:math display="inline" id="im84">
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> runs. Because each replication was created using a distinct seed, the distribution of data among them varies. We applied the Combined <inline-formula>
<mml:math display="inline" id="im85">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> Cross-validated F-Test (<xref ref-type="bibr" rid="B3">Alpaydm, 1999</xref>) a more reliable substitute for the <inline-formula>
<mml:math display="inline" id="im86">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> t-test (<xref ref-type="bibr" rid="B21">Dietterich, 1998</xref>) for comparing the performance of supervised classification learning algorithms. The combined <inline-formula>
<mml:math display="inline" id="im87">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> F-test reduces the drawbacks of the cross-validated t-test and has higher power and requires five replications of two-fold cross-validation.</p>
</sec>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>Machine-learning classification</title>
<p>In order to relate the performance of different ML classification techniques, we evaluated our bee buzzing sounds dataset using classical ML and DL classifiers.</p>
<sec id="s2_4_1">
<label>2.4.1</label>
<title>Data augmentation</title>
<p>By definition, CNNs benefit from large training data sets, since this increases their capability of recognizing the acoustical patterns of bees. On the other hand, small training sets tend to cause overfitting bias. However, our data set is highly unbalanced, implying that some classes (bee species) present a very low number of audio samples. To overcome overfitting, we used data augmentation for the data set destined to CNN classifications. Data augmentation tends to improve the performance of ML algorithms by generating additional data for the training set of the model (<xref ref-type="bibr" rid="B14">Chlap et&#xa0;al., 2021</xref>). We then applied three data augmentation techniques to augment data during the training set of CNNs: mixup (<xref ref-type="bibr" rid="B76">Zhang et&#xa0;al., 2017</xref>) SpecAugment (<xref ref-type="bibr" rid="B48">Park et&#xa0;al., 2019a</xref>) and randomly truncated technique.</p>
<p>The mixup is a simple method to generate training data (<xref ref-type="bibr" rid="B76">Zhang et&#xa0;al., 2017</xref>) by mixing audio samples of two different bee species (both the feature space and the labels). If <inline-formula>
<mml:math display="inline" id="im88">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im89">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are two different input samples (spectrograms in our case), and <inline-formula>
<mml:math display="inline" id="im90">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im91">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> their respective one-hot encoded labels, then the mixed sample and target are obtained by a simple convex combination:</p>
<disp-formula>
<mml:math display="block" id="M2">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>m</mml:mi>
</mml:msup>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula>
<mml:math display="block" id="M3">
<mml:mrow>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi>m</mml:mi>
</mml:msup>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im92">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula> is a scalar sampled from a symmetric Beta distribution at each mini-batch generation:</p>
<disp-formula>
<mml:math display="block" id="M4">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>&#x2243;</mml:mo>
<mml:mi>B</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im93">
<mml:mi>&#x3b1;</mml:mi>
</mml:math>
</inline-formula> is a real-valued hyperparameter for tune.</p>
<p>The SpecAugment (<xref ref-type="bibr" rid="B50">Park et&#xa0;al., 2019b</xref>) is an occlusion augmentation technique, applied to Log Mel-Spectrograms. SpecAugment is applied at the mini-batch level, meaning that the same random strides are masked in all the samples of a given mini-batch. Frequency masking is applied such that <inline-formula>
<mml:math display="inline" id="im94">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> consecutive Mel frequency bins <inline-formula>
<mml:math display="inline" id="im95">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> are masked, where <inline-formula>
<mml:math display="inline" id="im96">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> is chosen from a uniform distribution from <inline-formula>
<mml:math display="inline" id="im97">
<mml:mn>0</mml:mn>
</mml:math>
</inline-formula> to a frequency mask parameter <inline-formula>
<mml:math display="inline" id="im98">
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup> </mml:math>
</inline-formula>, and <inline-formula>
<mml:math display="inline" id="im99">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is chosen from <inline-formula>
<mml:math display="inline" id="im100">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula>
<mml:math display="inline" id="im101">
<mml:mi>F</mml:mi>
</mml:math>
</inline-formula> is the number of Mel frequency bins (<xref ref-type="bibr" rid="B48">Park et&#xa0;al., 2019a</xref>). SpecAugment was originally proposed in automatic speech recognition, but it has been rapidly used with success for other audio-related tasks, such as audio tagging (<xref ref-type="bibr" rid="B48">Park et&#xa0;al., 2019a</xref>).</p>
<p>Lastly, randomly truncated (RT) is a technique that consists of sampling <italic>N</italic> seconds of an audio sampling considering random parts of segments that contain buzzing sounds, instead of taking a fixed segment in each forward pass of the DL model.</p>
</sec>
<sec id="s2_4_2">
<label>2.4.2</label>
<title>Classical machine-learning algorithms</title>
<p>For the classical ML approach, we chose some of the most commonly used and most successful ML classifiers at recognizing the taxonomic identity of bees by their buzzing sounds (<xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>): Logistic Regression, Support Vector Machines, Random Forest, Decision Trees and a classifier ensemble. Ensemble learning is a general meta approach to ML that seeks better predictive performance by combining the predictions of multiple models (for more details see <xref ref-type="bibr" rid="B56">Sagi and Rokach (2018)</xref>). Ensemble methods train multiple ML classifiers to solve the same problem and elect the class by taking a (weighted) vote of their predictions (<xref ref-type="bibr" rid="B36">Kuncheva, 2004</xref>).</p>
</sec>
<sec id="s2_4_3">
<label>2.4.3</label>
<title>Deep Learning algorithms</title>
<p>Unlike classical ML, Deep Learning (DL), especially CNNs, allows computational models that are composed of several processing layers to learn representations of data with multiple levels of abstraction. We chose two CNNs classifiers that have reached high performance in other audio domains: EfficientNet V2 and Pre-trained Audio Neural Networks (PANNs).</p>
<p>EfficientNet is a family of models that are optimized for FLOPs<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref> and parameter efficiency (<xref ref-type="bibr" rid="B66">Tan and Le, 2019</xref>) and has shown good performance in other audio domains (<xref ref-type="bibr" rid="B27">Gong et&#xa0;al., 2021</xref>). It leverages neural architecture search to search for the baseline, named EfficientNet-B0, which is scaled up with a compound scaling strategy to obtain the family of models B1-B7. The EfficientNet V2 family is an improvement and outperforms previous models in both training speed and parameter efficiency. In this work, we used version Small of the model EfficientNet V2 family, without pre-training. The model was pre-trained with the ImageNet dataset, instead (<xref ref-type="bibr" rid="B20">Deng et&#xa0;al., 2009</xref>). ImageNet pre-trained models have been successfully used to boost the performance of CNNs models in audio classification tasks in recent years (<xref ref-type="bibr" rid="B29">Gwardys and Grzywczak, 2014</xref>; <xref ref-type="bibr" rid="B42">M&#xfc;ller et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B49">Palanisamy et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B78">Zhong et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B27">Gong et&#xa0;al., 2021</xref>).</p>
<p>PANNs is a CNN model trained on Log Mel-Spectrogram representations of AudioSet recordings (<xref ref-type="bibr" rid="B26">Gemmeke et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B35">Kong et&#xa0;al., 2020</xref>). AudioSet is a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Using a carefully structured hierarchical ontology of <inline-formula>
<mml:math display="inline" id="im102">
<mml:mrow>
<mml:mn>632</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> audio classes guided by the literature and manual curation, data from human labelers were collected to probe the presence of specific audio classes in <inline-formula>
<mml:math display="inline" id="im103">
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> second segments of YouTube videos. PANNs architecture can be transferred to a wide range of audio pattern recognition tasks, being useful in scenarios like we have, where the total amount of data available for training is scarce.</p>
</sec>
</sec>
<sec id="s2_5">
<label>2.5</label>
<title>Evaluation metrics</title>
<p>We used the following metrics to evaluate the performance of classifications generated by classifiers: Accuracy (Acc), Macro-Precision (MacPrec), Macro-Recall (MacRec) and Macro-F1 (MacF1). These metrics are determined by the classification output that comes from the confusion matrix. In this matrix, diagonal elements show the object similar to the actual label whereas off diagonals tell the misclassification information of the model.</p>
<p>Let <inline-formula>
<mml:math display="inline" id="im104">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> be a class from the set of classes <inline-formula>
<mml:math display="inline" id="im105">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>. Let <inline-formula>
<mml:math display="inline" id="im106">
<mml:mi mathvariant="script">T</mml:mi>
</mml:math>
</inline-formula> be test set and let <inline-formula>
<mml:math display="inline" id="im107">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> be a classifier, such that <inline-formula>
<mml:math display="inline" id="im108">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula>
<mml:math display="inline" id="im109">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula> is an element of the test set <inline-formula>
<mml:math display="inline" id="im110">
<mml:mi mathvariant="script">T</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im111">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is a label corresponding to a class in <inline-formula>
<mml:math display="inline" id="im112">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> assigned to <inline-formula>
<mml:math display="inline" id="im113">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula> by <inline-formula>
<mml:math display="inline" id="im114">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula>. Let <inline-formula>
<mml:math display="inline" id="im115">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be the ground truth class label of <inline-formula>
<mml:math display="inline" id="im116">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>. In regard to the <inline-formula>
<mml:math display="inline" id="im117">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> classifier, we define:</p>
<list list-type="bullet">
<list-item>
<p>True Positives of class <inline-formula>
<mml:math display="inline" id="im118">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, denoted by <inline-formula>
<mml:math display="inline" id="im119">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, as the number of elements in <inline-formula>
<mml:math display="inline" id="im120">
<mml:mi mathvariant="script">T</mml:mi>
</mml:math>
</inline-formula> correctly labeled with class <inline-formula>
<mml:math display="inline" id="im121">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> by <inline-formula>
<mml:math display="inline" id="im122">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula>, i.e., <inline-formula>
<mml:math display="inline" id="im123">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">T</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>}</mml:mo>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>False Positives of class <inline-formula>
<mml:math display="inline" id="im124">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, denoted by <inline-formula>
<mml:math display="inline" id="im125">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, as the number of elements in <inline-formula>
<mml:math display="inline" id="im126">
<mml:mi mathvariant="script">T</mml:mi>
</mml:math>
</inline-formula> that were wrongly classified by <inline-formula>
<mml:math display="inline" id="im127">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> as belonging to class <inline-formula>
<mml:math display="inline" id="im128">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>. Formally, <inline-formula>
<mml:math display="inline" id="im129">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mo>|</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">T</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>}</mml:mo>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>False Negatives of class <inline-formula>
<mml:math display="inline" id="im130">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, denoted by <inline-formula>
<mml:math display="inline" id="im131">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, as the number of elements in <inline-formula>
<mml:math display="inline" id="im132">
<mml:mi mathvariant="script">T</mml:mi>
</mml:math>
</inline-formula> belonging to class <inline-formula>
<mml:math display="inline" id="im133">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> but classified by <inline-formula>
<mml:math display="inline" id="im134">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> with a label different from <inline-formula>
<mml:math display="inline" id="im135">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, that is, <inline-formula>
<mml:math display="inline" id="im136">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">T</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>}</mml:mo>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
<p>The above numbers are used to define traditional effectiveness measures of classifiers. These measures are: Precision, Recall and F1-score [for more detail see <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al. (2021)</xref>].</p>
<p>We based performance mostly on the F1-score since classes were unbalanced and Accuracy tends to underestimate classes with a smaller number of samples in relation to those with a larger number (<xref ref-type="bibr" rid="B62">Steiniger et&#xa0;al., 2020</xref>). The <inline-formula>
<mml:math display="inline" id="im137">
<mml:mrow>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> measure is a combination of the precision and recall measures and is defined by Eq. 2.</p>
<disp-formula>
<label>(2)</label>
<mml:math display="block" id="M5">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>When comparing the performance of classifiers generated from distinct learning methods, it is common to use a global measure. A global measure aims at resuming the performance of the classifier over all classes in the test set. In this work we use the following global measures to compare the results of the classifiers we used: Accuracy (<inline-formula>
<mml:math display="inline" id="im138">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>) (which is equivalent to Micro-F1), Macro-Precision (<inline-formula>
<mml:math display="inline" id="im139">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>), Macro-Recall (<inline-formula>
<mml:math display="inline" id="im140">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and Macro-F1 (<inline-formula>
<mml:math display="inline" id="im141">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>) [for more detail see <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al. (2021)</xref>]. The Macro measures are basically the average of the corresponding metric.</p>
<disp-formula>
<label>(3)</label>
<mml:math display="block" id="M6">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</sec>
<sec id="s2_6">
<label>2.6</label>
<title>Baselines establishment</title>
<p>The majority baseline was used to compare the performance metrics of CNNs recognizers. This baseline consists in assigning all audio samples to the majority class, that is, the bee species with more audio samples: <italic>Cadeguala occidentalis</italic> and <italic>Bombus terestris</italic>. Additionally, we assigned the best ML algorithm (based on the highest Macro F1-score) as an ML baseline to compare its performance with those of CNNs. We compared the performance metrics of each CNN classifier with those from the two baselines (majority baseline and best ML classifier) using the combined <inline-formula>
<mml:math display="inline" id="im142">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> cross-validated F-test (detailed in &#x201c;Data splitting&#x201d; section). We assumed a significance level of <inline-formula>
<mml:math display="inline" id="im143">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. If the <inline-formula>
<mml:math display="inline" id="im144">
<mml:mi>p</mml:mi>
</mml:math>
</inline-formula>-value was smaller than <inline-formula>
<mml:math display="inline" id="im145">
<mml:mi>&#x3b1;</mml:mi>
</mml:math>
</inline-formula>, we rejected the null hypothesis and accepted that there is a significant difference between a pair of models.</p>
</sec>
</sec>
<sec id="s3" sec-type="results">
<label>3</label>
<title>Results</title>
<sec id="s3_1">
<label>3.1</label>
<title>Characteristics of buzzing sounds</title>
<p>During <inline-formula>
<mml:math display="inline" id="im146">
<mml:mrow>
<mml:mn>990</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours of sampling effort distributed among <inline-formula>
<mml:math display="inline" id="im147">
<mml:mrow>
<mml:mn>554</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> non-consecutive days of <inline-formula>
<mml:math display="inline" id="im148">
<mml:mrow>
<mml:mn>2020</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im149">
<mml:mrow>
<mml:mn>2021</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, we recorded <inline-formula>
<mml:math display="inline" id="im150">
<mml:mrow>
<mml:mn>518</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> audio samples of <inline-formula>
<mml:math display="inline" id="im151">
<mml:mrow>
<mml:mn>16</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> bee species visiting flowers of highbush blueberry cultivars in five orchards of southern Chile (see <xref ref-type="table" rid="T1">
<bold>Tables&#xa0;1</bold>
</xref>, <xref ref-type="table" rid="T2">
<bold>2</bold>
</xref>); most, <inline-formula>
<mml:math display="inline" id="im152">
<mml:mrow>
<mml:mn>13</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> species, were native Chilean bees and three were exotics. In the set of <inline-formula>
<mml:math display="inline" id="im153">
<mml:mrow>
<mml:mn>518</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> audio samples, we identified <inline-formula>
<mml:math display="inline" id="im154">
<mml:mrow>
<mml:mn>3</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>595</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> buzzing-sound segments, <inline-formula>
<mml:math display="inline" id="im155">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>728</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> were of sonication and <inline-formula>
<mml:math display="inline" id="im156">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>867</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> of flight (see <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>). The distribution of samples per bee species was highly unbalanced and varied from five (<italic>Cadeguala albopilosa</italic> to <inline-formula>
<mml:math display="inline" id="im157">
<mml:mrow>
<mml:mn>103</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> (<italic>Cadeguala occidentalis</italic>). The length of recordings ranged from <inline-formula>
<mml:math display="inline" id="im158">
<mml:mn>5</mml:mn>
</mml:math>
</inline-formula> seconds to over one minute.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Performance of classical machine learning algorithms</title>
<p>The performances (based on macro-F1 score) of the classical ML algorithms at recognizing flower-visiting bees of blueberry crops were low, ranging between <inline-formula>
<mml:math display="inline" id="im159">
<mml:mrow>
<mml:mn>17.24</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im160">
<mml:mrow>
<mml:mn>34.97</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. However, the performances of the classical ML classifiers at recognizing bee species visiting blueberry crops depended on the algorithm employed. Support-Vector Machines (SVM) reached the highest Macro-F1 among the classical ML classifiers (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>), correctly predicting most audio samples of the majority classes (above <inline-formula>
<mml:math display="inline" id="im164">
<mml:mrow>
<mml:mn>50</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>): <inline-formula>
<mml:math display="inline" id="im165">
<mml:mrow>
<mml:mn>61.3</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Apis mellifera</italic>, <inline-formula>
<mml:math display="inline" id="im166">
<mml:mrow>
<mml:mn>67.3</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Bombus dahlbomii</italic>, <inline-formula>
<mml:math display="inline" id="im167">
<mml:mrow>
<mml:mn>84.6</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Cadeguala occidentalis</italic> and <inline-formula>
<mml:math display="inline" id="im168">
<mml:mrow>
<mml:mn>69.8</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Bombus terrestris</italic> (see <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1</bold>
</xref>). However, the SVM failed to recognize most audio samples of minority classes (below <inline-formula>
<mml:math display="inline" id="im169">
<mml:mrow>
<mml:mn>50</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>): <italic>Manuelia postica</italic> <inline-formula>
<mml:math display="inline" id="im170">
<mml:mrow>
<mml:mn>28.6</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>Ruizantheda mutabilis</italic> <inline-formula>
<mml:math display="inline" id="im171">
<mml:mrow>
<mml:mn>19.6</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, and <italic>Ruizantheda proxima</italic> <inline-formula>
<mml:math display="inline" id="im172">
<mml:mrow>
<mml:mn>34.2</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> (see <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1</bold>
</xref>).</p>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>Average predictive performance of different classical Machine-Learning algorithms combined with different audio feature extraction techniques (MFCC and Log Mel-Spectrogram) to recognize bee species based on buzzing sounds recorded during visits to flowers of blueberry cultivars in southern Chile.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" colspan="3" align="center">Flight + Sonication</th>
</tr>
<tr>
<th valign="top" align="center">Algorithms</th>
<th valign="top" align="center">MacF1 (%)</th>
<th valign="top" align="center">MacF1 (%)</th>
</tr>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="center">MFCC</th>
<th valign="top" align="center">Log Mel-Spectrogram</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">LR</td>
<td valign="top" align="center">32.52 ( &#xb1; 1.86)<italic>
<sup>a</sup>
</italic>
</td>
<td valign="top" align="center">28.60 ( &#xb1; 1.07)<italic>
<sup>a,b</sup>
</italic>
</td>
</tr>
<tr>
<td valign="top" align="center">
<bold>SVM</bold>
</td>
<td valign="top" align="center">
<bold>34.97 ( &#xb1; 1.52)<italic>
<sup>a</sup>
</italic>
</bold>
</td>
<td valign="top" align="center">
<bold>33.11 ( &#xb1; 1.65)<italic>
<sup>a</sup>
</italic>
</bold>
</td>
</tr>
<tr>
<td valign="top" align="center">RF</td>
<td valign="top" align="center">24.79 ( &#xb1; 0.46)<italic>
<sup>b</sup>
</italic>
</td>
<td valign="top" align="center">23.66 ( &#xb1; 1.73)<italic>
<sup>b,c</sup>
</italic>
</td>
</tr>
<tr>
<td valign="top" align="center">DTree</td>
<td valign="top" align="center">18.88 ( &#xb1; 2.88)<italic>
<sup>c,d</sup>
</italic>
</td>
<td valign="top" align="center">17.24 ( &#xb1; 2.02)<italic>
<sup>c,d</sup>
</italic>
</td>
</tr>
<tr>
<td valign="top" align="center">Ensemble</td>
<td valign="top" align="center">26.43( &#xb1; 0.73)<italic>
<sup>c</sup>
</italic>
</td>
<td valign="top" align="center">21.37 ( &#xb1; 0.66)<italic>
<sup>c</sup>
</italic>
</td>
</tr>
<tr>
<td valign="top" align="center">Mean ( &#xb1; SD)</td>
<td valign="top" align="center">27.51 ( &#xb1; 5.72)</td>
<td valign="top" align="center">24.79 ( &#xb1; 5.54)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The performances of the classical ML algorithms were measured by Macro-F1 score (MacF1) (<inline-formula>
<mml:math display="inline" id="im161">
<mml:mo>&#xb1;</mml:mo>
</mml:math>
</inline-formula> standard deviation). Bold numbers represent the best results per evaluation metric within audio feature extraction technique. Different superscript letters denote significant differences in F1-score among algorithms (<inline-formula>
<mml:math display="inline" id="im162">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0.05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im163">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> Combined F test).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>Confusion matrix showing the log-transformed number of audio segments correctly assigned to respective bee identity (diagonal elements) versus those erroneously assigned (non-diagonal elements), by the ML classifiers. SVM fed by MFCC achieved the best performance among the classical ML algorithms. Cell color represents the corresponding number (log-transformed) of audio segments predicted in a given cell, ranging from gray (zero predicted audio segments) to dark blue.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1081050-g001.tif"/>
</fig>
<p>On the other hand, the audio feature extraction technique had little effect on the performances of ML algorithms, ranging from <inline-formula>
<mml:math display="inline" id="im173">
<mml:mrow>
<mml:mn>18.88</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math display="inline" id="im174">
<mml:mrow>
<mml:mn>34.97</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with MFCC and from <inline-formula>
<mml:math display="inline" id="im175">
<mml:mrow>
<mml:mn>17.24</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math display="inline" id="im176">
<mml:mrow>
<mml:mn>33.11</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with Log Mel-Spectrogram. The ML algorithms presented a slightly higher performance (based on Macro-F1 score) when fed by MFCC than when fed by Log Mel-Spectrogram (<xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>).</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Performance of the Deep Learning classifiers</title>
<p>Both of the tested CNNs (EfficientNet V2 Small and PANNs) reached higher performance in recognizing buzzing bee sounds than the majority baseline (assigning all the audio samples to the majority class), regardless of whether they were tested unaccompanied or combined with pre-training and/or audio data augmentation or of sampling technique (see <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>). However, without data pre-processing (audio data augmentation, sampling technique, or pre-training) the CNNs did not present an evident higher performance (based on Macro-F1 score; <inline-formula>
<mml:math display="inline" id="im199">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0.05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, combined <inline-formula>
<mml:math display="inline" id="im200">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> F-test) in relation to the best classical ML classifier (SVM) (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>). EfficientNet V2 Small overperformed SVM only when it was combined with some audio data augmentation and/or pre-training (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>). However, PANNs without pre-training was capable of overperforming SVM, though data pre-processing also boosted its Macro-F1 score (see <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>; <xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>).</p>
<table-wrap id="T4" position="float">
<label>Table&#xa0;4</label>
<caption>
<p>Average predictive performance of Deep Learning models combined with an audio feature extraction technique (Log Mel-Spectrogram) to recognize bee species based on buzzing sounds recorded during visits to flowers of blueberry cultivars in southern Chile.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Methods</th>
<th valign="top" align="left">MacF1 (%)</th>
<th valign="top" align="left">MacF1 (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" style="background-color:#D3D3D3"/>
<td valign="top" align="left" style="background-color:#D3D3D3">Without Pre-training</td>
<td valign="top" align="left" style="background-color:#D3D3D3">With Pre-training</td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small</td>
<td valign="top" align="left">25.08 ( &#xb1; 5.25)</td>
<td valign="top" align="left">22.70 ( &#xb1; 6.04)</td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small + Mixup</td>
<td valign="top" align="left">31.91 ( &#xb1; 5.32)</td>
<td valign="top" align="left">43.39 ( &#xb1; 3.03) <bold><sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small + SpecAugment</td>
<td valign="top" align="left">20.54 ( &#xb1; 4.61)</td>
<td valign="top" align="left">31.33 ( &#xb1; 2.80)</td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small + RT</td>
<td valign="top" align="left">37.12 ( &#xb1; 5.62)</td>
<td valign="top" align="left">47.39 ( &#xb1; 4.80) <bold><sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small + Mixup + SpecAugment</td>
<td valign="top" align="left">14.74 ( &#xb1; 4.14)</td>
<td valign="top" align="left">39.32 ( &#xb1; 1.74) <bold><sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">
<bold>EfficientNet V2 Small + Mixup + RT</bold>
</td>
<td valign="top" align="left">
<bold>47.55 ( &#xb1; 9.27)</bold>
</td>
<td valign="top" align="left">
<bold>58.04 ( &#xb1; 2.47)<sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small + SpecAugment + RT</td>
<td valign="top" align="left">20.69 ( &#xb1; 4.88)</td>
<td valign="top" align="left">41.59 ( &#xb1; 4.76) <bold><sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">EfficientNet V2 Small + Mixup + SpecAugment + RT</td>
<td valign="top" align="left">16.63 ( &#xb1; 4.96)</td>
<td valign="top" align="left">48.48 ( &#xb1; 2.11) <bold><sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">
<bold>Mean ( &#xb1; SD)</bold>
</td>
<td valign="top" align="left">26.78 ( &#xb1; 10.55)</td>
<td valign="top" align="left">39.19 ( &#xb1; 7.98)</td>
</tr>
<tr>
<td valign="top" align="left">PANNs</td>
<td valign="top" align="left">42.66 ( &#xb1; 6.20) <bold><sup>**</sup></bold></td>
<td valign="top" align="left">35.25 ( &#xb1; 4.14)</td>
</tr>
<tr>
<td valign="top" align="left">PANNs + Mixup</td>
<td valign="top" align="left">52.50 ( &#xb1; 2.36) <bold><sup>**</sup></bold></td>
<td valign="top" align="left">51.95 ( &#xb1; 1.64) <bold><sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">PANNs + SpecAugment</td>
<td valign="top" align="left">52.95 ( &#xb1; 1.84) <bold><sup>**</sup></bold></td>
<td valign="top" align="left">44.85 ( &#xb1; 5.63)</td>
</tr>
<tr>
<td valign="top" align="left">PANNs + RT</td>
<td valign="top" align="left">46.58 ( &#xb1; 4.68) <bold><sup>**</sup></bold></td>
<td valign="top" align="left">42.55 ( &#xb1; 5.47)</td>
</tr>
<tr>
<td valign="top" align="left">
<bold>PANNs + Mixup + SpecAugment</bold>
</td>
<td valign="top" align="left">43.11 ( &#xb1; 3.62) <bold><sup>**</sup></bold></td>
<td valign="top" align="left">
<bold>56.96 ( &#xb1; 2.30)<sup>**</sup></bold></td>
</tr>
<tr>
<td valign="top" align="left">PANNs + Mixup + RT</td>
<td valign="top" align="left">50.07 ( &#xb1; 1.92)<sup>**</sup></td>
<td valign="top" align="left">52.33 ( &#xb1; 2.71)<sup>**</sup></td>
</tr>
<tr>
<td valign="top" align="left">
<bold>PANNs + SpecAugment + RT</bold>
</td>
<td valign="top" align="left">
<bold>55.00 ( &#xb1; 3.81)</bold><sup>**</sup></td>
<td valign="top" align="left">52.18 ( &#xb1; 3.58)<sup>**</sup></td>
</tr>
<tr>
<td valign="top" align="left">PANNs + Mixup + SpecAugment + RT</td>
<td valign="top" align="left">35.95 ( &#xb1; 3.40)</td>
<td valign="top" align="left">53.33 ( &#xb1; 2.80)<sup>**</sup></td>
</tr>
<tr>
<td valign="top" align="left">
<bold>Mean ( &#xb1; SD)</bold>
</td>
<td valign="top" align="left">47.35 ( &#xb1; 6.06)</td>
<td valign="top" align="left">48.67 ( &#xb1; 6.69)</td>
</tr>
<tr>
<th valign="top" align="center" colspan="3">Baselines</th>
</tr>
<tr>
<td valign="top" align="left">
<bold>Methods</bold>
</td>
<td valign="top" align="left" colspan="2">
<bold>MacF1 (%)</bold>
</td>
</tr>
<tr>
<td valign="top" align="left">Majority Class</td>
<td valign="top" align="left" colspan="2">2.98 ( &#xb1; 0.00)</td>
</tr>
<tr>
<td valign="top" align="left">SVM</td>
<td valign="top" align="left" colspan="2">34.97 ( &#xb1; 1.52)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The performance of the CNNs algorithms was measured by average Macro-F1 score (MacF1) (mean <inline-formula>
<mml:math display="inline" id="im195">
<mml:mo>&#xb1;</mml:mo>
</mml:math>
</inline-formula> standard deviation). Bold numbers represent the best results per evaluation metric within buzz-sound. <inline-formula>
<mml:math display="inline" id="im196">
<mml:mrow>
<mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>*</mml:mo>
<mml:mo>*</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> denotes that the performance of the algorithm is higher than the baselines (based on MacF1 score; <inline-formula>
<mml:math display="inline" id="im197">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0.05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im198">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> Combined F test). RT, Randomly Truncated Technique; PANNs, Pretrained Audio Neural Networks.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>Violin plots representing the performance of the bests classical ML (SVM) and DL (EfficientNet V2 Small and PANNs) models combined with different pre-processing techniques (sound feature extraction, pre-training, and/or data augmentation) at recognizing bee species based on buzzing sounds recorded during visits to flowers of blueberry cultivars in southern Chile. Classifier performance was based on Macro-F1 score (MacF1); Each dot represents the F1 score achieved by an independent model run (<inline-formula>
<mml:math display="inline" id="im201">
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> runs per model, 120 epochs). Note the effect of pre-training which increased the performance of DL classifiers while reducing F1-score scattering.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1081050-g002.tif"/>
</fig>
<p>Accordingly, pre-training increased the performance of CNNs at acoustic recognition of bee taxa (<xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>) by reducing the variability of F1-scores reached per model run (see <xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>). The average performances of EfficientNet V2 Small and PANNs models were higher with pre-training (see <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>): The Macro-F1 score of EfficientNet V2 Small ranged from <inline-formula>
<mml:math display="inline" id="im202">
<mml:mrow>
<mml:mn>14.74</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.14</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math display="inline" id="im203">
<mml:mrow>
<mml:mn>47.55</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>9.27</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> without pre-training and from <inline-formula>
<mml:math display="inline" id="im204">
<mml:mrow>
<mml:mn>22.70</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.04</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math display="inline" id="im205">
<mml:mrow>
<mml:mn>58.04</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.47</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with pre-training; for PANNs they ranged from <inline-formula>
<mml:math display="inline" id="im206">
<mml:mrow>
<mml:mn>35.95</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.40</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math display="inline" id="im207">
<mml:mrow>
<mml:mn>55.00</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.81</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> without pre-training and from <inline-formula>
<mml:math display="inline" id="im208">
<mml:mrow>
<mml:mn>35.25</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.14</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math display="inline" id="im209">
<mml:mrow>
<mml:mn>56.96</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.30</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with pre-training.</p>
<p>Despite the better recognition of audio samples of the majority classes by PANNs, EfficientNet V2 Small was better at hitting the samples of minority classes. Also, EfficientNet V2 Small with Mixup RT with pre-training correctly predicted the most audio samples of the majority classes (above <inline-formula>
<mml:math display="inline" id="im210">
<mml:mrow>
<mml:mn>50</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>): <inline-formula>
<mml:math display="inline" id="im211">
<mml:mrow>
<mml:mn>61.3</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Apis mellifera</italic>, <inline-formula>
<mml:math display="inline" id="im212">
<mml:mrow>
<mml:mn>67.3</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Bombus dahlbomii</italic>, <inline-formula>
<mml:math display="inline" id="im213">
<mml:mrow>
<mml:mn>84.6</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Cadeguala occidentalis</italic> and <inline-formula>
<mml:math display="inline" id="im214">
<mml:mrow>
<mml:mn>69.8</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>Bombus terrestris</italic> (see <xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>). On the other hand, EfficientNetV2 Small failed to recognize most audio samples of lower represented classes (below <inline-formula>
<mml:math display="inline" id="im215">
<mml:mrow>
<mml:mn>50</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>): <italic>Manuelia postica</italic> <inline-formula>
<mml:math display="inline" id="im216">
<mml:mrow>
<mml:mn>19</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>Ruizantheda mutabilis</italic> <inline-formula>
<mml:math display="inline" id="im217">
<mml:mrow>
<mml:mn>14.1</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>Ruizantheda proxima</italic> <inline-formula>
<mml:math display="inline" id="im218">
<mml:mrow>
<mml:mn>39.9</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>(see <xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>).</p>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>Confusion matrices showing the log-transformed number of audio samples correctly assigned to respective bee identity (diagonal elements) versus those erroneously assigned (out-of-diagonal elements) by the DL classifiers. <bold>(A)</bold> EfficientNet V2 Smallcombined with Mixup and RT data augmentation techniques with pre-training achieved the best performance at acoustic recognition of bees visiting flowers of blueberry crops, followed by <bold>(B)</bold> PANNs combined with Mixup and SpecAugment with pre-training. Cellcolor represents the corresponding number (log-transformed) of audio segments predicted in a given cell, ranging from gray (zero predicted audio segments) to dark blue.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1081050-g003.tif"/>
</fig>
<p>Regardless of the differences above, the overall performance for acoustic recognition of bee species did not vary significantly among the CNNs architectures employed (EfficientNet V2 Small and PANNs). Despite that, EfficientNet V2 Small combined with Mixup, audio Randomly Truncated (RT), and pre-training overreached PANNs and achieved the highest Macro F1-score of <inline-formula>
<mml:math display="inline" id="im219">
<mml:mrow>
<mml:mn>58.04</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.47</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> among all CNNs models and baselines tested (see <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>).</p>
</sec>
</sec>
<sec id="s4" sec-type="discussion">
<label>4</label>
<title>Discussion</title>
<p>The studied CNNs can contribute towards automation of blueberry pollinating bee species recognition. These popular DL models reached better performances at assigning bee buzzing sounds to their respective taxa than expected by chance. However, CNNs were highly dependent on acoustic data pre-training and data augmentation to outperform classical ML classifiers at recognizing bee buzzing sounds.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Sound feature extraction type did not influence classical ML performance</title>
<p>Although the Mel-frequency cepstral coefficient (MFCC) can often lead to worse performance than the raw Mel spectral data (<xref ref-type="bibr" rid="B63">Stowell and Plumbley, 2014</xref>; <xref ref-type="bibr" rid="B69">Valletta et&#xa0;al., 2017</xref>), our results indicated no difference between employing MFCC and Log Mel-Spectrogram on the performance of classical ML algorithms at assigning bee buzzing sounds to the species to which they belong. MFCC is a more time-consuming method than Log Mel-Spectrogram since MFCC is a manually-designed summary of spectral information whereas Log Mel-Spectrogram involves a much simpler representation of a raw spectrogram. Despite that, MFCC has some advantages, including providing a substantially dimension-reduced summary of spectral data, which is positive for use in classical ML systems since they cannot cope with high-dimensional data (<xref ref-type="bibr" rid="B63">Stowell and Plumbley, 2014</xref>). However, dimension reduction necessarily implies a loss of information that could be made available for later processing and the consequent risk of discarding information that a classifier could have used. Despite MFCC being originally designed to represent human speech [24,40], which differs perceptually and from the production of bee buzzing, it can be applied to the acoustic bee recognition task with classical ML algorithms. However, our results indicated that the Mel-spectrogram, not only MFCC as previously speech (<xref ref-type="bibr" rid="B24">Fayek, 2016</xref>; <xref ref-type="bibr" rid="B39">Logan, 2000</xref>), can be a suitable sound feature extraction method for the recognition of buzzing bee species.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>CNNs were highly dependent on pre-training and data augmentation to outperform classical ML classifiers at recognizing bees&#x2019; buzzing sounds</title>
<p>To our knowledge, this is the first application of CNNs to the task of acoustical classification of bee species. Despite Support-Vector Machine (SVM) being the best classical ML algorithm for bee sound recognition (<xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>), our results indicated that convolutional neural networks (CNNs) can outperform them. In fact, SVM and other classical classifiers are designed to model small variations which result in the lack of time and frequency invariance (<xref ref-type="bibr" rid="B71">Van Noord and Postma, 2017</xref>) which is often insufficient to cover the high-dimensional audio data of bee buzzing sounds. Therefore, CNNs become a primary choice in other applications of DL, not only for bee sound recognition recognition (<xref ref-type="bibr" rid="B65">Takahashi et&#xa0;al., 2016</xref>). In contrast to classical ML, CNNs were designed to process high-dimensional data well, which is the direct representation of raw audio data, like Log Mel-Spectrogram (<xref ref-type="bibr" rid="B63">Stowell and Plumbley, 2014</xref>; <xref ref-type="bibr" rid="B37">LeCun et&#xa0;al., 2015</xref>).</p>
<p>However, our results did not indicate that CNNs models alone overperformed classical ML, it only become evident when CNNs were combined with Log Mel-spectrogram and data augmentation techniques. In fact, CNNs can address the former limitations by learning filters that are shifted in both time and frequency (<xref ref-type="bibr" rid="B77">Zhang et&#xa0;al., 2015</xref>). However, it also generated very fast pre-training overfitting, resulting in models excessively adapted to the training set and with reduced capacity to transfer learning to validation and testing sets (<xref ref-type="bibr" rid="B13">Chicco, 2017</xref>). To mitigate overfitting and improve the generalization of models, we used the spectrogram augmentation technique and cross-validation to counterbalance it by generating additional pre-training audio samples and acoustic noise by applying random time-frequency masks to Log Mel spectrograms. Cross-validation is a well-known technique to deal with overfitting and was implemented in all ML classifiers here. The trained model does not overfit to a specific training subset, but rather is able to learn from each data fold, in turn (<xref ref-type="bibr" rid="B13">Chicco, 2017</xref>). Yet, data augmentation techniques can lead to a significant improvement in the performance of DL classifiers, but not for classical ML. Deep Learning models can take advantage of the iterative characteristic of this type of optimization, by epochs, in which the same data set can be represented in different ways for the classifiers. In practical terms, it would be like the model being exposed to different data. On the other hand, the augmented data for classical ML algorithms would be static. Therefore, we suppose that CNNs models best overperformed ML in acoustic recognition of bee species when using Mel-spectrogram information and Mixup data augmentation. Even with this improvement, however, our results indicated that the performance of CNNs is still unsatisfactory at recognizing buzzing bees in relation to ML standards (maximum F1-score <inline-formula>
<mml:math display="inline" id="im220">
<mml:mrow>
<mml:mn>58.04</mml:mn>
<mml:mo>%</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.47</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>). Hence, the CNNs tested here would not be the ultimate model and still have room for improvement, especially from novel Neural Networks architectures based on attention like the &#x201c;transformers/perceivers&#x201d; are likely to achieve higher performance for the task of bee species identification (<xref ref-type="bibr" rid="B23">Elliott et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B73">Wolters et&#xa0;al., 2021</xref>).</p>
<p>However, it is important to highlight that the two DL classifiers tested here, employ the mixup data augmentation technique slightly differently. The technique was used in the PANNs model as described in the original work, directly on the log Mel spectrogram representation. However, in the EfficientNet V2 Small model, the mixup was applied to the waveform. Based on previous experiments, we conclude that for this specific model, the application of mixup on the waveform provides better overall results.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Imbalanced data bias and noise corruption</title>
<p>In general, Machine-Learning can review large volumes of data and discover specific trends and patterns that would not be apparent to humans. To generate suitable classifications, ML models need massive resources with a considerable amount of accuracy and relevancy. However, our data set as well as other bioacoustic data sets are usually imbalanced, and with background noise (<xref ref-type="bibr" rid="B53">Rodrigues, 2019</xref>). Consequently, the imbalance was the main challenge to handle our data set using ML models. In-field bee audio data collection and acoustic pre-processing require domain knowledge and were exhaustive and very time-consuming: we spent <inline-formula>
<mml:math display="inline" id="im221">
<mml:mrow>
<mml:mn>990</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours of fieldwork to record <inline-formula>
<mml:math display="inline" id="im222">
<mml:mrow>
<mml:mn>518</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> audio samples, which corresponds to an average of 1.9 hours to record one sound file. Moreover, audio data collection was susceptible to bee species richness and abundance differences, thus limiting the number of samples for rare species (see also <xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al. (2021)</xref>). This not only impacts the applicability domain of the implemented ML but also influences the utility of the models for prospective use (<xref ref-type="bibr" rid="B53">Rodrigues, 2019</xref>). A data set is imbalanced when one class is over-represented with respect to the others, causing the model to return sub-optimal solutions due to bias in the majority class (<xref ref-type="bibr" rid="B13">Chicco, 2017</xref>). As a result, these classifiers tend to ignore small classes while concentrating on classifying the large ones accurately. Here, we dealt with data imbalance by employing pre-training and data augmentation (as discussed in the previous section) and measuring the performance of the classifiers based on Macro-F1 score instead of Accuracy. We employed macro-F1 since Accuracy underestimates classes with a smaller number of samples in relation to the larger ones. Macro-F1 score is considered a suited metric for an unbalanced test set because it better describes performance by class and not by sample number (<xref ref-type="bibr" rid="B62">Steiniger et&#xa0;al., 2020</xref>).</p>
<p>However, relating bee species performance with its respective pollination efficiency (<xref ref-type="bibr" rid="B17">Cort&#xe9;s&#x2010;Rivas et al., 2023</xref>), we found that the bees most efficient at pollinating were also the majority classes here (e.g. <italic>B. terrestris</italic>, <italic>C. occidentalis</italic>, <italic>B. dahlbomii</italic>). In practice, this reduces the imbalanced data bias since the majority and most hit classes are also the most efficient pollinators. Thus, we suppose that the ML algorithms are capable of recognizing well the most efficient pollinators of highbush blueberry crops in Chile.</p>
<p>Besides imbalanced data bias, background noise corruption was another frequent problem in our data set. However, we decided to input the original audios without noise removal or attenuation. Since noise corruption must be unavoidable in practical situations, audios without noise removal/attenuation bring more realistic model projections. In addition, by not removing noise from the input data, we also gain two functionalities: <inline-formula>
<mml:math display="inline" id="im223">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> we get more data for our deep neural network to train; and <inline-formula>
<mml:math display="inline" id="im224">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> we can train our neural network on noisy data which means that it will generalize well on noisy test data as well.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Consequences of automating bee recognition for blueberry fruit yields</title>
<p>Automating the taxonomic recognition of flower-visiting bees would be especially relevant for blueberry fruit set and size, since the quality of the pollination provided is dependent, among other factors, on the taxonomic identity of a flower visitor (<xref ref-type="bibr" rid="B6">Brewer and Dobson, 1969</xref>; <xref ref-type="bibr" rid="B22">Dogterom et&#xa0;al., 2000</xref>; <xref ref-type="bibr" rid="B5">Benjamin and Winfree, 2014</xref>; <xref ref-type="bibr" rid="B44">Nicholson and Ricketts, 2019</xref>). A parallel study focusing on pollinator performance and covering most of the species analyzed here revealed that only a subset of the flower-visiting bee species achieved high performance at pollinate blueberry cultivars, while others were poor pollinators or even considered flower resource thieves (<xref ref-type="bibr" rid="B17">Cort&#xe9;s-Rivas et&#xa0;al., 2023</xref>). Therefore, automating acoustic recognition of bee species, especially distinguishing pollinators from nectar/pollen thieves, could result in a comprehensive and powerful tool for agriculture decision-making processes. Farmers could recognize the best pollinators without needing an expert in insect taxonomy. Aware of the value of bees to crop income, farmers could be encouraged to consider the pollination perspective in their crop management, which results in the conservation of local wild bee species, thereby contributing to advances toward more sustainable and higher-yield agriculture.</p>
<p>In summary, we compared the performance of CNNs models at recognizing blueberry-pollinating bees with the current state-of-the-art models for bee automatic recognition. We found advantages for CNN classifiers in recognizing bee species based on their buzzing sounds over the classical ML algorithms used (<xref ref-type="bibr" rid="B52">Ribeiro et&#xa0;al., 2021</xref>). CNNs algorithms powered by a combination of transforming sound events into Mel-spectrogram images and strong data augmentation overperformed classical ML algorithms and could lead to automating the taxonomic recognition of flower-visiting bees of blueberry crops. As far as we know, the use of DL classifiers for bee taxa identification based on respective buzzing sounds has not been reported previously. However, there is still room to improve the performance of DL models. Further studies, focusing on recording samples for poorly represented classes, and/or applying algorithms that can perform more complex processing tasks like unsupervised learning systems, could help to achieve better classification results.</p>
</sec>
</sec>
<sec id="s5" sec-type="data-availability">
<title>Data availability statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s6" sec-type="author-contributions">
<title>Author contributions</title>
<p>NS, FM, TR, and JM-N contributed to conception and design of the study. VM and JM-N carried out the experiment. AF organized the database. AF performed the data and statistical analysis. FM and JM-N wrote the first draft of the manuscript. TR, AF, and NS wrote sections of the manuscript. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by the ANID/Fondecyt Iniciaci&#xf3;n en Investigaci&#xf3;n under grant No. 11190013, FIC GORE Maule under grant No. BIP- 40.019.177&#x2013;0, and ANID/CONICYT FONDECYT Regular under grant No. 1201893.</p>
</sec>
<ack>
<title>Acknowledgments</title>
<p>We thank the staff of Agr&#xed;cola Aguas Negras S.A. and Shine Liucura, especially Claudio Troncoso and Andr&#xe9; Didier, for assistance during fieldwork and the two reviewers for their constructive comments that improved the manuscript.</p>
</ack>
<sec id="s8" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s9" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>Acronym for Floating Operations per Second.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abe&#xdf;er</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A review of deep learning based methods for acoustic scene classification</article-title>. <source>Appl. Sci.</source> <volume>10</volume>, <page-range>1&#x2013;16</page-range>.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Acevedo</surname> <given-names>M. A.</given-names>
</name>
<name>
<surname>Corrada-Bravo</surname> <given-names>C. J.</given-names>
</name>
<name>
<surname>Corrada-Bravo</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Villanueva-Rivera</surname> <given-names>L. J.</given-names>
</name>
<name>
<surname>Aide</surname> <given-names>T. M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Automated classification of bird and amphibian calls using machine learning: A comparison of methods</article-title>. <source>Ecol. Inf.</source> <volume>4</volume>, <fpage>206</fpage>&#x2013;<lpage>214</lpage>.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alpaydm</surname> <given-names>E.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Combined 5&#xd7;2 cv f test for comparing supervised classification learning algorithms</article-title>. <source>Neural Comput.</source> <volume>11</volume>, <fpage>1885</fpage>&#x2013;<lpage>1892</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arruda</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Imperatriz-Fonseca</surname> <given-names>V.</given-names>
</name>
<name>
<surname>de Souza</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Pessin</surname> <given-names>G.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Identifying bee species by means of the foraging pattern using machine learning</article-title>. <source>In 2018 Int. Joint Conf. Neural Networks (IJCNN).</source> <volume>1&#x2013;6</volume>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/IJCNN.2018.8489608</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benjamin</surname> <given-names>F. E.</given-names>
</name>
<name>
<surname>Winfree</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Lack of pollinators limits fruit production in commercial blueberry (vaccinium corymbosum)</article-title>. <source>Environ. Entomol.</source> <volume>43</volume>, <fpage>1574</fpage>&#x2013;<lpage>1583</lpage>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brewer</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Dobson</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>1969</year>). <article-title>Seed count and berry size in relation to pollinator level and harvest date for the highbush blueberry, vaccinium corymbosum</article-title>. <source>J. Economic Entomology</source> <volume>62</volume>, <fpage>1353</fpage>&#x2013;<lpage>1356</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Briggs</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Fern</surname> <given-names>X. Z.</given-names>
</name>
<name>
<surname>Raich</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Lou</surname> <given-names>Q.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Instance annotation for multi-instance multi-label learning</article-title>. <source>ACM Trans. Knowledge Discovery Data (TKDD)</source> <volume>7</volume>, <fpage>1</fpage>&#x2013;<lpage>30</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buchmann</surname> <given-names>S. L.</given-names>
</name>
</person-group> (<year>1983</year>). <article-title>Buzz pollination in angiosperms</article-title>. <source>Buzz Pollination Angiosperms.</source> <volume>28</volume>, <fpage>73</fpage>&#x2013;<lpage>113</lpage>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burkart</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Lunau</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Schlindwein</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Comparative bioacoustical studies on flight and buzzing of neotropical bees</article-title>. <source>J. Pollination Ecol.</source> <volume>6</volume>, <fpage>491</fpage>&#x2013;<lpage>596</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cane</surname> <given-names>J. H.</given-names>
</name>
<name>
<surname>Eickwort</surname> <given-names>G. C.</given-names>
</name>
<name>
<surname>Wesley</surname> <given-names>F. R.</given-names>
</name>
<name>
<surname>Spielholz</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>1985</year>). <article-title>Pollination ecology of vaccinium stamineum (ericaceae: Vaccinioideae)</article-title>. <source>Am. J. Bot.</source> <volume>72</volume>, <fpage>135</fpage>&#x2013;<lpage>142</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cardinal</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Buchmann</surname> <given-names>S. L.</given-names>
</name>
<name>
<surname>Russell</surname> <given-names>A. L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>The evolution of floral sonication, a pollen foraging behavior used by bees (anthophila)</article-title>. <source>Evolution</source> <volume>72</volume>, <fpage>590</fpage>&#x2013;<lpage>600</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cejrowski</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Szyma&#x144;ski</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Logof&#x103;tu</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Buzz-based recognition of the honeybee colony circadian rhythm</article-title>. <source>Comput. Electron. Agric.</source> <volume>175</volume>, <fpage>505</fpage>&#x2013;<lpage>486</lpage>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chicco</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Ten quick tips for machine learning in computational biology</article-title>. <source>BioData Min.</source> <volume>10</volume>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chlap</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Min</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Vandenberg</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Dowling</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Holloway</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Haworth</surname> <given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A review of medical image data augmentation techniques for deep learning applications</article-title>. <source>J. Med. Imaging Radiat. Oncol.</source> <volume>65</volume>, <fpage>545</fpage>&#x2013;<lpage>563</lpage>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cooley</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Vallejo-Mar&#xed;n</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Buzz-pollinated crops: A global review and meta-analysis of the effects of supplemental bee pollination in tomato</article-title>. <source>J. Econ. Entomol.</source> <volume>14</volume>, <fpage>179</fpage>&#x2013;<lpage>213</lpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corbet</surname> <given-names>S. A.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>S.-Q.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Buzz pollination in eight bumblebee-pollinated pedicularis species: does it involve vibration-induced triboelectric charging of pollen grains</article-title>? <source>Ann. Bot.</source> <volume>114</volume>, <fpage>1665</fpage>&#x2013;<lpage>1674</lpage>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cort&#xe9;s-Rivas</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Smith-Ramirez</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Monz&#xf3;n</surname> <given-names>V. H.</given-names>
</name>
<name>
<surname>Mesquita-Neto</surname> <given-names>J. N.</given-names>
</name>
</person-group> (<year>2023</year>). <article-title>Native bee species with buzz-behavior can achieve high-performance pollination of highbush blueberry crops of Chile</article-title>. <source>Agric. For. Entomol.</source> <volume>25</volume>, <fpage>91</fpage>&#x2013;<lpage>102</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Luca</surname> <given-names>P. A.</given-names>
</name>
<name>
<surname>Bussiere</surname> <given-names>L. F.</given-names>
</name>
<name>
<surname>Souto-Vilaros</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Goulson</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Mason</surname> <given-names>A. C.</given-names>
</name>
<name>
<surname>Vallejo-Mar&#xed;n</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Variability in bumblebee pollination buzzes affects the quantity of pollen released from flowers</article-title>. <source>Oecologia</source> <volume>172</volume>, <fpage>805</fpage>&#x2013;<lpage>816</lpage>.</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Luca</surname> <given-names>P. A.</given-names>
</name>
<name>
<surname>Vallejo-Marin</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>What&#x2019;s the &#x2018;buzz&#x2019;about? the ecology and evolutionary significance of buzz-pollination</article-title>. <source>Curr. Opin. Plant Biol.</source> <volume>16</volume>, <fpage>429</fpage>&#x2013;<lpage>435</lpage>.</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Deng</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Dong</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Socher</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>L.-J.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Fei-Fei</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Imagenet: A large-scale hierarchical image database</article-title>. <source>In 2009 IEEE Conf. Comput. Vision Pattern Recognit.</source> <fpage>248</fpage>&#x2013;<lpage>255</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/CVPR.2009.5206848</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dietterich</surname> <given-names>T. G.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Approximate statistical tests for comparing supervised classification learning algorithms</article-title>. <source>Neural Comput.</source> <volume>10</volume>, <fpage>1895</fpage>&#x2013;<lpage>1923</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1162/089976698300017197</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dogterom</surname> <given-names>M. H.</given-names>
</name>
<name>
<surname>Winston</surname> <given-names>M. L.</given-names>
</name>
<name>
<surname>Mukai</surname> <given-names>A.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Effect of pollen load size and source (self, outcross) on seed and fruit production in highbush blueberry cv.&#x2018;bluecrop&#x2019;(vaccinium corymbosum; ericaceae)</article-title>. <source>Am. J. Bot.</source> <volume>87</volume>, <fpage>1584</fpage>&#x2013;<lpage>1591</lpage>.</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elliott</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Otero</surname> <given-names>C. E.</given-names>
</name>
<name>
<surname>Wyatt</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Martino</surname> <given-names>E.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Tiny transformers for environmental sound classification at the edge</article-title>. <source>arXiv preprint</source>. <page-range>1&#x2013;12</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.2103.12157</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fayek</surname> <given-names>H. M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Speech processing for machine learning: Filter banks, mel-frequency cepstral coefficients (mfccs) and what&#x2019;s in-between</article-title>. Available at: <uri xlink:href="https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html">https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html</uri> (accessed on 6 August 2022).</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gaston</surname> <given-names>K. J.</given-names>
</name>
<name>
<surname>O&#x2019;Neill</surname> <given-names>M. A.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Automated species identification: why not? <italic>Philosophical transactions of the royal society of london</italic>
</article-title>. <source>Ser. B: Biol. Sci.</source> <volume>359</volume>, <fpage>655</fpage>&#x2013;<lpage>667</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gemmeke</surname> <given-names>J. F.</given-names>
</name>
<name>
<surname>Ellis</surname> <given-names>D. P. W.</given-names>
</name>
<name>
<surname>Freedman</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Jansen</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Lawrence</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Moore</surname> <given-names>R. C.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Audio set: An ontology and human-labeled dataset for audio events</article-title>. <source>In 2017 IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP).</source> <fpage>776</fpage>&#x2013;<lpage>780</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/ICASSP.2017.7952261</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gong</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Chung</surname> <given-names>Y.-A.</given-names>
</name>
<name>
<surname>Glass</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation</article-title>. <source>IEEE/ACM Trans. Audio Speech Lang. Proc.</source> <volume>29</volume>, <fpage>3292</fpage>&#x2013;<lpage>3306</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TASLP.2021.3120633</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gradi&#x161;ek</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Slapni&#x10d;ar</surname> <given-names>G.</given-names>
</name>
<name>
<surname>&#x160;orn</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Lu&#x161;trek</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Gams</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Grad</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Predicting species identity of bumblebees through analysis of flight buzzing sounds</article-title>. <source>Bioacoustics</source> <volume>26</volume>, <fpage>63</fpage>&#x2013;<lpage>76</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gwardys</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Grzywczak</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Deep image features in music information retrieval</article-title>. <source>Int. J. Electron. Telecommun.</source> <volume>60</volume>, <fpage>321</fpage>&#x2013;<lpage>326</lpage>.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hershey</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Chaudhuri</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Ellis</surname> <given-names>D. P.</given-names>
</name>
<name>
<surname>Gemmeke</surname> <given-names>J. F.</given-names>
</name>
<name>
<surname>Jansen</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Moore</surname> <given-names>R. C.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Cnn architectures for large-scale audio classification</article-title>. <source>In 2017 IEEE Int. Conf. acoustics speech Signal Process. (icassp) (IEEE)</source>, <fpage>131</fpage>&#x2013;<lpage>135</lpage>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Javorek</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Mackenzie</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Vander Kloet</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Comparative pollination effectiveness among bees (hymenoptera: Apoidea) on lowbush blueberry (ericaceae: Vaccinium angustifolium)</article-title>. <source>Ann. Entomological Soc. America</source> <volume>95</volume>, <fpage>345</fpage>&#x2013;<lpage>351</lpage>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jinbo</surname> <given-names>U.</given-names>
</name>
<name>
<surname>Kato</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Ito</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Current progress in dna barcoding and future implications for entomology</article-title>. <source>Entomological Sci.</source> <volume>14</volume>, <fpage>107</fpage>&#x2013;<lpage>124</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kandori</surname> <given-names>I.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Diverse visitors with various pollinator importance and temporal change in the important pollinators of geranium thunbergii (geraniaceae)</article-title>. <source>Ecol. Res.</source> <volume>17</volume>, <fpage>283</fpage>&#x2013;<lpage>294</lpage>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kawakita</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Ichikawa</surname> <given-names>K.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Automated classification of bees and hornet using acoustic analysis of their flight sounds</article-title>. <source>Apidologie</source> <volume>50</volume>, <fpage>71</fpage>&#x2013;<lpage>79</lpage>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kong</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Cao</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Iqbal</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Plumbley</surname> <given-names>M. D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Panns: Large-scale pretrained audio neural networks for audio pattern recognition</article-title>. <source>IEEE/ACM Trans. Audio Speech Lang. Process.</source> <volume>28</volume>, <fpage>2880</fpage>&#x2013;<lpage>2894</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TASLP.2020.3030497</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kuncheva</surname> <given-names>L. I.</given-names>
</name>
</person-group> (<year>2004</year>). <source>Combining pattern classifiers: Methods and algorithms</source> (<publisher-loc>USA</publisher-loc>: <publisher-name>Wiley-Interscience</publisher-name>).</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>LeCun</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Bengio</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Hinton</surname> <given-names>G.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>nature</source> <volume>521</volume>, <fpage>436</fpage>&#x2013;<lpage>444</lpage>.</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname> <given-names>O. T.</given-names>
</name>
<name>
<surname>Basset</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Insect conservation in tropical forests</article-title>. <source>Insect Conserv. Biol.</source> <volume>456</volume>, <fpage>34</fpage>&#x2013;<lpage>56</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Logan</surname> <given-names>B.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Mel Frequency cepstral coefficients for music modeling</article-title>. <source>In Ismir (Citeseer) vol.</source> <volume>270</volume>, <fpage>1</fpage>&#x2013;<lpage>11</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lorenz</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Almeida</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Almeida-Lopes</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Louise</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Pereira</surname> <given-names>S. N.</given-names>
</name>
<name>
<surname>Petersen</surname> <given-names>V.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Geometric morphometrics in mosquitoes: What has been measured</article-title>? <source>Infection Genet. Evol.</source> <volume>54</volume>, <fpage>205</fpage>&#x2013;<lpage>215</lpage>.</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martineau</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Conte</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Raveaux</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Arnault</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Munier</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Venturini</surname> <given-names>G.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A survey on image-based insect classification</article-title>. <source>Pattern Recognition</source> <volume>65</volume>, <fpage>273</fpage>&#x2013;<lpage>284</lpage>.</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>M&#xfc;ller</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Ritz</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Illium</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Linnhoff-Popien</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Acoustic anomaly detection for machine sounds based on image transfer learning</article-title>. <source>CoRR</source>. <page-range>1&#x2013;8</page-range>.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nanni</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Maguolo</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Paci</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Data augmentation approaches for improving animal audio classification</article-title>. <source>Ecol. Inf.</source> <volume>57</volume>, <fpage>101084</fpage>.</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nicholson</surname> <given-names>C. C.</given-names>
</name>
<name>
<surname>Ricketts</surname> <given-names>T. H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Wild pollinators improve production, uniformity, and timing of blueberry crops</article-title>. <source>Agriculture Ecosyst. Environ.</source> <volume>272</volume>, <fpage>29</fpage>&#x2013;<lpage>37</lpage>.</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nolasco</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Terenzi</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Cecchi</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Orcioni</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Bear</surname> <given-names>H. L.</given-names>
</name>
<name>
<surname>Benetos</surname> <given-names>E.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Audio-based identification of beehive states</article-title>. <source>CoRR</source>. <page-range>1&#x2013;5</page-range>.</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nunes-Silva</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Hnrcir</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Shipp</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Imperatriz-Fonseca</surname> <given-names>V. L.</given-names>
</name>
<name>
<surname>Kevan</surname> <given-names>P. G.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>The behaviour of bombus impatiens (apidae, bombini) on tomato (lycopersicon esculentum mill., solanaceae) flowers: pollination and reward perception</article-title>. <source>J. Pollination Ecol.</source> <volume>11</volume>, <fpage>33</fpage>&#x2013;<lpage>40</lpage>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Orr</surname> <given-names>M. C.</given-names>
</name>
<name>
<surname>Hughes</surname> <given-names>A. C.</given-names>
</name>
<name>
<surname>Chesters</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Pickering</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>C.-D.</given-names>
</name>
<name>
<surname>Ascher</surname> <given-names>J. S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Global patterns and drivers of bee distribution</article-title>. <source>Curr. Biol.</source> <volume>50</volume>, <fpage>53</fpage>&#x2013;<lpage>78</lpage>.</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname> <given-names>D. S.</given-names>
</name>
<name>
<surname>Chan</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Chiu</surname> <given-names>C.-C.</given-names>
</name>
<name>
<surname>Zoph</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Cubuk</surname> <given-names>E. D.</given-names>
</name>
<etal/>
</person-group>. (<year>2019</year>a). <article-title>Specaugment: A simple data augmentation method for automatic speech recognition</article-title>. <source>Interspeech</source>. <page-range>2613&#x2013;2617</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.21437/interspeech.2019-2680</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Palanisamy</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Singhania</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Yao</surname> <given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Rethinking CNN models for audio classification</article-title>. <source>arXiv preprint.</source> doi:&#xa0;<pub-id pub-id-type="doi">10.48550/ARXIV.2007.11154</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname> <given-names>D. S.</given-names>
</name>
<name>
<surname>Chan</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Chiu</surname> <given-names>C.-C.</given-names>
</name>
<name>
<surname>Zoph</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Cubuk</surname> <given-names>E. D.</given-names>
</name>
<etal/>
</person-group>. (<year>2019</year>b). <article-title>Specaugment: A simple data augmentation method for automatic speech recognition</article-title>. <source>arXiv preprint</source>. <page-range>1&#x2013;6</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1904.08779</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rebelo</surname> <given-names>A. R.</given-names>
</name>
<name>
<surname>Fagundes</surname> <given-names>J. M.</given-names>
</name>
<name>
<surname>Digiampietri</surname> <given-names>L. A.</given-names>
</name>
<name>
<surname>Francoy</surname> <given-names>T. M.</given-names>
</name>
<name>
<surname>B&#xed;scaro</surname> <given-names>H. H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A fully automatic classification of bee species from wing images</article-title>. <source>Apidologie</source> <page-range>1060&#x2013;1074</page-range>.</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ribeiro</surname> <given-names>A. P.</given-names>
</name>
<name>
<surname>da Silva</surname> <given-names>N. F. F.</given-names>
</name>
<name>
<surname>Mesquita</surname> <given-names>F. N.</given-names>
</name>
<name>
<surname>Ara&#xfa;jo</surname> <given-names>P.</given-names>
</name>
<name>
<surname>d.</surname> <given-names>C. S.</given-names>
</name>
<name>
<surname>Rosa</surname> <given-names>T. C.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>Machine learning approach for automatic recognition of tomato-pollinating bees based on their buzzing-sounds</article-title>. <source>PLoS Comput. Biol.</source> <volume>17</volume>, <elocation-id>e1009426</elocation-id>.</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodrigues</surname> <given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>The good, the bad, and the ugly in chemical and biological data for machine learning</article-title>. <source>Drug Discovery Today: Technol.</source> <volume>32</volume>, <fpage>3</fpage>&#x2013;<lpage>8</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosi-Denadai</surname> <given-names>C. A.</given-names>
</name>
<name>
<surname>Ara&#xfa;jo</surname> <given-names>P. C. S.</given-names>
</name>
<name>
<surname>Campos</surname> <given-names>L. A.</given-names>
</name>
<name>
<surname>d.</surname> <given-names>O.</given-names>
</name>
<name>
<surname>Cosme</surname> <given-names>L.</given-names>
<suffix>Jr.</suffix>
</name>
<name>
<surname>Guedes</surname> <given-names>R. N. C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Buzz-pollination in neotropical bees: genus-dependent frequencies and lack of optimal frequency for pollen release</article-title>. <source>Insect Sci.</source> <volume>27</volume>, <fpage>133</fpage>&#x2013;<lpage>142</lpage>.</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Russell</surname> <given-names>A. L.</given-names>
</name>
<name>
<surname>Buchmann</surname> <given-names>S. L.</given-names>
</name>
<name>
<surname>Papaj</surname> <given-names>D. R.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>How a generalist bee achieves high efficiency of pollen collection on diverse floral resources</article-title>. <source>Behav. Ecol.</source> <volume>28</volume>, <fpage>991</fpage>&#x2013;<lpage>1003</lpage>.</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sagi</surname> <given-names>O.</given-names>
</name>
<name>
<surname>Rokach</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Ensemble learning: A survey</article-title>. <source>Wiley Interdiscip. Reviews: Data Min. Knowledge Discovery</source> <volume>8</volume>, <elocation-id>e1249</elocation-id>.</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santana</surname> <given-names>F. S.</given-names>
</name>
<name>
<surname>Costa</surname> <given-names>A. H. R.</given-names>
</name>
<name>
<surname>Truzzi</surname> <given-names>F. S.</given-names>
</name>
<name>
<surname>Silva</surname> <given-names>F. L.</given-names>
</name>
<name>
<surname>Santos</surname> <given-names>S. L.</given-names>
</name>
<name>
<surname>Francoy</surname> <given-names>T. M.</given-names>
</name>
<etal/>
</person-group>. (<year>2014</year>). <article-title>A reference process for automating bee species identification based on wing images and digital image processing</article-title>. <source>Ecol. Inf.</source> <volume>24</volume>, <fpage>248</fpage>&#x2013;<lpage>260</lpage>.</citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santos</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Bartelli</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Nogueira-Ferreira</surname> <given-names>F.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Potential pollinators of tomato, lycopersicon esculentum (solanaceae), in open crops and the effect of a solitary bee in fruit set and quality</article-title>. <source>J. economic entomology</source> <volume>107</volume>, <fpage>987</fpage>&#x2013;<lpage>994</lpage>.</citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schemske</surname> <given-names>D. W.</given-names>
</name>
<name>
<surname>Horvitz</surname> <given-names>C. C.</given-names>
</name>
</person-group> (<year>1984</year>). <article-title>Variation among floral visitors in pollination ability: a precondition for mutualism specialization</article-title>. <source>Science</source> <volume>225</volume>, <fpage>519</fpage>&#x2013;<lpage>521</lpage>.</citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schroder</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>The new key to bees: automated identification by image analysis of wings, pollinating bees-the conservation link between agriculture and nature</article-title>. <source>Brasilia: Ministry Environ.</source> <volume>94</volume>, <fpage>691</fpage>&#x2013;<lpage>596</lpage>.</citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Silva-Neto</surname> <given-names>C.</given-names>
</name>
<name>
<surname>d.</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Bergamini</surname> <given-names>L. L.</given-names>
</name>
<name>
<surname>d. S.</surname> <given-names>M. A.</given-names>
</name>
<name>
<surname>Moreira</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Morais</surname> <given-names>J.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>High species richness of native pollinators in brazilian tomato crops</article-title>. <source>Braz. J. Biol.</source> <volume>77</volume>, <fpage>506</fpage>&#x2013;<lpage>513</lpage>.</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Steiniger</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Stoppe</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Meisen</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Kraus</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Dealing with highly unbalanced sidescan sonar image datasets for deep learning classification tasks</article-title>. <source>In Global Oceans 2020: Singapore&#x2013;US Gulf Coast. (IEEE)</source>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>.</citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stowell</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Plumbley</surname> <given-names>M. D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning</article-title>. <source>PeerJ</source> <volume>2</volume>, <elocation-id>e488</elocation-id>.</citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stowell</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Wood</surname> <given-names>M. D.</given-names>
</name>
<name>
<surname>Pamu&#x142;a</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Stylianou</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Glotin</surname> <given-names>H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Automatic acoustic detection of birds through deep learning: the first bird audio detection challenge</article-title>. <source>Methods Ecol. Evol.</source> <volume>10</volume>, <fpage>368</fpage>&#x2013;<lpage>380</lpage>.</citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takahashi</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Gygli</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Pfister</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Van Gool</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Deep convolutional neural networks and data augmentation for acoustic event detection</article-title>. <source>arXiv preprint</source>. <page-range>1&#x2013;5</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1604.07160</pub-id>
</citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tan</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Le</surname> <given-names>Q. V.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Efficientnet: Rethinking model scaling for convolutional neural networks</article-title>. <source>ArXiv</source>. <page-range>1&#x2013;10</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1905.11946</pub-id>
</citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Terenzi</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Cecchi</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Orcioni</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Piazza</surname> <given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Features extraction applied to the analysis of the sounds emitted by honey bees in a beehive</article-title>. <source>In 2019 11th Int. Symposium Image Signal Process. Anal. (ISPA).</source>, <fpage>03</fpage>&#x2013;<lpage>08</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/ISPA.2019.8868934</pub-id>
</citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Toni</surname> <given-names>H. C.</given-names>
</name>
<name>
<surname>Djossa</surname> <given-names>B. A.</given-names>
</name>
<name>
<surname>Ayenan</surname> <given-names>M. A. T.</given-names>
</name>
<name>
<surname>Teka</surname> <given-names>O.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Tomato (solanum lycopersicum) pollinators and their effect on fruit set and quality</article-title>. <source>J. Hortic. Sci. Biotechnol.</source> <volume>96</volume>, <fpage>1</fpage>&#x2013;<lpage>13</lpage>.</citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Valletta</surname> <given-names>J. J.</given-names>
</name>
<name>
<surname>Torney</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Kings</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Thornton</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Madden</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Applications of machine learning in animal behaviour studies</article-title>. <source>Anim. Behav.</source> <volume>124</volume>, <fpage>203</fpage>&#x2013;<lpage>220</lpage>.</citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Valliammal</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Geethalakshmi</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Automatic recognition system using preferential image segmentation for leaf and flower images</article-title>. <source>Comput. Sci. Eng.</source> <volume>1</volume>, <fpage>13</fpage>.</citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van Noord</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Postma</surname> <given-names>E.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Learning scale-variant and scale-invariant features for deep image classification</article-title>. <source>Pattern Recognit.</source> <volume>61</volume>, <fpage>583</fpage>&#x2013;<lpage>592</lpage>.</citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vin&#xed;cius-Silva</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Parma</surname> <given-names>D.</given-names>
</name>
<name>
<surname>d.</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Tostes</surname> <given-names>R. B.</given-names>
</name>
<name>
<surname>Arruda</surname> <given-names>V. M.</given-names>
</name>
<name>
<surname>Werneck</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Importance of bees in pollination of solanum lycopersicum l.(solanaceae) in open-field of the southeast of minas gerais state, brazil</article-title>. <source>Hoehnea</source> <volume>44</volume>, <fpage>349</fpage>&#x2013;<lpage>360</lpage>.</citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wolters</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Daw</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Hutchinson</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Phillips</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Proposal-based few-shot sound event detection for speech and environmental sounds with perceivers</article-title>. <source>arXiv preprint</source>. <page-range>1&#x2013;7</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.2107.13616</pub-id>
</citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Investigation of different cnn-based models for improved bird sound classification</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>175353</fpage>&#x2013;<lpage>175361</lpage>.</citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yanikoglu</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Aptoula</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Tirkaz</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Automatic plant identification from photographs</article-title>. <source>Mach. Vision Appl.</source> <volume>25</volume>, <fpage>1369</fpage>&#x2013;<lpage>1383</lpage>.</citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Cisse</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Dauphin</surname> <given-names>Y. N.</given-names>
</name>
<name>
<surname>Lopez-Paz</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Mixup: Beyond empirical risk minimization</article-title>. <source>arXiv preprint</source>. <page-range>1&#x2013;13</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1710.09412</pub-id>
</citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>H.</given-names>
</name>
<name>
<surname>McLoughlin</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Song</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Robust sound event recognition using convolutional neural networks</article-title>. <source>In 2015 IEEE Int. Conf. acoustics speech Signal Process. (ICASSP) (IEEE)</source>, <fpage>559</fpage>&#x2013;<lpage>563</lpage>.</citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhong</surname> <given-names>M.</given-names>
</name>
<name>
<surname>LeBien</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Campos-Cerqueira</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Dodhia</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Lavista Ferres</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Velev</surname> <given-names>J. P.</given-names>
</name>
<etal/>
</person-group>. (<year>2020</year>). <article-title>Multispecies bioacoustic classification using transfer learning of deep convolutional neural networks with pseudo-labeling</article-title>. <source>Appl. Acoustics</source> <volume>166</volume>, <elocation-id>107375</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.apacoust.2020.107375</pub-id>
</citation>
</ref>
<ref id="B79">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zor</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Awais</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Kittler</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Bober</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Husain</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Kong</surname> <given-names>Q.</given-names>
</name>
<etal/>
</person-group>. (<year>2019</year>). <article-title>Divergence based weighting for information channels in deep convolutional neural networks for bird audio detection</article-title>. <source>In ICASSP 2019-2019 IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (IEEE)</source>, <fpage>3052</fpage>&#x2013;<lpage>3056</lpage>.</citation>
</ref>
</ref-list>
</back>
</article>