<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2022.844522</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Segmentation-Guided Deep Learning Framework for Leaf Counting</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Fan</surname> <given-names>Xijian</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1604843/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhou</surname> <given-names>Rui</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1604874/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Tjahjadi</surname> <given-names>Tardi</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Das Choudhury</surname> <given-names>Sruti</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/641888/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ye</surname> <given-names>Qiaolin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>College of Information Science and Technology, Nanjing Forestry University</institution>, <addr-line>Nanjing</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>School of Engineering, University of Warwick</institution>, <addr-line>Coventry</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Biological Systems Engineering, University of Nebraska-Lincoln</institution>, <addr-line>Lincoln, NE</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Yu Xue, Nanjing University of Information Science and Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Marcin Wozniak, Silesian University of Technology, Poland; Li Chaorong, Yibin University, China</p></fn>
<corresp id="c001">&#x002A;Correspondence: Xijian Fan, <email>xijian.fan@njfu.edu.cn</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Sustainable and Intelligent Phytoprotection, a section of the journal Frontiers in Plant Science</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>05</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>844522</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>04</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Fan, Zhou, Tjahjadi, Das Choudhury and Ye.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Fan, Zhou, Tjahjadi, Das Choudhury and Ye</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Deep learning-based methods have recently provided a means to rapidly and effectively extract various plant traits due to their powerful ability to depict a plant image across a variety of species and growth conditions. In this study, we focus on dealing with two fundamental tasks in plant phenotyping, i.e., plant segmentation and leaf counting, and propose a two-steam deep learning framework for segmenting plants and counting leaves with various size and shape from two-dimensional plant images. In the first stream, a multi-scale segmentation model using spatial pyramid is developed to extract leaves with different size and shape, where the fine-grained details of leaves are captured using deep feature extractor. In the second stream, a regression counting model is proposed to estimate the number of leaves without any pre-detection, where an auxiliary binary mask from segmentation stream is introduced to enhance the counting performance by effectively alleviating the influence of complex background. Extensive pot experiments are conducted CVPPP 2017 Leaf Counting Challenge dataset, which contains images of Arabidopsis and tobacco plants. The experimental results demonstrate that the proposed framework achieves a promising performance both in plant segmentation and leaf counting, providing a reference for the automatic analysis of plant phenotypes.</p>
</abstract>
<kwd-group>
<kwd>plant phenotyping</kwd>
<kwd>segmentation</kwd>
<kwd>deep CNN architecture</kwd>
<kwd>leaf counting</kwd>
<kwd>multiple traits</kwd>
</kwd-group>
<counts>
<fig-count count="8"/>
<table-count count="5"/>
<equation-count count="14"/>
<ref-count count="42"/>
<page-count count="13"/>
<word-count count="9283"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>Plant phenotype is a set of observable traits of a plant, which is heavily influenced by the interaction between plant gene expression and environmental factor (<xref ref-type="bibr" rid="B34">Siebner et al., 2009</xref>). The accurate and efficient monitoring of phenotypes is essential for plant cultivation, which is a prerequisite for intelligent production and planting, and information/data management. The traditional monitoring of plant phenotype mainly requires manual observation and measurement to analyse the appearance of plants in terms of their shape, texture, colour, and other characteristic morphological phenotypes (<xref ref-type="bibr" rid="B28">Montero et al., 2000</xref>; <xref ref-type="bibr" rid="B26">Minervini et al., 2015</xref>). Such an approach is labour intensive, which is time-consuming and prone to error due to the reliance on subjective perception (<xref ref-type="bibr" rid="B39">Yang et al., 2020</xref>). Image-based plant phenotyping allows non-invasive and distant observation, reducing the effects of manual interference and vastly increasing the scale and throughput of plant phenotyping activities. However, it still requires a robust algorithm to automatically process the input image to provide accurate and reliable phenotypic estimation (<xref ref-type="bibr" rid="B33">Scharr et al., 2016</xref>). In addition, such an algorithm should be able to estimate a wide diversity of phenotypes, which allows for a range of different scientific applications. The current trend of image-based plant phenotyping attempts to combine image processing (e.g., noise removal and image enhancement), feature extraction and machine learning to obtain effective and efficient estimation (<xref ref-type="bibr" rid="B36">Tsaftaris et al., 2016</xref>). In recent years, deep learning-based methods have made remarkable progress in the field of computer vision such as semantic segmentation, classification, and object detection (<xref ref-type="bibr" rid="B22">Lecun et al., 2015</xref>). They integrate feature extraction and classification using a single convolutional neural network (CNN) based framework, which is trained in an end-to-end fashion. Due to their powerful ability to capture meaningful feature representation, deep learning-based methods are drawing more attention in the plant research community (<xref ref-type="bibr" rid="B6">Dhaka et al., 2021</xref>; <xref ref-type="bibr" rid="B20">Kundu et al., 2021</xref>) and have also been applied to deal with different tasks in plant phenotyping (<xref ref-type="bibr" rid="B4">Choudhury et al., 2019</xref>).</p>
<p>Plant segmentation and leaf counting are two fundamental tasks of plant phenotyping as they are relevant to the developmental stage of a plant, and are considered essential means of providing vital indicators for the evaluation of plant growth (e.g., growth regulation and flowering time), yield potential, and plant health. Moreover, they help farmers and horticulturists to make better decision regarding cultivation strategic and timely horticulture adjustments. Plant segmentation aims to extract the plant area, shape, and size from a visual perspective by segmenting an entire plant from the scene background in an image. Such a task closely relates to the semantic/instance segmentation problems, and some researchers have addressed this task using instance/semantic segmentation (<xref ref-type="bibr" rid="B31">Romera-Paredes and Torr, 2016</xref>; <xref ref-type="bibr" rid="B30">Ren and Zemel, 2017</xref>; <xref ref-type="bibr" rid="B38">Ward et al., 2018</xref>; <xref ref-type="bibr" rid="B42">Zhu et al., 2018</xref>), achieving promising performance. Leaf counting aims to estimate the precise number of leaves of a plant. There are two mainstream ways to infer the leaf count or leaf number: (1) estimating the leaf number as a sub-product of leaf segmentation or detection (<xref ref-type="bibr" rid="B9">Girshick, 2015</xref>; <xref ref-type="bibr" rid="B17">Kong et al., 2020</xref>; <xref ref-type="bibr" rid="B19">Kumar and Domnic, 2020</xref>; <xref ref-type="bibr" rid="B23">Lin and Guo, 2020</xref>; <xref ref-type="bibr" rid="B25">Lu and Cao, 2020</xref>; <xref ref-type="bibr" rid="B35">Tassis et al., 2021</xref>); and (2) directly regarding the task as a holistic regression problem (<xref ref-type="bibr" rid="B8">Dobrescu et al., 2017</xref>; <xref ref-type="bibr" rid="B10">Giuffrida et al., 2018</xref>; <xref ref-type="bibr" rid="B16">Itzhaky et al., 2018</xref>; <xref ref-type="bibr" rid="B37">Ubbens et al., 2018</xref>; <xref ref-type="bibr" rid="B27">Mishra et al., 2021</xref>). The methods have successfully addressed the tasks of leaf segmentation and counting using machine learning and especially deep learning methods, which uncover the intrinsic information from plant images, even when they contain complex structure. However, they merely focus on a single task, i.e., learn one plant trait at a time. Thus, they might ignore the facts that plant phenotype traits tend to be associated with each other and lack the insight to the potential relationship between different traits (<xref ref-type="bibr" rid="B12">Gomes and Zheng, 2020</xref>). For instance, the leaf number is associated with the leaf area, age, and genotype. We believe that incorporating multiple traits in the deep CNN architecture could be beneficial for learning more reliable and discriminative information than using only one trait. <xref ref-type="bibr" rid="B7">Dobrescu et al. (2020)</xref> presented a multi-task framework for leaf count, projected leaf area, and genotyping, where they compute three plant traits at the same time by using the share representation layers. However, they did not address the tasks of plant segmentation that is more challenging due to the requirement of classifying all the leaves (foreground) pixel by pixel.</p>
<p>Convolutional neural network based methods have been applied to plant and leaf segmentation in plant phenotyping. <xref ref-type="bibr" rid="B1">Aich and Stavness (2017)</xref> used a CNN based deconvolutional network for plant (foreground) and leaf segmentation. <xref ref-type="bibr" rid="B21">Kuznichov et al. (2019)</xref> utilised data augmentation technology to maintain the geometric structure and physical appearance of plant in images to improve the leaf segmentation. <xref ref-type="bibr" rid="B2">Bell and Dee (2019)</xref> employed a relatively shallow CNN model to classify image edges extracted using Canny edge detector, which distinguished the occluding pairs of leaves. <xref ref-type="bibr" rid="B30">Ren and Zemel (2017)</xref> adopted recurrent neural network (RNN) to generate a single segmented template for each leaf and combined convolutional long short-term memory (LSTM) network using spatial inhibition modules. They then used dynamical non-maximal suppression to leverage the previously segmented instances to enhance the segmentation. Although achieving promising results, these methods use the shallow CNN model, which is inadequate to capture the meaningful information of the diversity of plant images. Moreover, all methods concentrate on addressing the single task, i.e., leaf/plant segmentation in an independent pipeline.</p>
<p>Image segmentation using deep learning has gained a significant advance, and a few benchmark methods have been proposed. Fully convolutional networks (FCN) (<xref ref-type="bibr" rid="B24">Long et al., 2015</xref>) and U-Net (<xref ref-type="bibr" rid="B32">Ronneberger et al., 2015</xref>) are two representative models that are based on the encoder-decoder network architecture. Both of them share a similar idea, i.e., using skip connection, that shows the capability to capture the fine-grained characteristics of the target images. FCN summed the up-sampled feature maps with feature maps skipped from the encoder, while U-Net modified the way of feature concatenation by adding convolutions and non-linearities during each up-sampling step. Another mainstream work is using spatial pyramid pooling ideas. PSPNet employed a pyramid parsing operation that captures global context information by region feature aggregation (<xref ref-type="bibr" rid="B40">Zhao et al., 2017</xref>). DeepLab (<xref ref-type="bibr" rid="B3">Chen et al., 2017</xref>) introduced the atrous convolution with up-sampling filter for feature extraction, and extended it using spatial pyramid pooling to encode the multi-scale contextual semantics. However, the various scale pooling operations tend to lose local spatial details and will fail to maintain leaf target with high density if a small input size is adopted. The Mask Region Convolutional Neural Network (Mask-RCNN), proposed by <xref ref-type="bibr" rid="B13">He et al. (2017)</xref>, extended the region proposal network by integrating a branch to predict segmentation mask on each ROI. Mask RCNN can segment the object with pixel-wise mask from a complicated background, which is suitable for the leaf segmentation. Thus, we developed our network model based on the backbone architecture in Mask-RCNN and simply replaced the plain skip connection with a nested dense skip pathway to enhance the ability to extract more fine-grained features in leaf images.</p>
<p>Leaf counting is also an important task in plant phenotyping, since leaf count is considered as an indicator for yield potential and plant health (<xref ref-type="bibr" rid="B29">Rahnemoonfar and Sheppard, 2017</xref>). From the perspective of computer vision, leaf counting can be addressed along two different lines: (1) Regarding leaf counting as the sub-product of leaf segmentation or detection, leading to the leaf number following the segmentation module; and (2) Directly learning an image-to-count model to estimate the leaf number using training samples.</p>
<sec id="S1.SS1">
<title>Direct Count</title>
<p>Leaf counting is regarded as a holistic regression task, in which a counting model estimates the leaf number for a given plant image. In this way, the machine learning based regression model solely needs the annotation of leaf number, which is an easier way to obtain compared with the pixel-wise annotations using segmentation. <xref ref-type="bibr" rid="B8">Dobrescu et al. (2017)</xref> presented a counting framework employing the ResNet50 backbone (<xref ref-type="bibr" rid="B14">He et al., 2016</xref>), in which the learning of leaf counting is performed by gathering samples from multiple sources. <xref ref-type="bibr" rid="B16">Itzhaky et al. (2018)</xref> proposed to estimate the leaf number using multi-scale representations and fuse them to make the final predictions. <xref ref-type="bibr" rid="B37">Ubbens et al. (2018)</xref> presented an open-source platform which aims to introduce a more generalised system for plant breeders, which can be used to count leaves across different datasets, as well as to assist other tasks e.g., projected leaf area and genotype classification. <xref ref-type="bibr" rid="B5">da Silva and Goncalves (2019)</xref> constructed a CNN based regression model to learn from images, where the skip connections of Resent50 (<xref ref-type="bibr" rid="B14">He et al., 2016</xref>) are considered efficient for leaf counting. Direct count could be a natural and easy selection as it is not necessary to annotate the image when training.</p>
</sec>
<sec id="S1.SS2">
<title>Counting via Detection or Segmentation</title>
<p>This approach regards the leaf counting problem as a sub-product of detection or segmentation, where the exact locations and number of the leaves are also obtained after detection or segmentation. <xref ref-type="bibr" rid="B31">Romera-Paredes and Torr (2016)</xref> proposed to learn an end-to-end segmentation model using RNN, that segments each leaf sequentially and then estimate the number of segmented leaves. <xref ref-type="bibr" rid="B1">Aich and Stavness (2017)</xref> used a CNN based deconvolutional network for leaf segmentation and a convolutional network for leaf counting. <xref ref-type="bibr" rid="B18">Kumar and Domnic (2019)</xref> developed a counting model with the combination of CNN and traditional methods, where graph-based method is used for U-Net segmentation and CNN-based is then used for leaf counting via a fine-tuned AlexNet. <xref ref-type="bibr" rid="B30">Ren and Zemel (2017)</xref>, propose a neural network using which visual attention operation to jointly learn the instance segmentation and counting model, where sequential attention using LSTM cell is created by using temporal chain to output one instance at a time. However, such a segmentation or detection-based method has one limitation for counting. That is, only successfully segmented leaves are counted, and imperfect detection will result in reduced accuracy in counting. Unlike the aforementioned methods, we employ the segmented binary image to guide the learning of leaf counting, i.e., not counting directly from the segmented image, thus avoiding the effect of inaccurate detection or segmentation on the counting task.</p>
<p>In this study, we present in this article a two-stream framework, one stream for plant segmentation and the other stream for leaf counting based on regression. The resultant mask from segmentation stream is leveraged to guide the learning of leaf counting, which help to alleviate the inference of complex background. In order to obtain more semantic and meaningful feature representation of plant images, we employ the deep CNN as the model backbones of both two streams. By using the CNN paradigm, the two-stream model is robust and generalizes well regardless of the plant species and the quality of the acquired image data. This is achieved by one stream task supervising the training of the other stream task <italic>via</italic> sharing certain knowledge. To this end, we employ the segmented binary mask from the plant segmentation stream as an auxiliary cue to optimise the training process of the leaf counting stream. Introducing the binary mask to supervise the learning of leaf counting is based on two issues that exclusively exist in plant leaf counting: (1) some leaves might be partially occluded by other leaves, or are incomplete and fragmentary on their own, making them difficult to detect; and (2) the leaves sometimes contain complex background, increasing the challenge in leaf counting. These two issues led to incorrect or missing count where the meaningful and useful information of leaf is hard to maintain during the leaf counting. The binary mask effectively deals with these two issues by precisely locating all individual leaves while alleviating the effect of complex background. In addition, the binary mask of image samples brings more diversity of the input images by increasing the number of samples, which could be regarded as an implicit data augmentation.</p>
<p>Specifically, in our proposed framework, a two-stream deep neural network model segments the leaves and counts the number of leaves, where the segmented binary mask is employed as an auxiliary cue to supervise the learning of leaf counting. In the stream for segmentation, a multi-scale-based segmentation network is proposed to extract fine-grained characteristics of leaves. In the stream for leaf counting, we propose to learn a regression model based on the fine-tuned CNN model. During the learning of leaf counting, the segmented mask is utilized to highlight the target leaf region (foreground) of interest (ROI) from the entire image by removing the disturbance of complex background (i.e., non-leaf area, thus facilitating the counting process.</p>
<p>The contributions of this study are summarized as follows:</p>
<list list-type="simple">
<list-item>
<label>1.</label>
<p>We propose to explore fine-grained characteristics, i.e., high inter-class similarity and low intra-class variations, widely existing in high throughput plant phenotyping that cause the failure in localizing the leaves within a small area during segmentation. To address this issue, we introduce a multi-scale U-Net segmentation model which compensates the upper-lower semantics difference by concatenating features in various scales. This model is learned in an end-to-end fashion, allowing for efficient segmentation of the leaves with different areas.</p>
</list-item>
<list-item>
<label>2.</label>
<p>We propose a two-stream network based on deep CNN architecture to complete the leaf counting together with plant segmentation, in which the model outputs the segmentation results and directly estimates the leaf number.</p>
</list-item>
<list-item>
<label>3.</label>
<p>We enhance the leaf counting by introducing the auxiliary binary information. The binary mask is utilised to supervise the leaf counting, which increases the contrast between the leaf target from background interference, and significantly aids the convergence of the counting regression model.</p>
</list-item>
</list>
<p>The remainder of the article is presented as follows: we review related work in Section &#x201C;Introduction,&#x201D; present our method in Section &#x201C;Proposed Method,&#x201D; provide the experimental results in Section &#x201C;Experiments&#x201D; and discuss the conclusions and further work in Section &#x201C;Conclusion.&#x201D;</p>
</sec>
</sec>
<sec id="S2">
<title>Proposed Method</title>
<p>We present a parallel two-stream network for determining leaf count and undertake segmentation simultaneously for the rosette-shaped plants as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. The stream for segmentation adopts the nested U-Net (U-Net++) architecture (<xref ref-type="bibr" rid="B41">Zhou et al., 2018</xref>) as backbone to extract the target leaf region from the entire image using a binary mask. The stream for leaf counting learns the CNN based regression model which is customized by modifying its last layer to directly estimate the number of leaves where the segmented mask and original colour images with the leaf number label are mixed as input of the regression model. The streams for plant segmentation and count are designed separately first. The segmented binary mask denoting the area of the leaf is used as a complementary cue to supervise the learning of the count regression stream. This is because the two key traits of the two streams, i.e., the area and leaf number are often related to each other. Incorporating the leaf area into the estimation of leaf number during the learning of deep neural network aids not only to learn more meaningful and essential information, but also alleviates the influence of complex background.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The proposed parallel two-stream network combines leaf counting and segmentation tasks. Top row: the modified Resnet50 regression model for leaf counting with 16 residual blocks. Remaining rows: U-Net++ for segmentation <italic>via</italic> multi-use of the features from different semantic levels (layers). Each blue box corresponds to a multi-channel feature map, and the green boxes represent copied feature maps. The arrows denote various operations.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g001.tif"/>
</fig>
<sec id="S2.SS1">
<title>Plant Segmentation Module</title>
<p>The segmentation module aims to extract the whole leaf area from the background. In order to enhance the robustness and accuracy of extraction, it is a necessity for the module to be in capacity to depict the characteristics existing in a plant image, i.e., fine-grained and variation in shape and size. To this end, we consider the nested U-Net as our backbone network for the segmentation. The nested U-Net model is proposed based on the U-Net that was originally proposed to meet the requirement on accurately segmenting medical images. Compared with the original U-Net model proposed by <xref ref-type="bibr" rid="B32">Ronneberger et al. (2015)</xref>, the nested U-Net architecture replaces the plain skip connection with nested and dense skip connections, which can capture fine-grained information of the object in an image. Moreover, due to the up-sampling scheme, the U-Net model could locate leaves with different size and shape by using feature maps with different scales. By dealing with the characteristics in leaves, the nested U-Net is thus suitable for plant segmentation. Another problem needs to be addressed during training, namely the ROIs of plant segmentation comprise a relatively small segment of the entire image. Thus, negative samples (i.e., background pixels) are much larger than positive samples (i.e., leaf pixels), which resulted in an unbalanced binary classification problem. To address the problem, we integrate the binary cross-entropy (BCE) loss with dice loss together, and jointly guide the learning process of the segmentation. Generally, the nested U-Net consists of three main modules: encoding, decoding, and cross-layers dense concatenation. The feature maps in the same size are defined to be of the same layer, denoting the layers as L1&#x2013;L5 from top to bottom. Each node represents a feature extraction module consisting of two 3 &#x00D7; 3 convolutional layers, followed by a rectified linear unit (ReLU) and a 2 &#x00D7; 2 max pooling that use stride 2 for down-sampling.</p>
<p>The output features from encoder are fused with the next encoder layer <italic>via</italic> up-sampling features across layers from top to bottom. The fusion outputs are concatenated with the corresponding up-sampled features of the next layer, and the process is iterated until there is no corresponding module in the next layer. The integrated feature maps are defined as</p>
<disp-formula id="S2.E1">
<label>(1)</label>
<mml:math id="M1">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnspacing="5pt" displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>&#x210B;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mpadded width="+1.7pt">
<mml:mi>j</mml:mi>
</mml:mpadded>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>&#x210B;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>&#x1D4B0;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mpadded width="+1.7pt">
<mml:mi>j</mml:mi>
</mml:mpadded>
<mml:mo>&gt;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mi/>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where &#x210B;(&#x22C5;) denotes a convolution operation followed by an activation function, &#x1D4B0;(&#x22C5;) denotes an up-sampling layer, and [] denotes the concatenation layer. Nodes at level <italic>j</italic> = 0 only receive input from the previous encoder layer; nodes at level <italic>j</italic> = 1 receive the encoder and sub-network input from two consecutive levels; and nodes <italic>j</italic> &#x003E; 1 receive <italic>j</italic> + 1 inputs of which j inputs are the outputs of the previous j nodes in the same skip pathway and the last input is the up-sampled output from the lower skip pathway.</p>
<p>The dense skip connections between layers in the same dimension pass the output of the current module to all subsequent modules and fuse it with other input features. Thus, the overall U-Net++ feature fusion structure is in the form of an inverted pyramid, where the intermediate layer contains more accurate localisation information, while the in-depth layer captures pixel-level category information.</p>
<p>As a typical binary classification task, the core objective is to segment the plant image into a binary image by labelling the foreground and background pixels as 1 and 0, respectively. To overcome the class imbalance problem, BCE loss and Dice loss are combined to form the objective function to optimize the imbalance between the foreground and background pixels through back-propagation. Dice coefficient is a measure of the pixel degree of an ensemble, and the original expression takes the form of</p>
<disp-formula id="S2.E2">
<label>(2)</label>
<mml:math id="M2">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>d</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>X</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">&#x2229;</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo rspace="5.8pt" stretchy="false">|</mml:mo>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>X</italic> and <italic>Y</italic> are sets, and <italic>s</italic> &#x2208; [0, 1], and the size of s reflects the similarity between the sets <italic>X</italic> and <italic>Y</italic>.</p>
<p>The binary cross-entropy and dice coefficient are combined to form the final loss function, which is defined as</p>
<disp-formula id="S2.E3">
<label>(3)</label>
<mml:math id="M3">
<mml:mrow>
<mml:mrow>
<mml:mi>&#x2112;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>b</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mo>&#x22C5;</mml:mo>
<mml:msubsup>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>b</mml:mi>
</mml:msubsup>
<mml:mo>&#x22C5;</mml:mo>
<mml:mtext>log</mml:mtext>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:msubsup>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mi>b</mml:mi>
</mml:msubsup>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x22C5;</mml:mo>
<mml:msubsup>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>b</mml:mi>
</mml:msubsup>
<mml:mo>&#x22C5;</mml:mo>
<mml:msubsup>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mi>b</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msubsup>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>b</mml:mi>
</mml:msubsup>
</mml:mpadded>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:msubsup>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mi>b</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>Y</italic><sup><italic>b</italic></sup><sub><italic>gt</italic></sub> and <italic>Y</italic><sup><italic>b</italic></sup><sub><italic>pred</italic></sub> denote the predict map and ground truth map of <italic>b</italic>-th image, respectively, and <italic>N</italic> denotes the batch size.</p>
<p>The objective function takes the form of a logarithmic logic function as a replacement for the complex softmax multi-class prediction function. Forward propagation infers the prediction results and compares them with the true value annotations to generate cross-entropy loss. Backward propagation updates the model weight parameters. In this way, the task of plant segmentation is transformed into a binary classification problem that is suitable for plant segmentation. The re-designed skip pathways take effect on the output of the fused features and simplify the optimisation on the shallow, middle, and profound output results for varying degrees, <italic>via</italic> tuning the overall parameter of the network.</p>
</sec>
<sec id="S2.SS2">
<title>Learning Count Model With Segmentation</title>
<p>During leaf counting, the estimated number of leaves tends to exceed its ground truth. This is because the lower part of a leaf might be occluded by other leaves, or the leaves are incomplete and fragmentary on their own, which would be ignored by the counting model. To address this problem, we introduced the auxiliary cue, i.e., the segmented mask to guide the learning of the counting model. Also, it is widely acknowledged the counting model could fail due to the lacking of available samples belonging to certain class in the training dataset. The labelling for leaf counting is also time-consuming. Such data scarcity is often met in the data-driven methods such as deep learning. Thus, we augmented the samples by combining the segmented mask and the original images, which enhance the model to effectively capture the occluded leaves and the hardly detected leaves in plant image under the assistance of segmented binary mask.</p>
<p>Inspired by the work of <xref ref-type="bibr" rid="B14">He et al. (2016)</xref>, we employed Resnet50 network as our backbone architecture due to its superb performance in image recognition. For our regression task, we modified the Resnet50 network by replacing the last layer with a fully connected layer with one-dimension output, which acts as a regression model for leaf counting. The modified network uses the combined samples from the segmentation mask and the original images as input, and applies convolution with a 7 &#x00D7; 7 filter followed by a series of convolutions, ending with fully connected layers to determine the number of plant predictions. Residual learning is also used to overcome the inefficient learning and the possibility of over-fitting due to deep network, where the skip connections resolve the degradation problem by taking the output of the previous layers as the input of the latter. For instance, when an input is x and the learned features are denoted as <italic>H(x)</italic>, then the residual learning features is <italic>F(x) = H(x) - x</italic>. The stacked-layer learns new features on top of the input features, and a residual unit is given by</p>
<disp-formula id="S2.E4">
<label>(4)</label>
<mml:math id="M4">
<mml:mrow>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo rspace="5.8pt">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>l</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>x</italic><sub><italic>l</italic></sub> and <italic>x</italic><sub><italic>l + 1</italic></sub>, respectively, represent the input and output of the l th residual unit, and each residual unit contains multiple layers of structure. <italic>F</italic> represents the learned residual block, <italic>h(x<sub><italic>l</italic></sub>) = x<sub><italic>l</italic></sub></italic> is the constant mapping, <italic>f</italic> is the ReLU activation function. Thus, the learned features from shallow <italic>l</italic> to deep <italic>L</italic> are</p>
<disp-formula id="S2.E5">
<label>(5)</label>
<mml:math id="M5">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mpadded>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>A chain rule is used to aid the reverse process of gradients, i.e.,</p>
<disp-formula id="S2.E6">
<label>(6)</label>
<mml:math id="M6">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mfrac>
<mml:mrow>
<mml:mo rspace="7.5pt">&#x2202;</mml:mo>
<mml:mi>loss</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mtext>loss</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22C5;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mtext>loss</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22C5;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mn>1</mml:mn>
</mml:mpadded>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mo>&#x2202;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula><mml:math id="INEQ3"><mml:mfrac><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mtext>loss</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>L</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula> denotes the gradient of the loss function reaching <italic>L</italic>, the value 1 in the parentheses indicates that the shortcut connection mechanism propagates the gradient without loss, while other residual gradient passes through a layer with weights indirectly. In this context, 1 is selected to make the residual gradient easier to learn and thus avoid the gradient vanishing.</p>
<p>To better train the regression model, we employed mean squared error (MSE) as the loss function. Given an image <italic>i</italic> and the ground truth leaf count <italic>y</italic><sup><italic>i</italic></sup><sub><italic>gt,c</italic></sub>, the loss function <italic>L</italic><sub><italic>c</italic></sub> is determined by</p>
<disp-formula id="S2.E7">
<label>(7)</label>
<mml:math id="M7">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>m</mml:mi>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msubsup>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>m</italic> is the image number and <italic>y</italic><sup><italic>i</italic></sup><sub><italic>pred,c</italic></sub> denotes the predicted leaf count.</p>
<p>With respect to our regression task, the last fully-connected layer with 1,000 neurons initially used for classification is replaced by a layer with a single neuron, which allows for the output estimation of leaf number. The neuron is to regress the correct leaf numbers given the input images. To obtain the rich prior knowledge, the regression network is pre-trained on ImageNet for parameter initialization, and then fine-tuned on the used datasets. Our regression model is shown in the top row of <xref ref-type="fig" rid="F1">Figure 1</xref>. Note that the combination of segmentation and RGB images extends the input channel from 3 to 4. By extending the channel, an additional binary channel is added to the leaf count regression model to convey pure semantic information of leaf and suppress bias from features in the background of the training images, e.g., the soil, moss, pot, etc., that differ between datasets. At the same time, the RGB channels enable the network to retain the rich local texture and context information that the binary mask fails to capture, thus enhancing the robustness of our model. In addition, our regression model does not require any bounding box or centre point annotation, which can be efficiently applied to deal with more complex scenes.</p>
<p>U-Net remains the preferred choice for the maintenance of fine edge binary segmentation. The design of skip connections greatly enriches the information received by the decoder, and <italic>via</italic> specially trained end-to-end, U-Net performs high-precision segmentation for small training samples. When applied in leaf segmentation, the architecture extracts the edge details, size, and shape diversity in the low-level information and uncovers the discriminative high-level information of the target leaf. This advantage reduces the overall size of the dataset required for training. Furthermore, due to the effective reuse of extracted features and an ability to capture the targets, the architecture achieves an implicit data argumentation and speeds up the convergence for the binary tasks during training.</p>
<p>However, since the leaf dataset (with sub-datasets A1&#x2013;A4) varies in the degree of occlusion, leaf numbers and leaf size, we only combined the same-scale information not previously countered. Designing U-net with different depth for each layer may be an idea but such an approach has not been widely applied. To address this, we adopt U-Net++ (remaining rows of <xref ref-type="fig" rid="F1">Figure 1</xref>) as the feature extractor for segmentation, which extends U-Net with denser cross-layer concatenation and shortens the semantic gap between the encoder and decoder by fusing spatial information from shallow to deep cross layers. The architecture makes full use of contextual features and semantic information from the same dimension, and it captures the detailed features of the target. Moreover, using the pruning scheme basing on the module which receives the best estimation during training, the network is adjustable and customisable. For instance, it is customised to the most suitable size and saves unnecessary storage space. This is equivalent to the maintenance of any useful feature we acquired and the distinctive design for each dataset in one end-to-end network.</p>
</sec>
</sec>
<sec id="S3">
<title>Experiments</title>
<p>We thoroughly assess the effectiveness of our proposed framework on the widely used plant phenotyping dataset including its four sub-datasets (see Section &#x201C;Dataset and Data Pre-processing&#x201D;). We conducted extensive experiments on both plant segmentation and leaf counting, and compared the performance of our method with the state-of-the-art methods for validation. We explored three segmentation architectures using three different backbone networks, i.e., MobileNet, ResNet, and VGGNet on the four sub-datasets, and compared our method with the state-of-the-art leaf segmentation methods. We also performed the experiments to demonstrate the effectiveness of the proposed leaf counting method, comparing it with the state-of-the-art leaf counting methods.</p>
<sec id="S3.SS1">
<title>Dataset and Data Pre-processing</title>
<p>The dataset used in our experiments belongs to the Leaf Segmentation and Counting Challenge (LCC and LSC) held as part of the Computer Vision Problems in Plant Phenotyping (CVPPP 2017) workshop (<xref ref-type="bibr" rid="B11">Giuffrida et al., 2015</xref>). The dataset is divided into training set and testing set, which consists of 810 and 275 top-down view RGB images of either Tobacco or Arabidopsis plants, respectively. Both training and testing images are grouped into four folders, i.e., four sub-datasets which vary from the species and means of collection such as imaging setups and labs. The training sets include 128, 31, 27, 624 images and the testing sets contain 33, 9, 65, 168 images for A1, A2, A3, and A4 respectively. The sub-datasets A1 and A2 include Arabidopsis images collected from growth chamber experiments with different field of views covering many plants and then cropped to a single plant image with the size of approximately 500 &#x00D7; 500 pixels. Sub-dataset A3 contains tobacco images at 2,000 &#x00D7; 2,500 pixels with the field of view chosen to encompass a single plant. Sub-dataset A4 is a subset of another public Arabidopsis dataset. The dataset provides the corresponding annotations in binary segmentation with 1 and 0, respectively, denoting plant and background pixels. All the folders contain the ground truth binary mask used for whole plant segmentation (i.e., semantic segmentation). For the experiment of plant segmentation, we follow the training strategy from <xref ref-type="bibr" rid="B1">Aich and Stavness (2017)</xref>, and also use the combination of all sub-datasets (referred as to <italic>All</italic>) for training to achieve more robust model.</p>
<p>In our work, we addressed two problems caused by a dataset as follows: (1) Deep learning based methods require a huge amount of training samples while the availability of the dataset of plant leaf with annotations is limited, causing data scarcity; and (2) Small and overlapping leaf instances brought a challenge for plant segmentation and leaf counting. Data augmentation is a widely used technique in deep learning to increase the number of samples and provide more diversity to the deep neural networks. In this context, we also employed data augmentation to address the above two problems.</p>
<p>Moreover, we first reshaped the size of training images to 480 &#x00D7; 480 pixels and normalized. Following the resize operation, we conducted the following scheme for data augmentation: (1) Random-Rotate with an interval of 90 to increase the network invariance to slight angular changes; (2) Flip: horizontal, vertical, and horizontal+ vertical; (3) Resize the images to increase the network invariance to different image resolutions; (4) Gamma transform to extend the data by changing the image greyscale; (5) Random-Brightness: the clarity of object depends on scene lighting and camera sensitivity, thus random changing the image brightness improves the illumination invariance of the network; (6) Random change in the contrast range to increase the network invariance to shadows and improve the network performance in low light conditions; (7) Hue Saturation Brightness (HSV): changes in colour channels, degree of lightness or darkness of a colour; and (8) Normalise a characteristic linear transformation which scales a specific range of data values retaining the original data distribution. Selected augmentation processes are shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Augmentation samples for training the segmentation network to avoid the risk of over-fitting.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g002.tif"/>
</fig>
</sec>
<sec id="S3.SS2">
<title>Implementation Details and Evaluation Protocol</title>
<p>All images from the training set are randomly split into two sets for training and validation with the split ratio of 0.8 and 0.2, respectively. Images from the testing set are used for evaluating the segmentation performance. We used the validation set to verify the hyper-parameters (see <xref ref-type="table" rid="T1">Table 1</xref>) during the training of the initial experiments.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Hyper-parameters used for training.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left">Epochs</td>
<td valign="top" align="left">100</td>
</tr>
<tr>
<td valign="top" align="left">Batch-size</td>
<td valign="top" align="left">4</td>
</tr>
<tr>
<td valign="top" align="left">Optimizer</td>
<td valign="top" align="left">Adam</td>
</tr>
<tr>
<td valign="top" align="left">Learning-rate</td>
<td valign="top" align="left">1e-3</td>
</tr>
<tr>
<td valign="top" align="left">Weight-decay</td>
<td valign="top" align="left">1e-4</td>
</tr>
<tr>
<td valign="top" align="left">Factor</td>
<td valign="top" align="left">0.1</td>
</tr>
</tbody>
</table></table-wrap>
<sec id="S3.SS2.SSS1">
<title>Network Parameter Setting</title>
<p>All our experiments are performed on the PyTorch platform with NVIDIA 2080Ti GPU. We used the data augmentation to increase the number of samples as in Section &#x201C;Dataset and Data Pre-processing.&#x201D; This module contributes to preventing over-fitting for the relatively small plant datasets and ensure the model produces promising results when segmenting on new data <italic>via</italic> learning multiple variations (<xref ref-type="bibr" rid="B15">Holmberg, 2020</xref>). The binary mask is transformed the same way, to maintain the consistency between images and annotations (except for the transformation regarding colours).</p>
<p>We randomly sampled four samples to form a mini-batch with batch size of four to guarantee the convergence of training. Adam is adopted as the optimizer for its fast convergence rate to train the model for a total of 100 epochs, where the results remain stable with no further improvement. The weight decay factor is set to 0.0001 and the learning rate is constantly set as 0.001.</p>
</sec>
<sec id="S3.SS2.SSS2">
<title>Metrics for Segmentation</title>
<p>We employed the intersection of union (IoU) as the evaluation metric, which is widely used in segmentation. IoU is used to determine the spatial overlap between the segmented leaf region and its ground truth, i.e.,</p>
<disp-formula id="S3.E8">
<label>(8)</label>
<mml:math id="M8">
<mml:mrow>
<mml:mtext>IoU</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>%</mml:mo>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mtext>gt</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mpadded>
<mml:mo rspace="5.8pt">&#x2229;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mtext>pred</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mtext>gt</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo rspace="5.8pt">|</mml:mo>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mtext>pred</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>P</italic><sub><italic>gt</italic></sub> and <italic>P</italic><sub><italic>pred</italic></sub>, respectively, denote the ground truth mask and the prediction mask. Due to the problem of class imbalance between positive and negative samples, it is insufficient to use accuracy as evaluation metric. For better evaluation, we introduced two more metrics: Precision and Recall. Precision is used to determine the portion of segmented leaf region pixels that matches with the ground truth, i.e.,</p>
<disp-formula id="S3.E9">
<label>(9)</label>
<mml:math id="M9">
<mml:mrow>
<mml:mtext>Precision</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>%</mml:mo>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>P</mml:mi>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mpadded>
<mml:mo rspace="5.8pt">&#x00D7;</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Recall is used to determine the portion of ground-truth pixels present in the segmented leaf region, i.e.,</p>
<disp-formula id="S3.E10">
<label>(10)</label>
<mml:math id="M10">
<mml:mrow>
<mml:mtext>Recall</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>%</mml:mo>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>P</mml:mi>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mpadded>
<mml:mo rspace="5.8pt">&#x00D7;</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where True Positive (TP), False Negative (FN), and False Positive (FP) respectively denote the number of leaf region pixels correctly identified, the number of leaf region pixels unidentified, and the number of leaf region pixels falsely identified.</p>
</sec>
<sec id="S3.SS2.SSS3">
<title>Metrics for Count</title>
<p>To evaluate how good a leaf count method is in estimating the correct number of leaves, we employed the regression metrics: Difference in Count (DiC), Absolute Difference in Count (ADiC), and mean squared error (MSE) calculated as follows:</p>
<disp-formula id="S3.E11">
<label>(11)</label>
<mml:math id="M11">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mtext>DiC</mml:mtext>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>m</mml:mi>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>gt</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>pred</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S3.E12">
<label>(12)</label>
<mml:math id="M12">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mtext>ADiC</mml:mtext>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>m</mml:mi>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>gt</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>pred</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S3.E13">
<label>(13)</label>
<mml:math id="M13">
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mtext>MSE</mml:mtext>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>m</mml:mi>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>gt</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>pred</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
</sec>
</sec>
<sec id="S3.SS3">
<title>Experimental Analysis</title>
<sec id="S3.SS3.SSS1">
<title>Segmentation Analysis</title>
<p>In the first experiment, we evaluated the effectiveness of our segmentation model on plant images by using different segmentation architectures and backbones for comparison. FCN8, PSPNet, U-Net are selected as the basic encoder and decoder architectures, where ResNet and VGG are used as backbones due to its good ability of depicting 2D images. The comparative segmentation performance in terms of IoU on the combination of all sub-datasets are provided in <xref ref-type="fig" rid="F3">Figure 3</xref>. It is evident from <xref ref-type="fig" rid="F3">Figure 3</xref> that the segmentation results generated by our segmentation model outperforms the other architectures. Among different models, using VGG as backbone performs constantly better than using ResNet as backbone. To evaluate the performance of dealing with a variety of scenes, we evaluated our model on the four individual sub-datasets and the results are shown in <xref ref-type="table" rid="T2">Table 2</xref>. The U-Net++ performs significantly better than the state-of-the-art segmentation methods. For better illustration, the segmentation results for images in sub- dataset A1 using different models together with ground truth are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. Although all the three semantic segmentation methods can obtain clear segmentation results on A1, the U-Net++ retains the boundary and detail information. For the relative scarce sub-dataset A3 which only contains 27 tobacco images, the proposed method still shows a stable IoU. For each sub-dataset, the network generates segmentation results that are almost consistent with the corresponding binary template, from both quantitative and qualitative standpoints.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Results of segmentation using Resnet50 and VGG16 as backbone in FCN, PSPnet, U-Net, and U-Net++ architectures.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g003.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Segmentation results on each sub-dataset and their com- bination using different basic architectures.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">IoU (%)</td>
<td valign="top" align="center">All</td>
<td valign="top" align="center">A1</td>
<td valign="top" align="center">A2</td>
<td valign="top" align="center">A3</td>
<td valign="top" align="center">A4</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">FCN</td>
<td valign="top" align="center">93.95</td>
<td valign="top" align="center">93.45</td>
<td valign="top" align="center">89.17</td>
<td valign="top" align="center">88.51</td>
<td valign="top" align="center">92.23</td>
</tr>
<tr>
<td valign="top" align="left">PSPNet</td>
<td valign="top" align="center">90.17</td>
<td valign="top" align="center">94.34</td>
<td valign="top" align="center">90.55</td>
<td valign="top" align="center">91.19</td>
<td valign="top" align="center">93.83</td>
</tr>
<tr>
<td valign="top" align="left">U-Net</td>
<td valign="top" align="center">98.32</td>
<td valign="top" align="center">98.51</td>
<td valign="top" align="center">97.76</td>
<td valign="top" align="center">94.72</td>
<td valign="top" align="center">97.17</td>
</tr>
<tr>
<td valign="top" align="left">U-Net++</td>
<td valign="top" align="center">99.11</td>
<td valign="top" align="center">98.29</td>
<td valign="top" align="center">97.98</td>
<td valign="top" align="center">95.90</td>
<td valign="top" align="center">97.23</td>
</tr>
</tbody>
</table></table-wrap>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Comparing segmentation results on the same RGB image.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g004.tif"/>
</fig>
<p>During the training for segmentation, the sigmoid function produces outputs in the range [0 &#x2026; 1]. While calculating the loss, greater weight is assigned for the boundary pixels. The weight map is then calculated using</p>
<disp-formula id="S3.E14">
<label>(14)</label>
<mml:math id="M14">
<mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>&#x22C5;</mml:mo>
<mml:mtext>exp</mml:mtext>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mtext mathvariant="bold">x</mml:mtext>
<mml:mo>&#x00A0;</mml:mo>
<mml:mo rspace="5.8pt">+</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>w</italic><sub><italic>c</italic></sub>(x) is the category weight based on the frequency of occurrence of each category in the training dataset; <italic>d</italic><sub>1</sub>(x) represents the distance between the object pixel and the nearest boundary. <italic>d</italic><sub>2</sub>(x) represents the same distance for the second nearest boundary. In our work, we set the threshold &#x03C3; to 0.5 to obtain the segmentation weight map. The segmentation results using our method on different sub-datasets are shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. Our model generates the segmentation results that are almost coincident visually with the ground-truth mask for each sub-dataset. For A3 sub-dataset which only contains 27 tobacco images with small leaf area, our method still shows a stable segmentation result. The results show our method effectively addresses segmentation under various scenes, i.e., with occlusions, small leaf area, and large leaf area, demonstrating good robustness.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Segmentation result for each sub-dataset, with the corresponding IoU provided at the right.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g005.tif"/>
</fig>
<p>We also compared the convergence rate of different segmentation models. The curves of the precision, recall, training cross entropy (CE) loss, and IoU are shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. The figure shows that all four networks selecting VGG16 as the encoder for feature extraction achieve good IoU scores consistently. In addition, <xref ref-type="fig" rid="F7">Figure 7</xref> visualises the feature extraction process of our method using UNet++ with VGG from the early to late epochs. The process of feature extraction is smoother and faster to reach the convergence, which shows VGG can capture the meaningful representations for leaf images.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Convergence curves for accuracy, loss, and IoU score on the validation set during the training process for comparison in terms of accuracy and convergence rate.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g006.tif"/>
</fig>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Visualization for the feature extraction process of our method, arranged by time series from the early to late epochs. The first to third line images respectively show the predicted images, ground truth images and transformed RGB images.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g007.tif"/>
</fig>
<p>We compared the proposed segmentation model with the other state-of-the-art method that performed the experiment on plant (foreground) segmentation, i.e., SRGB (<xref ref-type="bibr" rid="B1">Aich and Stavness, 2017</xref>) using three metrics, i.e., Precision, Recall, and IoU and the results are shown in <xref ref-type="table" rid="T3">Table 3</xref>. Our method outperforms the SRGB method on two metrics, achieving the high performance on IoU. The results suggest that our approach is very effective for plant segmentation task in plant phenotyping.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Segmentation results on each sub-dataset and their combination using different basic architectures.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center" colspan="5">SRGB<hr/></td>
<td valign="top" align="center" colspan="5">Ours<hr/></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">All</td>
<td valign="top" align="center">A1</td>
<td valign="top" align="center">A2</td>
<td valign="top" align="center">A3</td>
<td valign="top" align="center">A4</td>
<td valign="top" align="center">All</td>
<td valign="top" align="center">A1</td>
<td valign="top" align="center">A2</td>
<td valign="top" align="center">A3</td>
<td valign="top" align="center">A4</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Precision</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.80</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
</tr>
<tr>
<td valign="top" align="left">Recall</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.99</td>
</tr>
<tr>
<td valign="top" align="left">IoU</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.98</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
<sec id="S3.SS3.SSS2">
<title>Leaf Count Evaluations</title>
<p>In the second experiment, we evaluated the effectiveness of the proposed leaf counting method using segmented binary mask (referred as RGB+SBM). During the experiment, the number of input channels must be consistent with the input size of the backbone models, i.e., 3 channels. In this way, when a binary image with single channel is fed into the model, the values of the single channel are extended to three channels by duplication, forming an image with three channels. The resulting three-channel images are mixed with the RGB image samples to increase the number of training samples, facilitating the stability of leaf counting. To validate the effectiveness of our counting model for leaf counting, we adopted different backbones for our leaf counting task, e.g., MobileNet, VGGNet, InceptionNet, and ResNet, and report the results in <xref ref-type="table" rid="T4">Table 4</xref>. Moreover, to further explore the potential benefit of the auxiliary binary mask, we conducted an ablation experiment on with/without using the binary channel, and the result is also shown in <xref ref-type="table" rid="T4">Table 4</xref>. In <xref ref-type="table" rid="T4">Table 4</xref>, RGB denotes the method without using the binary mask, while RGB+SBM denotes that our method using the auxiliary binary mask. It is observed from the table that the count model using the ResNet50 backbone performs the best among the backbones. The binary mask increases the count performance in all metrics, where the MSE drops from 0.89 to 0.04, DiC from 0.02 to 0.01, and ADiC from 0.60 to 0.36. These results validate our assumption that binary mask improves the accuracy and robustness for the leaf count model, due to its capability to deal with background interferences.</p>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Counting results using different backbones with or without the auxiliary binary mask on CVPPP 2017 dataset (Bold values denote the best performance).</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Metric</td>
<td valign="top" align="center">DiC</td>
<td valign="top" align="center">ADiC</td>
<td valign="top" align="center">MSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="4">&#x2006;<bold>Mobilenet</bold></td>
</tr>
<tr>
<td valign="top" align="left">RGB</td>
<td valign="top" align="center">&#x2013;0.30</td>
<td valign="top" align="center">0.66</td>
<td valign="top" align="center">0.98</td>
</tr>
<tr>
<td valign="top" align="left">RGB+SBM</td>
<td valign="top" align="center">0.13</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.64</td>
</tr>
<tr>
<td valign="top" align="left" colspan="4">&#x2006;<bold>InceptionNet</bold></td>
</tr>
<tr>
<td valign="top" align="left">Rgb</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.61</td>
<td valign="top" align="center">1.20</td>
</tr>
<tr>
<td valign="top" align="left">RGB+SBM</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.54</td>
</tr>
<tr>
<td valign="top" align="left" colspan="4">&#x2006;<bold>VGGNet</bold></td>
</tr>
<tr>
<td valign="top" align="left">RGB</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.79</td>
<td valign="top" align="center">1.44</td>
</tr>
<tr>
<td valign="top" align="left">RGB+SBM</td>
<td valign="top" align="center">&#x2013;0.12</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">0.44</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Resnet50</bold></td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">RGB</td>
<td valign="top" align="center">&#x2013;0.12</td>
<td valign="top" align="center">0.60</td>
<td valign="top" align="center">0.89</td>
</tr>
<tr>
<td valign="top" align="left">RGB+SBM</td>
<td valign="top" align="center"><bold>0.11</bold></td>
<td valign="top" align="center"><bold>0.36</bold></td>
<td valign="top" align="center">0.42</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>For DiC, ADiC, and MSE, a lower value is better.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>We used the scatter diagram to visually illustrate the correlation between the estimated leaf numbers and their ground truth, and the results are shown in <xref ref-type="fig" rid="F8">Figure 8</xref>, which is also for the evaluation of our regression model. The higher overlap between the scatter plots of estimation and the ground truth indicates a better agreement. <xref ref-type="fig" rid="F8">Figure 8</xref> shows that the binary mask significantly enhances the agreement between the ground truth and the estimation, as the error distribution in leaf count is constantly confined within smaller region. If directly doubling the number of the input samples by simple copy, referred as RGB &#x002A;2, we find that the performance is almost the same as with the mixture of RGB and binary mask images. In the experiments, the time cost using double RGB images is higher than using the combination of RGB and binary mask images. Thus, we conclude that using the auxiliary binary mask to guide the leaf counting is a simple but effective way for improving the performance of counting.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>Comparison between the coefficient of determination in the implementation of scatter graphics, where <bold>(A)</bold> denotes using only RGB image, <bold>(B)</bold> denotes using the mixture of RGB and segmented binary mask, and <bold>(C)</bold> denotes using the double RGB images by simple copy.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-844522-g008.tif"/>
</fig>
<p>In addition, we reported the quantitative comparison of our leaf counting method with state-of-the-art methods i.e., GLC (<xref ref-type="bibr" rid="B11">Giuffrida et al., 2015</xref>), IPK (Pape and Klukas, 2015), Nottingham (<xref ref-type="bibr" rid="B33">Scharr et al., 2016</xref>), MSU (<xref ref-type="bibr" rid="B33">Scharr et al., 2016</xref>), and Wageningen (<xref ref-type="bibr" rid="B33">Scharr et al., 2016</xref>), as shown in <xref ref-type="table" rid="T5">Table 5</xref>. For fair comparison, we used A1, A2, A3 from testing set for testing the counting performance. Overall, the proposed leaf counting model using segmented binary mask achieves the best performance with lower values in the metrics of DiC, ADiC, and MSE. This shows the proposed counting model estimates the number of leaves with adequate accuracy and stability.</p>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>Comparative evaluation of the proposed counting model with state-of-the-art methods.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center">DiC</td>
<td valign="top" align="center">ADiC</td>
<td valign="top" align="center">MSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IPK</td>
<td valign="top" align="center">&#x2013;1.9 (2.7)</td>
<td valign="top" align="center">2.4 (2.1)</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">GLC</td>
<td valign="top" align="center">&#x2013;0.51 (2.02)</td>
<td valign="top" align="center">1.43 (1.51)</td>
<td valign="top" align="center">4.31</td>
</tr>
<tr>
<td valign="top" align="left">Nottingham</td>
<td valign="top" align="center">&#x2013;2.4 (2.8)</td>
<td valign="top" align="center">2.9 (2.3)</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">MSU</td>
<td valign="top" align="center">&#x2013;2.3(1.8)</td>
<td valign="top" align="center">2.4 (1.7)</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">Wageningen</td>
<td valign="top" align="center">1.5 (4.4)</td>
<td valign="top" align="center">2.5 (3.9)</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">Proposed RGB+SBM</td>
<td valign="top" align="center">0.11 (0.98)&#x2013;</td>
<td valign="top" align="center">0.36 (0.93)</td>
<td valign="top" align="center">0.42</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
</sec>
</sec>
<sec id="S4" sec-type="conclusion">
<title>Conclusion</title>
<p>In this study, we focus on dealing with two fundamental tasks in plant phenotyping, i.e., plant segmentation and leaf counting, and propose a two-stream deep learning framework for automatic segmenting and counting leaves with various size and shape from two-dimensional plant images. In the first stream, a multi-scale segmentation model using spatial pyramid is developed to extract the whole plant in different size and shape, where the fine-grained details of leaves are captured using deep feature extractor. In the second stream, a regression counting model is proposed to estimate the number of leaves without any pre-detection, where the auxiliary binary mask is introduced to enhance the counting performance by effectively alleviating the influence of complex background. Extensive experiments on a publicly available plant phenotyping dataset show that the proposed framework achieves a promising performance both in the task of plant segmentation and leaf counting, providing a reference for the automatic analysis of plant. Future work will focus in increasing the robustness of the tasks of segmentation and the counting to deal with varying environments.</p>
</sec>
<sec id="S5" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://www.plant-phenotyping.org/CVPPP2017">https://www.plant-phenotyping.org/CVPPP2017</ext-link>.</p>
</sec>
<sec id="S6">
<title>Author Contributions</title>
<p>XF contributed to writing the draft and designing the ideas. RZ contributed to conducting experiments. TT contributed to editing the draft. SD and QY contributed to algorithm supervision. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aich</surname> <given-names>S.</given-names></name> <name><surname>Stavness</surname> <given-names>I.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Leaf counting with deep convolutional and deconvolutional networks</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE International Conference on Computer Vision Workshops</italic></source>, <publisher-loc>Washington, DC</publisher-loc>, <fpage>2080</fpage>&#x2013;<lpage>2089</lpage>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>J.</given-names></name> <name><surname>Dee</surname> <given-names>H. M.</given-names></name></person-group> (<year>2019</year>). <article-title>Leaf segmentation through the classification of edges.</article-title> <source><italic>arXiv [Preprint]</italic></source> <pub-id pub-id-type="doi">10.48550/arXiv.1904.03124</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>L.-C.</given-names></name> <name><surname>Papandreou</surname> <given-names>G.</given-names></name> <name><surname>Kokkinos</surname> <given-names>I.</given-names></name> <name><surname>Murphy</surname> <given-names>K.</given-names></name> <name><surname>Yuille</surname> <given-names>A. L.</given-names></name></person-group> (<year>2017</year>). <article-title>Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intell.</italic></source> <volume>40</volume> <fpage>834</fpage>&#x2013;<lpage>848</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2017.2699184</pub-id> <pub-id pub-id-type="pmid">28463186</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choudhury</surname> <given-names>S. D.</given-names></name> <name><surname>Samal</surname> <given-names>A.</given-names></name> <name><surname>Awada</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Leveraging image analysis for high-throughput plant phenotyping.</article-title> <source><italic>Front. Plant Sci.</italic></source> <volume>10</volume>:<fpage>508</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2019.00508</pub-id> <pub-id pub-id-type="pmid">31068958</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>da Silva</surname> <given-names>N. B.</given-names></name> <name><surname>Goncalves</surname> <given-names>W. N.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Regression in convolutional neural networks applied to plant leaf counting</article-title>,&#x201D; in <source><italic>Proceedings of the Anais do XV Workshop de Vis&#x00E3;o Computacional, SBC</italic></source>, <publisher-loc>S&#x00E3;o Bernardo do Campo</publisher-loc>, <fpage>49</fpage>&#x2013;<lpage>54</lpage>.</citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dhaka</surname> <given-names>V. S.</given-names></name> <name><surname>Meena</surname> <given-names>S. V.</given-names></name> <name><surname>Rani</surname> <given-names>G.</given-names></name> <name><surname>Sinwar</surname> <given-names>D.</given-names></name> <name><surname>Ijaz</surname> <given-names>K. M. F.</given-names></name> <name><surname>Wo&#x017A;niak</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>A survey of deep convolutional neural networks applied for prediction of plant leaf diseases.</article-title> <source><italic>Sensors</italic></source> <volume>21</volume>:<fpage>4749</fpage>. <pub-id pub-id-type="doi">10.3390/s21144749</pub-id> <pub-id pub-id-type="pmid">34300489</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dobrescu</surname> <given-names>A.</given-names></name> <name><surname>Giuffrida</surname> <given-names>M. V.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Doing more with less: a multitask deep learning approach in plant phenotyping.</article-title> <source><italic>Front. Plant Sci.</italic></source> <volume>11</volume>:<fpage>141</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2020.00141</pub-id> <pub-id pub-id-type="pmid">32256503</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dobrescu</surname> <given-names>A.</given-names></name> <name><surname>Valerio Giuffrida</surname> <given-names>M.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Leveraging multiple datasets for deep leaf counting</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE International Conference on Computer Vision Workshops</italic></source>, <publisher-loc>Venice</publisher-loc>, <fpage>2072</fpage>&#x2013;<lpage>2079</lpage>.</citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Girshick</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Fast r-cnn</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE International Conference on Computer Vision</italic></source>, <publisher-loc>Santiago</publisher-loc>, <fpage>1440</fpage>&#x2013;<lpage>1448</lpage>.</citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giuffrida</surname> <given-names>M. V.</given-names></name> <name><surname>Doerner</surname> <given-names>P.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2018</year>). <article-title>Pheno-deep counter: a unified and versatile deep learning architecture for leaf counting.</article-title> <source><italic>Plant J.</italic></source> <volume>96</volume> <fpage>880</fpage>&#x2013;<lpage>890</lpage>. <pub-id pub-id-type="doi">10.1111/tpj.14064</pub-id> <pub-id pub-id-type="pmid">30101442</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giuffrida</surname> <given-names>M. V.</given-names></name> <name><surname>Minervini</surname> <given-names>M.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Learning to count leaves in rosette plants</article-title>,&#x201D; in <source><italic>Proceedings of the Computer Vision Problems in Plant Phenotyping (CVPPP)</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Tsaftaris</surname> <given-names>H. S. S. A.</given-names></name> <name><surname>Pridmore</surname> <given-names>T.</given-names></name></person-group> (<publisher-loc>Swansea</publisher-loc>: <publisher-name>BMVA Press</publisher-name>), <fpage>1.1</fpage>&#x2013;<lpage>1.13</lpage>. <pub-id pub-id-type="doi">10.5244/C.29.CVPPP.1</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gomes</surname> <given-names>D. P. S.</given-names></name> <name><surname>Zheng</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Leaf segmentation and counting with deep learning: on model certainty, test-time augmentation, trade-offs.</article-title> <source><italic>arXiv [Preprint]</italic></source> <pub-id pub-id-type="doi">10.48550/arXiv.2012.11486</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Gkioxari</surname> <given-names>G.</given-names></name> <name><surname>Doll&#x00E1;r</surname> <given-names>P.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Mask r-cnn</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE International Conference on Computer Vision</italic></source>, <publisher-loc>Venice</publisher-loc>, <fpage>2961</fpage>&#x2013;<lpage>2969</lpage>.</citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Deep residual learning for image recognition</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</italic></source>, <publisher-loc>Las Vegas, NV</publisher-loc>, <fpage>770</fpage>&#x2013;<lpage>778</lpage>.</citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holmberg</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <source><italic>Targeting the Zebrafish Eye using Deep Learning-Based Image Segmentation.</italic></source> <publisher-loc>Uppsala</publisher-loc>: <publisher-name>Uppsala University Publications</publisher-name>.</citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itzhaky</surname> <given-names>Y.</given-names></name> <name><surname>Farjon</surname> <given-names>G.</given-names></name> <name><surname>Khoroshevsky</surname> <given-names>F.</given-names></name> <name><surname>Shpigler</surname> <given-names>A.</given-names></name> <name><surname>Bar-Hillel</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <source><italic>Leaf Counting: Multiple Scale Regression and Detection using Deep CNNS.</italic></source> <publisher-loc>Beer Sheva</publisher-loc>: <publisher-name>Ben Gurion University of the Negev</publisher-name>, <fpage>328</fpage>.</citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kong</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Ren</surname> <given-names>Y.</given-names></name> <name><surname>Genchev</surname> <given-names>G. Z.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Automated yeast cells segmentation and counting using a parallel u-net based two-stage framework.</article-title> <source><italic>OSA Continuum</italic></source> <volume>3</volume> <fpage>982</fpage>&#x2013;<lpage>992</lpage>. <pub-id pub-id-type="doi">10.1364/osac.388082</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname> <given-names>J. P.</given-names></name> <name><surname>Domnic</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Image based leaf segmentation and counting in rosette plants.</article-title> <source><italic>Inform. Process. Agric.</italic></source> <volume>6</volume> <fpage>233</fpage>&#x2013;<lpage>246</lpage>. <pub-id pub-id-type="doi">10.1016/j.inpa.2018.09.005</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname> <given-names>J. P.</given-names></name> <name><surname>Domnic</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Rosette plant segmentation with leaf count using orthogonal transform and deep convolutional neural network.</article-title> <source><italic>Mach. Vis. Appl.</italic></source> <volume>31</volume> <fpage>1</fpage>&#x2013;<lpage>14</lpage>.</citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kundu</surname> <given-names>N.</given-names></name> <name><surname>Rani</surname> <given-names>G.</given-names></name> <name><surname>Dhaka</surname> <given-names>V. S.</given-names></name> <name><surname>Gupta</surname> <given-names>K.</given-names></name> <name><surname>Nayak</surname> <given-names>S. C.</given-names></name> <name><surname>Verma</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>IoT and interpretable machine learning based framework for disease prediction in pearl millet.</article-title> <source><italic>Sensors</italic></source> <volume>21</volume>:<fpage>5386</fpage>. <pub-id pub-id-type="doi">10.3390/s21165386</pub-id> <pub-id pub-id-type="pmid">34450827</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuznichov</surname> <given-names>D.</given-names></name> <name><surname>Zvirin</surname> <given-names>A.</given-names></name> <name><surname>Honen</surname> <given-names>Y.</given-names></name> <name><surname>Kimmel</surname> <given-names>R.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Data augmentation for leaf segmentation and counting tasks in rosette plants</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops</italic></source>, <publisher-loc>Long Beach, CA</publisher-loc>.</citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lecun</surname> <given-names>Y.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning.</article-title> <source><italic>Nature</italic></source> <volume>521</volume> <fpage>436</fpage>&#x2013;<lpage>444</lpage>.</citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>Z.</given-names></name> <name><surname>Guo</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Sorghum panicle detection and counting using unmanned aerial system images and deep learning.</article-title> <source><italic>Front. Plant Sci.</italic></source> <volume>11</volume>:<fpage>534853</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2020.534853</pub-id> <pub-id pub-id-type="pmid">32983210</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Long</surname> <given-names>J.</given-names></name> <name><surname>Shelhamer</surname> <given-names>E.</given-names></name> <name><surname>Darrell</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Fully convolutional networks for semantic segmentation</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</italic></source>, <publisher-loc>Boston, MA</publisher-loc>, <fpage>3431</fpage>&#x2013;<lpage>3440</lpage>.</citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>H.</given-names></name> <name><surname>Cao</surname> <given-names>Z.</given-names></name></person-group> (<year>2020</year>). <article-title>Tasselnetv2+: a fast implementation for high-throughput plant counting from high-resolution rgb imagery.</article-title> <source><italic>Front. Plant Sci.</italic></source> <volume>11</volume>:<fpage>541960</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2020.541960</pub-id> <pub-id pub-id-type="pmid">33365037</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Minervini</surname> <given-names>M.</given-names></name> <name><surname>Scharr</surname> <given-names>H.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Image analysis: the new bottleneck in plant phenotyping [applications corner].</article-title> <source><italic>IEEE Signal Process. Mag.</italic></source> <volume>32</volume> <fpage>126</fpage>&#x2013;<lpage>131</lpage>. <pub-id pub-id-type="doi">10.1109/msp.2015.2405111</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mishra</surname> <given-names>P.</given-names></name> <name><surname>Sadeh</surname> <given-names>R.</given-names></name> <name><surname>Bino</surname> <given-names>E.</given-names></name> <name><surname>Polder</surname> <given-names>G.</given-names></name> <name><surname>Boer</surname> <given-names>M. P.</given-names></name> <name><surname>Rutledge</surname> <given-names>D. N.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Complementary chemometrics and deep learning for semantic segmentation of tall and wide visible and near-infrared spectral images of plants.</article-title> <source><italic>Comput. Electron. Agric.</italic></source> <volume>186</volume>:<fpage>106226</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2021.106226</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Montero</surname> <given-names>F.</given-names></name> <name><surname>De Juan</surname> <given-names>J.</given-names></name> <name><surname>Cuesta</surname> <given-names>A.</given-names></name> <name><surname>Brasa</surname> <given-names>A.</given-names></name></person-group> (<year>2000</year>). <article-title>Nondestructive methods to estimate leaf area in vitis vinifera l.</article-title> <source><italic>Hort Sci.</italic></source> <volume>35</volume> <fpage>696</fpage>&#x2013;<lpage>698</lpage>. <pub-id pub-id-type="doi">10.1093/aob/mcf059</pub-id> <pub-id pub-id-type="pmid">12096800</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rahnemoonfar</surname> <given-names>M.</given-names></name> <name><surname>Sheppard</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Deep count: fruit counting based on deep simulated learning.</article-title> <source><italic>Sensors</italic></source> <volume>17</volume>:<fpage>905</fpage>. <pub-id pub-id-type="doi">10.3390/s17040905</pub-id> <pub-id pub-id-type="pmid">28425947</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ren</surname> <given-names>M.</given-names></name> <name><surname>Zemel</surname> <given-names>R. S.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>End-to-end instance segmentation with recurrent attention</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</italic></source>, <publisher-loc>Honolulu, HI</publisher-loc>, <fpage>6656</fpage>&#x2013;<lpage>6664</lpage>.</citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romera-Paredes</surname> <given-names>B.</given-names></name> <name><surname>Torr</surname> <given-names>P. H. S.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Recurrent instance segmentation</article-title>,&#x201D; in <source><italic>Proceedings of the European Conference on Computer Vision</italic></source>, (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>312</fpage>&#x2013;<lpage>329</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-46466-4_19</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ronneberger</surname> <given-names>O.</given-names></name> <name><surname>Fischer</surname> <given-names>P.</given-names></name> <name><surname>Brox</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>U-net: convolutional networks for biomedical image segmentation</article-title>,&#x201D; in <source><italic>International Conference on Medical Image Computing and Computer-Assisted Intervention</italic></source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>234</fpage>&#x2013;<lpage>241</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-24574-4_28</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scharr</surname> <given-names>H.</given-names></name> <name><surname>Minervini</surname> <given-names>M.</given-names></name> <name><surname>French</surname> <given-names>A.</given-names></name> <name><surname>Klukas</surname> <given-names>P. C.</given-names></name> <name><surname>Kramer</surname> <given-names>D. M.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Leaf segmentation in plant phenotyping: a collation study.</article-title> <source><italic>Mach. Vis. Appl.</italic></source> <volume>27</volume> <fpage>585</fpage>&#x2013;<lpage>606</lpage>. <pub-id pub-id-type="doi">10.1007/s00138-015-0737-3</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siebner</surname> <given-names>H.</given-names></name> <name><surname>Callicott</surname> <given-names>J.</given-names></name> <name><surname>Sommer</surname> <given-names>T.</given-names></name> <name><surname>Mattay</surname> <given-names>V.</given-names></name></person-group> (<year>2009</year>). <article-title>From the genome to the phenome and back: linking genes with human brain function and structure using genetically informed neuroimaging.</article-title> <source><italic>Neuroscience</italic></source> <volume>164</volume> <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroscience.2009.09.009</pub-id> <pub-id pub-id-type="pmid">19751805</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tassis</surname> <given-names>L. M.</given-names></name> <name><surname>de Souza</surname> <given-names>J. E. T.</given-names></name> <name><surname>Krohling</surname> <given-names>R. A.</given-names></name></person-group> (<year>2021</year>). <article-title>A deep learning approach combining instance and semantic segmentation to identify diseases and pests of coffee leaves from in-field images.</article-title> <source><italic>Comput. Electron. Agric.</italic></source> <volume>186</volume>:<fpage>106191</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2021.106191</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name> <name><surname>Minervini</surname> <given-names>M.</given-names></name> <name><surname>Scharr</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). <article-title>Machine learning for plant phenotyping needs image processing.</article-title> <source><italic>Trends Plant Sci.</italic></source> <volume>21</volume> <fpage>989</fpage>&#x2013;<lpage>991</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2016.10.002</pub-id> <pub-id pub-id-type="pmid">27810146</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ubbens</surname> <given-names>J.</given-names></name> <name><surname>Cieslak</surname> <given-names>M.</given-names></name> <name><surname>Prusinkiewicz</surname> <given-names>P.</given-names></name> <name><surname>Stavness</surname> <given-names>I.</given-names></name></person-group> (<year>2018</year>). <article-title>The use of plant models in deep learning: an application to leaf counting in rosette plants.</article-title> <source><italic>Plant Methods</italic></source> <volume>14</volume> <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1186/s13007-018-0273-z</pub-id> <pub-id pub-id-type="pmid">29375647</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname> <given-names>D.</given-names></name> <name><surname>Moghadam</surname> <given-names>P.</given-names></name> <name><surname>Hudson</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep leaf segmentation using synthetic data.</article-title> <source><italic>arXiv [Preprint]</italic></source> <pub-id pub-id-type="doi">10.48550/arXiv.1807.10931</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>W.</given-names></name> <name><surname>Feng</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Doonan</surname> <given-names>J. H.</given-names></name> <name><surname>Batchelor</surname> <given-names>W. D.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives.</article-title> <source><italic>Mol. Plant</italic></source> <volume>13</volume> <fpage>187</fpage>&#x2013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1016/j.molp.2020.01.008</pub-id> <pub-id pub-id-type="pmid">31981735</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Shi</surname> <given-names>J.</given-names></name> <name><surname>Qi</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Jia</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Pyramid scene parsing network</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</italic></source>, <publisher-loc>San Juan, PR</publisher-loc>, <fpage>2881</fpage>&#x2013;<lpage>2890</lpage>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Siddiquee</surname> <given-names>M. M. R.</given-names></name> <name><surname>Tajbakhsh</surname> <given-names>N.</given-names></name> <name><surname>Liang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Unet++: a nested u-net architecture for medical image segmentation</article-title>,&#x201D; in <source><italic>Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Maier-Hein</surname> <given-names>L.</given-names></name> <name><surname>Syeda-Mahmood</surname> <given-names>T.</given-names></name> <name><surname>Taylor</surname> <given-names>Z.</given-names></name> <name><surname>Lu</surname> <given-names>Z.</given-names></name> <name><surname>Stoyanov</surname> <given-names>D.</given-names></name> <name><surname>Madabhushi</surname> <given-names>A.</given-names></name><etal/></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>3</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-00889-5_1</pub-id> <pub-id pub-id-type="pmid">32613207</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Aoun</surname> <given-names>M.</given-names></name> <name><surname>Krijn</surname> <given-names>M.</given-names></name> <name><surname>Vanschoren</surname> <given-names>J.</given-names></name> <name><surname>Campus</surname> <given-names>H. T.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Data augmentation using conditional generative adversarial networks for leaf counting in Arabidopsis plants</article-title>,&#x201D; in <source><italic>Proceedings of the The British Machine Vision Conference (BMVC)</italic></source>, <publisher-loc>Newcastle</publisher-loc>, <fpage>324</fpage>.</citation></ref>
</ref-list>
</back>
</article>