<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2022.874035</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Detection and Localization of Tip-Burn on Large Lettuce Canopies</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Franchetti</surname> <given-names>Benjamin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Pirri</surname> <given-names>Fiora</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1669108/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Agricola Moderna</institution>, <addr-line>Milan</addr-line>, <country>Italy</country></aff>
<aff id="aff2"><sup>2</sup><institution>Alcor Lab, DIAG, Sapienza University of Rome</institution>, <addr-line>Rome</addr-line>, <country>Italy</country></aff>
<aff id="aff3"><sup>3</sup><institution>Deep Plants</institution>, <addr-line>Rome</addr-line>, <country>Italy</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Shawn Carlisle Kefauver, University of Barcelona, Spain</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jianjun Du, Beijing Research Center for Information Technology in Agriculture, China; Ana Mar&#x000ED;a Mendez-Espinoza, Instituto de Investigaciones Agropecuarias, Chile; Kun Li, Chinese Academy of Agricultural Sciences (CAAS), China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Fiora Pirri <email>pirri&#x00040;diag.uniroma1.it</email>; <email>fiora&#x00040;deepplants.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Technical Advances in Plant Science, a section of the journal Frontiers in Plant Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>05</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>874035</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Franchetti and Pirri.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Franchetti and Pirri</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Recent years have seen an increased effort in the detection of plant stresses and diseases using non-invasive sensors and deep learning methods. Nonetheless, no studies have been made on dense plant canopies, due to the difficulty in automatically zooming into each plant, especially in outdoor conditions. Zooming in and zooming out is necessary to focus on the plant stress and to precisely localize the stress within the canopy, for further analysis and intervention. This work concentrates on tip-burn, which is a plant stress affecting lettuce grown in controlled environmental conditions, such as in plant factories. We present a new method for tip-burn stress detection and localization, combining both classification and self-supervised segmentation to detect, localize, and closely segment the stressed regions. Starting with images of a dense canopy collecting about 1,000 plants, the proposed method is able to zoom into the tip-burn region of a single plant, covering less than 1/10th of the plant itself. The method is crucial for solving the manual phenotyping that is required in plant factories. The precise localization of the stress within the plant, of the plant within the tray, and of the tray within the table canopy allows to automatically deliver statistics and causal annotations. We have tested our method on different data sets, which do not provide any ground truth segmentation mask, neither for the leaves nor for the stresses; therefore, the results on the self-supervised segmentation is even more impressive. Results show that the accuracy for both classification and self supervised segmentation is new and efficacious. Finally, the data set used for training test and validation is currently available on demand.</p>
</abstract>
<kwd-group>
<kwd>tip-burn detection and localization</kwd>
<kwd>self supervised segmentation</kwd>
<kwd>plant disease classification</kwd>
<kwd>segmentation of large canopies</kwd>
<kwd>indoor farming</kwd>
</kwd-group>
<counts>
<fig-count count="9"/>
<table-count count="2"/>
<equation-count count="11"/>
<ref-count count="75"/>
<page-count count="15"/>
<word-count count="10941"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Plant stress detection is a long-standing research field and, among the stresses, tip-burn affecting, particularly, lettuce has been intensively studied, refer for example Termohlen and Hoeven (<xref ref-type="bibr" rid="B59">1965</xref>), Lutman (<xref ref-type="bibr" rid="B33">1919</xref>), Cox et al. (<xref ref-type="bibr" rid="B15">1976</xref>), and Gozzovelli et al. (<xref ref-type="bibr" rid="B21">2021</xref>).</p>
<p>Nowadays, the combination of new methods arising from computer vision and deep learning, the availability of new low-cost sensors together with increased attention on the transparency, quality, and healthiness of the farm to fork process is making plant stress analysis a challenging research topic.</p>
<p>Classification of plant diseases is becoming a relevant topic thanks to a number of new data sets, such as PlantLeaves (Chouhan et al., <xref ref-type="bibr" rid="B13">2019</xref>), PlantsDoc (Singh et al., <xref ref-type="bibr" rid="B51">2020</xref>), PlantsVillage (Hughes and Salathe, <xref ref-type="bibr" rid="B26">2016</xref>), Plantae-K (Vippon Preet Kour, <xref ref-type="bibr" rid="B61">2019</xref>), Cassava (Mwebaze et al., <xref ref-type="bibr" rid="B38">2019</xref>), Citrus leaves (Rauf et al., <xref ref-type="bibr" rid="B45">2019</xref>), etc. made available as tensorflow datasets at <italic>tensorflow.org</italic>. Examples from these datasets are shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. These new datasets and their ease of accessibility have thrived the research improving deep learning models for stress detection applications.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Images from the plant disease classification datasets: Cassava (Mwebaze et al., <xref ref-type="bibr" rid="B38">2019</xref>), Citrus leaves (Rauf et al., <xref ref-type="bibr" rid="B45">2019</xref>), PlantLeaves (Chouhan et al., <xref ref-type="bibr" rid="B13">2019</xref>), and PlantVillage (Hughes and Salathe, <xref ref-type="bibr" rid="B26">2016</xref>). The images clearly illustrate the difference with the proposed task of stress detection on large canopies. <bold>(A)</bold> Cassava, <bold>(B)</bold> Citrus leaves, <bold>(C)</bold> PlantLeaves, and <bold>(D)</bold> PlantVillage.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0001.tif"/>
</fig>
<p>A limit of the currently available datasets is their inadequateness for stress analysis in Controlled Environment Agriculture (CEA) and specifically in plant factories, where plants are grown indoors under artificial lights, densely packed together, and stacked on multiple layers. In such a highly densely growing conditions, the plants are compacted on tables of trays, and stress problems need to be studied from this specific perspective, as shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Large canopies of plants grown in Plant Factories <bold>(A)</bold>. In <bold>(B)</bold> we see the operators controlling the canopy to visually detect tip-burn, on the rolling tables.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0002.tif"/>
</fig>
<p>The detection and localization of stress in plant factories have to deal with complex surfaces agglomerating several plants, where the single leaf shape is not specifically relevant, and at the same time, stresses, such as tip-burn, occur on the leaf tip. Moreover, typically plants affected by tip-burn are few, sparse, and hidden in the canopy of other healthy leaves. The underlying cause of tip-burn is a lack of calcium intake by the plants. This, however, is a result of multiple factors, such as lack of airflow, high humidity, excessive lighting, inadequate watering, and nutrient supply. A key advantage of growing plants indoors is the possibility to control all aspects of the plant growth including the light recipe and climate, thereby providing the optimal mix of conditions to optimize plant development and quality. However, high-density crop production, limited dimensions, lack of natural ventilation, and the need for artificial lighting for photosynthesis makes plants grown in plant factories, especially, vulnerable to tip-burn. Consequently, tip-burn has become a metric for the healthiness of the plants, and being able to monitor its advent is extremely relevant in indoor growing conditions. By automatically detecting tip-burn, the vertical farm control software can adjust the growing recipes in real time to provide the plants with the optimal growing conditions.</p>
<p>In this work, we propose a novel model for tip-burn detection in lettuce that fills the gap between already explored techniques of deep learning applied to plant stress detection and their practical implementation in plant factories. Our work includes the realization of an adequate dataset made of real and generated images. Yet, to emphasize the generality of our contribution we have also tested our model on PlantLeaves (Chouhan et al., <xref ref-type="bibr" rid="B13">2019</xref>), PlantsVillage (Hughes and Salathe, <xref ref-type="bibr" rid="B26">2016</xref>), and Citrus leaves (Rauf et al., <xref ref-type="bibr" rid="B45">2019</xref>) and compared with other works, whose results have a state of the art.</p>
</sec>
<sec id="s2">
<title>2. Related Works on Disease Detection</title>
<p>Plants disease detection is nowadays a quite impressive research field collecting methods and studies on a good diversity of diseases, crops, plant species, conditions, and contexts. In particular, most of the recent studies are based on deep learning methods, yet consider different cameras and datasets.</p>
<p><bold>Disease detection</bold>. A number of approaches are based on dedicated sensors, such as hyperspectral cameras, or generate their own datasets. For example, Nagasubramanian et al. (<xref ref-type="bibr" rid="B39">2018</xref>) studied charcoal rot disease identification in soybean leaves by implementing a 3D Deep-CNN on data collected by a hyperspectral camera. Zhang et al. (<xref ref-type="bibr" rid="B71">2019</xref>) carried out a similar study using high-resolution hyperspectral images to detect the presence of yellow rust in winter wheat. Refer to Terentev et al. (<xref ref-type="bibr" rid="B58">2022</xref>) for a recent overview of hyperspectral approaches.</p>
<p>On the other hand, the publicly available datasets designed for disease classification, such as those introduced above, have played a crucial role in most of the deep learning methods.</p>
<p>Approaches exploiting the publicly available datasets have obtained very high accuracy for classification. For example, Agarwal et al. (<xref ref-type="bibr" rid="B5">2020</xref>) trained a CNN on tomato leaves images taken from the PlantVillage dataset obtaining 91.20% accuracy on 10 classes of diseases. On the other hand, on the same set of tomato classes, Abbas et al. (<xref ref-type="bibr" rid="B2">2021</xref>) obtained 97.11% accuracy with DenseNet121 &#x0002B; Synthetic images.</p>
<p>Patidar et al. (<xref ref-type="bibr" rid="B43">2020</xref>) obtained 95.38% accuracy in diseases classification on the Rice Leaf Disease Dataset (Prajapati et al., <xref ref-type="bibr" rid="B44">2017</xref>) from the UCI Machine Learning Repository. Mishra et al. (<xref ref-type="bibr" rid="B35">2020</xref>) achieved 88.46% accuracy on corn plant disease detection, at the same time, obtaining real-time performance of a deep model capable of running on smart devices. Saleem et al. (<xref ref-type="bibr" rid="B47">2020</xref>) experimented a number of deep networks on the Plant Village dataset, proposing a comparative evaluation study between multiple CNNs and optimizers for the task of plant disease classification, in order to find the combination with the best performances, obtaining quite challenging results. Sharma et al. (<xref ref-type="bibr" rid="B48">2020</xref>) obtained 98.6% accuracy on PlantVillage by manually segmenting a subset of the images. Hassan and Maji (<xref ref-type="bibr" rid="B24">2022</xref>) obtain significant results on three datasets: 99.39% on PlantVillage, 99.66% on Rice, and 76.59% on imbalance cassava. Syed-Ab-Rahman et al. (<xref ref-type="bibr" rid="B57">2022</xref>) obtained 94.37% accuracy in detection and an average precision of 95.8% on the Citrus leaves dataset, distinguishing between three different citrus diseases, namely citrus black spot, citrus bacterial canker, and Huanglongbing.</p>
<p>Overall, results on the publicly available datasets are saturating toward super human performance, showing that new steps for diseases detection need to be taken.</p>
<p>Other digital images based deep learning approaches have experimented with their own datasets. Examples are DeChant et al. (<xref ref-type="bibr" rid="B16">2017</xref>) and Shrivastava et al. (<xref ref-type="bibr" rid="B50">2019</xref>). DeChant et al. (<xref ref-type="bibr" rid="B16">2017</xref>) consider the classification of the Northern Leaf Blight in maize plants, taking images of leaves in the field. While (Shrivastava et al., <xref ref-type="bibr" rid="B50">2019</xref>) studied the strength of transfer learning for the identification of three different rice plant diseases. A recent review on computer vision and machine learning methods for disease detection is done in Barbedo (<xref ref-type="bibr" rid="B8">2019a</xref>), Abade et al. (<xref ref-type="bibr" rid="B1">2020</xref>), and Lu et al. (<xref ref-type="bibr" rid="B32">2021</xref>).</p>
<p><bold>Large canopies and tip-burn studies</bold>. Tip-burn studies date back long ago (Lutman, <xref ref-type="bibr" rid="B33">1919</xref>; Termohlen and Hoeven, <xref ref-type="bibr" rid="B59">1965</xref>; Cox and McKee, <xref ref-type="bibr" rid="B14">1976</xref>), essentially exploring causes induced by lack of nutrients absorption, such as in Son and Takakura (<xref ref-type="bibr" rid="B53">1989</xref>) and Watchareeruetai et al. (<xref ref-type="bibr" rid="B63">2018</xref>). As far as we know, only (Shimamura et al., <xref ref-type="bibr" rid="B49">2019</xref>) conducted tip-burn identification in plant factories using GoogLeNet, for binary classification of single lettuce images. They check from manually collected images of a single plant whether it has tip-burn or not.</p>
<p>Similarly, in Gozzovelli et al. (<xref ref-type="bibr" rid="B21">2021</xref>), a dataset for tip-burn detection on large dense canopies of indoor grown plants is generated with specific attention to cover the data imbalance. To cope with the imbalance, a huge amount of data were generated with Wasserstein Generative Adversarial Network (GANs) and verified using the realism score of Kynk&#x000E4;&#x000E4;nniemi et al. (<xref ref-type="bibr" rid="B30">2019</xref>). Classification was performed with two class-classifier architecture highly inspired from DarkNet-19, YOLOv2 backbone (Redmon and Farhadi, <xref ref-type="bibr" rid="B46">2016</xref>), while the tip-burn region was identified preparing a ground-truth with a conditional random field, further generalized with a U-Net (Noh et al., <xref ref-type="bibr" rid="B40">2015</xref>).</p>
<p>GANs were already used in Giuffrida et al. (<xref ref-type="bibr" rid="B20">2017</xref>) to generate Arabidopsis leaf using the number of leaves as the label. Similar to Gozzovelli et al. (<xref ref-type="bibr" rid="B21">2021</xref>) in Douarre et al. (<xref ref-type="bibr" rid="B19">2019</xref>), the authors explore segmentation at the canopy level, of apple scab. They augment the segmentation training set with conditional GANS.</p>
<p><bold>Plants stress and disease segmentation</bold>. Segmentation for enhancing plant stress and disease detection has been explored by the works of Tian and LI (<xref ref-type="bibr" rid="B60">2004</xref>) and Zhang and Wang (<xref ref-type="bibr" rid="B70">2007</xref>). Most of the methods, even recently, tend to use image processing methods, such as filtering, thresholding, Gaussian mixtures, and color transforms to segment the disease or part of the leaf. Barbedo (<xref ref-type="bibr" rid="B7">2017</xref>) noted that when the disease symptoms show a difference in color with respect to surrounding areas, then ROI segmentation can be easily exploited. This observation has led to the study of the improvements in disease classification led by segmentation. This indeed was the choice in Gozzovelli et al. (<xref ref-type="bibr" rid="B21">2021</xref>) and Sharma et al. (<xref ref-type="bibr" rid="B48">2020</xref>), despite in the latter, segmentation is done manually. A leaf segmented version of Plant Village is used by Abdu et al. (<xref ref-type="bibr" rid="B3">2018</xref>) to introduce an automatic extended region of interest (EROI) algorithm for simplified detection. The segmentation of the disease is obtained by thresholding while the leaf segmentation is not treated and segmented leaf images are provided as a dataset. Following the work of Abdu et al. (<xref ref-type="bibr" rid="B3">2018</xref>) in Abdu et al. (<xref ref-type="bibr" rid="B4">2019</xref>), an extended EROI version is provided to study individual diseased segments, still based on a segmented version of PlantVillage, provided as a dataset.</p>
<p>In Douarre et al. (<xref ref-type="bibr" rid="B19">2019</xref>), the authors segment a canopy apple leaves extending the manual training set with cGAN (Mirza and Osindero, <xref ref-type="bibr" rid="B34">2014</xref>) generated images. Sodjinou et al. (<xref ref-type="bibr" rid="B52">2021</xref>) propose a segmentation method to separate plants and weeds, based on initial semi-manual preprocessing, using cropping and thresholding, further U-Net semantic segmentation refines the segmentation and, finally, the results are post processed with a subtractive clustering algorithm.</p>
<p>As a matter of fact, despite the observation of Barbedo (<xref ref-type="bibr" rid="B7">2017</xref>), better and more generalized results can be obtained using deep learning methods that do not rely on specific image processing practices to come out with a segmentation result, as shown in other application fields.</p>
<p><bold>Weakly Self Supervised segmentation</bold>. As far as we know, no method has so far explored self-supervised segmentation of plants disease, based on the class annotation only. Our work is the first one providing both the leaf segmentation (for PlantLeaves and PlantVillage, and Citrus Leaves) and the tip-burn stress segmentation without any manual annotation of pixel labels for segmentation.</p>
<p>We recall that weakly self-supervised segmentation (WSSS) is self-supervised segmentation using only image-level annotation. This means that only the information of the category in the image (e.g., &#x0201C;diseased&#x0201D; or &#x0201C;healthy&#x0201D;) is used to segment the object(s) of interest. Namely, the method consists of predicting a pseudo-label mask of the objects belonging to the class of interest, only relying on the image class label. Recent research has dedicated significant attention to the problem, introducing new methods based on weakly supervised learning, such as self training (Zou et al., <xref ref-type="bibr" rid="B75">2018</xref>; Gu et al., <xref ref-type="bibr" rid="B23">2020</xref>; Wang et al., <xref ref-type="bibr" rid="B62">2020</xref>), domain adaptation (Pan et al., <xref ref-type="bibr" rid="B42">2020</xref>; Yang and Soatto, <xref ref-type="bibr" rid="B66">2020</xref>), noisy label learning (Xie and Huang, <xref ref-type="bibr" rid="B65">2021</xref>) and class activation maps (CAM). CAM, introduced by Zeiler and Fergus (<xref ref-type="bibr" rid="B68">2014</xref>) and Zhou et al. (<xref ref-type="bibr" rid="B73">2016</xref>) localize the object of interest only relying on the image classes and backpropagating the probability to layers before the logits. The CAM-based methods have motivated a huge amount of works, such as Sun et al. (<xref ref-type="bibr" rid="B55">2020</xref>), Chan et al. (<xref ref-type="bibr" rid="B10">2021</xref>), Araslanov and Roth (<xref ref-type="bibr" rid="B6">2020</xref>), Yao and Gong (<xref ref-type="bibr" rid="B67">2020</xref>), and Wang et al. (<xref ref-type="bibr" rid="B62">2020</xref>). The method we propose in this work is WSSS using only the image class label, to segment the plants&#x00027; lesions. The only available knowledge is whether the image represents a stressed or not-stressed region. Our method works on domains where the task is to generate pseudo label masks of quite small high-deformable shapes. Despite our elective application domain being large canopies of plants grown in plant factories, it can be used for other applications, as we show applying our method to publicly available datasets.</p>
</sec>
<sec sec-type="materials and methods" id="s3">
<title>3. Materials and Methods</title>
<sec>
<title>3.1. Data Collection</title>
<p>Since tip-burn manifests on the leaves tip, it is mandatory to acquire images with a top view of the whole table. We do so by taking images with an HR digital camera fixed above the rolling table shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. A table is a base on which plants are grown. Each table assembles into multiple trays, which in turn are further divided into multiple cells where plant seeds are placed.</p>
<p>We collected images of size 4.64E &#x0002B; 3&#x000D7;6.96E &#x0002B; 3&#x000D7;3 of the tables, using a camera Canon32.5 APS-C of 32.5 megapixels located above the rolling tables (shown in <xref ref-type="fig" rid="F2">Figure 2</xref>). The whole set is made of 43 images, 30 for training and 13 for validation and testing. Images were collected in a period of tip-burn spread. As a tip burn is about 5&#x000D7;5 pixels in the camera image, we have devised a splitting process that allows to zoom into the table image. We split the 43 images of size 4, 640&#x000D7;6, 960&#x000D7;3 into smaller images of size 64&#x000D7;64&#x000D7;3, with an interface we have prepared for the task, and collected 2,127 images of tip-burn. We have automatically selected the same number of images of healthy plants, ensuring to be healthy by correlated with the stressed images from the same table. The images collected by splitting the original table images have been then augmented to finally obtain a training set of 16,323 images of tip-burn stressed and healthy plants, a validation set of 5,596 images and a test set of 1,399 images.</p>
<p>For the purpose of illustrating our method on other datasets, we used the PlantVillage, PlantLeaves, and CitrusLeaves datasets available on <ext-link ext-link-type="uri" xlink:href="https://Tensorflow.org">Tensorflow.org</ext-link>.</p>
</sec>
<sec>
<title>3.2. Method</title>
<p><bold>Preliminaries</bold>. The main practicality of weakly-supervised semantic segmentation methods (WSSS) is to avoid the resource-demanding manual labeling of each pixel of the categories of interest in an image, which is an impossible task for large canopies. Indeed, WSSS transforms the semantic segmentation task into the much less demanding effort of image-level class annotations. The problem is ill-conditioned and difficult, and a large literature is dedicated to the solution of it starting from Zeiler and Fergus (<xref ref-type="bibr" rid="B68">2014</xref>) and Zhou et al. (<xref ref-type="bibr" rid="B73">2016</xref>), up to most recent contributions (Chang et al., <xref ref-type="bibr" rid="B11">2020</xref>; Sun et al., <xref ref-type="bibr" rid="B55">2020</xref>, <xref ref-type="bibr" rid="B56">2022</xref>; Wang et al., <xref ref-type="bibr" rid="B62">2020</xref>; Wu et al., <xref ref-type="bibr" rid="B64">2021</xref>; Zhang et al., <xref ref-type="bibr" rid="B69">2021</xref>). Semantic segmentation is critical for detecting tip-burn on large canopies due to the difficulty of both identifying it on a dense set of plants and to individually localizing each tip-burned plant within the canopy, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. To ensure both identification and localization, we develop a new method for weakly-supervised semantic segmentation for the tip-burn stress (and for the visible disease in plants disease datasets) by defining a network pipeline using attention-based splitting, classification, and graph convolution.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The problem: given the image of a large canopy find all regions with tip-burn. Because tip-burn regions are very small and maybe each other close, segmentation is better than simple localization with bounding boxes. We propose a novel method for weakly supervised semantic segmentation, with only image class-labels annotations (classification accuracy 97.3%).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0003.tif"/>
</fig>
<p>A crucial aspect of our model is that we adopt the same classifier for both the image and the patches, suitably resized. For this idea to work, it is required that feature properties are shared between the image of the object as a whole and the image of sub-parts of the object. For example, any subset of the image of a canopy shares similar features with the image as a whole. See <xref ref-type="fig" rid="F4">Figure 4</xref>, last image of the upper strip captioned as &#x02018;input image&#x00027;. Similarly, a leaf and part of a leaf have the same feature properties. This often occurs in natural images, though it is not true, for example, for a tree, which has different features for the trunk and the crown. We define this characteristic of an object feature property as the <italic>principle of decomposition</italic>. In this work, we show that this principle is valid for both stress detection and segmentation on large canopies and for disease detection and segmentation for leaves (from the cited datasets), which is the domain of interest in this work.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Main idea of the tip-burn semantic segmentation requiring only image-level class annotation: decompose table canopy images up to an image <italic>X</italic> of size 352&#x000D7;576&#x000D7;3. Split <italic>X</italic> into overlapping patches of size 64&#x000D7;64&#x000D7;3 and use classification trained on these patches to obtain an attention map. Use the attention map as supervision for training a convolutional graph transferring probabilities on similar patches. Finally, results are automatically merged together forming the segmentation map of the canopy.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0004.tif"/>
</fig>
<p>We consider a classification model <italic>f</italic><sub>&#x02113;</sub>(<italic>C</italic>|<italic>X, Y</italic>, &#x003B8;), where <italic>C</italic> indicates the class a sample image <italic>X</italic> belongs to, <italic>Y</italic> &#x0003D; {1, &#x02026;, <italic>c</italic>} is the vector of training labels, &#x02113; indicates the size of the images accepted by the network, and &#x003B8; are the network parameters. The classification model maps each sample <italic>X</italic> to <inline-formula><mml:math id="M1"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>p</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, which is the probability vector for the class <italic>C</italic> given <italic>X</italic>, as estimated by the softmax activation.</p>
<p>Let <italic>X</italic> be an <italic>N</italic>&#x000D7;<italic>M</italic>&#x000D7;<italic>d</italic> tensor specifying an image, and <italic>X</italic><sup>&#x022C6;</sup> be any connected sub-tensor of it of size <italic>n</italic>&#x000D7;<italic>m</italic>&#x000D7;<italic>d</italic>, with <italic>m</italic> &#x02264; <italic>M</italic> and <italic>n</italic> &#x02264; <italic>N</italic>, where connected means that chosen row and column elements <italic>n</italic> and <italic>m</italic> from <italic>X</italic> are consecutive. We say that <italic>X</italic> enjoys the principle of decomposition if, given a deep classifier <italic>f</italic><sub><italic>M</italic>&#x000D7;<italic>N</italic></sub>(<italic>C</italic>|<italic>X, Y</italic>, &#x003B8;) with probability <italic>p</italic> of correctly classifying <italic>X</italic>, with respect to classes <italic>C</italic>, we expect that it correctly classifies <italic>S</italic>(<italic>X</italic><sup>&#x022C6;</sup>) with approximately the same <italic>p</italic>. Here, <italic>S</italic> is a suitable scaling transformation, including appropriate filtering, transforming <italic>X</italic><sup>&#x0002A;</sup> to <italic>X</italic><sup>&#x022C6;</sup>&#x02032; having the same size as <italic>X</italic>.</p>
<p><bold>Pre-processing and classification model for tip-burn on large canopies</bold>. Tip-burn pre-processing exposes three components. The first component is splitting the canopy image into two images of size (4.64E&#x0002B; 3&#x000D7;6.96E &#x0002B; 3&#x000D7;3) representing half-table, then into all the sub-trays, and further each tray into 16 input images of size 352&#x000D7;576&#x000D7;3. The second component is the augmentation of the training images of size 64&#x000D7;64&#x000D7;3, by random rotation between 0 up to 90 degrees, flipping up and down and left to right, color quantization to 8 colors, zooming in by scaling and cropping, zooming out by padding, and finally by Gaussian blurring with random variance &#x003C3; &#x02208; (0.5, 2). For the classification model, we have used as backbone Resnet50V2, trained on ImageNet 1000 and fine tuned with Global Average Pooling, drop-out (to introduce stochasticity in the training) and dense layers.</p>
<p><bold>Pre-processing and classification model for the single leaf image datasets</bold>. For classifying the single leaf images of datasets, like PlantVillage, CitrusLeaves, and PlantLeaves, we used the same backbone as for the tip-burn. On the other hand, for weakly supervised semantic segmentation, we have also used a multi layer perceptron (MLP) to separate the background from the foreground. Profiting from the simple arrangement of a single leaf on a background of these datasets, we have automatically sampled from each image a patch of size 8&#x000D7;8&#x000D7;3 from each corner, labeling them background, and 6 patches of the same size from the image center as foreground and gave these data to the MLP to learn to separate the background from the foreground.</p>
<p><bold>Local attention by splitting with hard strides</bold>. The main interest of splitting an image into patches with hard strides is to obtain the attention map, in a way similar to how the human gaze glimpses a scene focusing on interesting regions. Here, by hard strides, we intend strides that allow for a significant overlapping of the patches or, more specifically, strides that have a dimension much lower than the patch size.</p>
<p>Most of the work for attention estimation is done by the overlapping induced by the strides like when the gaze goes back to an interesting region of the scene several times. Yet, this kind of attention is <italic>local</italic>, as it does not capture the whole context. To obtain the context, we shall refine this spitting-based attention with spectral graph convolution, described in the next paragraph.</p>
<p><bold>The splitting process and patch classification</bold>. Breaking an image into patches is a well-known technique (see Nowak et al., <xref ref-type="bibr" rid="B41">2006</xref>; Zhou et al., <xref ref-type="bibr" rid="B74">2009</xref>; Dong et al., <xref ref-type="bibr" rid="B18">2011</xref>), requiring only algebraic manipulations of tensors. Consider the image <italic>X</italic> of size <italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;<italic>d</italic>. The splitting operation, along the spatial dimension, extracts from <italic>X</italic> patches of dimension (<italic>p</italic><sub><italic>x</italic></sub>, <italic>p</italic><sub><italic>y</italic></sub>, <italic>d</italic>). Here, the splitting combined with strides allows for overlapping the neighbors&#x00027; patches according to the stride values (<italic>s</italic><sub><italic>x</italic></sub>, <italic>s</italic><sub><italic>y</italic></sub>). In some sense, it is like taking the inner product of <italic>X</italic> with the lower and upper shift matrices <italic>A</italic>, <italic>A</italic>&#x02032;, and their transposed <italic>A</italic><sup>&#x022A4;</sup> and <italic>A</italic>&#x02032;<sup>&#x022A4;</sup> with suitable shifts, and then cropping the non-zero values. Or, similarly, convolving <italic>X</italic> with a shifting kernel and cropping the non zero elements. The number of obtained patches and their configuration depends on the sizes <italic>M, N</italic> of the image, the number of channels <italic>d</italic>, the patch sizes (<italic>p</italic><sub><italic>x</italic></sub>, <italic>p</italic><sub><italic>y</italic></sub>, <italic>d</italic>), and the spatial stride (<italic>s</italic><sub><italic>x</italic></sub>, <italic>s</italic><sub><italic>y</italic></sub>). <italic>k</italic><sub>1</sub> and <italic>k</italic><sub>2</sub> are obtained like in the convolution output, though here we do not consider padding:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M3"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x02308;</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x02309;</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn><mml:mtext class="textrm" mathvariant="normal">&#x000A0;and&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x02308;</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x02309;</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We denote CO as a configuration of <italic>k</italic><sub>1</sub>&#x000D7;<italic>k</italic><sub>2</sub> patches, each of size <italic>p</italic><sub><italic>x</italic></sub>&#x000D7;<italic>p</italic><sub><italic>y</italic></sub>&#x000D7;<italic>d</italic>. Namely, it is the shaping of <italic>k</italic><sub>1</sub> patches on a row and <italic>k</italic><sub>2</sub> patches on a column, shown in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>The figure illustrates the splitting process and patch classification taking as an example an image <italic>X</italic>&#x02208;&#x0211D;<sup><italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;3</sup>, with <italic>M</italic>&#x0003D;160 and <italic>N</italic>&#x0003D;195. The first plate above shows the <italic>configuration</italic> CO obtained by splitting <italic>X</italic> into patches of size <italic>p</italic><sub><italic>x</italic></sub>&#x000D7;<italic>p</italic><sub><italic>y</italic></sub>&#x000D7;<italic>d</italic> and a stride (<italic>s</italic><sub><italic>x</italic></sub>, <italic>s</italic><sub><italic>y</italic></sub>) with <italic>p</italic><sub><italic>x</italic></sub>&#x0003D;<italic>p</italic><sub><italic>y</italic></sub>&#x0003D;64, <italic>d</italic>&#x0003D;3 and <italic>s</italic><sub><italic>x</italic></sub>&#x0003D;<italic>s</italic><sub><italic>y</italic></sub>&#x0003D;8. In the example, CO has a configuration of <italic>k</italic><sub>1</sub>&#x000D7;<italic>k</italic><sub>2</sub> patches, with <italic>k</italic><sub>1</sub>&#x0003D;13 and <italic>k</italic><sub>2</sub>&#x0003D;17. The plate on the upper-right shows the configuration of probabilities CoP obtained <italic>via</italic> the softmax by classifying each patch in CO. CoP has the same shape as CO, by fine-tuning Resnet50V2. The plates below, on the right of the image <italic>X</italic>, show the matrix <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, collecting the number of times patches overlap on a pixel when collapsing the configuration into the reconstructed image, according to the stride. The Reconstructed Attention Map (RAM) is obtained by collapsing CoP, and it has the same size as <italic>X</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0005.tif"/>
</fig>
<p>Given <italic>X</italic>&#x02208;&#x0211D;<sup><italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;<italic>d</italic></sup>, the configuration <italic>CO</italic>, and a patch <italic>X</italic><sub><italic>j</italic></sub> in <italic>CO</italic>, with 0&#x02264;<italic>j</italic>&#x02264;<italic>k</italic><sub>1</sub>&#x000B7;<italic>k</italic><sub>2</sub>, the patch <italic>X</italic><sub><italic>j</italic></sub> is resized as <italic>S</italic><sub>&#x02113;</sub>(<italic>X</italic><sub><italic>j</italic></sub>) (including required filtering modes), where &#x02113; indicates the size of input images accepted by the network <italic>f</italic><sub>&#x02113;</sub>(<italic>C</italic>|<italic>X, Y</italic>, &#x003B8;). The value of &#x02113; changes according to the considered dataset. For the plants, disease datasets, the classification entry corresponds to the size of the image in the dataset, may be reduced as for PlantLeaves, while a patch is proportional to the image size. This shows the extreme flexibility of the splitting process followed by classification which is adaptable to several kinds of backbones.</p>
<p>For each patch in the configuration CO, obtained by splitting the original image, the probability that it belongs to the class of interest (e.g., tip-burn) is estimated by the network <italic>f</italic><sub>&#x02113;</sub> resizing the patch to the input size &#x02113; accepted by the classification network <italic>f</italic><sub>&#x02113;</sub>.</p>
<p>The estimation amounts to the softmax applied to the logits of the classifier <italic>f</italic><sub>&#x02113;</sub>(<italic>C</italic>|<italic>X, Y</italic>, &#x003B8;), here we used Resnet50V2 as a backbone. After each patch, probability to belong to <italic>C</italic> is estimated, a configuration of probabilities (CoP) is obtained, as shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. CoP has the same configuration as CO, though each patch &#x003C0; is defined by repeating at each pixel, the probability <italic>p</italic> computed by the softmax on classifying the patch. When we indicate the probability <italic>p</italic><sub><italic>r,c</italic></sub> of the patch located at indexes (<italic>r, c</italic>), we mean the probability <italic>p</italic>.</p>
<p>A mapping <italic>h</italic> from CoP to the reconstructed attention map (RAM) is defined by collapsing the patches in CoP into the corresponding pixels of the matrix RAM. Note that while the whole size of CoP is <italic>k</italic><sub>1</sub>&#x000B7;<italic>p</italic><sub><italic>x</italic></sub>&#x000D7;<italic>k</italic><sub>2</sub>&#x000B7;<italic>p</italic><sub><italic>y</italic></sub>, namely (<italic>M</italic>&#x000B7;&#x02308;<italic>p</italic><sub><italic>x</italic></sub>/<italic>s</italic><sub><italic>x</italic></sub>&#x02309;)&#x000D7;(<italic>N</italic>&#x000B7;&#x02308;<italic>p</italic><sub><italic>y</italic></sub>/<italic>s</italic><sub><italic>y</italic></sub>&#x02309;), with <italic>s</italic><sub><italic>x</italic></sub> &#x0226A; <italic>p</italic><sub><italic>x</italic></sub> and <italic>s</italic><sub><italic>y</italic></sub> &#x0226A; <italic>p</italic><sub><italic>y</italic></sub>, RAM has the same spatial dimension as the original image, namely <italic>M</italic>&#x000D7;<italic>N</italic>. Given a patch &#x003C0;<sub><italic>r,c</italic></sub> in row <italic>r</italic> and column <italic>c</italic> in CoP, and a pixel at location (<italic>i, j</italic>) in &#x003C0;<sub><italic>r,c</italic></sub>, the tuple ((<italic>r, i</italic>), (<italic>c, j</italic>)) is mapped by <italic>h</italic> to the pixel (<italic>x, y</italic>) in RAM, as follows:</p>
<disp-formula id="E3"><label>(2)</label><mml:math id="M5"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>A</mml:mi><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>r</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">for&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>P</mml:mi><mml:mtext class="textrm" mathvariant="normal">&#x000A0;and&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Given Equation (2), we also obtain a matrix <italic>A</italic><sub><italic>overlap</italic></sub> by counting all times a pixel from CoP hits the corresponding pixel of RAM. Indeed, this matrix specifies how many overlapping patches contribute to a pixel in RAM:</p>
<disp-formula id="E4"><label>(3)</label><mml:math id="M6"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The matrix <italic>A</italic><sub><italic>overlap</italic></sub> is used to count the accuracy of the classification at the pixel level, and it allows to suitably average RAM. The averaged RAM is obtained as follows:</p>
<disp-formula id="E5"><label>(4)</label><mml:math id="M7"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>A</mml:mi><mml:msup><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mi>A</mml:mi><mml:mi>M</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows an example of CO representation, of CoP, of the matrix <italic>A</italic><sub><italic>overlap</italic></sub>, of RAM given a random image <italic>X</italic> with tip-burn highlighted. Where, here and from now on, we denote <italic>RAM</italic><sup>&#x022C6;</sup> RAM.</p>
<p>We can see that the accuracy of RAM is determined by the strides. For example, if the stride is <italic>s</italic><sub><italic>x</italic></sub> &#x0003D; <italic>s</italic><sub><italic>y</italic></sub> &#x0003D; 10, we have an accuracy at the level of a region of size 10&#x000D7;10, and if the stride is <italic>s</italic><sub><italic>x</italic></sub> &#x0003D; <italic>s</italic><sub><italic>y</italic></sub> &#x0003D; 1, the accuracy is at the pixel level allowing to effectively label each pixel. Clearly reducing the stride increases the number of patches of the same image. The average increase of the number of pixel is by a factor of 9.</p>
<p><bold>Refining by graph convolution</bold>. The RAM results in pseudo segmentation masks for the tip-burn stressed leaves found in the RGB image, following the pipeline splitting-classification-reconstruction. Differently from CAM (Zeiler and Fergus, <xref ref-type="bibr" rid="B68">2014</xref>; Zhou et al., <xref ref-type="bibr" rid="B73">2016</xref>), RAM highlights in the same map all objects of interest quite accurately. Moreover, while in CAM the result is obtained by the gradient of the softmax outcome, with respect to the last feature map, which has very low resolution, thus requiring significant resizing inducing blurring, here we do not need any resizing, as we can obtain the original image by a single step merging, according to Equation 2. Despite classification accuracy for tip-burn is 98.3%, and for the other datasets is no less than 96%, there is still noise on the attention map because classification is done on <italic>S</italic><sub>&#x02113;</sub>(<italic>X</italic>), namely on the resized patch, given the decomposition principle.</p>
<p>Comparing the size of a patch in CoP and the size of the probabilities highlighted in RAM in <xref ref-type="fig" rid="F5">Figure 5</xref>, we can note they have different sizes. This is due to overlapping and projection, which augment the resolution of the probability from uniform in a patch of size 64&#x000D7;64 to uniform on a patch of size 8&#x000D7;8. Indeed, the RAM probability resolution is 8&#x000D7;8. Having in RAM a higher probability resolution, we re-propose the splitting into sub-patches with size (<italic>s</italic><sub><italic>x</italic></sub>, <italic>s</italic><sub><italic>y</italic></sub>, <italic>d</italic>), namely of size 8&#x000D7;8&#x000D7;<italic>d</italic>, <italic>d</italic> &#x02208; {1, 3}, for both the RGB image of size 352&#x000D7;576&#x000D7;3 (see the paragraph above, on Preprocessing and Classification, and <xref ref-type="fig" rid="F6">Figure 6</xref>) and the RAM of size 352&#x000D7;576&#x000D7;1. A schema of this further splitting follows:</p>
<disp-formula id="E7"><label>(5)</label><mml:math id="M9"><mml:mtable columnalign="right"><mml:mtr><mml:mtd><mml:mi>C</mml:mi><mml:msub><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mtext class="textrm" mathvariant="normal">size</mml:mtext><mml:mo>=</mml:mo><mml:mn>42</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>72</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mtext class="textrm" mathvariant="normal">sub-patch size</mml:mtext><mml:mo>=</mml:mo><mml:mn>8</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>8</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">num of sub-patches</mml:mtext><mml:mo>=</mml:mo><mml:mn>3168</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mtext class="textrm" mathvariant="normal">size</mml:mtext><mml:mo>=</mml:mo><mml:mn>42</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>72</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext class="textrm" mathvariant="normal">sub-patch size</mml:mtext><mml:mo>=</mml:mo><mml:mn>8</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>8</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">num of sub-patches</mml:mtext><mml:mo>=</mml:mo><mml:mn>3168</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The goal is to use CoP<sub><italic>new</italic></sub> as supervision for training a convolutional graph network (GCN) improving the semantic segmentation accuracy obtained by classifying the patches. This further splitting step obtains a <italic>CO</italic><sub><italic>new</italic></sub> and a <italic>CoP</italic><sub><italic>new</italic></sub>, as specified in Equation (5), from which we obtain the features and the labels for the GCN. The softmax of the GCN classifies the nodes of the GCN, inducing an effective semantic segmentation for tip-burn for each image in the dataset, and similarly for the other datasets.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Overview of the weakly supervised semantic segmentation process of tip-burn leaves stress supervised with image-level class annotation only. Above, the splitting process starts at images <italic>X</italic> &#x02208; &#x0211D;<sup><italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;3</sup>, <italic>M</italic>&#x0003D;4, 640, <italic>N</italic>&#x0003D;6, 960. Note that starting from patches of size 352&#x000D7;576&#x000D7;3 the splitting process plays with overlapping patches of size 64&#x000D7;64&#x000D7;3 with a hard stride of 8 pixels. The classification is trained to detect tip-burn stresses, and <italic>via</italic> the softmax to predict a probability for each patch to belong to the class tip-burn. After classification, a reconstruction step obtains the Reconstructed Attention Map (RAM) for patches of size 352&#x000D7;576&#x000D7;3. These are again split into patches 8&#x000D7;8&#x000D7;3 and used to supervise the Graph Convolutional network (GCN). Namely, the GCN features nodes are the flattened 8&#x000D7;8&#x000D7;3 patches and the labels are one hot encoded vectors obtained from the classification predictions, see the plate with the GCN, The GCN estimates a refined semantic segmentation of pixels. A final reconstruction does the inverse splitting process reconstructing the table canopies from patches.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0006.tif"/>
</fig>
<p>Following Kipf and Welling (<xref ref-type="bibr" rid="B29">2016</xref>), a number of approaches have experimented with graph convolution (GCN), especially on non grid structures. Though, recently, an increasing interest is devolved to apply graph convolution on images, for segmentation and attention purposes, such as Li and Gupta (<xref ref-type="bibr" rid="B31">2018</xref>) and Hu et al. (<xref ref-type="bibr" rid="B25">2020</xref>). Here, we apply an unsupervised node classification, conditioning the graph model both on the data and on the adjacency matrix <italic>via</italic> graph convolution. As a matter of fact, we are going to generate the adjacency matrix for the graph <inline-formula><mml:math id="M24"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow></mml:math></inline-formula> &#x0003D; (<inline-formula><mml:math id="M25"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M26"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow></mml:math></inline-formula>), fully unsupervised. We construct a graph for each RGB input image <italic>X</italic> of size 352&#x000D7;576&#x000D7;3, fixing the size of the nodes, so as to put the graphs in a batch.</p>
<p>We take the patches as node features, labeling them with a one hot encoding vector obtained by thresholding the score <italic>p</italic><sub><italic>c,r</italic></sub> in <italic>CoP</italic><sub><italic>new</italic></sub> of patch &#x003C0;<sub><italic>c,r</italic></sub> &#x02208; <italic>CO</italic><sub><italic>new</italic></sub>. More precisely, we flatten each mini-patch of size 8&#x000D7;8&#x000D7;3 into a vector <bold>x</bold> &#x02208; &#x0211D;<sup><italic>k</italic></sup>, <italic>k</italic> &#x0003D; 192 and stack all the flattened patches into a matrix <inline-formula><mml:math id="M11"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <italic>n</italic> &#x0003D; 3168. To obtain a corresponding ordering, we use an index function <italic>idx</italic>:(<italic>r, c</italic>) &#x02192; <italic>i</italic>, <italic>idx</italic>(<italic>r, c</italic>)&#x0003D;<italic>w</italic>(<italic>c</italic>&#x02212;1)&#x0002B;<italic>r</italic>&#x0003D;<italic>i</italic> with <italic>w</italic> the number of rows, <italic>r</italic> and <italic>c</italic> the row and column indexes in <italic>CO</italic><sub><italic>new</italic></sub> and in <italic>CoP</italic><sub><italic>new</italic></sub>, respectively, and <italic>i</italic> the corresponding index in <italic>X</italic><sub>&#x003C6;</sub>. <italic>X</italic><sub>&#x003C6;</sub> is the input matrix to the network. At each layer of the network, a feature matrix is generated, starting with <italic>X</italic><sub>&#x003C6;</sub>.</p>
<p>To connect subsets of nodes, based on their feature similarity, we generate the adjacency matrix <italic>Adj</italic>, which is symmetric and of size <italic>n</italic>&#x000D7;<italic>n</italic>, as follows. We keep the indexing <italic>idx</italic> to maintain the correspondence between <italic>CO</italic><sub><italic>new</italic></sub> and <italic>X</italic><sub>&#x003C6;</sub> and between <italic>CoP</italic><sub><italic>new</italic></sub> and the labels. For each sub-patch, we estimate a non-parametric probability by computing the histogram using both the RGB and the HSV color transformation of the sub-patch and collapsing the 64&#x000B7;3&#x000B7;2 vector into a histogram with 64 bin-edges. For each pair of histograms <italic>q</italic><sub><italic>i</italic></sub>, <italic>q</italic><sub><italic>j</italic></sub>, we compute the Shannon-Jensen divergence:</p>
<disp-formula id="E9"><label>(6)</label><mml:math id="M12"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">JSD</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02016;</mml:mo><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mtext class="textrm" mathvariant="normal">KL</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02016;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mtext class="textrm" mathvariant="normal">KL</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02016;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">with</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;and&#x000A0;KL</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02016;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>X</mml:mi></mml:mrow></mml:msub></mml:mstyle><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo class="qopname">log</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The choice of JSD is required by the need of the adjacency matrix <italic>Adj</italic> to be symmetric. Then, two nodes <italic>v</italic><sub><italic>i</italic></sub>, <italic>v</italic><sub><italic>j</italic></sub> feature vectors <bold>x</bold><sub><italic>i</italic></sub> and <bold>x</bold><sub><italic>j</italic></sub> are similar, hence connected by an edge <italic>e</italic><sub><italic>i,j</italic></sub>&#x02208;<inline-formula><mml:math id="M27"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow></mml:math></inline-formula> if <italic>JSD</italic>&#x0003C;&#x003B2;, we have chosen &#x003B2;&#x0003D;0.005 for the tip-burn dataset. Given the <italic>n</italic> nodes in <inline-formula><mml:math id="M28"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:math></inline-formula>, the diagonal degree matrix <italic>D</italic> adds for each node the number of its connected ones. The normalized graph Laplacian matrix is <inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mi>A</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>U</mml:mi><mml:mi>&#x0039B;</mml:mi><mml:msup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>. Here, &#x0039B; is the matrix of the eigenvalues and <italic>U</italic> is the orthogonal eigenvectors. The graph convolution <italic>g</italic><sub>&#x003B8;</sub>(<italic>L</italic><sub><italic>norm</italic></sub>) &#x022C6; <italic>X</italic><sub><italic>j</italic></sub> using <italic>L</italic><sub><italic>norm</italic></sub> is the spectral convolution, based on obtaining parameter filters from the eigenvectors of <italic>L</italic><sub><italic>norm</italic></sub> in the Fourier domain. Several simplifications have been proposed, we refer the reader to Defferrard et al. (<xref ref-type="bibr" rid="B17">2016</xref>) for spectral convolution in the Fourier domain and the approximation of the <italic>L</italic> eigenvectors by Chebyshev polynomial up to the K-th order. Kipf and Welling (<xref ref-type="bibr" rid="B29">2016</xref>) obtain a GCN by a first-order approximation spectral graph convolution. They define <italic>K</italic> &#x0003D; 1 and reduce the Chebyshev coefficients to a matrix of filter parameters.</p>
<p>The feed forward propagation of the GCN is recursively defined as:</p>
<disp-formula id="E10"><label>(7)</label><mml:math id="M14"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x000C2;</mml:mi><mml:msup><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;with&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here &#x003C3; is an activation function, <inline-formula><mml:math id="M15"><mml:msup><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> are the hidden vectors of the <italic>t</italic>-th layer, with <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> the hidden feature vector of the node <italic>v</italic><sub><italic>i</italic></sub>. &#x000C2; is defined as follows. <italic>A</italic> &#x0003D; <italic>Adj</italic> &#x0002B; <italic>I</italic><sub><italic>n</italic></sub>, to include self loops, since nodes are self similar, <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M18"><mml:mi>&#x000C2;</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mi>A</mml:mi><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:math></inline-formula>, so as to be normalized, <italic>I</italic><sub><italic>n</italic></sub> is the identity matrix of size <italic>n</italic>&#x000D7;<italic>n</italic>. The role of &#x000C2; is to aggregate information from connected nodes. <italic>W</italic><sup>(<italic>t</italic>)</sup> is the weight matrix to be learned. The dimensions are as follows:</p>
<disp-formula id="E11"><label>(8)</label><mml:math id="M19"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x000C2;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here, <italic>u</italic><sub><italic>t</italic></sub> and <italic>u</italic><sub><italic>t</italic>&#x0002B;1</sub> are the sizes of the hidden layers. A 3 layer GCN has the form:</p>
<disp-formula id="E12"><label>(9)</label><mml:math id="M20"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>Z</mml:mi><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mi>&#x000C2;</mml:mi><mml:mo>,</mml:mo><mml:mi>W</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">softmax</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where the softmax is applied row-wise and</p>
<disp-formula id="E13"><label>(10)</label><mml:math id="M21"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x000C2;</mml:mi><mml:mtext>&#x000A0;ReLU</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x000C2;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ReLU</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x000C2;</mml:mi><mml:msup><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The optimization of the GCN uses cross-entropy loss on all labeled nodes (Kipf and Welling, <xref ref-type="bibr" rid="B29">2016</xref>), where here the labels are the one hot encoded values obtained from <italic>RAM</italic><sub><italic>new</italic></sub>. Let us denote by <inline-formula><mml:math id="M29"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">I</mml:mi></mml:mrow></mml:math></inline-formula> the indexes of the nodes and by <italic>Y</italic><sub><italic>i,l</italic></sub> an indicator which has value 1 if node <italic>v</italic><sub><italic>i</italic></sub> has label <italic>l</italic>:</p>
<disp-formula id="E14"><label>(11)</label><mml:math id="M22"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo class="qopname">log</mml:mo><mml:msub><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>According to the number <italic>K</italic> of layers, a GCN convolves the <italic>K</italic>-hop neighbors of a node, essentially clustering similar nodes, according to their probability labels and features. We use simple 3 layers GCN, since in the end tip-burn stresses on leaves are very small and rare. An overview of the whole learning process is given in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<p>The GCN adjusts the RAM by looking at the context, going beyond the localized estimation of splitting plus classification. GCN estimates the probability that a node, corresponding to features of a patch 8&#x000D7;8&#x000D7;3, belongs to tip-burn or not, by updating the belief that two patches are similar. At the end of the training, <italic>CoP</italic><sub><italic>new</italic></sub> is updated with the new distribution. In <xref ref-type="table" rid="T2">Table 2</xref>, in Section 5, we show the advantages of the GCN by ablation.</p>
<p><bold>Reconstruction</bold>. Given the initial image of the dense canopy, the question to be explored is &#x0201C;which plant suffers tip-burn stress and where it is?&#x0201D; including counting would not be so useful. Consider that when tables are unrolled, from the position of the plant on the table, it is possible to go back to the cell the plant comes from, and possibly revise its growing conditions, or make useful statistics. It is therefore pivotal to localize the stress segment on the table image. It turns out that by the proposed model it is extremely easy.</p>
<p>In fact, as noted in the paragraph on splitting, reconstruction is automatically done by projecting back a pixel in CoP into a pixel in RAM by Equation (2). Obviously, it can be done for any image, not only for the maps but also for RGB images.</p>
<p>Reconstruction is done both when the stride <italic>s</italic><sub><italic>x</italic></sub>&#x0003E;1 and <italic>s</italic><sub><italic>y</italic></sub>&#x0003E;1, that is when the splitting generates sub-images that overlap and, obviously, when they do not overlap. This, in fact, can be done for all the steps of splitting, from the table canopy up to <italic>Ram</italic><sub><italic>new</italic></sub> and for its dual RGB image, and again back to the large table canopy.</p>
<p>The back process requires preserving just the patches size for the maps and the scores, at each layer of the splitting. Then, the process is simply recursively applied to go from the patch up to the image of the whole canopy. Note that for the semantic segmentation, we need only to preserve the score vectors estimated by GCN. An image of a partial reconstruction of the half table is given in <xref ref-type="fig" rid="F4">Figures 4</xref>, <xref ref-type="fig" rid="F6">6</xref>.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Application of the Model to Other Datasets</title>
<p>As gathered in the introduction, we have collected three datasets, namely PlantLeaves (Chouhan et al., <xref ref-type="bibr" rid="B13">2019</xref>), PlantsVillage (Hughes and Salathe, <xref ref-type="bibr" rid="B26">2016</xref>), and Citrus leaves (Rauf et al., <xref ref-type="bibr" rid="B45">2019</xref>), to evaluate our approach. Usually these datasets are tested for classification, which has nowadays obtained striking results. Here, instead, we consider the semantic segmentation of the leaves lesions using only images class-labels, which is actually the only information available for these datasets.</p>
<p>Our goal here is to discuss mildly classification and most of all the whole pipeline we used to segment both the leaves and the disease spots and lesions. Clearly, segmenting the disease is more difficult when the leaf is almost completely covered by the disease spots, which are discolored regions or dark necrotic spots. As we shall see, the best results are actually obtained for CitrusLeaves and PlantLeaves, where the disease spots are localized.</p>
<p>From each dataset, we have chosen a class of diseases and the corresponding healthy images, for segmentation. For PlantVillage, we used the whole dataset, but we have chosen only Pepper Bell bacterial and Pepper Bell healthy for segmentation. Consider that the only burden for classification once the model is defined is to load the data. In turn, the model is just a fine-tuning of an already existing model, such as Resnet50V2. All parameters and accuracies for each network model available for fine-tuning classification are provided on the Keras Application page.</p>
<p>PlantLeaves consists of 4,502 images of healthy and diseased leaves divided into 22 categories including species and disease. From this dataset, we have chosen Pomegranate (P9) both diseased and healthy. There are 272 images of diseased Pomegranate (P9) and 287 healthy ones.</p>
<p>PlantVillage consists of 54,303 images of healthy and diseased leaves divided into 38 categories including species and disease. It is possible to download either the augmented or the non augmented set of images. As gathered above we have considered the whole dataset for classification, PepperBell healthy and PepperBell Bacterial spot for segmentation. PepperBell Bacterial spot are 998 images, and PepperBell healthy are 1,478 images.</p>
<p>CitrusLeaves consists of 594 images of healthy and diseased leaves with 4 diseased categories and one healthy. We have chosen healthy and Canker. Canker contains 163 images, while healthy contains 58 images.</p>
<p>The model for the above indicated datasets, from splitting up to segmentation, is similar to the tip-burn stress segmentation, starting from splitting. Yet, the preprocessing is quite different. Preprocessing, for the three datasets consists of removing the background and tightening the image within its bounding box. This last step is crucial for complying with the principle of decomposition, discussed at the beginning of the method, and also for avoiding overfitting due to background. We have automatically sampled from each image, the four corners with 8&#x000D7;8&#x000D7;3 pixels and 6 patches of the same size from the image center. We have then defined a MLP that could accept sub-patches of the size 8&#x000D7;8&#x000D7;3 to separate background and foreground. Some results, compared with the original image are shown in <xref ref-type="fig" rid="F7">Figure 7</xref>. Note that the background has a value (0, 0, 0), thus not influencing the CNN classification.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Preliminary leaves segmentation for PlantLeaves, PlantVillage, and CitrusLeaves, from left to right in the order.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0007.tif"/>
</fig>
<p>Being the image of PlantLeaves of size 4E &#x0002B; 3&#x000D7;6E &#x0002B; 3, we reduce them to 264&#x000D7;400 after automatic cropping with the MLP. On the other hand, we resize both CitrusLeaves and PlantLeaves to their original size 256&#x000D7;256, after automatic cropping with MLP. We do augmentation by flipping up and down, left and right, and blurring with a Gaussian filter with random variance &#x003C3; &#x02208; (0.5, 2), for both CitrusLeaves and PlantLeaves. We have not augmented PlantVillage, since it comes already augmented.</p>
<p>Also, differently from the model for tip-burn stress detection, for these datasets, we do the first splitting to a size of 70&#x000D7;70, with the same stride of <italic>s</italic><sub><italic>x</italic></sub> &#x0003D; <italic>s</italic><sub><italic>y</italic></sub> &#x0003D; 8, for tip-burn, and then we resize each patch to the original image size for classification. For classification, we have fine-tuned Resnet50V2, as for tip-burn data. The remaining of the model, from further splitting up to the GCN and the reconstruction, here just for the leaves, follows the same steps, which are the relevant novelties.</p>
</sec>
<sec id="s5">
<title>5. Experiments and Results</title>
<sec>
<title>5.1. Setup</title>
<p>The whole model is implemented in Tensorflow 2.5, on a GeForce RTX 3080, 300 HZ. For the ResNet50V2, we use the Keras API in Tensorflow. We used the Keras functional API for fine tuning the model, with all the provided advances, such as early stopping, and learning rate decay. For early stopping we used <italic>patience</italic> 4, with delta 0.001. For reducing the initial learning rate when on a plateau, we used a factor of 0.2 and a minimum learning rate of 0.001 starting with an initial learning rate of 0.1. For the loss, we used categorical cross entropy with Adam as the optimizer (Kingma and Ba, <xref ref-type="bibr" rid="B28">2014</xref>).</p>
<p>For the splitting and reconstruction, we use Tensorflow GradientTape, as the gradient computes both <italic>A</italic><sub><italic>overlap</italic></sub> and the mapping between <italic>CoP</italic> and <italic>RAM</italic> for both Equations (2) and (3).</p>
<p>We have implemented a good part of the GCN including the adjacency matrix, the features vectors and the joining step, to transfer probabilities, in Tensorflow. We used much intuition from DGL, an open-source graph library introduced by Zheng et al. (<xref ref-type="bibr" rid="B72">2021</xref>), though DGL is implemented in PyTorch. We also get inspired by Spektral of Grattarola and Alippi (<xref ref-type="bibr" rid="B22">2021</xref>), an open-source Python library, for building graph neural networks with TensorFlow and Keras interface. As specified in the Method Section, we have been using cross entropy loss and Adam optimizer like in the classifier together with early stopping.</p>
</sec>
<sec>
<title>5.2. Comparison With State of the Art</title>
<p>The main contribution of our work is weakly-supervised semantic segmentation with the only supervision being the image class labels, whether there is tip-burn or not. For classification, we have been using Resnet50V2, because it is quite flexible, and fine-tuned it. As we have already mentioned, we expect that if <italic>f</italic> is a classifier that classifies correctly <italic>X</italic> with probability <italic>p</italic>, if the classifier generalizes well, it would classify the resized <italic>X</italic>, namely <italic>S</italic>(<italic>X</italic>), approximately with the same probability <italic>p</italic>. This is shown to be correct for tip-burn and plant disease datasets CitrusLeaves, PlantLeaves, and PlantVillage, according to the principle of decomposition.</p>
<p><bold>Stress and disease detection</bold>. For training tip-burn CNN classification, we used 30 out of 43 images and compared our work with Gozzovelli et al. (<xref ref-type="bibr" rid="B21">2021</xref>), where DarkNet was used. For classification of the plant disease datasets, we considered the following recent works: Sujatha et al. (<xref ref-type="bibr" rid="B54">2021</xref>), Khattak et al. (<xref ref-type="bibr" rid="B27">2021</xref>), and Syed-Ab-Rahman et al. (<xref ref-type="bibr" rid="B57">2022</xref>) for CitrusLeaves; Mohameth et al. (<xref ref-type="bibr" rid="B36">2020</xref>), Mohanty et al. (<xref ref-type="bibr" rid="B37">2016</xref>), Chen et al. (<xref ref-type="bibr" rid="B12">2020</xref>), Agarwal et al. (<xref ref-type="bibr" rid="B5">2020</xref>), and Abbas et al. (<xref ref-type="bibr" rid="B2">2021</xref>) for PlantVillage; for PlantLeaves, we expose only our approach as there are no recent contributions.</p>
<p>Results are shown in <xref ref-type="table" rid="T1">Table 1</xref>, considering validation accuracy, as usual. The best accuracy in class is highlighted in bold. We note that for a number of species in PlantVillage, we obtained a validation accuracy of 1.0, in few epochs. Since we use early dropping, this was not caused by overfitting, as shown in <xref ref-type="fig" rid="F8">Figure 8</xref>, where it can be observed that for all datasets we have a small number of epochs. Note that we have also removed the background because, according to Barbedo (<xref ref-type="bibr" rid="B9">2019b</xref>), accuracy drops without the background. It seems possible that the background induces overfitting. In <xref ref-type="fig" rid="F8">Figure 8</xref>, we show some results motivated by changing the patience value in early stopping.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Tip-burn stress and plant disease detection.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>References</bold></th>
<th valign="top" align="center"><bold>Method</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Tip-Burn</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>CitrusLeaves</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>PlantVillage</bold></th>
<th valign="top" align="center"><bold>PlantLeaves</bold></th>
</tr>
<tr>
<th/>
<th/>
<th valign="top" align="center"><bold>Acc</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Acc</bold></th>
<th valign="top" align="center"><bold>Acc</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Acc</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Acc</bold></th>
</tr>
<tr>
<th/>
<th/>
<th/>
<th/>
<th valign="top" align="center"><bold>Canker</bold></th>
<th valign="top" align="center"><bold>Healthy</bold></th>
<th valign="top" align="center"><bold>Canker</bold></th>
<th valign="top" align="center"><bold>Healthy</bold></th>
<th valign="top" align="center"><bold>Whole</bold></th>
<th valign="top" align="center"><bold>Pommg</bold>.</th>
<th/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Our approach</td>
<td valign="top" align="left">Resnet50V2 (fine t.)</td>
<td valign="top" align="center"><bold>0.978</bold></td>
<td valign="top" align="center">0.983</td>
<td valign="top" align="center"><bold>0.964</bold></td>
<td valign="top" align="center"><bold>0.975</bold></td>
<td valign="top" align="center">0.981</td>
<td valign="top" align="center">0.963</td>
<td valign="top" align="center">0.989</td>
<td valign="top" align="center">0.958</td>
<td valign="top" align="center"><bold>0.984</bold></td>
</tr>
<tr>
<td valign="top" align="left">Gozzovelli et al., <xref ref-type="bibr" rid="B21">2021</xref></td>
<td valign="top" align="left">DarkNet</td>
<td valign="top" align="center">0.961</td>
<td valign="top" align="center">0.960</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Sujatha et al., <xref ref-type="bibr" rid="B54">2021</xref></td>
<td valign="top" align="left">InceptionV3/</td>
<td/>
<td/>
<td valign="top" align="center">0.937</td>
<td valign="top" align="center">0.965</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">VGG16</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Khattak et al., <xref ref-type="bibr" rid="B27">2021</xref></td>
<td valign="top" align="left">Own method</td>
<td/>
<td/>
<td valign="top" align="center">0.945</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Syed-Ab-Rahman et al., <xref ref-type="bibr" rid="B57">2022</xref></td>
<td valign="top" align="left">Faster R-CNN</td>
<td/>
<td/>
<td valign="top" align="center">0.945</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Mohanty et al., <xref ref-type="bibr" rid="B37">2016</xref></td>
<td valign="top" align="left">AlexNet</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center"><bold>0.993</bold></td>
<td valign="top" align="center">0.972</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">GoogleLeNet</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Chen et al., <xref ref-type="bibr" rid="B12">2020</xref></td>
<td valign="top" align="left">Own Method</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">0.918</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Mohameth et al., <xref ref-type="bibr" rid="B36">2020</xref></td>
<td valign="top" align="left">Resnet50</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">InceptionV3</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">MobileNet</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Agarwal et al., <xref ref-type="bibr" rid="B5">2020</xref></td>
<td valign="top" align="left">VGG16</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">0.912</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Abbas et al., <xref ref-type="bibr" rid="B2">2021</xref></td>
<td valign="top" align="left">DensNet121 &#x0002B;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">0.971</td>
<td valign="top" align="center">0.97</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Synthetic Images</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Sharma et al., <xref ref-type="bibr" rid="B48">2020</xref></td>
<td valign="top" align="left">Own method</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">0.986</td>
<td/>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best results are highlighted in bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>The first graph on the left shows the validation accuracy and the loss for Tip-burn, PlantsVillage, CitrusLeaves, and PlantLeaves, with patience 4 for both early stopping and for updating the learning rule. We can observe that the maximum number of epochs is 13 for PlantsVillage. In the central graph, we see a paradox, validation hits 1% before training, for the grape class in PlantsVillage. On the right, we observe the convergence for tip-burn at 6 epochs.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0008.tif"/>
</fig>
<p>We split the leaves datasets 70/30% between train and validation plus test as in Mohanty et al. (<xref ref-type="bibr" rid="B37">2016</xref>). As shown by Mohanty et al. (<xref ref-type="bibr" rid="B37">2016</xref>), GoogleLeNet is the best backbone for plant disease detection, and we recall that the assessed performance of Resnet50V2 on the Imagenet validation set is 0.760 on Top-1 accuracy and 0.930 on Top-5 accuracy.</p>
</sec>
<sec>
<title>5.3. Tip-Burn Segmentation and Plant Datasets Segmentation</title>
<p><bold>Testing accuracy by manually labeling ground-truth</bold>. Typically, accuracy metrics for segmentation are <italic>F</italic><sub>1</sub>, in the context of segmentation referred to as Dice similarity coefficient, and Intersection over Union (IoU), both required to compute the corresponding pixels (true positive), the exceeding pixels (false positives) or lacking pixels (false negative) between the ground truth mask and the estimated masks. Because, in none of the available datasets, we have ground truth masks available, we introduce the patch-based method that is not too demanding to obtain an approximate Dice coefficient. Here, approximate means that instead of computing the pixels we compute the super-pixels and also it means that we use a reduced number of test samples.</p>
<p>We consider 1 test image from the table canopy images and 20 test images for each of the plant disease datasets. Note that 1 test image is the half image of a table canopy, and it amounts to 346,464 images of size 64&#x000D7;64&#x000D7;3, that is 2, 406&#x000B7;16&#x000B7;9.</p>
<p>Now, assuming that we have segmented the test images, by the automatic decomposition and recomposition process, by definition of the model, we have made available all the patches that contribute to the final estimated segmentation. These patches are actually vectors <italic>Z</italic> holding the probability that the corresponding RGB vector is of class tip-burn or not, as estimated by the GCN and similarly, for the other datasets. At the same time, according to the described model, there is a one to one correspondence between the patches in <italic>CoP</italic><sub><italic>new</italic></sub> and <italic>CO</italic><sub><italic>new</italic></sub> and there is a correspondence, by Equation (2), between the patches and the attention map <italic>RAM</italic>, hence the image, and the final segmentation map estimated. So, it is enough to choose the patches in <italic>CO</italic><sub><italic>new</italic></sub>. The manually chosen patches are immediately aligned with <italic>CoP</italic><sub><italic>new</italic></sub> and the final segmentation. That is, suppose we have chosen a patch <italic>X</italic><sub><italic>j</italic></sub> which will be of size 8&#x000D7;8&#x000D7;3, by definition of the model, then we have automatically selected 64 pixels, and the process is significantly sped up. Once the patches are selected, we know the corresponding value in the segmentation map and can compute both the Dice similarity coefficient at sub-patch or super-pixel level, instead of pixel level, and the IoU. Let <italic>X</italic> &#x0003D; {<italic>X</italic><sub><italic>j</italic></sub>|<italic>X</italic><sub><italic>j</italic></sub> &#x02208; selected}, where each selected patch has value 1, and <italic>Y</italic> the corresponding patches in <inline-formula><mml:math id="M23"><mml:mi>C</mml:mi><mml:mi>O</mml:mi><mml:msubsup><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> with value <italic>Z</italic> computed by the GCN, the <italic>DSC</italic><sub><italic>patch</italic></sub>= 2(<italic>X</italic>&#x02229;<italic>Y</italic>)/(|<italic>X</italic>| &#x0002B; |<italic>Y</italic>|), with | &#x000B7; | the cardinality.</p>
<p><xref ref-type="table" rid="T2">Table 2</xref> gives the results for the approximate meanIoU and F1 (Dice similarity coefficient), computed according to sub-patches (super-pixels) in place of pixels, and according to a limited subset of the test images. We give some ablation too.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Segmentation and ablations.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Similarity metrics</bold></th>
<th valign="top" align="center"><bold>Tip-burn</bold></th>
<th valign="top" align="center"><bold>Citrus leaves</bold></th>
<th valign="top" align="center"><bold>PlantVillage</bold></th>
<th valign="top" align="center"><bold>PlantLeaves</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="5"><bold>Segmentation by thresholding the Attention Map RAM</bold></td>
</tr>
<tr>
<td valign="top" align="left">Dice similarity coefficient (DSC/<italic>F</italic><sub>1</sub>)</td>
<td valign="top" align="center">0.7827</td>
<td valign="top" align="center">0.6996</td>
<td valign="top" align="center">0.6799</td>
<td valign="top" align="center">0.7326</td>
</tr>
<tr>
<td valign="top" align="left">Intersection over union (IOU)</td>
<td valign="top" align="center">0.6430</td>
<td valign="top" align="center">0.5380</td>
<td valign="top" align="center">0.5150</td>
<td valign="top" align="center">0.5780</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>2 Layers GCN</bold></td>
</tr>
<tr>
<td valign="top" align="left">Dice similarity coefficient (DSC/<italic>F</italic><sub>1</sub>)</td>
<td valign="top" align="center">0.8386</td>
<td valign="top" align="center">0.7277</td>
<td valign="top" align="center">0.6868</td>
<td valign="top" align="center">0.7908</td>
</tr>
<tr>
<td valign="top" align="left">Intersection over union (IOU)</td>
<td valign="top" align="center">0.7220</td>
<td valign="top" align="center">0.5720</td>
<td valign="top" align="center">0.5230</td>
<td valign="top" align="center">0.6540</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>3 Layers GCN</bold></td>
</tr>
<tr>
<td valign="top" align="left">Dice similarity coefficient (DSC/<italic>F</italic><sub>1</sub>)</td>
<td valign="top" align="center">0.8499</td>
<td valign="top" align="center">0.7326</td>
<td valign="top" align="center">0.6292</td>
<td valign="top" align="center">0.7974</td>
</tr>
<tr>
<td valign="top" align="left">Intersection over union (IOU)</td>
<td valign="top" align="center">0.7390</td>
<td valign="top" align="center">0.5780</td>
<td valign="top" align="center">0.4590</td>
<td valign="top" align="center">0.6630</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Doubling the number of nodes in the graph</bold></td>
</tr>
<tr>
<td valign="top" align="left">Dice similarity coefficient (DSC/<italic>F</italic><sub>1</sub>)</td>
<td valign="top" align="center">0.7797</td>
<td valign="top" align="center">0.7105</td>
<td valign="top" align="center">0.6217</td>
<td valign="top" align="center">0.7021</td>
</tr>
<tr>
<td valign="top" align="left">Intersection over union (IOU)</td>
<td valign="top" align="center">0.6540</td>
<td valign="top" align="center">0.551</td>
<td valign="top" align="center">0.4511</td>
<td valign="top" align="center">0.541</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>Ablation</bold>. Consider first the segmentation using just thresholding of the attention map RAM. Introducing the CGN with two layers, we observe an improvement for all models. Extending the GCN to three layers, we observe that accuracy improves for all models but for PlantVillage. It is interesting to note also that doubling the number of nodes in the GCN lowers the accuracy for all models, which is reasonable because we have to choose patches with lower probability of being tip-burn.</p>
<p>In <xref ref-type="fig" rid="F9">Figure 9</xref>, we provide some qualitative results facilitating an understanding of the extremely good results of the model.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Qualitative results of the weakly supervised semantic segmentation of tip-burn stress and of disease spot and lesions on PlantVillage, PlantLeaves, and CitrusLeaves.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-874035-g0009.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>In this paper, we have introduced a new method for detection and localization of tip-burn stress in large plant canopies grown in plant factories. The idea is very simple to implement, and the only supervised step is a classification of the image, namely just knowing the class in the image. We have shown that the method obtains quite nice refined weakly self-supervised segmentation for tip-burn stress.</p>
<p>We have tested our method both on publicly available datasets, such as PlantVillage, PlantLeaves, and CitrusLeaves, and in operating conditions inside a plant factory showing the flexibility of our model. The results show that plant stress detection and localization can be done automatically in Controlled Environment Agriculture conditions.</p>
</sec>
<sec sec-type="data-availability" id="s7">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s8">
<title>Ethics Statement</title>
<p>Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec id="s9">
<title>Author Contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p>
</sec>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>The research has been partly funded by Agricola Moderna.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>BF was employed by Agricola Moderna. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack><p>The authors would like to acknowledge Malik Bekmurat for having collected the data set, and the plant scientists at Agricola Moderna for their insight into plant physiology.</p></ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abade</surname> <given-names>A. S.</given-names></name> <name><surname>Ferreira</surname> <given-names>P. A.</given-names></name> <name><surname>Vidal</surname> <given-names>F., d. B.</given-names></name></person-group> (<year>2020</year>). <article-title>Plant diseases recognition on images using convolutional neural networks: a systematic review</article-title>. <source>arXiv preprint</source> <fpage>arXiv:2009.04365</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2021.106125</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abbas</surname> <given-names>A.</given-names></name> <name><surname>Jain</surname> <given-names>S.</given-names></name> <name><surname>Gour</surname> <given-names>M.</given-names></name> <name><surname>Vankudothu</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Tomato plant disease detection using transfer learning with c-gan synthetic images</article-title>. <source>Comput. Electron. Agric</source>. <volume>187</volume>, <fpage>106279</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2021.106279</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abdu</surname> <given-names>A. M.</given-names></name> <name><surname>Mokji</surname> <given-names>M.</given-names></name> <name><surname>Sheikh</surname> <given-names>U.</given-names></name></person-group> (<year>2018</year>). <article-title>An investigation into the effect of disease symptoms segmentation boundary limit on classifier performance in application of machine learning for plant disease detection</article-title>. <source>Int. J. Agric. Forestry Plantation</source> <volume>7</volume>, <fpage>33</fpage>&#x02013;<lpage>40</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Abdu</surname> <given-names>A. M.</given-names></name> <name><surname>Mokji</surname> <given-names>M. M.</given-names></name> <name><surname>Sheikh</surname> <given-names>U. U.</given-names></name></person-group> (<year>2019</year>). <article-title>An automatic plant disease symptom segmentation concept based on pathological analogy</article-title>, in <source>2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC)</source> (<publisher-loc>Shah Alam</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>94</fpage>&#x02013;<lpage>99</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agarwal</surname> <given-names>M.</given-names></name> <name><surname>Singh</surname> <given-names>A.</given-names></name> <name><surname>Arjaria</surname> <given-names>S.</given-names></name> <name><surname>Sinha</surname> <given-names>A.</given-names></name> <name><surname>Gupta</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>ToLeD: tomato leaf disease detection using convolution neural network</article-title>. <source>Procedia Comput. Sci</source>. <volume>167</volume>, <fpage>293</fpage>&#x02013;<lpage>301</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2020.03.225</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Araslanov</surname> <given-names>N.</given-names></name> <name><surname>Roth</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Single-stage semantic segmentation from image labels</article-title>, in <source>CVPR</source>, <fpage>4253</fpage>&#x02013;<lpage>4262</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00431</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barbedo</surname> <given-names>J. G. A.</given-names></name></person-group> (<year>2017</year>). <article-title>A new automatic method for disease symptom segmentation in digital photographs of plant leaves</article-title>. <source>Eur. J. Plant Pathol</source>. <volume>147</volume>, <fpage>349</fpage>&#x02013;<lpage>364</lpage>. <pub-id pub-id-type="doi">10.1007/s10658-016-1007-6</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barbedo</surname> <given-names>J. G. A.</given-names></name></person-group> (<year>2019a</year>). <article-title>Detection of nutrition deficiencies in plants using proximal images and machine learning: a review</article-title>. <source>Comput. Electron. Agric</source>. <volume>162</volume>, <fpage>482</fpage>&#x02013;<lpage>492</lpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2019.04.035</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barbedo</surname> <given-names>J. G. A.</given-names></name></person-group> (<year>2019b</year>). <article-title>Plant disease identification from individual lesions and spots using deep learning</article-title>. <source>Biosyst. Eng</source>. <volume>180</volume>, <fpage>96</fpage>&#x02013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2019.02.002</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chan</surname> <given-names>L.</given-names></name> <name><surname>Hosseini</surname> <given-names>M. S.</given-names></name> <name><surname>Plataniotis</surname> <given-names>K. N.</given-names></name></person-group> (<year>2021</year>). <article-title>A comprehensive analysis of weakly-supervised semantic segmentation in different image domains</article-title>. <source>Int. J. Comput. Vision</source> <volume>129</volume>, <fpage>361</fpage>&#x02013;<lpage>384</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-020-01373-4</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>Y.-T.</given-names></name> <name><surname>Wang</surname> <given-names>Q.</given-names></name> <name><surname>Hung</surname> <given-names>W.-C.</given-names></name> <name><surname>Piramuthu</surname> <given-names>R.</given-names></name> <name><surname>Tsai</surname> <given-names>Y.-H.</given-names></name> <name><surname>Yang</surname> <given-names>M.-H.</given-names></name></person-group> (<year>2020</year>). <article-title>Weakly-supervised semantic segmentation via sub-category exploration</article-title>, in <source>CVPR</source>, <fpage>8991</fpage>&#x02013;<lpage>9000</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00901</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Nanehkaran</surname> <given-names>Y. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Using deep transfer learning for image-based plant disease identification</article-title>. <source>Comput. Electron. Agric</source>. <volume>173</volume>, <fpage>105393</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2020.105393</pub-id><pub-id pub-id-type="pmid">33121188</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chouhan</surname> <given-names>S.</given-names></name> <name><surname>Koul</surname> <given-names>A.</given-names></name> <name><surname>Singh</surname> <given-names>D. U.</given-names></name> <name><surname>Jain</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>A data repository of leaf images: Practice towards plant conservation with plant Pathology</article-title>, in <source>2019 4th International Conference on Information Systems and Computer Networks (ISCON)</source> (<publisher-loc>Mathura</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cox</surname> <given-names>E.</given-names></name> <name><surname>McKee</surname> <given-names>J.</given-names></name></person-group> (<year>1976</year>). <article-title>A comparison of tipburn susceptibility in lettuce under field and glasshouse conditions</article-title>. <source>J. Hortic. Sci</source>. <volume>51</volume>, <fpage>117</fpage>&#x02013;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.1080/00221589.1976.11514671</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cox</surname> <given-names>E.</given-names></name> <name><surname>McKee</surname> <given-names>J.</given-names></name> <name><surname>Dearman</surname> <given-names>A.</given-names></name></person-group> (<year>1976</year>). <article-title>The effect of growth rate on tipburn occurrence in lettuce</article-title>. <source>J. Hortic. Sci</source>. <volume>51</volume>, <fpage>297</fpage>&#x02013;<lpage>309</lpage>. <pub-id pub-id-type="doi">10.1080/00221589.1976.11514693</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeChant</surname> <given-names>C.</given-names></name> <name><surname>Wiesner-Hanks</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>S.</given-names></name> <name><surname>Stewart</surname> <given-names>E.</given-names></name> <name><surname>Yosinski</surname> <given-names>J.</given-names></name> <name><surname>Gore</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Automated identification of northern leaf blight-infected maize plants from field imagery using deep learning</article-title>. <source>Phytopathology</source> <volume>107</volume>, <fpage>1426</fpage>&#x02013;<lpage>1432</lpage>. <pub-id pub-id-type="doi">10.1094/PHYTO-11-16-0417-R</pub-id><pub-id pub-id-type="pmid">28653579</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Defferrard</surname> <given-names>M.</given-names></name> <name><surname>Bresson</surname> <given-names>X.</given-names></name> <name><surname>Vandergheynst</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>Convolutional neural networks on graphs with fast localized spectral filtering</article-title>. <source>Adv. Neural Inf. Process. Syst</source>. <volume>29</volume>, <fpage>3844</fpage>&#x02013;<lpage>3852</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Shi</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>Sparsity-based image denoising via dictionary learning and structural clustering</article-title>, in <source>CVPR</source>, <fpage>457</fpage>&#x02013;<lpage>464</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2011.5995478</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Douarre</surname> <given-names>C.</given-names></name> <name><surname>Crispim-Junior</surname> <given-names>C. F.</given-names></name> <name><surname>Gelibert</surname> <given-names>A.</given-names></name> <name><surname>Tougne</surname> <given-names>L.</given-names></name> <name><surname>Rousseau</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Novel data augmentation strategies to boost supervised segmentation of plant disease</article-title>. <source>Comput. Electron. Agric</source>. <volume>165</volume>, <fpage>104967</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2019.104967</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Giuffrida</surname> <given-names>V.</given-names></name> <name><surname>Scharr</surname> <given-names>H.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2017</year>). <article-title>Arigan: Synthetic arabidopsis plants using generative adversarial network</article-title>, in <source>Proceedings of the IEEE International Conference on Computer Vision Workshops</source> (<publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2064</fpage>&#x02013;<lpage>2071</lpage>.</citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gozzovelli</surname> <given-names>R.</given-names></name> <name><surname>Franchetti</surname> <given-names>B.</given-names></name> <name><surname>Bekmurat</surname> <given-names>M.</given-names></name> <name><surname>Pirri</surname> <given-names>F.</given-names></name></person-group> (<year>2021</year>). <article-title>Tip-burn stress detection of lettuce canopy grown in plant factories</article-title>, in <source>Proceedings of the IEEE/CVF International Conference on Computer Vision</source> (<publisher-loc>Montreal, BC</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1259</fpage>&#x02013;<lpage>1268</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grattarola</surname> <given-names>D.</given-names></name> <name><surname>Alippi</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Graph neural networks in tensorflow and keras with spektral [application notes]</article-title>. <source>IEEE Comput. Intell. Mag</source>. <volume>16</volume>, <fpage>99</fpage>&#x02013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1109/MCI.2020.3039072</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>J.</given-names></name> <name><surname>Kuen</surname> <given-names>J.</given-names></name> <name><surname>Joty</surname> <given-names>S.</given-names></name> <name><surname>Cai</surname> <given-names>J.</given-names></name> <name><surname>Morariu</surname> <given-names>V.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Self-supervised relationship probing</article-title>. <source>Adv. Neural Inf. Process. Syst</source>. <volume>33</volume>, <fpage>1841</fpage>&#x02013;<lpage>1853</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hassan</surname> <given-names>S. M.</given-names></name> <name><surname>Maji</surname> <given-names>A. K.</given-names></name></person-group> (<year>2022</year>). <article-title>Plant disease identification using a novel convolutional neural network</article-title>. <source>IEEE Access</source> <volume>10</volume>, <fpage>5390</fpage>&#x02013;<lpage>5401</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2022.3141371</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>H.</given-names></name> <name><surname>Ji</surname> <given-names>D.</given-names></name> <name><surname>Gan</surname> <given-names>W.</given-names></name> <name><surname>Bai</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>W.</given-names></name> <name><surname>Yan</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Class-wise dynamic graph convolution for semantic segmentation</article-title>, in <source>European Conference on Computer Vision</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>17</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hughes</surname> <given-names>D. P.</given-names></name> <name><surname>Salathe</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>An open access repository of images on plant health to enable the development of mobile disease diagnostics</article-title>. <source>arXiv preprint</source>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khattak</surname> <given-names>A.</given-names></name> <name><surname>Asghar</surname> <given-names>M. U.</given-names></name> <name><surname>Batool</surname> <given-names>U.</given-names></name> <name><surname>Asghar</surname> <given-names>M. Z.</given-names></name> <name><surname>Ullah</surname> <given-names>H.</given-names></name> <name><surname>Al-Rakhami</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Automatic detection of citrus fruit and leaves diseases using deep neural network model</article-title>. <source>IEEE Access</source> <volume>9</volume>, <fpage>112942</fpage>&#x02013;<lpage>112954</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3096895</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization</article-title>. <source>arXiv preprint</source> arXiv:1412.6980.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>T. N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Semi-supervised classification with graph convolutional networks</article-title>. <source>arXiv preprint</source> arXiv:1609.02907.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kynk&#x000E4;&#x000E4;nniemi</surname> <given-names>T.</given-names></name> <name><surname>Karras</surname> <given-names>T.</given-names></name> <name><surname>Laine</surname> <given-names>S.</given-names></name> <name><surname>Lehtinen</surname> <given-names>J.</given-names></name> <name><surname>Aila</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Improved precision and recall metric for assessing generative models</article-title>. <source>arXiv preprint</source> arXiv:1904.06991.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Gupta</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Beyond grids: learning graph representations for visual recognition</article-title>. <source>Adv. Neural Inf. Process. Syst</source>. <volume>31</volume>, <fpage>9225</fpage>&#x02013;<lpage>9235</lpage>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>J.</given-names></name> <name><surname>Tan</surname> <given-names>L.</given-names></name> <name><surname>Jiang</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Review on convolutional neural network (cnn) applied to plant leaf disease classification</article-title>. <source>Agriculture</source> <volume>11</volume>, <fpage>707</fpage>. <pub-id pub-id-type="doi">10.3390/agriculture11080707</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lutman</surname> <given-names>B. F.</given-names></name></person-group> (<year>1919</year>). <article-title>Tip burn of the potato and other plants</article-title>. <source>Vermont Agric. Exp. Station</source>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mirza</surname> <given-names>M.</given-names></name> <name><surname>Osindero</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Conditional generative adversarial nets</article-title>. <source>arXiv preprint</source> arXiv:1411.1784.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mishra</surname> <given-names>S.</given-names></name> <name><surname>Sachan</surname> <given-names>R.</given-names></name> <name><surname>Rajpal</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Deep convolutional neural network based detection system for real-time corn plant disease recognition</article-title>. <source>Procedia Comput. Sci</source>. <volume>167</volume>, <fpage>2003</fpage>&#x02013;<lpage>2010</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2020.03.236</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohameth</surname> <given-names>F.</given-names></name> <name><surname>Bingcai</surname> <given-names>C.</given-names></name> <name><surname>Sada</surname> <given-names>K. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Plant disease detection with deep learning and feature extraction using plant village</article-title>. <source>J. Comput. Commun</source>. <volume>8</volume>, <fpage>10</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.4236/jcc.2020.86002</pub-id><pub-id pub-id-type="pmid">35062534</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohanty</surname> <given-names>S.</given-names></name> <name><surname>Hughes</surname> <given-names>D.</given-names></name> <name><surname>Salathe</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Using deep learning for image-based plant disease detection</article-title>. <source>Front Plant Sci</source>. <volume>7</volume>, <fpage>1419</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2016.01419</pub-id><pub-id pub-id-type="pmid">27713752</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mwebaze</surname> <given-names>E.</given-names></name> <name><surname>Gebru</surname> <given-names>T.</given-names></name> <name><surname>Frome</surname> <given-names>A.</given-names></name> <name><surname>Nsumba</surname> <given-names>S.</given-names></name> <name><surname>Tusubira</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>icassava 2019fine-grained visual categorization challenge</article-title>. <source>arXiv preprint</source>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagasubramanian</surname> <given-names>K.</given-names></name> <name><surname>Jones</surname> <given-names>S.</given-names></name> <name><surname>Singh</surname> <given-names>A.</given-names></name> <name><surname>Singh</surname> <given-names>A.</given-names></name> <name><surname>Ganapathysubramanian</surname> <given-names>B.</given-names></name> <name><surname>Sarkar</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Explaining hyperspectral imaging based plant disease identification: 3d cnn and saliency maps</article-title>. <source>arXiv preprint</source>.</citation>
</ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Noh</surname> <given-names>H.</given-names></name> <name><surname>Hong</surname> <given-names>S.</given-names></name> <name><surname>Han</surname> <given-names>B.</given-names></name></person-group> (<year>2015</year>). <article-title>Learning deconvolution network for semantic segmentation</article-title>, in <source>Proceedings of the IEEE International Conference on Computer Vision</source> (<publisher-loc>Santiago</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1520</fpage>&#x02013;<lpage>1528</lpage>.</citation>
</ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Nowak</surname> <given-names>E.</given-names></name> <name><surname>Jurie</surname> <given-names>F.</given-names></name> <name><surname>Triggs</surname> <given-names>B.</given-names></name></person-group> (<year>2006</year>). <article-title>Sampling strategies for bag-of-features image classification</article-title>, in <source>ECCV</source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>490</fpage>&#x02013;<lpage>503</lpage>.</citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>F.</given-names></name> <name><surname>Shin</surname> <given-names>I.</given-names></name> <name><surname>Rameau</surname> <given-names>F.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name> <name><surname>Kweon</surname> <given-names>I. S.</given-names></name></person-group> (<year>2020</year>). <article-title>Unsupervised intra-domain adaptation for semantic segmentation through self-supervision</article-title>, in <source>CVPR</source>, <fpage>3764</fpage>&#x02013;<lpage>3773</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00382</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Patidar</surname> <given-names>S.</given-names></name> <name><surname>Pandey</surname> <given-names>A.</given-names></name> <name><surname>Shirish</surname> <given-names>B. A.</given-names></name> <name><surname>Sriram</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Rice plant disease detection and classification using deep residual learning</article-title>, in <source>International Conference on Machine Learning, Image Processing, Network Security and Data Sciences</source> (<publisher-name>Springer</publisher-name>), <fpage>278</fpage>&#x02013;<lpage>293</lpage>.</citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prajapati</surname> <given-names>H. B.</given-names></name> <name><surname>Shah</surname> <given-names>J. P.</given-names></name> <name><surname>Dabhi</surname> <given-names>V. K.</given-names></name></person-group> (<year>2017</year>). <article-title>Detection and classification of rice plant diseases</article-title>. <source>Intell. Decis. Technol</source>. <volume>11</volume>, <fpage>357</fpage>&#x02013;<lpage>373</lpage>. <pub-id pub-id-type="doi">10.3233/IDT-170301</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rauf</surname> <given-names>H. T.</given-names></name> <name><surname>Saleem</surname> <given-names>B. A.</given-names></name> <name><surname>Lali</surname> <given-names>M. I. U.</given-names></name> <name><surname>Khan</surname> <given-names>M. A.</given-names></name> <name><surname>Sharif</surname> <given-names>M.</given-names></name> <name><surname>Bukhari</surname> <given-names>S. A. C.</given-names></name></person-group> (<year>2019</year>). <article-title>A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning</article-title>. <source>Data Brief</source> <volume>26</volume>, <fpage>104340</fpage>. <pub-id pub-id-type="doi">10.1016/j.dib.2019.104340</pub-id><pub-id pub-id-type="pmid">31516936</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Redmon</surname> <given-names>J.</given-names></name> <name><surname>Farhadi</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Yolo9000: Better, faster, Stronger</article-title>, in <source>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saleem</surname> <given-names>M. H.</given-names></name> <name><surname>Khanchi</surname> <given-names>S.</given-names></name> <name><surname>Potgieter</surname> <given-names>J.</given-names></name> <name><surname>Arif</surname> <given-names>K. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Image-based plant disease identification by deep learning meta-architectures</article-title>. <source>Plants</source> <volume>9</volume>, <fpage>1451</fpage>. <pub-id pub-id-type="doi">10.3390/plants9111451</pub-id><pub-id pub-id-type="pmid">33121188</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname> <given-names>P.</given-names></name> <name><surname>Berwal</surname> <given-names>Y. P. S.</given-names></name> <name><surname>Ghai</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Performance analysis of deep learning cnn models for disease detection in plants using image segmentation</article-title>. <source>Inf. Process. Agric</source>. <volume>7</volume>, <fpage>566</fpage>&#x02013;<lpage>574</lpage>. <pub-id pub-id-type="doi">10.1016/j.inpa.2019.11.001</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shimamura</surname> <given-names>S.</given-names></name> <name><surname>Uehara</surname> <given-names>K.</given-names></name> <name><surname>Koakutsu</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Automatic identification of plant physiological disorders in plant factories with artificial light using convolutional neural networks</article-title>. <source>Int. J. New Comput. Archi. Appl</source>. <volume>9</volume>, <fpage>25</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.17781/P002611</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shrivastava</surname> <given-names>V.</given-names></name> <name><surname>Pradhan</surname> <given-names>M.</given-names></name> <name><surname>Minz</surname> <given-names>S.</given-names></name> <name><surname>Thakur</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Rice plant disease classification using transfer learning of deep convolution neural network</article-title>, in <source>ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLII-3/W6</source>, <fpage>631</fpage>&#x02013;<lpage>635</lpage>. <pub-id pub-id-type="doi">10.5194/isprs-archives-XLII-3-W6-631-2019</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>D.</given-names></name> <name><surname>Jain</surname> <given-names>N.</given-names></name> <name><surname>Jain</surname> <given-names>P.</given-names></name> <name><surname>Kayal</surname> <given-names>P.</given-names></name> <name><surname>Kumawat</surname> <given-names>S.</given-names></name> <name><surname>Batra</surname> <given-names>N.</given-names></name></person-group> (<year>2020</year>). <article-title>Plantdoc</article-title>, in <source>Proceedings of the 7th ACM IKDD CoDS and 25th COMAD</source>, <fpage>249</fpage>&#x02013;<lpage>253</lpage>. <pub-id pub-id-type="doi">10.1145/3371158.3371196</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sodjinou</surname> <given-names>S. G.</given-names></name> <name><surname>Mohammadi</surname> <given-names>V.</given-names></name> <name><surname>Mahama</surname> <given-names>A. T. S.</given-names></name> <name><surname>Gouton</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>A deep semantic segmentation-based algorithm to segment crops and weeds in agronomic color images</article-title>. <source>Inf. Process. Agric</source>. <pub-id pub-id-type="doi">10.1016/j.inpa.2021.08.003</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Son</surname> <given-names>J. E.</given-names></name> <name><surname>Takakura</surname> <given-names>T.</given-names></name></person-group> (<year>1989</year>). <article-title>Effect of ec of nutrient solution and light condition on transpiration and tipburn injury of lettuce in a plant factory</article-title>. <source>J. Agric. Meteorol</source>. <volume>44</volume>, <fpage>253</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.2480/agrmet.44.253</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sujatha</surname> <given-names>R.</given-names></name> <name><surname>Chatterjee</surname> <given-names>J. M.</given-names></name> <name><surname>Jhanjhi</surname> <given-names>N.</given-names></name> <name><surname>Brohi</surname> <given-names>S. N.</given-names></name></person-group> (<year>2021</year>). <article-title>Performance of deep learning vs machine learning in plant leaf disease detection</article-title>. <source>Microprocess Microsyst</source>. <volume>80</volume>, <fpage>103615</fpage>. <pub-id pub-id-type="doi">10.1016/j.micpro.2020.103615</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>G.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Dai</surname> <given-names>J.</given-names></name> <name><surname>Van Gool</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Mining cross-image semantics for weakly supervised semantic segmentation</article-title>, in <source>ECCV</source>, <fpage>347</fpage>&#x02013;<lpage>365</lpage>. <pub-id pub-id-type="pmid">35439127</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Barnes</surname> <given-names>N.</given-names></name></person-group> (<year>2022</year>). <article-title>Inferring the class conditional response map for weakly supervised semantic segmentation</article-title>, in <source>WACV</source>, <fpage>2878</fpage>&#x02013;<lpage>2887</lpage>.</citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Syed-Ab-Rahman</surname> <given-names>S. F.</given-names></name> <name><surname>Hesamian</surname> <given-names>M. H.</given-names></name> <name><surname>Prasad</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>Citrus disease detection and classification using end-to-end anchor-based deep learning model</article-title>. <source>Appl. Intell</source>. <volume>52</volume>, <fpage>927</fpage>&#x02013;<lpage>938</lpage>. <pub-id pub-id-type="doi">10.1007/s10489-021-02452-w</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terentev</surname> <given-names>A.</given-names></name> <name><surname>Dolzhenko</surname> <given-names>V.</given-names></name> <name><surname>Fedotov</surname> <given-names>A.</given-names></name> <name><surname>Eremenko</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>Current state of hyperspectral remote sensing for early plant disease detection: a review</article-title>. <source>Sensors</source> <volume>22</volume>, <fpage>757</fpage>. <pub-id pub-id-type="doi">10.3390/s22030757</pub-id><pub-id pub-id-type="pmid">35161504</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Termohlen</surname> <given-names>G.</given-names></name> <name><surname>Hoeven</surname> <given-names>A. V.</given-names></name></person-group> (<year>1965</year>). <article-title>Tipburn symptoms in lettuce</article-title>. <source>Sympos. Veget. Growing Glass</source> <volume>4</volume>, <fpage>105</fpage>&#x02013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.17660/ActaHortic.1966.4.21</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname> <given-names>Y.-W.</given-names></name> <name><surname>Li</surname> <given-names>C.-H.</given-names></name></person-group> (<year>2004</year>). <article-title>Color image segmentation method based on statistical pattern recognition for plant disease diagnose</article-title>. <source>J Jilin Univer. Technol</source>. <volume>2</volume>, <fpage>28</fpage>.</citation>
</ref>
<ref id="B61">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vippon Preet Kour</surname> <given-names>S. A.</given-names></name></person-group> (<year>2019</year>). <source>Plantaek: A Leaf Database of Native Plants of Jammu and Kashmir</source> (<publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>359</fpage>&#x02013;<lpage>368</lpage>.</citation>
</ref>
<ref id="B62">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Kan</surname> <given-names>M.</given-names></name> <name><surname>Shan</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation</article-title>, in <source>CVPR</source>, <fpage>12275</fpage>&#x02013;<lpage>12284</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.01229</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Watchareeruetai</surname> <given-names>U.</given-names></name> <name><surname>Noinongyao</surname> <given-names>P.</given-names></name> <name><surname>Wattanapaiboonsuk</surname> <given-names>C.</given-names></name> <name><surname>Khantiviriya</surname> <given-names>P.</given-names></name> <name><surname>Duangsrisai</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Identification of plant nutrient deficiencies using convolutional neural networks</article-title>, in <source>2018 International Electrical Engineering Congress (iEECON)</source> (<publisher-loc>Krabi</publisher-loc>,: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>4</lpage>.</citation>
</ref>
<ref id="B64">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>T.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>G.</given-names></name> <name><surname>Wei</surname> <given-names>X.</given-names></name> <name><surname>Wei</surname> <given-names>X.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Embedded discriminative attention mechanism for weakly supervised semantic segmentation</article-title>, in <source>CVPR</source>, <fpage>16765</fpage>&#x02013;<lpage>16774</lpage>.</citation>
</ref>
<ref id="B65">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>M.-K.</given-names></name> <name><surname>Huang</surname> <given-names>S.-J.</given-names></name></person-group> (<year>2021</year>). <article-title>Partial multi-label learning with noisy label identification</article-title>, in <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>. <pub-id pub-id-type="pmid">33587695</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Soatto</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Fda: fourier domain adaptation for semantic segmentation</article-title>, in <source>CVPR</source>, <fpage>4085</fpage>&#x02013;<lpage>4095</lpage>.</citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname> <given-names>Q.</given-names></name> <name><surname>Gong</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>Saliency guided self-attention network for weakly and semi-supervised semantic segmentation</article-title>. <source>IEEE Access</source> <volume>8</volume>, <fpage>14413</fpage>&#x02013;<lpage>14423</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.2966647</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zeiler</surname> <given-names>M. D.</given-names></name> <name><surname>Fergus</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Visualizing and understanding convolutional networks</article-title>, in <source>ECCV</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>818</fpage>&#x02013;<lpage>833</lpage>.</citation>
</ref>
<ref id="B69">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>F.</given-names></name> <name><surname>Gu</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Dai</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Complementary patch for weakly supervised semantic segmentation</article-title>, in <source>ICCV</source>, <fpage>7242</fpage>&#x02013;<lpage>7251</lpage>.</citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>S.-X.</given-names></name></person-group> (<year>2007</year>). <article-title>A study on the segmentation method in image processing for plant disease of gree nhouse</article-title>. <source>J. Inner Mongolia Agric. Univer</source>. <volume>3</volume>, <fpage>1009</fpage>&#x02013;<lpage>3575</lpage>.</citation>
</ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Han</surname> <given-names>L.</given-names></name> <name><surname>Dong</surname> <given-names>Y.</given-names></name> <name><surname>Shi</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>W.</given-names></name> <name><surname>HAN</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>A deep learning-based approach for automated yellow rust disease detection from high-resolution hyperspectral uav images</article-title>. <source>Remote Sens</source>. <volume>11</volume>, <fpage>1554</fpage>. <pub-id pub-id-type="doi">10.3390/rs11131554</pub-id></citation>
</ref>
<ref id="B72">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>D.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Gan</surname> <given-names>Q.</given-names></name> <name><surname>Song</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Karypis</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>Scalable graph neural networks with deep graph library</article-title>, in <source>Proceedings of the 14th ACM International Conference on Web Search and Data Mining</source>, <fpage>1141</fpage>&#x02013;<lpage>1142</lpage>. <pub-id pub-id-type="pmid">31734815</pub-id></citation></ref>
<ref id="B73">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>B.</given-names></name> <name><surname>Khosla</surname> <given-names>A.</given-names></name> <name><surname>Lapedriza</surname> <given-names>A.</given-names></name> <name><surname>Oliva</surname> <given-names>A.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Learning deep features for discriminative localization</article-title>, in <source>CVPR</source>, <fpage>2921</fpage>&#x02013;<lpage>2929</lpage>.</citation>
</ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Paisley</surname> <given-names>J. W.</given-names></name> <name><surname>Ren</surname> <given-names>L.</given-names></name> <name><surname>Sapiro</surname> <given-names>G.</given-names></name> <name><surname>Carin</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>Non-parametric bayesian dictionary learning for sparse image representations</article-title>, in <source>NIPS, Vol</source>. <volume>9</volume>, <fpage>2295</fpage>&#x02013;<lpage>2303</lpage>.</citation>
</ref>
<ref id="B75">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zou</surname> <given-names>Y.</given-names></name> <name><surname>Yu</surname> <given-names>Z.</given-names></name> <name><surname>Kumar</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Unsupervised domain adaptation for semantic segmentation via class-balanced self-training</article-title>, in <source>ECCV</source>, <fpage>289</fpage>&#x02013;<lpage>305</lpage>.</citation>
</ref>
</ref-list> 
</back>
</article>