<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2023.1200977</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Inside out: transforming images of lab-grown plants for machine learning applications in agriculture</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Krosney</surname> <given-names>Alexander E.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2304157/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Sotoodeh</surname> <given-names>Parsa</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Henry</surname> <given-names>Christopher J.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1708710/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Beck</surname> <given-names>Michael A.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1787761/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Bidinosti</surname> <given-names>Christopher P.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computer Science, University of Manitoba</institution>, <addr-line>Winnipeg, MB</addr-line>, <country>Canada</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Physics, University of Winnipeg</institution>, <addr-line>Winnipeg, MB</addr-line>, <country>Canada</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Applied Computer Science, University of Winnipeg</institution>, <addr-line>Winnipeg, MB</addr-line>, <country>Canada</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Ribana Roscher, Institute for Bio- and Geosciences Plant Sciences (IBG-2), Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Manya Afonso, Wageningen University and Research, Netherlands; Kamil Dimililer, Near East University, Cyprus</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Christopher J. Henry <email>ch.henry&#x00040;uwinnipeg.ca</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>07</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>6</volume>
<elocation-id>1200977</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>04</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>05</day>
<month>06</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Krosney, Sotoodeh, Henry, Beck and Bidinosti.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Krosney, Sotoodeh, Henry, Beck and Bidinosti</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Machine learning tasks often require a significant amount of training data for the resultant network to perform suitably for a given problem in any domain. In agriculture, dataset sizes are further limited by phenotypical differences between two plants of the same genotype, often as a result of different growing conditions. Synthetically-augmented datasets have shown promise in improving existing models when real data is not available.</p></sec>
<sec>
<title>Methods</title>
<p>In this paper, we employ a contrastive unpaired translation (CUT) generative adversarial network (GAN) and simple image processing techniques to translate indoor plant images to appear as field images. While we train our network to translate an image containing only a single plant, we show that our method is easily extendable to produce multiple-plant field images.</p></sec>
<sec>
<title>Results</title>
<p>Furthermore, we use our synthetic multi-plant images to train several YoloV5 nano object detection models to perform the task of plant detection and measure the accuracy of the model on real field data images.</p></sec>
<sec>
<title>Discussion</title>
<p>The inclusion of training data generated by the CUT-GAN leads to better plant detection performance compared to a network trained solely on real data.</p></sec></abstract>
<kwd-group>
<kwd>digital agriculture</kwd>
<kwd>agriculture 4.0</kwd>
<kwd>deep learning</kwd>
<kwd>convolutional neural networks</kwd>
<kwd>generative adversarial networks</kwd>
<kwd>data augmentation</kwd>
<kwd>image augmentation</kwd>
</kwd-group>
<counts>
<fig-count count="18"/>
<table-count count="3"/>
<equation-count count="14"/>
<ref-count count="37"/>
<page-count count="15"/>
<word-count count="8933"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>AI in Food, Agriculture and Water</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Machine learning (ML) tasks are often limited by the availability and quality of training data for a given model (LeCun et al., <xref ref-type="bibr" rid="B21">2015</xref>; Goodfellow et al., <xref ref-type="bibr" rid="B15">2016</xref>). To enable ML-based applications in agriculture&#x02014;such as the automatic detection and classification of plants or crop health monitoring, say&#x02014;ultimately requires large quantities of labeled image data with which to train deep neural networks (DNNs) (Lobet, <xref ref-type="bibr" rid="B24">2017</xref>; Liakos et al., <xref ref-type="bibr" rid="B22">2018</xref>; W&#x000E4;ldchen et al., <xref ref-type="bibr" rid="B32">2018</xref>; Lu et al., <xref ref-type="bibr" rid="B25">2022</xref>). It is the present lack of such data, and the challenge of generating it, that may ultimately limit the broad application of such techniques across the immense variety of crop plants. Lobet (<xref ref-type="bibr" rid="B24">2017</xref>), for example, emphasize that this process &#x0201C;is hampered by the difficulty of finding good-quality ground-truth datasets.&#x0201D; Similar sentiments are echoed in general reviews (Liakos et al., <xref ref-type="bibr" rid="B22">2018</xref>; W&#x000E4;ldchen et al., <xref ref-type="bibr" rid="B32">2018</xref>; Lu et al., <xref ref-type="bibr" rid="B25">2022</xref>), as well as in publications on specific applications such as weed detection (Binch and Fox, <xref ref-type="bibr" rid="B6">2017</xref>; Bah et al., <xref ref-type="bibr" rid="B2">2018</xref>; Bosilj et al., <xref ref-type="bibr" rid="B8">2018</xref>) and high-throughput phenotyping (Fahlgren et al., <xref ref-type="bibr" rid="B11">2015</xref>; Singh et al., <xref ref-type="bibr" rid="B30">2016</xref>; Gehan and Kellogg, <xref ref-type="bibr" rid="B12">2017</xref>; Shakoor et al., <xref ref-type="bibr" rid="B29">2017</xref>; Tardieu et al., <xref ref-type="bibr" rid="B31">2017</xref>; Giuffrida et al., <xref ref-type="bibr" rid="B13">2018</xref>). The difficulty of generating or collecting plant-based image data for agricultural applications is further exacerbated by the many differences in growing conditions and physical dissimilarities between any two plants, even for those belonging to the same genotype. Covering a wide variety of phenotypes with a sufficient volume of labeled training data is a task of massive scope. This challenge is further impeded by the requirement of expert knowledge that is often necessary to accurately label plant data (for example, when it comes to the distinction between oats and wild oats) (Beck et al., <xref ref-type="bibr" rid="B5">2021</xref>).</p>
<p>Image transformation and synthesis through the use of generative adversarial networks (GANs) is gaining interest in agriculture as a means to expedite the development of large-scale, balanced and ground-truthed datasets (Lu et al., <xref ref-type="bibr" rid="B25">2022</xref>). GANs were originally used for creating synthetic MNIST digits, human faces, and other image types (Goodfellow et al., <xref ref-type="bibr" rid="B16">2014</xref>) and have proven successful for a wide variety of quite remarkable image translation problems, including transforming a horse to/from a zebra, a dog to/from a cat, and a summer scene to/from a winter scene (Isola et al., <xref ref-type="bibr" rid="B18">2016</xref>; Zhu et al., <xref ref-type="bibr" rid="B36">2017</xref>; Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). In the agricultural domain, GANs have been applied to areas such as plant localization, health, weed control, and phenotyping (Lu et al., <xref ref-type="bibr" rid="B25">2022</xref>). Some specific examples that demonstrate the promise of GANs in agriculture include the improvement of plant image segmentation masks (Barth et al., <xref ref-type="bibr" rid="B3">2018</xref>), disease detection in leaves (Zeng et al., <xref ref-type="bibr" rid="B33">2020</xref>; Cap et al., <xref ref-type="bibr" rid="B9">2022</xref>), leaf counting (Giuffrida et al., <xref ref-type="bibr" rid="B14">2017</xref>; Zhu et al., <xref ref-type="bibr" rid="B37">2018</xref>; Kuznichov et al., <xref ref-type="bibr" rid="B20">2019</xref>), modeling of seedlings (Madsen et al., <xref ref-type="bibr" rid="B26">2019</xref>), and rating plant vigor (Zhu et al., <xref ref-type="bibr" rid="B35">2020</xref>).</p>
<p>The contribution of this work is not an automated task or application to solve a particular agricultural problem, but rather the development of a method to translate real indoor images of plants to appear in field settings. As such, this work serves as an important precursor to the development of such applications. This approach can in principle be used to create bespoke, labeled data sets to support the training needs of a wide variety of ML tasks in agriculture. For example, appropriately constructed collages of two or more species on a soil background could ultimately be used to synthesize very large numbers of images of crop plants interspersed with weeds, resulting in labeled, ground-truthed datasets suitable for developing ML models for automated weed detection. For this initial study, however, testing is limited to simpler demonstrations of object detection.</p>
<p>The impetus for this work comes from our previous development of an embedded system for the automated generation of labeled plant images taken indoors (Beck et al., <xref ref-type="bibr" rid="B4">2020</xref>). Here, a camera mounted a computer controlled gantry system is used to take photographs of plants against blue keying fabric from multiple positions and angles. Because the camera and plant positions are always known, single-plant images can be automatically cropped and labeled. In addition to this, we have collected outdoor images of plants and soil (Beck et al., <xref ref-type="bibr" rid="B5">2021</xref>), which at present must be cropped and annotated by hand. Belonging to both datasets are four crop species: canola, oat, soybean, and wheat. The presence of these plants in both datasets provides the opportunity for outdoor image synthesis through image-to-image translation via GANs (Isola et al., <xref ref-type="bibr" rid="B18">2016</xref>). Our goal, then, is to create fully labeled training datasets that are visually consistent with real field data. This procedure eliminates the need for manual labeling of outdoor grown plants (which is time-consuming and prone to error) while being scalable and adaptable to new environments (e.g., different soil backgrounds, plant varieties, or weather conditions). The creation of one&#x00027;s own datasets could improve the accuracy of plant detection and other models in a real field setting.</p>
<p>This paper is structured as follows. Section 2 provides an overview of the GAN architecture used in our image translation experiments and describes the construction process for the GAN training datasets. Section 3 presents visual results for several single-plant translation experiments and discusses the benefits and limitations of each training dataset. Section 4 describes our method for producing augmented outdoor multi-plant images with automated labeling. Section 5 presents plant detection results achieved using a YoloV5 nano model trained on our augmented datasets. Section 6 concludes the paper and discusses potential extensions to the image synthesis methods.</p></sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and methods</title>
<p>Many GAN architectures require the availability of paired data for training. In our case, an image pair would consist of a plant placed in front of a blue screen and an identical plant, in the same location of the image, placed in soil. Such image pairs are difficult to obtain in large volumes and instead we focus on GAN architectures that can train on unpaired data (Zhu et al., <xref ref-type="bibr" rid="B36">2017</xref>), such as the examples shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, which are taken from our indoor and outdoor plant datasets.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref></p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Image of soybean plants in indoor <bold>(A)</bold> and outdoor <bold>(B)</bold> settings. Indoor plants are photographed against blue keying fabric to enable background removal. Outdoor plants are more susceptible to leaf damage. Differences in lighting lead to a darker appearing leaf color in the indoor images.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0001.tif"/>
</fig>
<p>Some GANs, such as CycleGAN, consist of two-sided networks that not only translate an image from one domain to another but perform the reverse translation as well (Zhu et al., <xref ref-type="bibr" rid="B36">2017</xref>). For our specific problem, we are interested in single-directional translation and only consider one-sided networks to reduce training duration and model sizes.</p>
<p>As a result, the approach taken here follows that of contrastive unpaired translation (CUT) as presented in Park et al. (<xref ref-type="bibr" rid="B27">2020</xref>). For this, one considers two image domains <italic>X</italic> and <italic>Y</italic> (with samples <inline-formula><mml:math id="M1"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>&#x02286;</mml:mo><mml:mi>X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M2"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>&#x02286;</mml:mo><mml:mi>Y</mml:mi></mml:math></inline-formula>) and seeks to find a function that takes a sample from the domain <italic>X</italic> and outputs an image that can plausibly come from the distribution <italic>Y</italic>.</p>
<p>Generative adversarial networks typically consist of two separate networks that are trained simultaneously. The generator <italic>G</italic> learns the mapping <italic>G</italic>:<italic>X</italic>&#x02192;<italic>Y</italic> and the discriminator <italic>D</italic> is trained to differentiate between the real images <bold><italic>y</italic></bold> of domain <italic>Y</italic> and the fake images <italic>G</italic>(<bold><italic>x</italic></bold>) &#x0003D; <bold>&#x00177;</bold> produced by the generator. Note that we refer to the real and fake images of domain <italic>Y</italic> as <bold><italic>y</italic></bold> and <bold>&#x00177;</bold>, respectively. The discriminator returns a probability in [0.0, 1.0] that the input image came from the distribution <italic>Y</italic>. Effectively, the generator is trained to produce images that fool the discriminator by minimizing the adversarial loss (Goodfellow et al., <xref ref-type="bibr" rid="B16">2014</xref>; Isola et al., <xref ref-type="bibr" rid="B18">2016</xref>; Giuffrida et al., <xref ref-type="bibr" rid="B14">2017</xref>; Zhu et al., <xref ref-type="bibr" rid="B36">2017</xref>, <xref ref-type="bibr" rid="B37">2018</xref>; Park et al., <xref ref-type="bibr" rid="B27">2020</xref>)</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>&#x0007E;</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:mo class="qopname">log</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>&#x0007E;</mml:mo><mml:mi>X</mml:mi></mml:mrow></mml:msub><mml:mo class="qopname">log</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>A visual overview of a GAN structure is given in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Visual representation of an image translating GAN. The discriminator learns to differentiate between the real images from domain <italic>Y</italic> and the translated images from the generator output. The discriminator and generator weights are updated through backpropagation of the gradients from the discriminator output according to the first and second terms in the adversarial loss, respectively. The image is adapted from the Google Developers Website (2022) (<ext-link ext-link-type="uri" xlink:href="https://developers.google.com/machine-learning/gan/gan_structure">https://developers.google.com/machine-learning/gan/gan_structure</ext-link>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0002.tif"/>
</fig>
<p>The generator is composed of two networks that are applied sequentially to an image. The first half, the encoder <italic>G</italic><sub><italic>enc</italic></sub>, receives the input and constructs a feature stack, primarily through down-sampling operations. The second half, the decoder <italic>G</italic><sub><italic>dec</italic></sub>, takes a feature stack and constructs a new image, through up-sampling operations. Here, we have that <italic>G</italic>(<bold><italic>x</italic></bold>) &#x0003D; <italic>G</italic><sub><italic>dec</italic></sub>(<italic>G</italic><sub><italic>enc</italic></sub>(<bold><italic>x</italic></bold>)) &#x0003D; <bold>&#x00177;</bold> (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>).</p>
<p>Contrastive unpaired translation is a GAN architecture that enables one-sided image-to-image translation (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). In addition to the adversarial loss, which is dependent on the networks <italic>G</italic> and <italic>D</italic>, CUT provides the PatchNCE loss and feature network <italic>H</italic>. The PatchNCE loss is used both to retain mutual information between the input image <bold><italic>x</italic></bold> and output <inline-formula><mml:math id="M4"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> as well as to enforce the identity translation <italic>G</italic>(<bold><italic>y</italic></bold>) &#x0003D; <bold><italic>y</italic></bold> (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). The feature network is defined as the first half of the generator, the encoder, plus a multi-layer patchwise (MLP) network with two layers and is used to encode the input and output images into feature tensors. Patches from the output image <bold>&#x00177;</bold> are sampled, passed to the feature network, and compared to the corresponding (positive) patch from the input as well as <italic>N</italic> other (negative) patches from the input image. The process is shown in <xref ref-type="fig" rid="F3">Figure 3</xref> (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>A visual demonstration of the features extracted from an input image <bold><italic>x</italic></bold> and output image <inline-formula><mml:math id="M5"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). Corresponding patches are sampled from both images as well as <italic>N</italic> other patches from the input image. These patches are used to calculate the PatchNCE loss in Equation (2) (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). In the example presented here, high similarity between positive patches is desired to retain the shape of the animal head, while allowing re-coloring of the fur. Conversely, one should not expect to retain similarities between the head and other parts of the body. The image is taken from Park et al. (<xref ref-type="bibr" rid="B27">2020</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0003.tif"/>
</fig>
<p>Since <italic>G</italic><sub><italic>enc</italic></sub> is used to translate a given image, its feature stack is readily available, with each layer and spatial position corresponding to a patch in the input image. We select <italic>L</italic> layers from the feature map and pass each layer through the patchwise network to produce features <inline-formula><mml:math id="M6"><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> where <inline-formula><mml:math id="M7"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the output of the <italic>l</italic>-th layer. The feature <inline-formula><mml:math id="M8"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> then represents the <italic>s</italic>-th spatial location of the <italic>l</italic>-th layer and <inline-formula><mml:math id="M9"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>\</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents all other locations. The PatchNCE loss is given by Park et al. (<xref ref-type="bibr" rid="B27">2020</xref>) as</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi><mml:mi>N</mml:mi><mml:mi>C</mml:mi><mml:mi>&#x1D53c;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>&#x0007E;</mml:mo><mml:mi>X</mml:mi></mml:mrow></mml:msub><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mi>&#x02113;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>\</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where the sums are taken over all desired layers <italic>l</italic> and spatial locations <italic>s</italic> within each layer,</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x02113;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold-italic"><mml:msup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold-italic"><mml:msup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle><mml:mo>&#x000B7;</mml:mo><mml:mstyle mathvariant="bold-italic"><mml:msup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mstyle><mml:mo>/</mml:mo><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle><mml:mo>&#x000B7;</mml:mo><mml:mstyle mathvariant="bold-italic"><mml:msup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mstyle><mml:mo>/</mml:mo><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>z</mml:mi></mml:mstyle><mml:mo>&#x000B7;</mml:mo><mml:mstyle mathvariant="bold-italic"><mml:msubsup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mstyle><mml:mo>/</mml:mo><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>N</italic> is the number of negative samples and the temperature &#x003C4; &#x0003D; 0.07 scales the magnitude of penalties on the negative samples (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). The overall loss used for network training is</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi><mml:mi>N</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mtd><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi><mml:mi>N</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where the weighing factors &#x003BB;<sub><italic>X</italic></sub> &#x0003D; &#x003BB;<sub><italic>Y</italic></sub> &#x0003D; 1 when using the default CUT options (Park et al., <xref ref-type="bibr" rid="B27">2020</xref>). The first PatchNCE loss term is used to retain mutual information between the input image <bold><italic>x</italic></bold> and output <inline-formula><mml:math id="M14"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>. The second is used as an identity loss term, where translating the input image <bold><italic>y</italic></bold> should produce <bold><italic>y</italic></bold> as the output.</p>
<p>The CUT training scripts provide several network definitions for the generator, discriminator, and feature network. For the work in this paper, we use the default options where the generator is a ResNet-based architecture that consists of 9 ResNet blocks between up and down-sampling layers and the discriminator is a 70 &#x000D7; 70 PatchGan network that originates from the work of Isola et al. (<xref ref-type="bibr" rid="B18">2016</xref>).</p>
<p>The remainder of this section provides an overview of the datasets used to train several CUT generators.</p>
<sec>
<title>2.1. Target domain&#x02014;Outdoor image dataset</title>
<p>Generator training requires a dataset of outdoor images to represent the target domain <italic>Y</italic> for image translation. Our outdoor dataset was constructed by sampling from the 540,000 available images in the outdoor image database (Beck et al., <xref ref-type="bibr" rid="B5">2021</xref>). The images are individual frames from videos taken with a camera mounted to a tractor while traveling through a field. The full-resolution images (2208x1242 px) often contain several plants with unknown locations and must be cropped by hand to obtain single-plant photos suitable for image translation. Manual cropping is a time-consuming process and, in general, limits our overall training dataset size. For initial experiments, 64 hand-cropped single-plant field images were used to construct the target dataset. Larger-scale experiments that include image translation of several species were also performed with 512 single-plant field images. <xref ref-type="table" rid="T1">Table 1</xref> summarizes the relevant parameters for all training datasets. Example multi- and cropped single-plant outdoor images are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Construction parameters for each training dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Dataset name</bold></th>
<th valign="top" align="center"><bold><italic>N</italic><sub><italic>x</italic></sub></bold></th>
<th valign="top" align="center"><bold><italic>N</italic><sub><italic>y</italic></sub></bold></th>
<th valign="top" align="center"><bold>Species</bold></th>
<th valign="top" align="center"><bold>Age (days)</bold></th>
<th valign="top" align="center"><bold><italic>N</italic><sub>backgrounds</sub></bold></th>
<th valign="top" align="center"><bold><italic>S</italic><sub>min</sub></bold></th>
<th valign="top" align="center"><bold><italic>S</italic><sub>max</sub></bold></th>
<th valign="top" align="center"><bold>Figure</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Cropped lab</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">Soybean</td>
<td valign="top" align="center">10-40</td>
<td valign="top" align="center">N/A</td>
<td valign="top" align="center">1.00</td>
<td valign="top" align="center">1.00</td>
<td valign="top" align="center"><xref ref-type="fig" rid="F8">Figure 8</xref></td>
</tr> <tr>
<td valign="top" align="left">Composites</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">Soybean</td>
<td valign="top" align="center">10-40</td>
<td valign="top" align="center">32</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center"><xref ref-type="fig" rid="F9">Figure 9</xref></td>
</tr> <tr>
<td valign="top" align="left">Color-corrected composites 1</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">Soybean</td>
<td valign="top" align="center">10-40</td>
<td valign="top" align="center">32</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center"><xref ref-type="fig" rid="F10">Figure 10</xref></td>
</tr>
<tr>
<td valign="top" align="left">Color-corrected composites 2</td>
<td valign="top" align="center">512</td>
<td valign="top" align="center">512</td>
<td valign="top" align="center">All</td>
<td valign="top" align="center">Varies</td>
<td valign="top" align="center">128</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center"><xref ref-type="fig" rid="F11">Figure 11</xref></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>For the <italic>Cropped Lab</italic> dataset, background replacement is not applicable since composite images are not created. Furthermore, the plant always makes up the entire image, so the scale is always 1.00. The <italic>Color-Corrected Composites 2</italic> dataset contains each of the four available species; canola, oat, soybean, and wheat. These plants have minimum-maximum age ranges of 10&#x02013;40, 0&#x02013;365, 10&#x02013;40, and 0&#x02013;365 days, respectively.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Outdoor multi-plant image of soybean <bold>(A)</bold> and cropped single-plant image <bold>(B)</bold>. The cropping bounds, shown by the green bounding box, are determined by hand. The single-plant images constitute the target domain <italic>Y</italic> for generator training.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0004.tif"/>
</fig>
<p>The outdoor plant datasets used for model training contain several single-plant images of canola, oat, soybean, and wheat. The cropping bounds are varied to provide data with plants of several sizes relative to the image, this helps to prevent the generator from increasing or decreasing the size of the plant in the image during translation. Similarly, plant locations (e.g., center, top-left, bottom-right, etc.) are varied across the single-plant images to minimize positional drift. Additional images sampled from this dataset can be seen in <bold>Figures 8A</bold>&#x02013;<bold>10A</bold>.</p>
</sec>
<sec>
<title>2.2. Input domain&#x02014;Cropped lab image dataset</title>
<p>The first dataset to be used as the input domain <italic>X</italic> to the generator consists of several indoor single-plant images with a blue screen background. These images are provided by the EAGL-I system, as described in Beck et al. (<xref ref-type="bibr" rid="B4">2020</xref>). EAGL-I employs a GoPro Hero 7 camera mounted on a movable gantry capable of image capture from positions that vary in all three spatial dimensions. Additionally, the gantry includes a pan-tilt system to provide different imaging angles. For the purpose of this work, we attempt only to translate top-down images, thus we include images only where the camera is perpendicular to the floor, within a range of &#x000B1;10&#x000B0;.</p>
<p>Similar to the previous section, the full-resolution images (4000 &#x000D7; 3000 px) contain several plants and must be cropped to obtain single-plant photos for training. However, in this case, both the plant and camera positions within the setting are known, and loose bounding boxes are found through geometric calculation. Tighter bounding boxes are obtained through an algorithm given in <xref ref-type="supplementary-material" rid="SM1">Appendix 1</xref> (<xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>), avoiding the need for hand-cropping. <xref ref-type="fig" rid="F5">Figure 5</xref> shows an example of both a multi- and single-plant lab image.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Indoor multi- <bold>(A)</bold> and single-plant <bold>(B)</bold> images of soybean. The cropping bounds (shown by the green bounding box) are found automatically and do not require user input. The single-plant images form the input domain <italic>X</italic> for generator training.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0005.tif"/>
</fig>
<p>To construct our dataset for generator input, we use 64 single-plant lab images as shown in <xref ref-type="fig" rid="F5">Figure 5B</xref>. It is expected that a translated image would not only contain a plant with characteristics that reflect the outdoor plant domain, but also contain soil that has been generated in place of the blue screen background.</p></sec>
<sec>
<title>2.3. Input domain&#x02014;Composite image dataset</title>
<p>As an alternative to the single-plant indoor photos of Section 2.2, image processing techniques can be used to remove the blue screen in a single-plant image and replace it with a real image of soil. These images, henceforth known as composite images, ideally require minimal translation to the image background and instead allow the generator to primarily translate plant characteristics. Such a network would be beneficial as it provides the user with the ability to influence the appearance of the background, even after passing through the generator.</p>
<p>Soil backgrounds are randomly sampled from a set of images to provide sufficient variation in the training data. These backgrounds are hand-cropped from real outdoor data, so the soil is visually consistent with our target domain. Typically, 32 soil backgrounds were used for datasets composed of 64 composites. See <xref ref-type="table" rid="T1">Table 1</xref> for the exact number of backgrounds used to construct each dataset.</p>
<p>Given an indoor multi-plant image and soil background, the steps for composite formation are listed below. This process is fully-automated through scripts written in Python that utilize the OpenCV and NumPy libraries for image processing. Example images that depict each step are shown in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<list list-type="order">
<list-item><p>Mask and create bounding boxes using a multi-plant image from the indoor plant database. Blue screen usage makes background removal a trivial problem that can be solved by image thresholding (Beck et al., <xref ref-type="bibr" rid="B4">2020</xref>).</p></list-item>
<list-item><p>Crop the multi-plant image and masked multi-plant image to create single-plant images.</p></list-item>
<list-item><p>Remove the blue screen in the single-plant image with the single-plant mask.</p></list-item>
<list-item><p>Randomly re-scale the single-plant image size relative to the final output size. We choose the minimum and maximum scale values to be <italic>S</italic><sub><italic>min</italic></sub> &#x0003D; 0.50 and <italic>S</italic><sub><italic>max</italic></sub> &#x0003D; 0.85, respectively. The image size required for the CUT generator is 256 &#x000D7; 256 px, so the plant is scaled relative to this size. Plant scale is calculated by dividing the longest side of the single-plant image by the output image size. For example, if the single-plant image has dimensions 150 &#x000D7; 200 px and the output image has dimensions 256 &#x000D7; 256 px, then the scale is 200/256 &#x0003D; 0.78. Finally, the resized image is padded with black pixels so the final image has size 256 &#x000D7; 256 px. Random padding gives the effect of random placement of the plant within the composite.</p></list-item>
<list-item><p>Combine the resized plant image with a randomly selected soil background.</p></list-item>
</list>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Intermediate images created during composite image formation <bold>(A)</bold> and their associated binary masks <bold>(B)</bold>. The multi-plant image <bold>(A1, B1)</bold> is cropped to obtain a single-plant image <bold>(A2, B2)</bold>. The single-plant binary mask is used to remove the blue screen in the single-plant image <bold>(A3, B3)</bold>. The single-plant image is then resized and padded to give the effect of random placement within the final image <bold>(A4, B4)</bold>. A soil background is joined to the resized image to create the final composite <bold>(A5, B5)</bold>. Images such as <bold>(A5)</bold> are used to form the input domain <italic>X</italic> for generator training.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0006.tif"/>
</fig>
<p>In many cases, differences in lighting between indoor and outdoor images leave little contrast between the plant and soil background in a composite image. In preliminary generator training experiments, insufficient contrast could lead to the generator translating plant leaves to appear as soil and constructing a fully-synthetic plant in a different region of the image. Such an occurrence is detrimental for creating synthetic object detection data as one loses the ability to accurately provide a bounding box for the plant. In the datasets described below, color-correction is used during composite formation to match the plant color with real field plants and increase the contrast between the plant and soil to allow the generator to maintain semantic information after domain translation. Color-correction techniques such as histogram matching were explored but were not used in favor of simple mean matching since one would still expect the generator to adjust plant color. The chosen color-correction process is described as follows.</p>
<p>Given an (<italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;3) CIELAB single-plant image <bold><italic>x</italic></bold> and the associated (<italic>M</italic>&#x000D7;<italic>N</italic>) binary mask <bold><italic>m</italic></bold>, we want to correct the pixels belonging to the plant to have color that is consistent with the real outdoor plants in the domain <italic>Y</italic> and leave all other pixels unchanged. To do this, we first determine the average value for the <italic>k</italic>-th channel of <bold><italic>x</italic></bold>, only where the mask is true, given by</p>
<disp-formula id="E6"><label>(5)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Note that Equation (5) is simply the weighted average for all pixels of a given channel. The weight of each pixel is given by the value of the corresponding pixel in the binary mask, which have values 0 and 1 for false and true, respectively. We now construct our color-corrected image <bold><italic>x</italic></bold>&#x02032;, where the individual pixels have values</p>
<disp-formula id="E7"><label>(6)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>and <inline-formula><mml:math id="M17"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the desired average value for the <italic>k</italic>-th channel of <bold><italic>x</italic></bold>&#x02032;. We found that choosing the CIELAB channels to have average values <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>170</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M19"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>100</mml:mn></mml:math></inline-formula>, and <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>160</mml:mn></mml:math></inline-formula> provides a suitable plant color. Note that in the definition of <inline-formula><mml:math id="M21"><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, one can assign pixel values that exceed the valid range [0, 255]. In this case, one must also constrain <inline-formula><mml:math id="M22"><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> to the same range. In general, color-correction leads to a composite image where the plant has enhanced lightness, greenness, and yellowness. An example of a composite image with and without color-correction is shown in <xref ref-type="fig" rid="F7">Figure 7</xref>.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Composite image of a soybean plant without <bold>(A)</bold> and with <bold>(B)</bold> color-correction. The color-corrected image contains plant leaves that better match the outdoor domain <bold>(C)</bold>. Images such as <bold>(A)</bold> or <bold>(B)</bold> can be used to provide the input domain <italic>X</italic> for generator training.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0007.tif"/>
</fig>
<p>For small-scale translation experiments, our input domain consists of 64 (optionally color-corrected) composite images. Larger-scale experiments use datasets composed of 512 composites. The method and parameters used to construct each dataset are summarized in <xref ref-type="table" rid="T1">Table 1</xref>.</p></sec></sec>
<sec id="s3">
<title>3. Single-plant image translation results</title>
<p>In this section, we provide single-plant translation results for several GANs trained on the datasets described in Section 2. Training hyperparameters are consistent for all models, where we train for a total of 400 epochs and use a dynamic learning rate that begins to decay linearly to zero after the 200th epoch.</p>
<sec>
<title>3.1. Cropped lab dataset</title>
<p>Training a generator on the <italic>Cropped Lab</italic> dataset allows us to directly translate indoor plants to outdoor, without the intermediate step of creating composite images. This dataset contains 64 indoor and outdoor images of soybean (128 total), aged 10&#x02013;40 days. Twenty additional indoor photos of soybean from the same age range unseen during the training process compose the distribution <italic>V</italic> for qualitative evaluation of the model. <xref ref-type="fig" rid="F8">Figure 8</xref> shows translation results for this generator.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Soybean images sampled from the distribution <italic>Y</italic> <bold>(A)</bold>, images sampled from the distribution <italic>X</italic> <bold>(B)</bold>, translated images <italic>G</italic>(<bold><italic>x</italic></bold>) <bold>(C)</bold>, images sampled from the distribution <italic>V</italic> <bold>(D)</bold>, and translated images <italic>G</italic>(<bold><italic>v</italic></bold>) <bold>(E)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0008.tif"/>
</fig>
<p>Referring to both the training and evaluation images, the generator appears to be capable of image translation between these two domains with little error. Details such as leaf color, leaf destruction, and soil are suitably added in each translated image. Leaf veins, which are greatly pronounced in some indoor images, are removed in the translated images, reflective of the images from the outdoor domain. Despite the ability to translate indoor plants to appear as true outdoor plants, the greatest drawback to a generator trained on these images is the inability to control the image background in the translated images. As discussed in Section 4, this is problematic for the construction of images with multiple plants, where one wants consistency in the background throughout the image.</p></sec>
<sec>
<title>3.2. Composites dataset</title>
<p>Training a generator on the <italic>Composites</italic> dataset allows us to translate uncorrected composite images to outdoor-appearing plants. Here, we expect that the background in the output image remains consistent with the input. This dataset contains 64 composite and outdoor images of soybean (128 total), aged 10&#x02013;40 days. Twenty additional composite photos of soybean from the same age range unseen during the training process compose the distribution <italic>V</italic> for qualitative evaluation of the model. <xref ref-type="fig" rid="F9">Figure 9</xref> shows translation results for this generator.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Soybean images sampled from the distribution <italic>Y</italic> <bold>(A)</bold>, images sampled from the distribution <italic>X</italic> <bold>(B)</bold>, translated images <italic>G</italic>(<bold><italic>x</italic></bold>) <bold>(C)</bold>, images sampled from the distribution <italic>V</italic> <bold>(D)</bold>, and translated images <italic>G</italic>(<bold><italic>v</italic></bold>) <bold>(E)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0009.tif"/>
</fig>
<p>Referring to both the training and evaluation images, the generator is capable of image translation between these two domains. Similar to the previous section, plant details such as leaf color, destruction, and veins are suitably added/removed in the translated images. In general, the image backgrounds are consistent before and after translation. Small changes, such as the addition/removal of straw or pebbles in the background, are seen in the translated images. However, these details are sufficiently small so as not to have any negative effect during the construction of a synthetic multi-plant image.</p></sec>
<sec>
<title>3.3. Color-corrected composite datasets</title>
<p>Training a generator on the <italic>Color-Corrected Composite</italic> datasets allows us to translate color-corrected composite images to outdoor-appearing plants. Similar to the previous section, we expect that the background in the output image remains consistent with the input. The first generator in this section is trained on a dataset that contains 64 color-corrected composite and outdoor images of soybean (128 total), aged 10&#x02013;40 days. Twenty additional color-corrected composite photos of soybean from the same age range unseen during the training process compose the distribution <italic>V</italic> for qualitative evaluation of the model. <xref ref-type="fig" rid="F10">Figure 10</xref> shows translation results for this generator. Additional translation results for generators trained on similar datasets of different species are shown in <xref ref-type="supplementary-material" rid="SM1">Appendix 2</xref> (<xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>).</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Soybean images sampled from the distribution <italic>Y</italic> <bold>(A)</bold>, images sampled from the distribution <italic>X</italic> <bold>(B)</bold>, translated images <italic>G</italic>(<bold><italic>x</italic></bold>) <bold>(C)</bold>, images sampled from the distribution <italic>V</italic> <bold>(D)</bold>, and translated images <italic>G</italic>(<bold><italic>v</italic></bold>) <bold>(E)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0010.tif"/>
</fig>
<p>Referring to both the training and evaluation images, the generator is capable of image translation between these two domains. Similar to the previous section, plant details are suitably added or removed in the translated images, and the image background is consistent before and after translation. Additionally, little-to-no positional drift is seen between plants in the input and output images, indicating a model such as this could be used for multiple-plant image synthesis.</p>
<p>We now turn our focus to translating images of additional plant species. The <italic>Color-Corrected Composites 2</italic> training dataset contains 128 color-corrected composite and 128 field images of each of the four available species: canola, oat, soybean, and wheat (1,024 images total). Twenty additional color-corrected composites for each species compose the distribution <italic>V</italic> for qualitative evaluation of the model. Image translation results are given in <xref ref-type="fig" rid="F11">Figure 11</xref>.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Field images sampled from the distribution <italic>Y</italic> <bold>(A)</bold>, images sampled from the distribution <italic>X</italic> <bold>(B)</bold>, translated images <italic>G</italic>(<bold><italic>x</italic></bold>) <bold>(C)</bold>, images sampled from the distribution <italic>V</italic> <bold>(D)</bold>, and translated images <italic>G</italic>(<bold><italic>v</italic></bold>) <bold>(E)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0011.tif"/>
</fig>
<p>From the above translation results, it can be seen that our method is easily extendable to additional species. Translated plants have no noticeable positional drift and background appearance is consistent between the input and output. A model such as the one providing image translations in <xref ref-type="fig" rid="F11">Figure 11</xref> could be used to construct synthetic multiple-plant images in a field setting with multiple plant species while retaining the original bounding boxes to enable the production of object detection data.</p></sec></sec>
<sec id="s4">
<title>4. Multiple-plant image translation results</title>
<p>The results in the previous section suggest that a generator trained to translate composite images would be suitable for multi-plant image construction. Here, one expects to be able to place several synthetic plants within a real soil background to produce plausible outdoor images of multiple plants. Domain translation by the GAN should produce outdoor plant images with lightness similar to that seen typically in our outdoor data, as well as sufficient blurring of the plant relative to the real background. Using a single-plant translation generator, multi-plant images are constructed via the following algorithm. The algorithm makes use of existing full-scale soil images (see <xref ref-type="fig" rid="F12">Figure 12A</xref> for an example) and indoor single-plant images with their binary masks (see <xref ref-type="fig" rid="F6">Figures 6A2, B2</xref> for examples).</p>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>Full-scale soil image <bold>(A)</bold>, cropped section of the image <bold>(B)</bold>, and the color-corrected composite image to be passed to the generator <bold>(C)</bold>. The cropping bounds are given by the green bounding box in <bold>(A)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0012.tif"/>
</fig>
<p><xref ref-type="table" rid="T4">Algorithm 1</xref> effectively creates several composites, translates, and places them back into the full-scale image. Instead of selecting composite backgrounds from a pre-defined set, small sections are randomly cropped from 20 larger images. <xref ref-type="fig" rid="F12">Figure 12</xref> provides an example of composite background selection from a large-scale soil image.</p>
<table-wrap position="float" id="T4">
<label>Algorithm 1</label>
<caption><p>Algorithm for constructing a synthetic outdoor multiple-plant image.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace> 1: &#x000A0;randomly select a full-scale soil image</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 2: &#x000A0;for number of plants <bold>do</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 3: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;randomly select an indoor plant image</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 4: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;select corresponding binary mask</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 5: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;select a sub-section of the soil image as the composite background</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 6: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;form a composite image using the image, mask, and background section</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 7: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;if color-correct is true <bold>then</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 8: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;color-correct the composite image</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 9: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>if</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 10: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;translate composite image</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 11: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;color-correct the translated image</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 12: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;replace background sub-section with translated composite image</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 13: &#x000A0;end <bold>for</bold></monospace></td></tr>  
</tbody>
</table>
</table-wrap>
 <p>To improve generator performance when translating images with these new backgrounds, we contruct an additional training dataset with 512 new color-corrected composite images and train a new CUT-GAN. Here, the composite backgrounds are randomly selected sub-sections of the large-scale backgrounds, as opposed to randomly selecting from the set of 128 small backgrounds used in the dataset <italic>Color-Corrected Composites 2</italic>. This training dataset uses the same 512 images as <italic>Color-Corrected Composites 2</italic> for the target domain <italic>Y</italic>. <xref ref-type="table" rid="T2">Table 2</xref> summarizes the parameters for constructing our new dataset.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Parameters for the new training dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Dataset name</bold></th>
<th valign="top" align="center"><bold><italic>N</italic><sub><italic>x</italic></sub></bold></th>
<th valign="top" align="center"><bold><italic>N</italic><sub><italic>y</italic></sub></bold></th>
<th valign="top" align="center"><bold>Species</bold></th>
<th valign="top" align="center"><bold>Age (days)</bold></th>
<th valign="top" align="center"><bold><italic>N</italic><sub>backgrounds</sub></bold></th>
<th valign="top" align="center"><bold><italic>S</italic><sub>min</sub></bold></th>
<th valign="top" align="center"><bold><italic>S</italic><sub>max</sub></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Color-corrected composites 3</td>
<td valign="top" align="center">512</td>
<td valign="top" align="center">512</td>
<td valign="top" align="center">All</td>
<td valign="top" align="center">Varies</td>
<td valign="top" align="center">512</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.85</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The dataset contains each of the four available species; canola, oat, soybean, and wheat. These plants are aged 10&#x02013;40, 0&#x02013;365, 10&#x02013;40, and 0&#x02013;365 days, respectively. Each composite image possesses a unique background that comes from randomly cropping small sections from 20 large soil images.</p>
</table-wrap-foot>
</table-wrap>
<p>In line 11 of <xref ref-type="table" rid="T4">Algorithm 1</xref>, composite images are color-corrected following translation. This is a necessary step to help ensure a consistent background of our multi-plant images after translation. Background correction becomes especially useful if one chooses an image background that differs from our training data. If one were to train a model to detect plants in our synthetic images without correction, it is possible that the model learns to locate plants through inconsistencies in the background where the translated plant is placed. Background color-correction is used as an attempt to mitigate this risk. Additional processing can be done to improve the joining of the translated composite with the background, but is not shown in this article.</p>
<p>The procedure for background correction follows similarly to that of color-correction for composite images. However, rather than only correcting the pixels belonging to the plant, we correct all pixels in the translated image by adding a uniform offset to all pixels of a given channel. This offset is found by calculating the difference in average values for each RGB channel in the composite and translated images, only for pixels in a small region outside the plant bounding box. In general, for background correction, we assume that the translated plant does not exceed the bounding box. However, small patches of green outside the bounding box (see <xref ref-type="fig" rid="F13">Figure 13B</xref>) lead to little difference in the result.</p>
<fig id="F13" position="float">
<label>Figure 13</label>
<caption><p>Color-corrected composite single-plant image <bold>(A)</bold>, translated single-plant image <bold>(B)</bold>, and the associated binary mask <bold>(C)</bold>. The binary mask is false for all pixels within the bounding box and true for all other pixels.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0013.tif"/>
</fig>
<p>To achieve background correction, three inputs are required: the (<italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;3) composite single-plant image <bold><italic>x</italic></bold>, (<italic>M</italic>&#x000D7;<italic>N</italic>&#x000D7;3) translated single-plant image <inline-formula><mml:math id="M23"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, and the (<italic>M</italic>&#x000D7;<italic>N</italic>) binary mask <bold><italic>m</italic></bold>. Example images are given in <xref ref-type="fig" rid="F13">Figure 13</xref>.</p>
<p>The offset <italic>d</italic><sub><italic>k</italic></sub> for the <italic>k</italic>-th channel of the corrected image <inline-formula><mml:math id="M24"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is calculated as</p>
<disp-formula id="E8"><label>(7)</label><mml:math id="M25"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x00177;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Equation (7) can equivalently be considered as the difference in weighted averages for the input and output images where the weight of all pixels within the bounding boxes is zero and one elsewhere. The pixels of <inline-formula><mml:math id="M26"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are then given by</p>
<disp-formula id="E9"><label>(8)</label><mml:math id="M27"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x00177;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x00177;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>constrained to the range [0, 255]. <xref ref-type="fig" rid="F14">Figure 14</xref> shows the effect of translating a single-plant image with and without applying background correction afterwards. <xref ref-type="fig" rid="F15">Figure 15</xref> demonstrates the difference between multi-plant images with and without background correction.</p>
<fig id="F14" position="float">
<label>Figure 14</label>
<caption><p>Single-plant images before <bold>(A)</bold> and after <bold>(B)</bold> translation and the corrected translated image <bold>(C)</bold>. The soil color and lightness in <bold>(C)</bold> is more consistent with the input <bold>(A)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0014.tif"/>
</fig>
<fig id="F15" position="float">
<label>Figure 15</label>
<caption><p>Translated multi-plant image without <bold>(A)</bold> and with <bold>(B)</bold> background correction. Note that the backgrounds of the translated sections of the image differ in color and lightness from the soil in the rest of the image. Background correction lessens this effect, especially seen for the soybean plant in the bottom-right.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0015.tif"/>
</fig>
<p>In general, the translation capabilities of the generator are invariant to the position of plants in a given image. As a result, there is no restriction on placement, aside from avoiding overlapping plants. Our approach provides the option to place plants randomly, or by alignment into rows, as they would be in a production field. Additionally, the minimum and maximum scales of the plants relative to the background can be chosen by the user. Datasets can be made to contain plants of all species or individual plants can be selected. Examples of labeled synthetic multi-plant images are given in <xref ref-type="fig" rid="F16">Figure 16</xref>. From the multi-plant images presented here, one can see that the position of each plant is maintained after translation, suggesting that such data could be useful for training a plant detection network.</p>
<fig id="F16" position="float">
<label>Figure 16</label>
<caption><p>Composite <bold>(A)</bold> and synthetic <bold>(B)</bold> multi-plant images where plants are placed randomly <bold>(A1, B1)</bold> or ordered into two rows <bold>(A2, B2)</bold>. The randomly placed plants are chosen to be any of the four available species, the ordered plants are all soybean.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0016.tif"/>
</fig></sec>
<sec id="s5">
<title>5. Results of plant detection using augmented datasets</title>
<p>In general, object detection is a machine learning task that refers to the process of locating objects of interest within a particular image (Zhao et al., <xref ref-type="bibr" rid="B34">2019</xref>). Models capable of object detection typically require training data consisting of a large number of images as well as the location and label of all objects within said images, usually given by bounding boxes. Such data is often produced by manual annotation, a procedure that is both time- and cost-intensive, especially as the number of objects within an image increases (Guillaumin and Ferrari, <xref ref-type="bibr" rid="B17">2012</xref>; Ayalew, <xref ref-type="bibr" rid="B1">2020</xref>). As a result, object detection is an excellent test case of our image transformation methods.</p>
<p>As proof of concept that our synthetic multi-plant images can be beneficial for detection of real plants, we trained several YoloV5 (Redmon et al., <xref ref-type="bibr" rid="B28">2016</xref>; Bochkovskiy et al., <xref ref-type="bibr" rid="B7">2020</xref>) nano object detection models on various augmented datasets (Jocher et al., <xref ref-type="bibr" rid="B19">2022</xref>). The nano is chosen in particular as it has the fewest number of trainable parameters in comparison to all other YoloV5 networks. This is most desirable for our training datasets which contain few classes and little variability in the data itself. In all cases the network is trained to locate canola, oat, soybean, and wheat. However, we are not attempting to find a solution to a classification problem, so these four species are grouped into a single class named <italic>plant</italic>. Three primary training dataset types are described below, and each includes the random placement of non-overlapping plants onto an image background. The size and location of the single plant image embedded within the background are used as the ground truth bounding boxes for our training data.</p>
<p>The first training dataset contains 80,000 blue screen images with color-corrected plants randomly placed throughout <xref ref-type="fig" rid="F17">Figure 17A</xref>). The dataset is split into training, validation, and testing sets with proportions 80% (64,000 images), 10% (8,000 images), and 10% (8,000 images), respectively. Note that no background replacement or GAN is used, so the differences between the training data and real field data are significant. Hence, we expect the performance of this network to be poor in general. For future references this dataset is referred to as the <italic>Baseline</italic> dataset.</p>
<fig id="F17" position="float">
<label>Figure 17</label>
<caption><p>Sample images from the <italic>Baseline</italic> <bold>(A)</bold>, <italic>Composite</italic> <bold>(B)</bold>, and <italic>GAN</italic> <bold>(C)</bold> datasets used to train the YoloV5 nano networks. In all cases, plants are color-corrected and randomly placed into non-overlapping positions onto an image background. The <italic>GAN</italic> images receive the additional step of plant translation.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0017.tif"/>
</fig>
<p>The <italic>Composite</italic> dataset contains 80,000 multi-plant color-corrected composite images (see <xref ref-type="fig" rid="F17">Figure 17B</xref>) with the same 80% (64,000 images), 10% (8,000 images), and 10% (8,000 images) split as above. Here we include a real soil background, but no GAN is used to individually translate each plant. Since this dataset is more similar to real data than the baseline we expect to see improved performance on an evaluation set composed of real images.</p>
<p>The third dataset used to train the network consists of 80,000 multi-plant GAN images (see <xref ref-type="fig" rid="F17">Figure 17C</xref>) split identically as above. This training data is created through the multi-plant GAN procedure previously described in this section. Here we expect network performance to be the greatest, since the training and target datasets are most similar. For future references this dataset is named <italic>GAN</italic>.</p>
<p>An additional dataset, known as the <italic>Merged</italic> dataset, is composed as the union of both the <italic>Composite</italic> and <italic>GAN</italic> datasets. As such, this training dataset consists of 160,000 total images for the network with half of the images including the usage of a GAN for plant translation.</p>
<p>With several plant detection models trained on various datasets, our models are evaluated on 253 additional real multi-plant images of canola and soy. These images were excluded during the training process and ground truth bounding boxes are determined by hand. The evaluation metrics used are precision, recall, and mean average precision (mAP). Before defining the metrics, one must first consider the intersection over union (IoU), given by Everingham et al. (<xref ref-type="bibr" rid="B10">2010</xref>):</p>
<disp-formula id="E10"><label>(9)</label><mml:math id="M28"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext>IoU</mml:mtext><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>Area&#x000A0;of&#x000A0;Overlap</mml:mtext></mml:mrow><mml:mrow><mml:mtext>Area&#x000A0;of&#x000A0;Union</mml:mtext></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where Area of Overlap refers to the area of overlap between the ground truth and predicted bounding boxes and Area of Union is the total area from joining the bounding boxes. Whether a bounding box prediction is considered to be successful is dependent on the IoU threshold, where we consider any prediction leading to an IoU greater than the threshold to be correct. Now, with IoU being used to determine if a model prediction is correct, the metrics precision and recall are defined by Everingham et al. (<xref ref-type="bibr" rid="B10">2010</xref>) as</p>
<disp-formula id="E11"><label>(10)</label><mml:math id="M29"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext>precision</mml:mtext></mml:mtd><mml:mtd><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>p</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>f</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E12"><label>(11)</label><mml:math id="M30"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext>recall</mml:mtext></mml:mtd><mml:mtd><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>p</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>f</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>tp</italic> denotes the number of true-positives, <italic>fp</italic> the false-positives, and <italic>fn</italic> the false-negatives. We interpolate precision over 101 recall values in the range [0.00, 1.00] in steps of 0.01. For notational simplicity, we define the set of recall values <italic>R</italic> &#x0003D; {0.00, 0.01, ..., 1.00}. The interpolated precision is given by Everingham et al. (<xref ref-type="bibr" rid="B10">2010</xref>) as</p>
<disp-formula id="E13"><label>(12)</label><mml:math id="M31"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext>precisio</mml:mtext><mml:msub><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow><mml:mrow><mml:mtext>inter</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtext>precisio</mml:mtext><mml:msub><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>:</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>&#x02265;</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, mAP is defined by Everingham et al. (<xref ref-type="bibr" rid="B10">2010</xref>) as</p>
<disp-formula id="E14"><label>(13)</label><mml:math id="M32"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">mAP</mml:mtext><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>R</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mtext>precisio</mml:mtext><mml:msub><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow><mml:mrow><mml:mtext>inter</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>n</italic> denotes the number of classes and the outer sum leads to a mean precision for all classes. Note that since we group our species into a single class we have <italic>n</italic> &#x0003D; 1 and the outer sum is presented here only for verbosity. The notation mAP&#x00040;0.5 denotes the evaluation of mAP using an IoU threshold of 0.5, consistent with the Pascal VOC evaluation metric (Everingham et al., <xref ref-type="bibr" rid="B10">2010</xref>). Alternatively, mAP&#x00040;0.5:0.95 denotes the average mAP value for all IoU thresholds in the range [0.50, 0.95], in steps of 0.05, which is identical to the evaluation metric for the COCO dataset challenge (Lin et al., <xref ref-type="bibr" rid="B23">2014</xref>). Sampled visual results for the performance of each network are given in <xref ref-type="fig" rid="F18">Figure 18</xref>, numerical results are provided by <xref ref-type="table" rid="T3">Table 3</xref>, including both the mAP&#x00040;0.5 and mAP&#x00040;0.5:0.95 metrics.</p>
<fig id="F18" position="float">
<label>Figure 18</label>
<caption><p>Sample images labeled by YoloV5 nano object detection models trained on various augmented datasets (see labels). Note that the ground truth bounding boxes are determined by hand. The top three image rows contain only soybean plants, the bottom three contain only canola.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1200977-g0018.tif"/>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Precision, recall, and mAP metrics when evaluating our four YoloV5 nano models on 253 real images of canola and soybean.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Training dataset</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>mAP&#x00040;0.5</bold></th>
<th valign="top" align="center"><bold>mAP&#x00040;0.5: 0.95</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="center">0.030</td>
<td valign="top" align="center">0.099</td>
<td valign="top" align="center">0.010</td>
<td valign="top" align="center">0.002</td>
</tr> <tr>
<td valign="top" align="left">Composite</td>
<td valign="top" align="center">0.597</td>
<td valign="top" align="center">0.468</td>
<td valign="top" align="center">0.454</td>
<td valign="top" align="center">0.202</td>
</tr> <tr>
<td valign="top" align="left">GAN</td>
<td valign="top" align="center">0.647</td>
<td valign="top" align="center">0.459</td>
<td valign="top" align="center">0.479</td>
<td valign="top" align="center">0.209</td>
</tr>
<tr>
<td valign="top" align="left">Merged</td>
<td valign="top" align="center">0.673</td>
<td valign="top" align="center">0.500</td>
<td valign="top" align="center">0.528</td>
<td valign="top" align="center">0.240</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The training dataset denotes the images used to train the network for which we are evaluating, not the set from which the evaluation images are taken.</p>
</table-wrap-foot>
</table-wrap>
<p>From both <xref ref-type="fig" rid="F18">Figure 18</xref> and <xref ref-type="table" rid="T3">Table 3</xref> it is clear that the baseline model shows no capacity to detect plants. This is expected as the training data is vastly different from the real images through which our models are evaluated. Both the <italic>Composite</italic>- and <italic>GAN</italic>-trained models show good ability to locate plants in the evaluation images, however the <italic>GAN</italic>-trained model performs better according to both the mAP&#x00040;0.5 and mAP&#x00040;0.5:0.95 metrics. The combination of the <italic>Composite</italic> and <italic>GAN</italic> datasets led to the greatest performing network in terms of our metrics. However, this could come as a result of being exposed to twice the number of training images. In general, the Yolo models trained on the <italic>Composite, GAN</italic>, and <italic>Merged</italic> datasets all appear capable of plant detection on the provided images.</p></sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>The contribution of this work is an image translation process through which one can produce artificial images in field settings, using images of plants taken in an indoor lab. The method is easily extendable to new plant species and settings, provided there exists sufficient real data to train the underlaying GAN. The construction of one&#x00027;s own augmented datasets enables further training of neural networks, with applications to in-field plant detection and classification. More importantly, this work is a first step in demonstrating that plants grown in growth chambers under precise and fully-controlled conditions can be used to easily generate large amounts of labeled data for developing machine learning models that operate in outdoor environments. This work has the potential to significantly improve and accelerate the model development process for machine learning applications in agriculture.</p>
<p>Future work will consist of mimicking outdoor lighting conditions via controllable led-based growth-chamber lights, leading to more variety in the input data and hopefully more realistic synthetic data. Additional work will focus on using the approach presented here to develop in-field plant-classification models, as well as developing outdoor datasets and machine learning models for other problem domains (such as disease detection). This will include an investigation into whether classification tasks require their own species-specific GAN for the image translation process.</p></sec>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. All the data for our article is available from the online portal we have created to share our data. This portal is hosted by the Digital Research Alliance of Canada, and access can be obtained by contacting the authors.</p></sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>AK: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing&#x02014;original draft, review, and editing. PS: data curation, formal analysis, investigation, software, validation, visualization, and writing&#x02014;original draft. CH: conceptualization, formal analysis, funding acquisition, methodology, project administration, resources, supervision, writing&#x02014;original draft, review, and editing. MB: data curation, formal analysis, methodology, supervision, and writing&#x02014;review and editing. CB: formal analysis, funding acquisition, project administration, resources, supervision, and writing&#x02014;review and editing. All authors contributed to the article and approved the submitted version.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This work was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant program (Nos. RGPIN-2018-04088 and RGPIN-2020-06191), Compute Canada (now Digital Research Alliance of Canada) Resources for Research Groups competition (No. 1679), Western Economic Diversification Canada (No. 15453), and the Mitacs Accelerate Grant program (No. IT14120).</p>
</sec>
<ack><p>The authors would like to thank Ezzat Ibrahim for establishing the Dr. Ezzat A. Ibrahim GPU Educational Lab at the University of Winnipeg, which provided the computing resources needed for this work.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s11">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2023.1200977/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2023.1200977/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>At the time of writing, the indoor dataset contains over 1.2 million labeled images of 14 different crops and weeds commonly found in Manitoba, Canada, while the outdoor dataset contains 540,000 still images extracted from video footage of five different common crops of this region. The datasets can be made available upon request.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ayalew</surname> <given-names>T. W.</given-names></name></person-group> (<year>2020</year>). <source>Unsupervised domain adaptation for object counting</source> (Doctoral dissertation). <publisher-name>University of Saskatchewan</publisher-name>, <publisher-loc>Saskatchewan, SK, Canada</publisher-loc>.<pub-id pub-id-type="pmid">35185973</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bah</surname> <given-names>M. D.</given-names></name> <name><surname>Hafiane</surname> <given-names>A.</given-names></name> <name><surname>Canals</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep learning with unsupervised data labeling for weed detection in line crops in UAV images</article-title>. <source>Remote Sensing</source> 10. <pub-id pub-id-type="doi">10.3390/rs10111690</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barth</surname> <given-names>R.</given-names></name> <name><surname>Hemming</surname> <given-names>J.</given-names></name> <name><surname>van Henten</surname> <given-names>E. J.</given-names></name></person-group> (<year>2018</year>). <article-title>Improved part segmentation performance by optimising realism of synthetic images using cycle generative adversarial networks</article-title>. <source>arXiv preprint arXiv:1803.06301</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1803.06301</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beck</surname> <given-names>M. A.</given-names></name> <name><surname>Liu</surname> <given-names>C.-Y.</given-names></name> <name><surname>Bidinosti</surname> <given-names>C. P.</given-names></name> <name><surname>Henry</surname> <given-names>C. J.</given-names></name> <name><surname>Godee</surname> <given-names>C. M.</given-names></name> <name><surname>Ajmani</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>An embedded system for the automated generation of labeled plant images to enable machine learning applications in agriculture</article-title>. <source>PLoS ONE</source> <volume>15</volume>, <fpage>e0243923</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0243923</pub-id><pub-id pub-id-type="pmid">33332382</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beck</surname> <given-names>M. A.</given-names></name> <name><surname>Liu</surname> <given-names>C.-Y.</given-names></name> <name><surname>Bidinosti</surname> <given-names>C. P.</given-names></name> <name><surname>Henry</surname> <given-names>C. J.</given-names></name> <name><surname>Godee</surname> <given-names>C. M.</given-names></name> <name><surname>Ajmani</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Presenting an extensive lab- and field-image dataset of crops and weeds for computer vision tasks in agriculture</article-title>. <source>arXiv preprint arXiv:2108.05789</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2108.05789</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Binch</surname> <given-names>A.</given-names></name> <name><surname>Fox</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Controlled comparison of machine vision algorithms for Rumex and Urtica detection in Grassland</article-title>. <source>Comput. Electron. Agric.</source> <volume>140</volume>, <fpage>123</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2017.05.018</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bochkovskiy</surname> <given-names>A.</given-names></name> <name><surname>Wang</surname> <given-names>C.-Y.</given-names></name> <name><surname>Liao</surname> <given-names>H.-Y. M.</given-names></name></person-group> (<year>2020</year>). <article-title>YOLOV4: optimal speed and accuracy of object detection</article-title>. <source>arXiv preprint arXiv:2004.10934</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2004.10934</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bosilj</surname> <given-names>P.</given-names></name> <name><surname>Duckett</surname> <given-names>T.</given-names></name> <name><surname>Cielniak</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Analysis of morphology-based features for classification of crop and weeds in precision agriculture</article-title>. <source>IEEE Robot. Automat. Lett.</source> <volume>3</volume>, <fpage>2950</fpage>&#x02013;<lpage>2956</lpage>. <pub-id pub-id-type="doi">10.1109/LRA.2018.2848305</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cap</surname> <given-names>Q. H.</given-names></name> <name><surname>Uga</surname> <given-names>H.</given-names></name> <name><surname>Kagiwada</surname> <given-names>S.</given-names></name> <name><surname>Iyatomi</surname> <given-names>H.</given-names></name></person-group> (<year>2022</year>). <article-title>Leafgan: An effective data augmentation method for practical plant disease diagnosis</article-title>. <source>IEEE Trans. Automat. Sci. Eng.</source> <volume>19</volume>, <fpage>1258</fpage>&#x02013;<lpage>1267</lpage>. <pub-id pub-id-type="doi">10.1109/TASE.2020.3041499</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Everingham</surname> <given-names>M.</given-names></name> <name><surname>Gool</surname> <given-names>L. V.</given-names></name> <name><surname>Williams</surname> <given-names>C. K. I.</given-names></name> <name><surname>Winn</surname> <given-names>J. M.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>The pascal visual object classes (VOC) challenge</article-title>. <source>Int. J. Comput. Vis.</source> <volume>88</volume>, <fpage>303</fpage>&#x02013;<lpage>338</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-009-0275-4</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fahlgren</surname> <given-names>N.</given-names></name> <name><surname>Gehan</surname> <given-names>M. A.</given-names></name> <name><surname>Baxter</surname> <given-names>I.</given-names></name></person-group> (<year>2015</year>). <article-title>Lights, camera, action: high-throughput plant phenotyping is ready for a close-up</article-title>. <source>Curr. Opin. Plant Biol.</source> <volume>24</volume>, <fpage>93</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1016/j.pbi.2015.02.006</pub-id><pub-id pub-id-type="pmid">25733069</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gehan</surname> <given-names>M. A.</given-names></name> <name><surname>Kellogg</surname> <given-names>E. A.</given-names></name></person-group> (<year>2017</year>). <article-title>High-throughput phenotyping</article-title>. <source>Am. J. Bot.</source> <volume>104</volume>, <fpage>505</fpage>&#x02013;<lpage>508</lpage>. <pub-id pub-id-type="doi">10.3732/ajb.1700044</pub-id><pub-id pub-id-type="pmid">28400413</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giuffrida</surname> <given-names>M. V.</given-names></name> <name><surname>Chen</surname> <given-names>F.</given-names></name> <name><surname>Scharr</surname> <given-names>H.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2018</year>). <article-title>Citizen crowds and experts: observer variability in image-based plant phenotyping</article-title>. <source>Plant Methods</source> <volume>14</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.1186/s13007-018-0278-7</pub-id><pub-id pub-id-type="pmid">29449872</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Giuffrida</surname> <given-names>M. V.</given-names></name> <name><surname>Scharr</surname> <given-names>H.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Arigan: synthetic arabidopsis plants using generative adversarial network,&#x0201D;</article-title> in <source>2017 IEEE International Conference on Computer Vision Workshops (ICCVW)</source> (<publisher-loc>Venice</publisher-loc>), <fpage>2064</fpage>&#x02013;<lpage>2071</lpage>.</citation>
</ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Courville</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <source>Deep Learning</source>. <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I. J.</given-names></name> <name><surname>Pouget-Abadie</surname> <given-names>J.</given-names></name> <name><surname>Mirza</surname> <given-names>M.</given-names></name> <name><surname>Xu</surname> <given-names>B.</given-names></name> <name><surname>Warde-Farley</surname> <given-names>D.</given-names></name> <name><surname>Ozair</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2014</year>). Generative adversarial networks.</citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Guillaumin</surname> <given-names>M.</given-names></name> <name><surname>Ferrari</surname> <given-names>V.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Large-scale knowledge transfer for object localization in imagenet,&#x0201D;</article-title> in <source>2012 IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Providence, RI</publisher-loc>), <fpage>3202</fpage>&#x02013;<lpage>3209</lpage>.<pub-id pub-id-type="pmid">37015683</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Isola</surname> <given-names>P.</given-names></name> <name><surname>Zhu</surname> <given-names>J.-Y.</given-names></name> <name><surname>Zhou</surname> <given-names>T.</given-names></name> <name><surname>Efros</surname> <given-names>A. A.</given-names></name></person-group> (<year>2016</year>). <article-title>Image-to-image translation with conditional adversarial networks</article-title>. <source>arXiv preprint arXiv:1611.07004</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1611.07004</pub-id><pub-id pub-id-type="pmid">34940729</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jocher</surname> <given-names>G.</given-names></name> <name><surname>Chaurasia</surname> <given-names>A.</given-names></name> <name><surname>Stoken</surname> <given-names>A.</given-names></name> <name><surname>Borovec</surname> <given-names>J.</given-names></name> <name><surname>NanoCode012</surname> <given-names>Kwon, Y.</given-names></name> <etal/></person-group>. (<year>2022</year>). <source>ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference</source>.</citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kuznichov</surname> <given-names>D.</given-names></name> <name><surname>Zvirin</surname> <given-names>A.</given-names></name> <name><surname>Honen</surname> <given-names>Y.</given-names></name> <name><surname>Kimmel</surname> <given-names>R.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Data augmentation for leaf segmentation and counting tasks in rosette plants,&#x0201D;</article-title> in <source>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</source> (<publisher-loc>Long Beach, CA</publisher-loc>), <fpage>2580</fpage>&#x02013;<lpage>2589</lpage>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>436</fpage>&#x02013;<lpage>444</lpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id><pub-id pub-id-type="pmid">26017442</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liakos</surname> <given-names>K. G.</given-names></name> <name><surname>Busato</surname> <given-names>P.</given-names></name> <name><surname>Moshou</surname> <given-names>D.</given-names></name> <name><surname>Pearson</surname> <given-names>S.</given-names></name> <name><surname>Bochtis</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Machine learning in agriculture: a review</article-title>. <source>Sensors</source> 18. <pub-id pub-id-type="doi">10.3390/s18082674</pub-id><pub-id pub-id-type="pmid">30110960</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>T.-Y.</given-names></name> <name><surname>Maire</surname> <given-names>M.</given-names></name> <name><surname>Belongie</surname> <given-names>S.</given-names></name> <name><surname>Bourdev</surname> <given-names>L.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name> <name><surname>Hays</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Microsoft COCO: common objects in context</article-title>. <source>arXiv preprint arXiv:1405.0312</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1405.0312</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobet</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Image analysis in plant sciences: publish then perish</article-title>. <source>Trends Plant Sci.</source> <volume>22</volume>, <fpage>559</fpage>&#x02013;<lpage>566</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2017.05.002</pub-id><pub-id pub-id-type="pmid">28571940</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>D.</given-names></name> <name><surname>Olaniyi</surname> <given-names>E.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Generative adversarial networks (GANs) for image augmentation in agriculture: a systematic review</article-title>. <source>Comput. Electron. Agric.</source> <volume>200</volume>, <fpage>107208</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2022.107208</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madsen</surname> <given-names>S. L.</given-names></name> <name><surname>Dyrmann</surname> <given-names>M.</given-names></name> <name><surname>Jorgensen</surname> <given-names>R. N.</given-names></name> <name><surname>Karstoft</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>Generating artificial images of plant seedlings using generative adversarial networks</article-title>. <source>Biosyst. Eng.</source> <volume>187</volume>, <fpage>147</fpage>&#x02013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2019.09.005</pub-id><pub-id pub-id-type="pmid">37223315</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>T.</given-names></name> <name><surname>Efros</surname> <given-names>A. A.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name> <name><surname>Zhu</surname> <given-names>J.-Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Contrastive learning for unpaired image-to-image translation</article-title>. <source>arXiv preprint arXiv:2007.15651</source>.</citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Redmon</surname> <given-names>J.</given-names></name> <name><surname>Divvala</surname> <given-names>S.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name> <name><surname>Farhadi</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;You only look once: unified, real-time object detection,&#x0201D;</article-title> in <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Las Vegas, NV</publisher-loc>), <fpage>779</fpage>&#x02013;<lpage>788</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shakoor</surname> <given-names>N.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name> <name><surname>Mockler</surname> <given-names>T. C.</given-names></name></person-group> (<year>2017</year>). <article-title>High throughput phenotyping to accelerate crop breeding and monitoring of diseases in the field</article-title>. <source>Curr. Opin. Plant Biol.</source> <volume>38</volume>, <fpage>184</fpage>&#x02013;<lpage>192</lpage>. <pub-id pub-id-type="doi">10.1016/j.pbi.2017.05.006</pub-id><pub-id pub-id-type="pmid">28738313</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>A.</given-names></name> <name><surname>Ganapathysubramanian</surname> <given-names>B.</given-names></name> <name><surname>Singh</surname> <given-names>A. K.</given-names></name> <name><surname>Sarkar</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Machine learning for high-throughput stress phenotyping in plants</article-title>. <source>Trends Plant Sci.</source> <volume>21</volume>, <fpage>110</fpage>&#x02013;<lpage>124</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2015.10.015</pub-id><pub-id pub-id-type="pmid">26651918</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tardieu</surname> <given-names>F.</given-names></name> <name><surname>Cabrera-Bosquet</surname> <given-names>L.</given-names></name> <name><surname>Pridmore</surname> <given-names>T.</given-names></name> <name><surname>Bennett</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Plant phenomics, from sensors to knowledge</article-title>. <source>Curr. Biol.</source> <volume>27</volume>, <fpage>R770</fpage>&#x02013;<lpage>R783</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2017.05.055</pub-id><pub-id pub-id-type="pmid">28787611</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>W&#x000E4;ldchen</surname> <given-names>J.</given-names></name> <name><surname>Rzanny</surname> <given-names>M.</given-names></name> <name><surname>Seeland</surname> <given-names>M.</given-names></name> <name><surname>Mader</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Automated plant species identification-trends and future directions</article-title>. <source>PLoS Comput. Biol.</source> <volume>14</volume>, <fpage>e1005993</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005993</pub-id><pub-id pub-id-type="pmid">29621236</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeng</surname> <given-names>Q.</given-names></name> <name><surname>Ma</surname> <given-names>X.</given-names></name> <name><surname>Cheng</surname> <given-names>B.</given-names></name> <name><surname>Zhou</surname> <given-names>E.</given-names></name> <name><surname>Pang</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>GANs-based data augmentation for citrus disease severity detection using deep learning</article-title>. <source>IEEE Access</source> <volume>8</volume>, <fpage>172882</fpage>&#x02013;<lpage>172891</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3025196</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>Z.-Q.</given-names></name> <name><surname>Zheng</surname> <given-names>P.</given-names></name> <name><surname>Xu</surname> <given-names>S.-T.</given-names></name> <name><surname>Wu</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Object detection with deep learning: a review</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst.</source> <volume>30</volume>, <fpage>3212</fpage>&#x02013;<lpage>3232</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2018.2876865</pub-id><pub-id pub-id-type="pmid">36850584</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>F.</given-names></name> <name><surname>He</surname> <given-names>M.</given-names></name> <name><surname>Zheng</surname> <given-names>Z.</given-names></name></person-group> (<year>2020</year>). <article-title>Data augmentation using improved CDCGAN for plant vigor rating</article-title>. <source>Comput. Electron. Agric.</source> <volume>175</volume>, <fpage>105603</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2020.105603</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>J.-Y.</given-names></name> <name><surname>Park</surname> <given-names>T.</given-names></name> <name><surname>Isola</surname> <given-names>P.</given-names></name> <name><surname>Efros</surname> <given-names>A. A.</given-names></name></person-group> (<year>2017</year>). <article-title>Unpaired image-to-image translation using cycle-consistent adversarial networks</article-title>. <source>arXiv preprint arXiv:1703.10593</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1703.10593</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Aoun</surname> <given-names>M.</given-names></name> <name><surname>Krijn</surname> <given-names>M.</given-names></name> <name><surname>Vanschoren</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Data augmentation using conditional generative adversarial networks for leaf counting in arabidopsis plants,&#x0201D;</article-title> in <source>British Machine Vision Conference</source> (<publisher-loc>Newcastle</publisher-loc>).</citation>
</ref>
</ref-list>
</back>
</article> 

