<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurorobot.</journal-id>
<journal-title>Frontiers in Neurorobotics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurorobot.</abbrev-journal-title>
<issn pub-type="epub">1662-5218</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnbot.2022.1059497</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A novel single robot image shadow detection method based on convolutional block attention module and unsupervised learning network</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhang</surname> <given-names>Jun</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2033104/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Liu</surname> <given-names>Junjun</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Office of Academic Affairs, Zhengzhou University of Science and Technology</institution>, <addr-line>Zhengzhou</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>College of Information Engineering, Zhengzhou University of Science and Technology</institution>, <addr-line>Zhengzhou</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Peng Li, Dalian University of Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Inam Ullah, Chungbuk National University, South Korea; Desheng Liu, Jiamusi University, China; Dianchen He, Shenyang Normal University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Junjun Liu <email>9663137&#x00040;qq.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>11</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>16</volume>
<elocation-id>1059497</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>10</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>10</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Zhang and Liu.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Zhang and Liu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Shadow detection plays a very important role in image processing. Although many algorithms have been proposed in different environments, it is still a challenging task to detect shadows in natural scenes. In this paper, we propose a convolutional block attention module (CBAM) and unsupervised domain adaptation adversarial learning network for single image shadow detection. The new method mainly contains three steps. Firstly, in order to reduce the data deviation between the domains, the hierarchical domain adaptation strategy is adopted to calibrate the feature distribution from low level to high level between the source domain and the target domain. Secondly, in order to enhance the soft shadow detection ability of the model, the boundary adversarial branch is proposed to obtain structured shadow boundary. Meanwhile, a CBAM is added in the model to reduce the correlation between different semantic information. Thirdly, the entropy adversarial branch is combined to further suppress the high uncertainty at the boundary of the prediction results, and it obtains the smooth and accurate shadow boundary. Finally, we conduct abundant experiments on public datasets, the RMSE has the lowest values with 9.6 and BER with 6.6 on ISTD dataset, the results show that the proposed shadow detection method has better edge structure compared with the existing deep learning detection methods.</p></abstract>
<kwd-group>
<kwd>robot image shadow detection</kwd>
<kwd>hierarchical domain adaptation strategy</kwd>
<kwd>boundary adversarial branch</kwd>
<kwd>unsupervised learning</kwd>
<kwd>convolutional block attention module</kwd>
</kwd-group>
<counts>
<fig-count count="12"/>
<table-count count="4"/>
<equation-count count="18"/>
<ref-count count="28"/>
<page-count count="11"/>
<word-count count="6278"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Shadows exist in most scenes in our daily life, which are shielded by light sources. Shadows can preserve important information about dynamic scene and objects, such as detection of buildings and vegetation areas, and detection of clouds through shadows in satellite images. On the other hand, shadows are also a major source of error and uncertainty (Shoulin et al., <xref ref-type="bibr" rid="B20">2018</xref>; Sun et al., <xref ref-type="bibr" rid="B21">2019</xref>; Yuan et al., <xref ref-type="bibr" rid="B27">2020</xref>). For example, shadows may be wrongly labeled as targets in dynamic target tracking tasks. Therefore, shadow detection in images can significantly improve the performance of many visual tasks. The shape and brightness of the shadow depends on the intensity, direction, color of the light source, and the geometry and albedo of the shade. Shadows can be divided into hard shadows and soft shadows based on their intensity. Hard shadows have relatively clear shadow boundaries, while soft shadows are often generated when the light source intensity is low, and the shadow boundaries are blurred. Most existing shadow detection methods are usually limited to hard shadow detection. Compared with video shadow detection, single image shadow detection is more challenging because of the lack of relevant information before and after frames (Sun et al., <xref ref-type="bibr" rid="B22">2018</xref>).</p>
<p>Most traditional shadow detection methods are based on the fact that the brightness of the shadow pixel is different from that of the non-shadow pixel (Vicente et al., <xref ref-type="bibr" rid="B23">2018</xref>). In addition, Wang et al. (<xref ref-type="bibr" rid="B25">2018</xref>) firstly divided images into multiple image blocks based on statistical learning method, and then classified these blocks using Least Squares Support Vectors Machine (LSSVM) to obtain shadow detection results. In recent years, many methods based on deep learning have quickly become the benchmark due to their good effects and calculation efficiency. For example, Khan et al. (<xref ref-type="bibr" rid="B7">2016</xref>) combined Conditional Random Field (CRF) and convolutional neural network (CNN) to extract the local features of shadow pixels in the image. In Yago Vicente et al. (<xref ref-type="bibr" rid="B26">2016</xref>), a stacked convolutional neural network (Stacked CNN) was proposed based on a large-scale shadow detection data set. It allowed one CNN with learned semantic features to train another CNN and refined the details of the shadow areas. Recently, Nguyen et al. (<xref ref-type="bibr" rid="B14">2017</xref>) proposed a novel shadow detection method based on Conditional Generative Adversarial Network (CGAN), which benefited from special sensitivity factors and adversarial learning framework, which could obtain relatively accurate shadow mask. Based on the idea of adversarial learning, Le et al. (<xref ref-type="bibr" rid="B9">2018</xref>) trained a shadow image attenuator to generate additional challenging image data to enhance the robustness of shadow detection. Wang et al. (<xref ref-type="bibr" rid="B25">2018</xref>) proposed the Stacked Conditional Generative Adversarial Network (ST-CGAN), which used two CGAN for shadow detection task and shadow removal task, respectively. Mohajerani and Saeedi (<xref ref-type="bibr" rid="B13">2018</xref>) preserved the global semantic features of shadows by changing the internal connection of the network to enhance the ability of shadow detection based on U-Net13.</p>
<p>The above methods can be roughly divided into the traditional machine learning methods based on custom features and the feature learning methods based on deep learning (Ji et al., <xref ref-type="bibr" rid="B4">2021</xref>; Ma et al., <xref ref-type="bibr" rid="B12">2021</xref>; Shafiq and Gu, <xref ref-type="bibr" rid="B15">2022</xref>). Due to the lack of prior information of light source or occlusion, traditional machine learning methods based on custom features often lack robust custom features and cannot accurately understand shadows. Through many rich experiments, although many deep learning methods are more accurate than traditional methods, they usually only have good results on homologous test sets. In addition, most shadow images in common data sets are strong shadow images captured by artificial occlusion (Kamnitsas et al., <xref ref-type="bibr" rid="B6">2017</xref>; Shafiq et al., <xref ref-type="bibr" rid="B17">2020b</xref>; Hatamizadeh et al., <xref ref-type="bibr" rid="B3">2022</xref>). However, the shapes and scenes of shadows are not limited to such shadows, such as shadows on buildings or soft shadows cast when the light source is not strong enough. They do not have clear shadow boundaries. Deep learning methods used to detect shadow images in these target domains (target datasets) often only produce incomplete and jagged shadow detection results.</p>
<p>To solve the above problems, our research goal is that a novel unsupervised domain adaptation adversarial learning network for single image shadow detection is proposed in this paper. The model is trained by supervised learning on the source data set. But for the unused target data set, the complex artificial labeling process is considered to make the model have the same performance on the target data set, and enhance the robustness of the model. Specifically, in the process of feature extraction, the multi-layer feature domain adaptation strategy is combined to minimize the data deviation between the source domain and the target domain. Secondly, the boundary adversarial branch is proposed, and the boundary generator and boundary discriminator are used to strengthen the boundary structure of soft shadow detection results. Finally, entropy adversarial branch is introduced to reduce the uncertainty of the shadow boundary region in the shadow image, and a smooth and accurate shadow mask is obtained.</p>
<p>This paper is organized as follows. Section 2 detailed introduces the proposed domain adaptation adversarial learning network for shadow detection. We conduct rich experiments in section 3. There is a conclusion in section 4.</p></sec>
<sec id="s2">
<title>2. Proposed unsupervised learning network</title>
<p>Different &#x0201C;domains&#x0201D; are actually different data sets. The process of domain adaptation aims to make a model adapt to multiple different domains, so that the model can be better generalized to other data sets. Many supervised deep learning methods can bring significant performance improvement for shadow automatic detection, but due to cross-domain discrepancy (Shafiq et al., <xref ref-type="bibr" rid="B18">2020a</xref>), the model cannot get satisfactory results on the target data set. As shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, through many experiments and analysis, the deep network trained on source data set ISTD can usually only generate relatively accurate shadow results for its homologous test images. When applied to the target data set SBU, the boundary structure of shadow detection results is poor, as shown in <xref ref-type="fig" rid="F1">Figure 1B</xref>. The proposed model not only performs well on the source data set, but also has good detection capability on the target data set, as shown in <xref ref-type="fig" rid="F1">Figure 1C</xref>. Compared with these methods, when facing a new data set, the proposed method is no longer need the tedious manual labeling work as training data to provide the corresponding shadow labeled data, it uses the unsupervised learning aiming to make the model easily realize the domain adapt to get accurate shadow detection results for new data sets.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Analysis of cross-domain discrepancy. First row: Source data set. Second row: Target data set. First column: shadow image <bold>(A)</bold>. Second column: CGAN method <bold>(B)</bold>. Third column: proposed method <bold>(C)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0001.tif"/>
</fig>
<p>The proposed shadow detection framework is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. For the shadow images in the source domain and the target domain, a separate feature extraction channel is firstly adopted, and the domain discriminator is used to judge the domain label of the current feature from the low level to the high level. Then, two generative adversarial branches are constructed. The boundary adversarial branch is used to enhance the detection ability of soft shadow image in the target dataset (Lee et al., <xref ref-type="bibr" rid="B10">2022</xref>). The entropy adversarial branch can further suppress the uncertainty at the boundary of the shadow, so that a smooth and accurate shadow mask can be obtained. With the objective function and special network connection, the two tasks are mutually constrained and promoted to achieve accurate cross-domain shadow detection.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Proposed single robot image shadow detection.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0002.tif"/>
</fig>
<sec>
<title>2.1. Hierarchical feature extraction method</title>
<p>The traditional domain adaptation model only corrects the feature distribution between different domains in the last convolution layer to realize the whole local adaptation (Chen et al., <xref ref-type="bibr" rid="B1">2018</xref>). However, this method ignores the importance of low-level features, and makes some domain-sensitive local features weaken the generalization ability of the domain adaptation model. Because of the non-transferable layer, a single domain classifier is difficult to eliminate the data deviation between the source domain and the target domain. Inspired by Shafiq et al. (<xref ref-type="bibr" rid="B19">2020c</xref>) and Zhang et al. (<xref ref-type="bibr" rid="B28">2020</xref>), shadow images are taken as input in the source domain and target domain. In the process of image encoding, each convolution layer in the encoder has a corresponding feature graph. It extracts the output feature graphs of multiple middle layers in the encoder. The corresponding image domain classifier is constructed on each convolution layer between the encoders of the source domain and the target domain to promote the feature matching in the middle layer. The aim is to make two different encoders still have similar feature extraction process under different data sets to achieve the purpose of domain adaptation. The objective function is shown in Equation (1):</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>o</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>l</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>D</italic><sub><italic>i</italic></sub> is the domain label of the <italic>i</italic> &#x02212; <italic>th</italic> image. <inline-formula><mml:math id="M2"><mml:msubsup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the feature graph activation value of the pixel at the <italic>k</italic> &#x02212; <italic>th</italic> layer with coordinate (<italic>o, p</italic>) in the <italic>i</italic> &#x02212; <italic>th</italic> image. <italic>f</italic><sub><italic>k</italic></sub> is the corresponding domain classifier.</p>
<p>Hierarchical domain adaptation ensures that the intermediate features between the two domains have similar distribution, thus enhancing the robustness of the adaptation model. In the process of shadow detection, eliminating the data deviation between domains can improve the accuracy of shadow detection on the target data set. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, <xref ref-type="fig" rid="F3">Figure 3A</xref> is the shadow image in the target domain. <xref ref-type="fig" rid="F3">Figure 3B</xref> is the label data (ground truth). <xref ref-type="fig" rid="F3">Figure 3C</xref> shows the shadow detection results with the global domain adaptation. <xref ref-type="fig" rid="F3">Figure 3D</xref> shows the shadow detection results with the hierarchical domain adaptation. Compared with <xref ref-type="fig" rid="F3">Figures 3C</xref>,<xref ref-type="fig" rid="F3">D</xref>, the model obtains a better generalization after the hierarchical domain adaptation feature extraction, and has a more accurate detection ability for the text with different colors adjacent to the shadow in the image.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Effect of hierarchical domain adaptation on shadow detection. First column <bold>(A)</bold>, second column <bold>(B)</bold>, third column <bold>(C)</bold>, and fourth column <bold>(D)</bold> are shadow image, GT, global domain adaptation, hierarchical domain adaptation, respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0003.tif"/>
</fig></sec>
<sec>
<title>2.2. A hybrid domain attention mechanism with CBAM</title>
<p>For computer vision tasks, the attention mechanism plays the role of generating weights for each pixel of the image. Ideally, the weight of foreground pixel will increase and the weight of background pixel will decrease gradually. Through the widening of the weight gap, the effect of different semantic separation will be achieved.</p>
<p>Convolutional Block Attention Module (CBAM) is a reliable attention mechanism algorithm in computer vision tasks, which has a simple algorithm structure and considerable practical effect. Convolutional block attention module combines the space and channel of CNN to generate respective attention for images and feature maps of different attention domains, and guides the model to distinguish semantic information more efficiently.</p>
<p>Convolutional block attention module is composed of spatial domain attention generation module and channel domain generation module, and the two modules need to be combined by weighted sum operation. Where, the space domain generation module can be expressed as:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mn>7</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>According to the feature map <italic>F</italic> output by the CNN, the global average pooling and global maximum pooling operations of the feature map are carried out simultaneously. Then, the results of the two pooling methods are connected based on channels, and a convolution network with the number of target channels is 1 and the convolution kernel is 7 &#x000D7; 7 is input. The number of channels is reduced to 1 without changing the length and width of the feature map. Then the activation function Sigmoid is used to transform the output into nonlinear data, and the spatial domain attention matrix <italic>M</italic><sub><italic>s</italic></sub>(<italic>F</italic>) is obtained. <xref ref-type="fig" rid="F4">Figure 4</xref> shows the spatial domain generation module of CBAM.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>CBAM.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0004.tif"/>
</fig>
<p>The channel domain generation module can be expressed as:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(7)</label><mml:math id="M8"><mml:msub><mml:mi>M</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mi>C</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mi>C</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>In the attention module of channel domain, the average pooling and maximum pooling operations based on channel are carried out synchronously in the feature map <italic>F</italic>. Then the results of the two operations are respectively input into the same multi-layer perceptron, and the two vectors are directly added together. It inputs Sigmoid activation function, and outputs channel domain attention matrix <italic>M</italic><sub><italic>c</italic></sub>(<italic>F</italic>).</p></sec>
<sec>
<title>2.3. Boundary feature analysis</title>
<p>The existing shadow detection data sets lack the soft shadow images with rich scenes because of the single acquisition method (using various shielders under the strong light source). Affected by the intensity of light source, soft shadow image does not have clear shadow boundary. However, many existing deep learning methods are unable to obtain good detection results on soft shadow images. As shown in <xref ref-type="fig" rid="F5">Figure 5</xref>, for soft shadow images in the target data set, the detection result of boundary structure cannot be obtained only by correcting feature distribution, as shown in <xref ref-type="fig" rid="F5">Figure 5B</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Analysis of boundary adversarial branch. <bold>(A)</bold> Shadow image, <bold>(B)</bold> before boundary adversarial, <bold>(C)</bold> after boundary adversarial, and <bold>(D)</bold> detection result.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0005.tif"/>
</fig>
<p>In order to solve the above problems, the boundary adversarial branch model is constructed to predict the boundary structure results in the target data set. Boundary adversarial branch is designed to generate the shadow boundary image as shown <xref ref-type="fig" rid="F5">Figure 5C</xref>. Then, based on the principle of adversarial learning, the discriminator is used to further improve the quality of the generated image. With the initial positioning of the shadow boundary, the subsequent shadow detection results will have more boundary structure and ultimately improve the detection ability of soft shadow.</p>
<p>Assume that the source data set is <italic>S</italic>. Its label data is the ground truth mask <italic>y</italic><sub><italic>s</italic></sub>. The target dataset <italic>T</italic> has no labeled data. Firstly, the generator <italic>G</italic><sub><italic>b</italic></sub> fits the shadow boundary in the image and generates boundary prediction results <italic>G</italic><sub><italic>b</italic></sub>(<italic>x</italic><sub><italic>s</italic></sub>) and <italic>G</italic><sub><italic>b</italic></sub>(<italic>x</italic><sub><italic>t</italic></sub>) for the light source shadow image <italic>x</italic><sub><italic>s</italic></sub> and target shadow image <italic>x</italic><sub><italic>t</italic></sub>, respectively. The visualization is shown in <xref ref-type="fig" rid="F5">Figure 5C</xref>. Secondly, the discriminator <italic>D</italic><sub><italic>b</italic></sub> is designed to determine whether the boundary comes from the source or the target dataset. With the boundary adversarial branch, for the soft shadow image in the target domain, it can accurately identify the shadow region, as shown in <xref ref-type="fig" rid="F5">Figure 5D</xref>.</p>
<p>For the source domain data set and target domain data set with domain label, boundary discriminator <italic>D</italic><sub><italic>b</italic></sub> judges and punishes <italic>G</italic><sub><italic>b</italic></sub>(<italic>x</italic><sub><italic>s</italic></sub>) and <italic>G</italic><sub><italic>b</italic></sub>(<italic>x</italic><sub><italic>t</italic></sub>), respectively, as shown in Equation (8):</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>L</italic><sub><italic>B</italic></sub> is binary cross entropy loss, which is defined as <italic>L</italic><sub><italic>B</italic></sub>(&#x00177;, <italic>y</italic>) &#x0003D; &#x02212;(<italic>yln&#x00177;</italic> &#x0002B; (1 &#x02212; <italic>y</italic>)<italic>ln</italic>(1 &#x02212; &#x00177;)); <italic>N</italic> and <italic>M</italic> are the number of images in the source data set and the target data set, respectively.</p>
<p>The loss function <italic>L</italic><sub><italic>G</italic><sub><italic>b</italic></sub></sub> of the generator is a weighted combination of the mean absolute error loss term on the source data set and the adversarial loss term on the target data set, as shown in Equation (9):</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M10"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">&#x0007C;</mml:mo></mml:mstyle><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mi>b</mml:mi></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:msub><mml:mo stretchy="false">&#x0007C;</mml:mo><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mo>&#x003BB;</mml:mo><mml:mn>1</mml:mn></mml:msub><mml:mfrac><mml:mn>1</mml:mn><mml:mi>M</mml:mi></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>B</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the shadow boundary label image in the source data set.</p></sec>
<sec>
<title>2.4. Entropy mask prediction</title>
<p>After the boundary adversarial branch, using the additional shadow mask generator directly generates zigzag shadow detection boundaries for the target dataset (<xref ref-type="fig" rid="F5">Figure 5B</xref>). Inspired by Vu et al. (<xref ref-type="bibr" rid="B24">2019</xref>), the shadow mask results have a high entropy value (uncertainty) in the region near the shadow boundary, which will lead to the zigzag boundary phenomenon.</p>
<p>In order to suppress the uncertain prediction results, the entropy adversarial branch first generates the shadow probability map for the shadow image. Based on the probability map, the Shannon entropy is used to transform the probability map into the entropy map. Entropy maps of the target domain and source domain are forced to be as similar as possible, so as to reduce the effect difference between the model on the target and source data sets. Finally, the quality of the generated image is improved by the idea of adversarial learning. The high entropy value in the entropy graph should only be around the shadow boundary. The reasonable entropy distribution corresponds to the shadow detection results with smooth boundary.</p>
<p>Mask generator <italic>G</italic><sub><italic>m</italic></sub> generates mask prediction results <italic>G</italic><sub><italic>m</italic></sub>(<italic>x</italic><sub><italic>s</italic></sub>) and <italic>G</italic><sub><italic>m</italic></sub>(<italic>x</italic><sub><italic>t</italic></sub>) for source and target images, respectively. Given the mask prediction result <italic>p</italic> of input image <italic>x</italic>, Shannon entropy can be used to calculate the entropy graph, as shown in Equation (10):</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Entropy discriminator <italic>D</italic><sub><italic>e</italic></sub> aims to calibrate the distribution of <italic>E</italic>(<italic>x</italic><sub><italic>s</italic></sub>) and <italic>E</italic>(<italic>x</italic><sub><italic>t</italic></sub>). Similar to the boundary-driven adversarial learning, the entropy discriminator <italic>D</italic><sub><italic>e</italic></sub> determines whether the entropy graph comes from the source domain or the target domain. Its objective function is shown in Equation (11):</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The loss function <italic>L</italic><sub><italic>G</italic><sub><italic>m</italic></sub></sub> of the generator is a weighted combination of the pixel-level cross entropy loss on the source data set and the adversarial loss item on the target data set, as shown in Equation (12):</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x000B7;</mml:mo><mml:mi>l</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x000B7;</mml:mo><mml:mi>l</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003BB;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the shadow mask label image.</p></sec>
<sec>
<title>2.5. Shadow removal</title>
<p>Firstly, the coherence block matching strategy is used to find the best matching non-shaded region block for each region block in the shaded region. Then, local illumination propagation and global illumination optimization were performed for each matching shaded and non-shaded area pair. Finally, the shadow boundary is processed to get the final result.</p>
<p><bold>A. Local illumination propagation</bold></p>
<p>The shadow area is modeled and combined with the illumination propagation algorithm, the ratio of direct light and indirect light is calculated to obtain the video without shadow:</p>
<disp-formula id="E14"><label>(13)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>I</italic><sub><italic>i</italic></sub> is the <italic>i</italic> &#x02212; <italic>th</italic> pixel of the image in RGB space. <italic>k</italic><sub><italic>i</italic></sub> &#x02208; [0, 1] is the degree of direct illumination of the pixel. Both <italic>L</italic><sub><italic>d</italic></sub> and <italic>L</italic><sub><italic>e</italic></sub> are vectors of scale 3, representing the intensity of direct light and ambient light. <italic>R</italic><sub><italic>i</italic></sub> is the reflectance of the pixel, which is also a three-dimensional vector, and each dimension corresponds to a color channel of the RGB image. Equation (13) indicates that the pixel value of a pixel is obtained by the interaction of direct light and ambient light and multiplied by the reflectance of the pixel. The state of direct light in the image can be divided into three situations, namely, shaded area, non-shaded area, and semi-shaded area. When <italic>k</italic><sub><italic>i</italic></sub> &#x0003D; 0, the pixel is not affected by direct light and belongs to the shadow area. When <italic>k</italic><sub><italic>i</italic></sub> &#x0003D; 1, direct light completely acts on the pixel, and the pixel belongs to the non-shadow region. When <italic>k</italic><sub><italic>i</italic></sub> &#x02208; (0, 1) is the shaded transition region.</p>
<p>Then for a pixel with the shadow removed, the relationship between its pixel value and the pixel value of the shadow pixel can be simplified as Equation (14):</p>
<disp-formula id="E15"><label>(14)</label><mml:math id="M18"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>r</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M19"><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula> is the ratio of direct light to ambient light. <italic>I</italic><sub><italic>i</italic></sub> is the RGB value of the <italic>i</italic>-th pixel of the original image.</p>
<p><bold>B. Global shadow removal</bold></p>
<p>Although the local illumination propagation operation can remove the shadow in the shadow area block, it can not get the spatio-temporal coherent shadow free video result. After the local illumination propagation, the global shadow removal method should be used to make up for this deficiency. In order to obtain spatially coherent unshaded images <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, a weighted average method is proposed to recover the unshaded values of pixels in overlapping areas in the following equation.</p>
<disp-formula id="E16"><label>(15)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>N</italic><sub><italic>s</italic></sub>(<italic>p</italic>) is the block containing pixel <italic>p</italic>. <italic>w</italic><sub><italic>i</italic></sub> &#x0003D; <italic>dist</italic>(<italic>i, j</italic>) is the similarity distance between block <italic>s</italic><sub><italic>i</italic></sub> and its corresponding block <italic>L</italic><sub><italic>j</italic></sub>. Where <italic>N</italic><sub><italic>L</italic></sub>(<italic>p</italic>) is the block set formed by the nearest <italic>n</italic> <italic>N</italic><sub><italic>s</italic></sub>(<italic>p</italic>) corresponding to the bright region. <inline-formula><mml:math id="M22"><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the result of local illumination propagation of pixel <italic>p</italic> through both block <italic>S</italic><sub><italic>i</italic></sub> and block <italic>L</italic><sub><italic>i</italic></sub>. In summary, the pixel value of pixel <italic>p</italic> in the overlapping area is obtained by the weighted average of multiple blocks containing pixel <italic>p</italic>.</p>
<p>Using the above global optimization technique, spatially smooth shadow-free results can be obtained within the shadow. Using weighted average method to calculate the shadow free value of overlapping pixels can avoid or greatly reduce the fuzzy artifacts in the overlapping area. Minimizing the objective function ensures that the results are consistent in time.</p>
<disp-formula id="E17"><label>(16)</label><mml:math id="M23"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mi>f</mml:mi></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder><mml:mo>=</mml:mo></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mi>&#x003D5;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mi>f</mml:mi></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:msup><mml:mi>V</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mi>s</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo>+</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mi>f</mml:mi></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:msub><mml:msup><mml:mi>V</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mi>s</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo>+</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:msup><mml:mo stretchy="false">]</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>V</italic><sup><italic>f</italic></sup> is the final shadow-free result. <italic>S</italic> represents the shaded area. <italic>u</italic>(<italic>p</italic>) and <italic>v</italic>(<italic>p</italic>) are the forward and backward optical flow direction of pixel <italic>p</italic>. <inline-formula><mml:math id="M25"><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msqrt></mml:math></inline-formula>. Using the gradient descent algorithm to minimize the objective function, the spatio-temporal coherence of the image without shadow can be obtained.</p></sec>
<sec>
<title>2.6. Network structure and training</title>
<p>The proposed network structure adopts encoder-decoder structure of U-Net (Lee et al., <xref ref-type="bibr" rid="B10">2022</xref>). U-Net structure consists of a contraction channel and an expansion channel. The contraction channel is used to extract contextual features, while the expansion channel is used for image up-sampling to obtain a generated image. The discriminator of the proposed network is also consistent with Lata et al. (<xref ref-type="bibr" rid="B8">2019</xref>), it contains multiple convolution blocks. The convolution layer is followed by Batch Normalization and activation function LeakyRelu. The last layer of the discriminator is a Sigmoid function, which outputs the probability value of the true image. In the training process, the generation network and the discriminant network are optimized by the alternating gradient updating strategy. First, the boundary and entropy discriminant networks are optimized to minimize the objective function. Second, the generator network, generation loss, and hierarchical domain adaptation loss are optimized. The overall loss function of the generator network is shown in Equation (17):</p>
<disp-formula id="E19"><label>(17)</label><mml:math id="M26"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The detailed variations of the overall loss value and its accuracy at each training stage are shown in <xref ref-type="fig" rid="F6">Figures 6</xref>, <xref ref-type="fig" rid="F7">7</xref> with our proposed shadow detection method. As can be seen from <xref ref-type="fig" rid="F6">Figures 6</xref>, <xref ref-type="fig" rid="F7">7</xref>, the convergence process of the proposed method is stable, which reduces the over-fitting phenomenon effectively. The overall accuracy exceeds 96%.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Loss value curves.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Overall accuracy curves.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0007.tif"/>
</fig></sec></sec>
<sec id="s3">
<title>3. Experiments and analysis</title>
<p>The experiment platform is: Python programming language, TensorFlow package, Ubuntu 18.04, 16 GB memory, Inter i7 CPU, and NVIDIA GTX1060TI. In the network, the slope of LRELU is set to 0.25, and the objective function is optimized by Adam. The 286 &#x000D7; 286 pixel image in the data set is cropped into 256 &#x000D7; 256 pixel sub-images and flipped to increase the training data. &#x003BB;<sub>1</sub> &#x0003D; &#x003BB;<sub>2</sub> &#x0003D; 0.5. The initial learning rate is 0.1. As shown in <xref ref-type="fig" rid="F8">Figure 8</xref>, three groups of different training images in the source data set are shown. The three groups of images represent simple geometric boundary shadow, text mixed shadow and complex structure shadow image, respectively. The training data sets with various scenarios are more conducive to the generalization of the network model.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p><bold>(A&#x02013;C)</bold> Datasets for network training.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0008.tif"/>
</fig>
<p>The proposed method is compared with three new shadow detection methods: GSCA-UNet (Jin et al., <xref ref-type="bibr" rid="B5">2020</xref>), SAS (Fan et al., <xref ref-type="bibr" rid="B2">2020</xref>), DSAN (Li et al., <xref ref-type="bibr" rid="B11">2020</xref>). GSCA-UNet aimed to generate additional shadow images to enhance the generalization ability of the model. SAS was constructed based on two CGANs. The multi-task learning mode was used to perform shadow detection and shadow removal tasks successively. DSAN preserved the semantic information of each convolution layer by changing the network connection in the encoding and decoding process to improve the accuracy of shadow detection based on the traditional U-Net image generation model. They are tested on ISTD dataset.</p>
<p><xref ref-type="fig" rid="F9">Figure 9</xref> shows the shadow detection effect of different methods in four different shadow scenes. By comparing <xref ref-type="fig" rid="F9">Figures 9C</xref>&#x02013;<xref ref-type="fig" rid="F9">F</xref>, it can be found that the entropy-driven adversarial learning model also has a great performance improvement in the source domain. In the complex shadow scenes, such as cross texture, text confusion, and irregular shape, it can also get better detection results and has better robustness. It is worth noting that, compared with the incomplete shadow detection results in DSAN, the necessity of boundary adversarial branch can also be reflected in a lateral way.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Detection results with different methods on ISTD dataset. <bold>(A)</bold> Shadow image, <bold>(B)</bold> GT, <bold>(C)</bold> GSCA-Unet, <bold>(D)</bold> SAS, <bold>(E)</bold> DSAN, and <bold>(F)</bold> Proposed.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0009.tif"/>
</fig>
<p>In order to verify the cross-domain detection performance of the proposed method, a cross-domain comparative experiment is conducted between the proposed method and references (Chen et al., <xref ref-type="bibr" rid="B1">2018</xref>; Shafiq et al., <xref ref-type="bibr" rid="B19">2020c</xref>; Zhang et al., <xref ref-type="bibr" rid="B28">2020</xref>) on the SBU dataset. <xref ref-type="fig" rid="F10">Figure 10</xref> shows the detection performance of different methods in four different shadow scenes. In the first line, due to the combination of multi-layer domain adaptive feature extraction process, the proposed method will not mistake the black shorts of athletes as shadows. Similarly, compared with the other two methods, the proposed method also has better accuracy in the soft shadow images in the third and fourth rows. In this paper, boundary adversarial branch and entropy adversarial branch are combined, so that the shadow detection results have good boundary structure, and the shadow boundary is smooth and natural.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Detection results with different methods on SBU dataset. <bold>(A)</bold> Shadow image, <bold>(B)</bold> GT, <bold>(C)</bold> GSCA-Unet, <bold>(D)</bold> SAS, <bold>(E)</bold> DSAN, and <bold>(F)</bold> proposed.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0010.tif"/>
</fig>
<p>We also conduct experiments on some remote sensing images, the results are shown in <xref ref-type="fig" rid="F11">Figures 11</xref>, <xref ref-type="fig" rid="F12">12</xref>.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Detection results with different methods on remote sensing images (Toronto). <bold>(A)</bold> Shadow image, <bold>(B)</bold> GT, <bold>(C)</bold> GSCA-Unet, <bold>(D)</bold> SAS, <bold>(E)</bold> DSAN, and <bold>(F)</bold> proposed.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0011.tif"/>
</fig>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>Detection results with different methods on remote sensing images (Vienna). <bold>(A)</bold> Shadow image, <bold>(B)</bold> GT, <bold>(C)</bold> GSCA-Unet, <bold>(D)</bold> SAS, <bold>(E)</bold> DSAN, and <bold>(F)</bold> proposed.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-16-1059497-g0012.tif"/>
</fig>
<p>The obtained results by the GSCA-UNet method look reasonable compared to the GT maps. However, as shown in <xref ref-type="fig" rid="F11">Figures 11C</xref>, <xref ref-type="fig" rid="F12">12C</xref>, quite a few positive (shadow) and negative (non-shadow) samples are needed to be labeled in advance to yield the final detection results for each input image. Another obvious weakness of the GSCA-UNet method can be found from <xref ref-type="fig" rid="F11">Figures 11C</xref>, <xref ref-type="fig" rid="F12">12C</xref>, where some small shadows are missed, which is caused by the fact that it is arduous to mark small shadows. The SAS method maintains well the integrity of the detected shadow regions, as illustrated in <xref ref-type="fig" rid="F11">Figures 11E</xref>, <xref ref-type="fig" rid="F12">12E</xref>. Unfortunately, the SAS method fails to handle the nonuniform shadows in the Toronto image and the dark water body in the Austin image. The DSAN method produces satisfactory detection results. However, it can be seen in <xref ref-type="fig" rid="F11">Figure 11</xref> that it is insufficient to precisely locate the shadow boundaries (Shafiq et al., <xref ref-type="bibr" rid="B16">2022</xref>), and part of the shadows was missed due to lack of consideration for global spatial contextual information. From the aforementioned comparisons, we can conclude that the balance between automaticity and accuracy for the proposed method is better than that of other advanced methods.</p>
<p>We select three evaluation indexes Root Mean Squared Error (RMSE), Balance Error Rate (BER), and Per pixel Error Rate (PER) to evaluate the proposed method.</p>
<disp-formula id="E20"><label>(18)</label><mml:math id="M27"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>B</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>TP, TN, FP</italic>, and <italic>FN</italic> are the correctly detected shadow pixels, correctly detected non-shadow pixels, wrongly detected shadow pixels and wrongly detected non-shadow pixels, respectively.</p>
<p><xref ref-type="table" rid="T1">Table 1</xref> shows the same domain detection analysis on the ISTD dataset. <xref ref-type="table" rid="T2">Table 2</xref> shows the cross-domain detection analysis on the SBU dataset. <xref ref-type="table" rid="T3">Table 3</xref> shows the detection results of remote sensing images. It can be seen that the proposed method is better than other methods.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The average detection results on the ISTD dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>RMSE</bold></th>
<th valign="top" align="center"><bold>BER</bold></th>
<th valign="top" align="center"><bold>Shadow (PER)</bold></th>
<th valign="top" align="center"><bold>Non-shadow (PER)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GSCA-UNet</td>
<td valign="top" align="center">14.7</td>
<td valign="top" align="center">8.5</td>
<td valign="top" align="center">7.9</td>
<td valign="top" align="center">7.2</td>
</tr>
<tr>
<td valign="top" align="left">SAS</td>
<td valign="top" align="center">13.8</td>
<td valign="top" align="center">7.2</td>
<td valign="top" align="center">7.2</td>
<td valign="top" align="center">6.4</td>
</tr>
<tr>
<td valign="top" align="left">DSAN</td>
<td valign="top" align="center">12.3</td>
<td valign="top" align="center">6.9</td>
<td valign="top" align="center">7.1</td>
<td valign="top" align="center">5.8</td>
</tr>
<tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center">9.6</td>
<td valign="top" align="center">6.6</td>
<td valign="top" align="center">6.9</td>
<td valign="top" align="center">5.1</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The average detection results on the SBU dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>RMSE</bold></th>
<th valign="top" align="center"><bold>BER</bold></th>
<th valign="top" align="center"><bold>Shadow (PER)</bold></th>
<th valign="top" align="center"><bold>Non-shadow (PER)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GSCA-UNet</td>
<td valign="top" align="center">14.5</td>
<td valign="top" align="center">10.2</td>
<td valign="top" align="center">10.2</td>
<td valign="top" align="center">11.1</td>
</tr>
<tr>
<td valign="top" align="left">SAS</td>
<td valign="top" align="center">12.2</td>
<td valign="top" align="center">9.8</td>
<td valign="top" align="center">9.7</td>
<td valign="top" align="center">9.3</td>
</tr>
<tr>
<td valign="top" align="left">DSAN</td>
<td valign="top" align="center">11.9</td>
<td valign="top" align="center">8.7</td>
<td valign="top" align="center">8.8</td>
<td valign="top" align="center">8.6</td>
</tr>
<tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center">10.3</td>
<td valign="top" align="center">7.6</td>
<td valign="top" align="center">7.1</td>
<td valign="top" align="center">7.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>The average detection results on the remote sensing dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>RMSE</bold></th>
<th valign="top" align="center"><bold>BER</bold></th>
<th valign="top" align="center"><bold>Shadow (PER)</bold></th>
<th valign="top" align="center"><bold>Non-shadow (PER)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GSCA-UNet</td>
<td valign="top" align="center">13.7</td>
<td valign="top" align="center">10.7</td>
<td valign="top" align="center">11.7</td>
<td valign="top" align="center">12.3</td>
</tr>
<tr>
<td valign="top" align="left">SAS</td>
<td valign="top" align="center">11.9</td>
<td valign="top" align="center">9.2</td>
<td valign="top" align="center">10.8</td>
<td valign="top" align="center">9.7</td>
</tr>
<tr>
<td valign="top" align="left">DSAN</td>
<td valign="top" align="center">10.2</td>
<td valign="top" align="center">8.4</td>
<td valign="top" align="center">8.9</td>
<td valign="top" align="center">8.8</td>
</tr>
<tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center">9.8</td>
<td valign="top" align="center">8.1</td>
<td valign="top" align="center">7.4</td>
<td valign="top" align="center">8.2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="T4">Table 4</xref> is the computation time comparison, which also shows the better effect with the proposed method.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>The average detection time with different methods.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Time</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GSCA-UNet</td>
<td valign="top" align="center">6.7</td>
</tr>
<tr>
<td valign="top" align="left">SAS</td>
<td valign="top" align="center">3.8</td>
</tr>
<tr>
<td valign="top" align="left">DSAN</td>
<td valign="top" align="center">2.3</td>
</tr>
<tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center">1.2</td>
</tr>
</tbody>
</table>
</table-wrap></sec>
<sec sec-type="conclusions" id="s4">
<title>4. Conclusion</title>
<p>Because existing shadow detection methods only have good performance on source data sets, a novel shadow detection method is proposed in this paper. The method aims to obtain the same accurate detection results on the target data set as on the source data set. Firstly, combined with the hierarchical domain adaptive feature extraction method, a domain classifier is added after each convolution layer in feature extraction process to reduce the data differences between domains, and thus it improves the robustness of the model. Secondly, boundary adversarial branch and entropy adversarial branch are used to obtain smooth boundary detection results. Compared with the most advanced shadow detection methods, the proposed method not only has a great improvement in the source domain, but also has advantages in the target domain. In the future research work, we will consider to further improve the performance of the model from the perspective of generating diverse shadow image data.</p></sec>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.</p></sec>
<sec id="s6">
<title>Author contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p></sec>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>Key Scientific Research Project of Higher Education in Henan Province, Teaching Science and Technology [2021] No. 383. Project Number: 22B510017. Project name: Indoor positioning research of quadrotor UAV based on MULTI-sensor fusion SLAM algorithm.</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Sakaridis</surname> <given-names>C.</given-names></name> <name><surname>Dai</surname> <given-names>D.</given-names></name> <name><surname>Van Gool</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>Domain adaptive faster R-CNN for object detection in the wild,</article-title> <source>2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>3339</fpage>&#x02013;<lpage>3348</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00352</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>X.</given-names></name> <name><surname>Wu</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Yan</surname> <given-names>Q.</given-names></name> <name><surname>Fu</surname> <given-names>G.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Shading-aware shadow detection and removal from a single image</article-title>. <source>Vis. Comput</source>. <volume>36</volume>, <fpage>2175</fpage>&#x02013;<lpage>2188</lpage>. <pub-id pub-id-type="doi">10.1007/s00371-020-01916-3</pub-id><pub-id pub-id-type="pmid">27046489</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hatamizadeh</surname> <given-names>A.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name> <name><surname>Nath</surname> <given-names>V.</given-names></name> <name><surname>Yang</surname> <given-names>D.</given-names></name> <name><surname>Myronenko</surname> <given-names>A.</given-names></name> <name><surname>Landman</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Unetr: Transformers for 3D medical image segmentation,</article-title> in <source>Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source> (<publisher-loc>Waikoloa, HI</publisher-loc>), <fpage>1748</fpage>&#x02013;<lpage>1758</lpage>. <pub-id pub-id-type="doi">10.1109/WACV51458.2022.00181</pub-id><pub-id pub-id-type="pmid">36223330</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>S.</given-names></name> <name><surname>Dai</surname> <given-names>P.</given-names></name> <name><surname>Lu</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Simultaneous cloud detection and removal from bitemporal remote sensing images using cascade convolutional neural networks</article-title>. <source>IEEE Trans. Geosci. Remote Sens</source>. <volume>59</volume>, <fpage>732</fpage>&#x02013;<lpage>748</lpage>. <pub-id pub-id-type="doi">10.1109/TGRS.2020.2994349</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>W.</given-names></name> <name><surname>Hu</surname> <given-names>Z.</given-names></name> <name><surname>Jia</surname> <given-names>H.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Shao</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>GSCA-UNet: towards automatic shadow detection in urban aerial imagery with global-spatial-context attention module</article-title>. <source>Remote Sens</source>. 12, 2864. <pub-id pub-id-type="doi">10.3390/rs12172864</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kamnitsas</surname> <given-names>K.</given-names></name> <name><surname>Baumgartner</surname> <given-names>C.</given-names></name> <name><surname>Ledig</surname> <given-names>C.</given-names></name> <name><surname>Newcombe</surname> <given-names>V.</given-names></name> <name><surname>Simpson</surname> <given-names>J.</given-names></name> <name><surname>Kane</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Unsupervised domain adaptation in brain lesion segmentation with adversarial networks,</article-title> in <source>International Conference on Information Processing in Medical Imaging. IPMI 2017. LNCS, Vol. 10265</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>597</fpage>&#x02013;<lpage>609</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-59050-9_47</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname> <given-names>S. H.</given-names></name> <name><surname>Bennamoun</surname> <given-names>M.</given-names></name> <name><surname>Sohel</surname> <given-names>F.</given-names></name> <name><surname>Togneri</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Automatic shadow detection and removal from a single image</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>38</volume>, <fpage>431</fpage>&#x02013;<lpage>446</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2015.2462355</pub-id><pub-id pub-id-type="pmid">27046489</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lata</surname> <given-names>K.</given-names></name> <name><surname>Dave</surname> <given-names>M.</given-names></name> <name><surname>Nishanth</surname> <given-names>K. N.</given-names></name></person-group> (<year>2019</year>). <article-title>Image-to-image translation using generative adversarial network,</article-title> <source>2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA)</source> (<publisher-loc>Coimbatore</publisher-loc>), <fpage>186</fpage>&#x02013;<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1109/ICECA.2019.8822195</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>H.</given-names></name> <name><surname>Vicente</surname> <given-names>T. F. Y.</given-names></name> <name><surname>Nguyen</surname> <given-names>V.</given-names></name> <name><surname>Hoai</surname> <given-names>M.</given-names></name> <name><surname>Samaras</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>A&#x0002B;D Net: Training a shadow detector with adversarial shadow attenuation,</article-title> in <source>European Conference on Computer Vision-ECCV 2018. Lecture Notes in Computer Science, Vol. 11206</source>, eds <person-group person-group-type="editor"><name><surname>Ferrari</surname> <given-names>V.</given-names></name> <name><surname>Hebert</surname> <given-names>M.</given-names></name> <name><surname>Sminchisescu</surname> <given-names>C.</given-names></name> <name><surname>Weiss</surname> <given-names>Y.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>680</fpage>&#x02013;<lpage>696</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-01216-8_41</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>H. K.</given-names></name> <name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Kim</surname> <given-names>S. B.</given-names></name></person-group> (<year>2022</year>). <article-title>Boundary-focused generative adversarial networks for imbalanced and multimodal time series</article-title>. <source>IEEE Trans. Knowl. Data Eng.</source> <volume>34</volume>, <fpage>4102</fpage>&#x02013;<lpage>4118</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2022.3182327</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>D.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Tang</surname> <given-names>X.-S.</given-names></name> <name><surname>Kong</surname> <given-names>W.</given-names></name> <name><surname>Shi</surname> <given-names>G.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Double-stream atrous network for shadow detection</article-title>. <source>Neurocomputing</source> <volume>417</volume>, <fpage>167</fpage>&#x02013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2020.07.038</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Ng</surname> <given-names>M.</given-names></name> <name><surname>Huang</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Loss odyssey in medical image segmentation</article-title>. <source>Med. Image Anal</source>. <volume>71</volume>:<fpage>102035</fpage>. <pub-id pub-id-type="doi">10.1016/j.media.2021.102035</pub-id><pub-id pub-id-type="pmid">33813286</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mohajerani</surname> <given-names>S.</given-names></name> <name><surname>Saeedi</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>CPNet: a context preserver convolutional neural network for detecting shadows in single RGB images,</article-title> in <source>2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)</source> (<publisher-loc>Vancouver, BC</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/MMSP.2018.8547080</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>V.</given-names></name> <name><surname>Vicente</surname> <given-names>T. F. Y</given-names></name> <name><surname>Zhao</surname> <given-names>M.</given-names></name> <name><surname>Hoai</surname> <given-names>M.</given-names></name> <name><surname>Samaras</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Shadow detection with conditional generative adversarial networks,</article-title> in <source>2017 IEEE International Conference on Computer Vision (ICCV)</source> (<publisher-loc>Venice</publisher-loc>), <fpage>4520</fpage>&#x02013;<lpage>4528</lpage>. <pub-id pub-id-type="doi">10.1109/ICCV.2017.483</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafiq</surname> <given-names>M.</given-names></name> <name><surname>Gu</surname> <given-names>Z.</given-names></name></person-group> (<year>2022</year>). <article-title>Deep residual learning for image recognition: a survey</article-title>. <source>Appl. Sci</source>. 12, 8972. <pub-id pub-id-type="doi">10.3390/app12188972</pub-id><pub-id pub-id-type="pmid">34181613</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafiq</surname> <given-names>M.</given-names></name> <name><surname>Gu</surname> <given-names>Z.</given-names></name> <name><surname>Cheikhrouhou</surname> <given-names>O.</given-names></name> <name><surname>Alhakami</surname> <given-names>W.</given-names></name> <name><surname>Hamam</surname> <given-names>H.</given-names></name></person-group> (<year>2022</year>). <article-title>The rise of &#x0201C;Internet of Things&#x0201D;: review and open research issues related to detection and prevention of IoT-based security attacks</article-title>. <source>Wirel. Commun. Mob. Comput</source>. 2022, 8669348. <pub-id pub-id-type="doi">10.1155/2022/8669348</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafiq</surname> <given-names>M.</given-names></name> <name><surname>Tian</surname> <given-names>Z.</given-names></name> <name><surname>Bashir</surname> <given-names>A. K.</given-names></name> <name><surname>Du</surname> <given-names>X.</given-names></name> <name><surname>Guizani</surname> <given-names>M.</given-names></name></person-group> (<year>2020b</year>). <article-title>IoT malicious traffic identification using wrapper-based feature selection mechanisms</article-title>. <source>Comput. Secur</source>. <volume>94</volume>:<fpage>101863</fpage>. <pub-id pub-id-type="doi">10.1016/j.cose.2020.101863</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafiq</surname> <given-names>M.</given-names></name> <name><surname>Tian</surname> <given-names>Z.</given-names></name> <name><surname>Bashir</surname> <given-names>A. K.</given-names></name> <name><surname>Jolfaei</surname> <given-names>A.</given-names></name></person-group> (<year>2020a</year>). <article-title>Data mining and machine learning methods for sustainable smart cities traffic classification: a survey</article-title>. <source>Sustain. Cities Soc</source>. 60, 102177. <pub-id pub-id-type="doi">10.1016/j.scs.2020.102177</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafiq</surname> <given-names>M.</given-names></name> <name><surname>Tian</surname> <given-names>Z.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Du</surname> <given-names>X.</given-names></name> <name><surname>Guizani</surname> <given-names>M.</given-names></name></person-group> (<year>2020c</year>). <article-title>Selection of effective machine learning algorithm and Bot-IoT attacks traffic identification for internet of things in smart city</article-title>. <source>Fut. Gen. Comput. Syst</source>. <volume>107</volume>, <fpage>433</fpage>&#x02013;<lpage>442</lpage>. <pub-id pub-id-type="doi">10.1016/j.future.2020.02.017</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shoulin</surname> <given-names>Y.</given-names></name> <name><surname>Jie</surname> <given-names>L.</given-names></name> <name><surname>Hang</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>A self-supervised learning method for shadow detection in remote sensing imagery</article-title>. <source>3D Res</source>. 9, 51. <pub-id pub-id-type="doi">10.1007/s13319-018-0204-9</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>G.</given-names></name> <name><surname>Huang</surname> <given-names>H.</given-names></name> <name><surname>Weng</surname> <given-names>Q.</given-names></name> <name><surname>Zhang</surname> <given-names>A.</given-names></name> <name><surname>Jia</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Combinational shadow index for building shadow extraction in urban areas from Sentinel-2A MSI imagery</article-title>. <source>Int. J. Appl. Earth Observ. Geoinform</source>. <volume>78</volume>, <fpage>53</fpage>&#x02013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1016/j.jag.2019.01.012</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>Q.</given-names></name> <name><surname>Zhou</surname> <given-names>X.</given-names></name> <name><surname>Wei</surname> <given-names>J.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>A priori surface reflectance-based cloud shadow detection algorithm for Landsat 8 OLI</article-title>. <source>IEEE Geosci. Remote Sens. Lett</source>. <volume>15</volume>, <fpage>1610</fpage>&#x02013;<lpage>1614</lpage>. <pub-id pub-id-type="doi">10.1109/LGRS.2018.2847297</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vicente</surname> <given-names>T. F. Y.</given-names></name> <name><surname>Hoai</surname> <given-names>M.</given-names></name> <name><surname>Samaras</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Leave-one-out kernel optimization for shadow detection and removal</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>40</volume>, <fpage>682</fpage>&#x02013;<lpage>695</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2017.2691703</pub-id><pub-id pub-id-type="pmid">28410096</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vu</surname> <given-names>T.</given-names></name> <name><surname>Jain</surname> <given-names>H.</given-names></name> <name><surname>Bucher</surname> <given-names>M.</given-names></name> <name><surname>Cord</surname> <given-names>M.</given-names></name> <name><surname>Perez</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>ADVENT: adversarial entropy minimization for domain adaptation in semantic segmentation,</article-title> <source>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Long Beach, CA</publisher-loc>), <fpage>2512</fpage>&#x02013;<lpage>2521</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2019.00262</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal,</article-title> in <source>2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>1788</fpage>&#x02013;<lpage>1797</lpage>, <pub-id pub-id-type="doi">10.1109/CVPR.2018.00192</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yago Vicente</surname> <given-names>T. F.</given-names></name> <name><surname>Hou</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>C.-P.</given-names></name> <name><surname>Hoai</surname> <given-names>M.</given-names></name> <name><surname>Samaras</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>Large-scale training of shadow detectors with noisily-annotated shadow examples,</article-title> in <source>European Conference on Computer Vision-ECCV 2016, ECCV 2016. Lecture Notes in Computer Science, Vol. 9910</source>, eds <person-group person-group-type="editor"><name><surname>Leibe</surname> <given-names>B.</given-names></name> <name><surname>Matas</surname> <given-names>J.</given-names></name> <name><surname>Sebe</surname> <given-names>N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>816</fpage>&#x02013;<lpage>832</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-46466-4_49</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>W.</given-names></name> <name><surname>Wan</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Single image shadow detection method based on entropy driven domain adaptive learning</article-title>. <source>J. Comput. Appl</source>. <volume>40</volume>, <fpage>2131</fpage>&#x02013;<lpage>2136</lpage>. <pub-id pub-id-type="doi">10.11772/j.issn.1001-9081.2019122068</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Hao</surname> <given-names>Y.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Luo</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>Multi-scale feature enhanced domain adaptive object detection for power transmission line inspection</article-title>. <source>IEEE Access</source>. <volume>8</volume>, <fpage>182105</fpage>&#x02013;<lpage>182116</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3027850</pub-id></citation></ref>
</ref-list> 
</back>
</article>