<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurorobot.</journal-id>
<journal-title>Frontiers in Neurorobotics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurorobot.</abbrev-journal-title>
<issn pub-type="epub">1662-5218</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnbot.2024.1382406</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Brain-inspired semantic data augmentation for multi-style images</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Wei</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/2578398/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Shang</surname> <given-names>Zhaowei</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<role content-type="https://credit.niso.org/contributor-roles/funding-acquisition/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Chengxing</given-names></name>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff><institution>College of Computer Science, Chongqing University</institution>, <addr-line>Chongqing</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Xianmin Wang, Guangzhou University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jielu Yan, Chongqing University, China</p>
<p>Cheng Ji, Nanjing University of Science and Technology, China</p>
<p>Gang Wang, Zhejiang University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Zhaowei Shang <email>szw&#x00040;cqu.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>03</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>18</volume>
<elocation-id>1382406</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>02</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>04</day>
<month>03</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2024 Wang, Shang and Li.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Wang, Shang and Li</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Data augmentation is an effective technique for automatically expanding training data in deep learning. Brain-inspired methods are approaches that draw inspiration from the functionality and structure of the human brain and apply these mechanisms and principles to artificial intelligence and computer science. When there is a large style difference between training data and testing data, common data augmentation methods cannot effectively enhance the generalization performance of the deep model. To solve this problem, we improve modeling Domain Shifts with Uncertainty (DSU) and propose a new brain-inspired computer vision image data augmentation method which consists of two key components, namely, <italic>using Robust statistics and controlling the Coefficient of variance for DSU</italic> (RCDSU) and <italic>Feature Data Augmentation</italic> (FeatureDA). RCDSU calculates feature statistics (mean and standard deviation) with robust statistics to weaken the influence of outliers, making the statistics close to the real values and improving the robustness of deep learning models. By controlling the coefficient of variance, RCDSU makes the feature statistics shift with semantic preservation and increases shift range. FeatureDA controls the coefficient of variance similarly to generate the augmented features with semantics unchanged and increase the coverage of augmented features. RCDSU and FeatureDA are proposed to perform style transfer and content transfer in the feature space, and improve the generalization ability of the model at the style and content level respectively. On Photo, Art Painting, Cartoon, and Sketch (PACS) multi-style classification task, RCDSU plus FeatureDA achieves competitive accuracy. After adding Gaussian noise to PACS dataset, RCDSU plus FeatureDA shows strong robustness against outliers. FeatureDA achieves excellent results on CIFAR-100 image classification task. RCDSU plus FeatureDA can be applied as a novel brain-inspired semantic data augmentation method with implicit robot automation which is suitable for datasets with large style differences between training and testing data.</p></abstract>
<kwd-group>
<kwd>data augmentation</kwd>
<kwd>deep learning</kwd>
<kwd>robust statistics</kwd>
<kwd>style transfer</kwd>
<kwd>uncertainty modeling</kwd>
<kwd>brain-inspired computer vision</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="7"/>
<equation-count count="22"/>
<ref-count count="62"/>
<page-count count="15"/>
<word-count count="10033"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>Data augmentation is a strategy to increase the quantity and diversity of limited data, aiming to extract more useful information from limited data and generate value equivalent to more data. It is a technique with implicit robot automation to automatically expand training data. Aiming at the problem of model overfitting in training deep networks (Krizhevsky, <xref ref-type="bibr" rid="B38">2009</xref>; Simonyan and Zisserman, <xref ref-type="bibr" rid="B51">2014</xref>; He et al., <xref ref-type="bibr" rid="B29">2016</xref>; Krizhevsky et al., <xref ref-type="bibr" rid="B37">2017</xref>; Huang et al., <xref ref-type="bibr" rid="B30">2019</xref>), data augmentation methods attempt to solve the problem from the root cause, namely, insufficient training samples (Wang et al., <xref ref-type="bibr" rid="B53">2019</xref>; Liu et al., <xref ref-type="bibr" rid="B41">2023</xref>). Data augmentation is widely used in text classification (Wei and Zou, <xref ref-type="bibr" rid="B55">2019</xref>; Fang et al., <xref ref-type="bibr" rid="B24">2022</xref>; Wu et al., <xref ref-type="bibr" rid="B56">2022</xref>; Dai H. et al., <xref ref-type="bibr" rid="B15">2023</xref>), image denoising (Eckert et al., <xref ref-type="bibr" rid="B23">2020</xref>; Liu et al., <xref ref-type="bibr" rid="B40">2020</xref>; Luo et al., <xref ref-type="bibr" rid="B42">2021</xref>), video recognition (Cauli and Reforgiato Recupero, <xref ref-type="bibr" rid="B6">2022</xref>; Gorpincenko and Mackiewicz, <xref ref-type="bibr" rid="B28">2022</xref>; Kim et al., <xref ref-type="bibr" rid="B35">2022</xref>), etc. In image recognition tasks, there are content-preserving transformations on input samples, such as rotation, horizontal mirroring, cropping and color jittering. Although these augmentation methods are effective, they cannot perform semantic transformations such as changing the background of an object or changing visual angle. The semantics-preserving transformations which preserve class identity can make data augmentation more powerful (Antoniou et al., <xref ref-type="bibr" rid="B2">2017</xref>; Ratner et al., <xref ref-type="bibr" rid="B48">2017</xref>; Bowles et al., <xref ref-type="bibr" rid="B5">2018</xref>). For example, by training a generative adversarial network (GAN) for each class in training set, an infinite number of samples can be sampled from the generator. However, this process is computationally expensive, since both training generative models and inferring them to obtain augmented samples are difficult tasks. In addition, the training process may also be lengthened due to the additional augmented data. Brain-inspired methods are approaches that draw inspiration from the functionality and structure of the human brain and apply these mechanisms and principles to artificial intelligence and computer science (Zendrikov et al., <xref ref-type="bibr" rid="B59">2023</xref>).</p>
<p>When encountering datasets with large style differences between training data and testing data, that is, multi-style datasets, common data augmentation methods cannot effectively enhance the generalization performance of the deep model (Li et al., <xref ref-type="bibr" rid="B39">2022</xref>). Therefore, it is very important to study data augmentation methods for multi-style datasets. In this paper, we propose a brain-inspired computer vision image data augmentation method for multi-style datasets in the feature space with semantic preservation which is highly efficient.</p>
<p>Our approach is motivated from the following three aspects: (1) Existing data augmentation methods such as implicit semantic data augmentation (ISDA) (Wang et al., <xref ref-type="bibr" rid="B54">2021</xref>) and so on mostly augment data by changing the image content without changing the image style. They can work well in situations where there are only content differences but not style differences between training data and testing data, such as CIFAR-10 and CIFAR-100 datasets. However, when there are large style differences between the training data and testing data, such as Photo, Art Painting, Cartoon, and Sketch (PACS) dataset, the common data augmentation methods cannot work well. Modeling Domain Shifts with Uncertainty (DSU) (Li et al., <xref ref-type="bibr" rid="B39">2022</xref>) changes the image style, but it does not change the image content. From the perspective of brain inspiration, we can explore and utilize the structure and functionality of the human brain to improve the performance of data augmentation. For example, when we observe an image, we will pay attention to its content and style, such as a dog with painting style, a cat with sketch style, a car with photo style and so on. Therefore, to improve the diversity of data augmentation results, in the actual application process, we may need to perform both style transfer and content transfer when generating augmented images from original images. Previous studies did not combine style transfer with content transfer. In this paper, we combine content transfer and style transfer by performing style transfer on the feature map, and then performing content transfer on the feature vector learned by the feature extraction network. (2) Real data is often mixed with noise. When the training data is mixed with noise, the model often faces the problem of performance degradation, mainly because the noise will bring outliers, which deviate from the overall distribution. Outliers will interfere with the model, making the model unable to extract key features of the sample, or making the model learn wrong features. In this paper, we calculate feature statistics (mean and standard deviation) with robust statistics to weaken the influence of outliers, making the statistics close to the real values and improving the robustness of deep learning models. (3) From the perspective of brain-inspired computer vision, the distribution of sample data can be regarded as a &#x0201C;spherical space,&#x0201D; which can be regarded as a circle in two-dimensional space and a sphere in three-dimensional space (Jeon et al., <xref ref-type="bibr" rid="B33">2022</xref>). For the convenience of expression, we use &#x0201C;sphere&#x0201D; to refer to the &#x0201C;spherical space&#x0201D; of any dimension. The data points are distributed layer by layer from the center of the sphere outward. Due to different positions, the data augmentation strategies of the sample points at the center of the sphere and the data augmentation strategies of the sample points at the outermost layer of the sphere should be different. However, the existing augmentation method does not consider the spherical distribution characteristics of the sample data, and treats all data equally. In this paper, from the perspective of brain-inspired computer vision, the data augmentation strategy of each point is determined according to the distance between each point and the center point.</p>
<p>According to DSU, it calculates the variance of all feature statistics in a mini-batch, and then uses the variance to generate random shifts to add to the original feature statistics. All feature statistics in a mini-batch share the same variance. However, we think that for all the feature statistics in a mini-batch, when considering their data distribution characteristics, the added shifts of the feature statistics distributed in the center of the group and at the edge of the group should be different. In order to keep the semantics unchanged, the shifts added to the feature statistics distributed at the edge of the group should be slightly smaller and in order to increase the coverage after shifting, the shifts added to the feature statistics near the center of the group should be slightly larger. DSU calculates the mean and variance by channel for each feature map, that is, calculates the mean and variance for all pixel values of each channel. However, this direct calculation of the mean and variance does not take into account the impact of outliers. The appearance of outliers will lead to great deviation in statistical results. In order to reduce the influence of outliers, this paper adopts the method of robust statistics to improve the stability of the model. In this paper, we improve DSU, and obtain the improved brain-inspired computer vision method using Robust statistics and controlling the Coefficient of variance for DSU (RCDSU), which calculates feature mean and standard deviation with robust statistics and controls the coefficient of variance to preserve semantics and increase shift range. According to ISDA, it enhances the generalization ability of the model through implicit semantic data augmentation. It works by computing the covariance of all features for each class, and then for each feature, using the covariance of corresponding class to generate a random shift to add to the original feature. This method needs to use the online algorithm to iteratively update the covariance matrix of each class, which is computationally intensive and the obtained covariance matrix is an estimated value rather than an accurate value most of the time. Therefore, this paper proposes a new augmentation method Feature Data Augmentation (FeatureDA), which calculates the variance of all features in a mini-batch, and then uses the variance to generate a random shift to add to the original feature. In order to keep the semantics unchanged, the shifts added to the features distributed at the edge of the group should be slightly smaller and in order to increase the coverage after shifting, the shifts added to the features near the center of the group should be slightly larger, similar to RCDSU. Our proposed method is simple and effective, and enhances the generalization ability and the stability against outliers of the model. Our brain-inspired computer vision method can be integrated into existing networks without introducing redundant model parameters or loss constraints. Experiments have proved that RCDSU and FeatureDA can improve the generalization ability of the model at the style level and at the content level respectively.</p>
<p>In summary, there are three major contributions in our work:</p>
<list list-type="order">
<list-item><p>In RCDSU, we calculate feature statistics (mean and standard deviation) with robust statistics to weaken the influence of outliers, making the statistics close to the real values and improving the robustness of deep learning models.</p></list-item>
<list-item><p>In RCDSU and FeatureDA, we control the coefficient of variance to preserve semantics and increase shift range from the perspective of brain-inspired computer vision.</p></list-item>
<list-item><p>We combine style transfer and content transfer (RCDSU &#x0002B; FeatureDA) by performing style transfer on the feature map, and then performing content transfer on the feature vector learned by the feature extraction network. We perform both style transfer and content transfer with implicit robot automation when generating augmented images from original images.</p></list-item>
</list></sec>
<sec id="s2">
<title>2 Related work</title>
<sec>
<title>2.1 Data augmentation</title>
<p>Data augmentation is a method that uses a small amount of data to generate more similar synthetic data by prior knowledge to expand the training dataset. It is an effective way to improve generalization ability and alleviate model overfitting. In image recognition tasks, to enhance the geometric invariance of convolutional networks, augmentation methods such as rotation, mirroring and random flipping are often used (Simonyan and Zisserman, <xref ref-type="bibr" rid="B51">2014</xref>; Srivastava et al., <xref ref-type="bibr" rid="B52">2015</xref>; He et al., <xref ref-type="bibr" rid="B29">2016</xref>; Huang et al., <xref ref-type="bibr" rid="B30">2019</xref>). Discarding some information in training images is also an effective way to enhance training data. Random erasing (Zhong et al., <xref ref-type="bibr" rid="B60">2020</xref>) and cutout (DeVries and Taylor, <xref ref-type="bibr" rid="B16">2017</xref>) crop out random rectangular regions of the input image to execute augmentation. Furthermore, there are some studies on automatic data augmentation techniques. AutoAugment (Cubuk et al., <xref ref-type="bibr" rid="B13">2018</xref>) uses reinforcement learning to search for a better augmentation policy among a large number of candidates. Besides, recent studies have shown that the transformations which preserve the class identity can also be seen as effective semantic data augmentation techniques (Jaderberg et al., <xref ref-type="bibr" rid="B32">2015</xref>; Bousmalis et al., <xref ref-type="bibr" rid="B4">2016</xref>; Antoniou et al., <xref ref-type="bibr" rid="B2">2017</xref>; Ratner et al., <xref ref-type="bibr" rid="B48">2017</xref>).</p></sec>
<sec>
<title>2.2 Uncertainty modeling</title>
<p>Some previous work on deep learning with uncertainty (Gal and Ghahramani, <xref ref-type="bibr" rid="B26">2015</xref>, <xref ref-type="bibr" rid="B27">2016</xref>; Kendall and Gal, <xref ref-type="bibr" rid="B34">2017</xref>) also assumes that the deep features or predictions of each sample follow a Gaussian distribution. In face recognition and person re-identification, probabilistic representations are used to resolve the problems of ambiguous faces (Shi and Jain, <xref ref-type="bibr" rid="B50">2020</xref>; Amaya and Von Arnim, <xref ref-type="bibr" rid="B1">2023</xref>) and data outliers/label noise (Yu et al., <xref ref-type="bibr" rid="B58">2020</xref>). To simultaneously learn feature embeddings and their uncertainty, data uncertainty is applied where the uncertainty is learned via a learnable subnetwork to indicate the quality of the image (Chang et al., <xref ref-type="bibr" rid="B8">2020</xref>; Shi and Jain, <xref ref-type="bibr" rid="B50">2020</xref>).</p></sec>
<sec>
<title>2.3 Robust statistics</title>
<p>The motivation of using robust statistics is to relieve the impact of outliers, which refer to values that are far from the true data. The appearance of outliers will lead to great deviation in statistical results. Robust statistics seek to provide methods that emulate popular statistical methods, but are not excessively affected by outliers or other small departures from model assumptions (Maronna et al., <xref ref-type="bibr" rid="B43">2019</xref>). Robust statistics can be utilized to detect the outliers by searching for the model fitted by the majority of the data (Rousseeuw and Hubert, <xref ref-type="bibr" rid="B49">2011</xref>; Feldotto et al., <xref ref-type="bibr" rid="B25">2022</xref>). There are efficient robust estimators for a series of complex problems, including covariance estimation (Cheng et al., <xref ref-type="bibr" rid="B10">2019</xref>; Diakonikolas et al., <xref ref-type="bibr" rid="B18">2019a</xref>), sparse estimation tasks (Balakrishnan et al., <xref ref-type="bibr" rid="B3">2017</xref>; Diakonikolas et al., <xref ref-type="bibr" rid="B22">2019c</xref>; Cheng et al., <xref ref-type="bibr" rid="B9">2022</xref>), learning graphical models (Cheng et al., <xref ref-type="bibr" rid="B11">2018</xref>; Diakonikolas et al., <xref ref-type="bibr" rid="B19">2021</xref>), linear regression (Klivans et al., <xref ref-type="bibr" rid="B36">2018</xref>; Diakonikolas et al., <xref ref-type="bibr" rid="B20">2019d</xref>; Pensia et al., <xref ref-type="bibr" rid="B45">2020</xref>), stochastic optimization (Diakonikolas et al., <xref ref-type="bibr" rid="B21">2019b</xref>; DeWolf et al., <xref ref-type="bibr" rid="B17">2020</xref>; Prasad et al., <xref ref-type="bibr" rid="B46">2020</xref>), etc. In RCDSU, we use the property that the median is highly resistant to outliers to enhance the robustness of the model.</p></sec>
<sec>
<title>2.4 Brain-inspired computer vision</title>
<p>Brain-inspired methods are approaches that draw inspiration from the functionality and structure of the human brain and apply these mechanisms and principles to artificial intelligence and computer science (Zendrikov et al., <xref ref-type="bibr" rid="B59">2023</xref>). Data augmentation is an important task in the field of computer vision, aiming to generate more similar synthetic data by prior knowledge to expand the training dataset. When applying brain-inspired methods to data augmentation tasks, we can explore and utilize the structure and functionality of the human brain from multiple perspectives to improve the performance of data augmentation. Designing neural network architectures inspired by brain is an important aspect. We can gain valuable insights from the visual processing mechanisms in the human brain and build neural network models with similar structures and connectivity patterns to mimic the processing and transmission of visual information (Qiu et al., <xref ref-type="bibr" rid="B47">2023</xref>). We can design hierarchical neural networks where each module corresponds to different visual processing phases in the human brain (Cheng et al., <xref ref-type="bibr" rid="B12">2023</xref>). For example, when we observe an image, we will pay attention to its content and style, such as a dog with painting style, a cat with sketch style, a car with photo style and so on. Therefore, we can perform style transfer and content transfer sequentially in data augmentation tasks. From the perspective of brain inspiration, the distribution of sample data can be regarded as a &#x0201C;spherical space&#x0201D;, which can be regarded as a circle in two-dimensional space and a sphere in three-dimensional space (Jeon et al., <xref ref-type="bibr" rid="B33">2022</xref>). Therefore, the data augmentation strategy of each point can be determined by its position in the data distribution. Brain-inspired methods can draw inspiration from the collaborative work of multiple brain regions in the human brain, combining and analyzing data from different vision aspects (such as style and content) to improve the diversity and performance of data augmentation.</p></sec></sec>
<sec id="s3">
<title>3 Method</title>
<sec>
<title>3.1 Preliminaries</title>
<p>In the field of data augmentation, we have the following general formula:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>f</italic> denotes any transformation in the image space or in the feature space, <italic>x</italic> denotes the original image in the image space or the original feature in the feature space, and <inline-formula><mml:math id="M2"><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> denotes the augmented image or feature in the corresponding space.</p>
<p>In this paper, <italic>f</italic> represents DSU, RCDSU or FeatureDA transformation. In DSU and RCDSU transformations, <italic>x</italic> denotes the encoded features in the intermediate layers of the network, that is, the feature maps. In the FeatureDA transformation, <italic>x</italic> denotes the deep features learned by a special network, that is, the feature vectors.</p>
<p>DSU calculates the feature mean and standard deviation by channel for each feature map, that is, calculates the feature mean and standard deviation for all pixel values of each channel. Then it calculates the variance of all feature statistics in a mini-batch, and uses the variance to generate random shifts to add to the original feature statistics. All feature statistics in a mini-batch share the same variance. More details about DSU can refer to Li et al. (<xref ref-type="bibr" rid="B39">2022</xref>).</p></sec>
<sec>
<title>3.2 Robust statistics for DSU</title>
<p>There are outliers in some channels of a feature map. We select three channels which have outliers from a feature map, and then make box plots for all pixel values of each channel. The results are shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. For outliers, if not dealt with, they will affect the final mean and variance.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Box plots of all the pixel values of the selected channels in a feature map. There are outliers in some channels of a feature map. We select three channels which have outliers from a feature map, and then make box plots for all pixel values of each channel. For outliers, if not dealt with, they will affect the final mean and variance.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0001.tif"/>
</fig>
<p>Therefore, when calculating the mean and variance by channel, in order to alleviate the impact of outliers on the mean and variance, a method of robust statistics is used. First arrange all the pixel values of each channel from small to large. Then divide all pixel values in a channel equally into <italic>S</italic> segments, and the number of pixels in each segment is <italic>HW</italic>/<italic>S</italic>. Find the median <italic>m</italic> of all pixel values in each segment. Then calculate the average of all medians in a channel as the mean of all pixel values and calculate the variance of all medians in a channel as the variance of all pixel values.</p>
<p>Given <italic>x</italic> &#x02208; &#x0211D;<sup><italic>B</italic>&#x000D7;<italic>C</italic>&#x000D7;<italic>H</italic>&#x000D7;<italic>W</italic></sup> to be the features which are encoded in the intermediate layers of the network, we divide all pixels in a channel into <italic>S</italic> segments and denote <italic>m</italic> &#x02208; &#x0211D;<sup><italic>B</italic>&#x000D7;<italic>C</italic>&#x000D7;<italic>S</italic></sup> as the median of each segment. The feature mean &#x003BC; &#x02208; &#x0211D;<sup><italic>B</italic>&#x000D7;<italic>C</italic></sup> and standard deviation &#x003C3; &#x02208; &#x0211D;<sup><italic>B</italic>&#x000D7;<italic>C</italic></sup> using robust statistics can be formulated as:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>b</italic> represents the <italic>b</italic>th instance in a mini-batch, <italic>c</italic> represents the <italic>c</italic>th channel in a feature map, <italic>s</italic> represents the <italic>s</italic>th segment in a channel.</p>
<p>The illustration of robust statistics is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. We calculate the average of all medians in a channel as the mean of all pixel values and calculate the variance of all medians in a channel as the variance of all pixel values.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Illustration of robust statistics. We calculate the average of all medians in a channel as the mean of all pixel values and calculate the variance of all medians in a channel as the variance of all pixel values.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0002.tif"/>
</fig>
<p>Following DSU, we can calculate the variance of the feature statistics as follows:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> represent the shift range of the feature mean &#x003BC; and feature standard deviation &#x003C3;, respectively.</p></sec>
<sec>
<title>3.3 Control the coefficient of variance for DSU</title>
<p>In the above, we calculate feature statistics with robust statistics for DSU to weaken the influence of outliers. Next we will control the coefficient of variance for DSU to make the feature statistics shift with semantic preservation and increase shift range.</p>
<p>According to the sphere distribution of the feature statistics, the closer to the outer layer of the sphere distribution the data point is, we hope that its shift will be smaller to avoid the semantic change of the feature statistic caused by the shift out of the boundary. And the closer to the center of the sphere distribution the data point is, we hope that its shift can be slightly larger to improve the coverage of the augmented feature statistics, increase the diversity of the augmented feature statistics and further enhance the generalization ability of the model. In order to achieve this goal, the size of the shift is controlled by multiplying a coefficient in front of the variance. We assign the coefficient of variance to each feature statistic by its Euclidean distance from the center vector. The larger the distance from the center vector is, the smaller the coefficient of variance corresponding to the feature statistic is, that is, the smaller the shift of the data point is. The smaller the distance from the center vector is, the larger the coefficient of variance corresponding to the feature statistic is, that is, the larger the shift of the data point is.</p>
<p>Given <inline-formula><mml:math id="M9"><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to be the feature mean and standard deviation of the <italic>i</italic>th instance in a mini-batch, respectively, we denote <inline-formula><mml:math id="M11"><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M12"><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> as the center of the feature statistics, which can be formulated as:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(7)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We denote <italic>d</italic><sub>&#x003BC;<sub><italic>i</italic></sub></sub> as the Euclidean distance between &#x003BC;<sub><italic>i</italic></sub> and <italic>ct</italic><sub>&#x003BC;</sub>, and denote <italic>d</italic><sub>&#x003C3;<sub><italic>i</italic></sub></sub> as the Euclidean distance between &#x003C3;<sub><italic>i</italic></sub> and <italic>ct</italic><sub>&#x003C3;</sub>, which can be formulated as:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>||</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo>||</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E9"><label>(9)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>||</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo>||</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Then sort all the distances of <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M18"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> in descending order respectively, and we can get the sorted distance lists, <italic>sorted</italic>_<italic>distance</italic><sub>&#x003BC;</sub> and <italic>sorted</italic>_<italic>distance</italic><sub>&#x003C3;</sub>.</p>
<p>We utilize <italic>n</italic><sub>&#x003BC;<sub><italic>i</italic></sub></sub> to indicate the corresponding position index of <italic>d</italic><sub>&#x003BC;<sub><italic>i</italic></sub></sub> in <italic>sorted</italic>_<italic>distance</italic><sub>&#x003BC;</sub> and utilize <italic>n</italic><sub>&#x003C3;<sub><italic>i</italic></sub></sub> to indicate the corresponding position index of <italic>d</italic><sub>&#x003C3;<sub><italic>i</italic></sub></sub> in <italic>sorted</italic>_<italic>distance</italic><sub>&#x003C3;</sub>, where the position index ranges from 1 to <italic>B</italic>.</p>
<p>Then the coefficients of variance are given by:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E11"><label>(11)</label><mml:math id="M20"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>start</italic> and <italic>end</italic> are the values set manually, and <italic>B</italic> represents the size of a mini-batch. <italic>start</italic> is the minimum value among all variance coefficients, while <italic>end</italic> is the maximum value.</p>
<p>We set &#x003BB;<sub>&#x003BC;<sub><italic>i</italic></sub></sub> and &#x003BB;<sub>&#x003C3;<sub><italic>i</italic></sub></sub> as the coefficient of variance to control the degree of shift. Then we obtain the augmented feature statistics:</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>X</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E13"><label>(13)</label><mml:math id="M22"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>Y</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>Y</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>X</italic> and <italic>Y</italic> is a zero-mean multi-variate normal distribution, respectively.</p>
<p>The augmented feature statistics, mean <inline-formula><mml:math id="M23"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and standard deviation <inline-formula><mml:math id="M24"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, can be randomly drawn from the corresponding distributions as:</p>
<disp-formula id="E14"><label>(14)</label><mml:math id="M25"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E15"><label>(15)</label><mml:math id="M26"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The final formula of RCDSU is as follows:</p>
<disp-formula id="E16"><label>(16)</label><mml:math id="M27"><mml:mtable class="eqnarray" columnalign="right"><mml:mtr><mml:mtd><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mo class="qopname">RCDSU</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mstyle displaystyle="true"><mml:munder accentunder="false"><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0FE38;</mml:mo></mml:munder></mml:mstyle></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mstyle displaystyle="true"><mml:munder accentunder="false"><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x0FE38;</mml:mo></mml:munder></mml:mstyle></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003BC;(<italic>x</italic>) and &#x003C3;(<italic>x</italic>) are feature statistics calculated using the robust statistics formulas (<xref ref-type="disp-formula" rid="E2">Equations 2</xref>, <xref ref-type="disp-formula" rid="E3">3</xref>).</p>
<p>The illustration of the sphere data distribution is shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. The data points close to the center of the sphere are not easy to break through the class boundary when shifting. For example, the shift marked as number 1 or number 3 in the figure transforms without changing the class identity and it means that the semantics are preserved. The data points close to the outermost layer of the sphere are easy to break through the class boundary when shifting, resulting in semantic changes. For example, the shift marked as number 4 or number 5 in the figure transforms from dogs to wolves and it means that the shift is too large, resulting in a change in semantics, which is the wrong shift.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Illustration of the sphere data distribution. The data points close to the center of the sphere are not easy to break through the class boundary when shifting. For example, the shift marked as number 1 or number 3 in the figure transforms without changing the class identity and it means that the semantics are preserved. The data points close to the outermost layer of the sphere are easy to break through the class boundary when shifting, resulting in semantic changes. For example, the shift marked as number 4 or number 5 in the figure transforms from dogs to wolves and it means that the shift is too large, resulting in a change in semantics, which is the wrong shift.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0003.tif"/>
</fig></sec>
<sec>
<title>3.4 Content transfer with FeatureDA</title>
<p>In the above, we introduce using Robust statistics and controlling the Coefficient of variance for DSU (RCDSU), which is utilized for style transfer. Next we will introduce Feature Data Augmentation (FeatureDA), which is utilized for content transfer. FeatureDA controls the coefficient of variance similarly to generate the augmented features with semantics unchanged and increase the coverage of augmented features.</p>
<p>According to the sphere distribution of the features, the closer to the outer layer of the sphere distribution the data point is, we hope that its shift will be smaller to avoid the semantic change of the feature caused by the shift out of the boundary. The closer to the center of the sphere distribution the data point is, we hope that its shift can be slightly larger to improve the coverage of the augmented features, increase the diversity of the augmented features and further improve the generalization ability of the model. In order to achieve this goal, the size of the shift is controlled by multiplying a coefficient in front of the variance. We assign the coefficient of variance to each feature by its Euclidean distance from the center vector. The larger the distance from the center vector is, the smaller the coefficient of variance corresponding to the feature is, that is, the smaller the shift of the data point is. The smaller the distance from the center vector is, the larger the coefficient of variance corresponding to the feature is, that is, the larger the shift of the data point is.</p>
<p>Given <italic>a</italic>&#x02208;&#x0211D;<sup><italic>B</italic>&#x000D7;<italic>A</italic></sup> to be the deep features and <inline-formula><mml:math id="M28"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to be the deep feature of the <italic>i</italic>th instance in a mini-batch learned by a deep network, we denote <inline-formula><mml:math id="M29"><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> as the variance of all features in a mini-batch, which can be formulated as:</p>
<disp-formula id="E17"><label>(17)</label><mml:math id="M30"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We denote <inline-formula><mml:math id="M31"><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> as the center of the features, which can be formulated as:</p>
<disp-formula id="E18"><label>(18)</label><mml:math id="M32"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We set <italic>d</italic><sub><italic>a</italic><sub><italic>i</italic></sub></sub> as the Euclidean distance between <italic>a</italic><sub><italic>i</italic></sub> and <italic>ct</italic><sub><italic>a</italic></sub>, which can be formulated as:</p>
<disp-formula id="E19"><label>(19)</label><mml:math id="M33"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>c</mml:mi><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Then sort all the distances of <inline-formula><mml:math id="M34"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> in descending order, and we can get the sorted distance list, <italic>sorted</italic>_<italic>distance</italic><sub><italic>a</italic></sub>.</p>
<p>We utilize <italic>n</italic><sub><italic>a</italic><sub><italic>i</italic></sub></sub> to indicate the corresponding position index of <italic>d</italic><sub><italic>a</italic><sub><italic>i</italic></sub></sub> in <italic>sorted</italic>_<italic>distance</italic><sub><italic>a</italic></sub>, where the position index ranges from 1 to <italic>B</italic>.</p>
<p>Then the coefficient of variance is given by:</p>
<disp-formula id="E20"><label>(20)</label><mml:math id="M35"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>B</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>start</italic> and <italic>end</italic> are the values set manually, and <italic>B</italic> represents the size of a mini-batch. <italic>start</italic> is the minimum value among all variance coefficients, while <italic>end</italic> is the maximum value.</p>
<p>We set &#x003BB;<sub><italic>a</italic><sub><italic>i</italic></sub></sub> as the coefficient of variance to control the degree of shift. Then we obtain the augmented feature:</p>
<disp-formula id="E21"><label>(21)</label><mml:math id="M36"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x000E3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>=</mml:mo><mml:mo class="qopname">FeatureDA</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E22"><label>(22)</label><mml:math id="M37"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>Z</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>Z</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>Z</italic> denotes a zero-mean multi-variate normal distribution.</p>
<p>Finally, we can obtain the augmented feature <inline-formula><mml:math id="M38"><mml:msub><mml:mrow><mml:mi>&#x000E3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0007E;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></sec>
<sec>
<title>3.5 Network architecture</title>
<p>The network architecture of our method (RCDSU &#x0002B; FeatureDA) is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. We use ResNet18 as the backbone. RCDSU and FeatureDA can be plug-and-play modules to be readily inserted into the network. In ResNet18, we insert RCDSU after first Conv, Max Pooling layer, 1, 2, 3, 4-th ConvBlock. After the feature extraction network, we can get the deep feature <italic>a</italic><sub><italic>i</italic></sub>. And we can get the augmented feature &#x000E3;<sub><italic>i</italic></sub> by using FeatureDA. The predicted value <inline-formula><mml:math id="M39"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of the augmented feature is obtained through a fully connected layer classifier. Then calculate the cross-entropy loss between the predicted value and the real value. With the stochastic gradient descent (SGD) algorithm, we can update the parameters of the feature extraction network, and update the weight matrix <italic>W</italic> and biases <italic>b</italic> of the fully connected layer. We present the pseudo code of the proposed method (RCDSU &#x0002B; FeatureDA) in <xref ref-type="table" rid="T8">Algorithm 1</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The network architecture of our method (RCDSU &#x0002B; FeatureDA). Note these images are for visualization only, rather than feeding into the network for training.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0004.tif"/>
</fig>
<table-wrap position="float" id="T8">
<label>Algorithm 1</label>
<caption><p>The algorithm of the proposed method.</p></caption>
<table frame="box" rules="all">
<tbody>
<tr><td><bold>Input</bold>: Intermediate feature <italic>x</italic>&#x02208;&#x0211D;<sup><italic>B</italic>&#x000D7;<italic>C</italic>&#x000D7;<italic>H</italic>&#x000D7;<italic>W</italic></sup></td></tr>
<tr><td>1 Compute the feature statistics &#x003BC;, &#x003C3; with robust statistics.</td></tr>
<tr><td>2 Compute <inline-formula><mml:math id="M40"><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M41"><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>.</td></tr>
<tr><td>3 Compute the coefficient of variance &#x003BB;<sub>&#x003BC;</sub> and &#x003BB;<sub>&#x003C3;</sub>.</td></tr>
<tr><td>4 Style transfer with RCDSU: <inline-formula><mml:math id="M42"><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo class="qopname">RCDSU</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>.</td></tr>
<tr><td>5 Get the features <italic>a</italic>&#x02208;&#x0211D;<sup><italic>B</italic>&#x000D7;<italic>A</italic></sup> after the deep network.</td></tr>
<tr><td>6 Compute <inline-formula><mml:math id="M43"><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>.</td></tr>
<tr><td>7 Compute the coefficient of variance &#x003BB;<sub><italic>a</italic><sub><italic>i</italic></sub></sub>.</td></tr>
<tr><td>8 Content transfer with FeatureDA: &#x000E3;<sub><italic>i</italic></sub> &#x0003D; FeatureDA(<italic>a</italic><sub><italic>i</italic></sub>). <bold>Output</bold>: The augmented features &#x000E3;&#x02208;&#x0211D;<sup><italic>B</italic>&#x000D7;<italic>A</italic></sup> </td></tr>
</tbody>
</table>
</table-wrap>
</sec></sec>
<sec id="s4">
<title>4 Experiments</title>
<p>In this section, we empirically validate the proposed method on several tasks. First, PACS multi-style classification task is performed using our method (RCDSU &#x0002B; FeatureDA). We compare our method with the previously proposed methods such as pAdaIN (Nuriel et al., <xref ref-type="bibr" rid="B44">2021</xref>) and MixStyle (Zhou et al., <xref ref-type="bibr" rid="B62">2021</xref>). Second, FeatureDA is used alone to perform CIFAR-100 image classification task. We report the accuracy of several modern deep networks with and without FeatureDA. Third, we add Gaussian noise to PACS training data, and compare our method (RCDSU &#x0002B; FeatureDA) with DSU (Li et al., <xref ref-type="bibr" rid="B39">2022</xref>), ISDA (Wang et al., <xref ref-type="bibr" rid="B54">2021</xref>), MixStyle (Zhou et al., <xref ref-type="bibr" rid="B62">2021</xref>), pAdaIN (Nuriel et al., <xref ref-type="bibr" rid="B44">2021</xref>), PCL (Yao et al., <xref ref-type="bibr" rid="B57">2022</xref>), SWAD (Cha et al., <xref ref-type="bibr" rid="B7">2021</xref>), and MODE (Dai R. et al., <xref ref-type="bibr" rid="B14">2023</xref>) to verify the robustness of our method. Fourth, we perform ablation studies of the proposed method on PACS and CIFAR-100 with models trained on ResNet.</p>
<sec>
<title>4.1 Multi-style image classification</title>
<sec>
<title>4.1.1 Setup and implementation details</title>
<p>We choose the PACS dataset, a commonly used benchmark for multi-style image classification. PACS consists of four styles, i.e., Art Painting, Cartoon, Photo, and Sketch, with totally 9,991 images of seven classes. For evaluation, a model is trained on three styles and tested on the remaining one. Following prior work, we use ResNet18 and ResNet50 as the backbones. We compare our method (RCDSU &#x0002B; FeatureDA) with the previously proposed methods such as pAdaIN (Nuriel et al., <xref ref-type="bibr" rid="B44">2021</xref>) and MixStyle (Zhou et al., <xref ref-type="bibr" rid="B62">2021</xref>).</p></sec>
<sec>
<title>4.1.2 Results</title>
<p>The experiment results, shown in <xref ref-type="table" rid="T1">Table 1</xref>, demonstrate our improvement over the baseline method on both ResNet18 and ResNet50, which shows our superiority to the conventional approach. We use Ours to denote our method (RCDSU &#x0002B; FeatureDA). In the last column of the table, our method improves the accuracy by an average of 1.24% compared to the previous methods, and the classification performance in Sketch is higher than other methods by over 3%. This is because our method works better when the style difference between training data and testing data is larger and the experiments in Sketch fit this very well, as shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. In Cartoon, our method also shows slightly better performance than previous methods. The performance in Photo is not very good because the style differences between Art Painting, Cartoon, and Photo are not very large.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Experiment results of PACS multi-style classification task.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Art</bold></th>
<th valign="top" align="center"><bold>Cartoon</bold></th>
<th valign="top" align="center"><bold>Photo</bold></th>
<th valign="top" align="center"><bold>Sketch</bold></th>
<th valign="top" align="center"><bold>Average (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="center">74.30</td>
<td valign="top" align="center">76.70</td>
<td valign="top" align="center">96.40</td>
<td valign="top" align="center">68.70</td>
<td valign="top" align="center">79.02</td>
</tr> <tr>
<td valign="top" align="left">L2A-OT (Zhou et al., <xref ref-type="bibr" rid="B61">2020</xref>)</td>
<td valign="top" align="center">83.30</td>
<td valign="top" align="center">78.20</td>
<td valign="top" align="center">96.20</td>
<td valign="top" align="center">73.60</td>
<td valign="top" align="center">82.82</td>
</tr> <tr>
<td valign="top" align="left">pAdaIN (Nuriel et al., <xref ref-type="bibr" rid="B44">2021</xref>)</td>
<td valign="top" align="center">81.74</td>
<td valign="top" align="center">76.91</td>
<td valign="top" align="center">96.29</td>
<td valign="top" align="center">75.13</td>
<td valign="top" align="center">82.51</td>
</tr> <tr>
<td valign="top" align="left">MixStyle (Zhou et al., <xref ref-type="bibr" rid="B62">2021</xref>)</td>
<td valign="top" align="center">82.30</td>
<td valign="top" align="center">79.00</td>
<td valign="top" align="center">96.30</td>
<td valign="top" align="center">73.80</td>
<td valign="top" align="center">82.85</td>
</tr> <tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center">82.72</td>
<td valign="top" align="center"><bold>79.14</bold></td>
<td valign="top" align="center">94.58</td>
<td valign="top" align="center"><bold>78.28</bold></td>
<td valign="top" align="center"><bold>83.68</bold></td>
</tr> <tr>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="center">86.20</td>
<td valign="top" align="center">78.70</td>
<td valign="top" align="center">97.66</td>
<td valign="top" align="center">70.63</td>
<td valign="top" align="center">83.29</td>
</tr> <tr>
<td valign="top" align="left">pAdaIN (Nuriel et al., <xref ref-type="bibr" rid="B44">2021</xref>)</td>
<td valign="top" align="center">85.82</td>
<td valign="top" align="center">81.06</td>
<td valign="top" align="center">97.17</td>
<td valign="top" align="center">77.37</td>
<td valign="top" align="center">85.36</td>
</tr> <tr>
<td valign="top" align="left">MixStyle (Zhou et al., <xref ref-type="bibr" rid="B62">2021</xref>)</td>
<td valign="top" align="center">86.80</td>
<td valign="top" align="center">79.00</td>
<td valign="top" align="center">96.60</td>
<td valign="top" align="center">78.50</td>
<td valign="top" align="center">85.22</td>
</tr> <tr>
<td valign="top" align="left">RSC (Huang et al., <xref ref-type="bibr" rid="B31">2020</xref>)</td>
<td valign="top" align="center">85.40</td>
<td valign="top" align="center">79.70</td>
<td valign="top" align="center">97.60</td>
<td valign="top" align="center">78.20</td>
<td valign="top" align="center">85.22</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center">86.68</td>
<td valign="top" align="center"><bold>81.28</bold></td>
<td valign="top" align="center">97.15</td>
<td valign="top" align="center"><bold>82.12</bold></td>
<td valign="top" align="center"><bold>86.80</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Lines 1 to 5 represent the experimental results of ResNet18, and lines 6 to 10 represent the experimental results of ResNet50. The best results are bold-faced.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Illustration of the experiments in Sketch. We train on Art Painting, Cartoon and Photo, and test on Sketch. We can see that the style difference between training data and testing data is very large and this makes our method work well.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0005.tif"/>
</fig>
<p>Our method keeps the performance of the model at a relatively high level although the accuracy of our method is not as good as that of DSU and the latest methods. RCDSU plus FeatureDA improves the robustness of the model, which can be seen in Section 4.3. We provide a novel idea for multi-style image data augmentation, that is, to improve the generalization performance of the model at the style and content level respectively.</p></sec></sec>
<sec>
<title>4.2 FeatureDA for CIFAR-100 image classification</title>
<sec>
<title>4.2.1 Setup and implementation details</title>
<p>The CIFAR-100 dataset consists of 32 &#x000D7; 32 colored natural images in 100 classes, with 50,000 images for training and 10,000 images for testing. Since CIFAR-100 belongs to a single-style dataset, that is, there are not great style differences between training data and testing data. Therefore, style transfer is not required for data augmentation and only content transfer is required. We use FeatureDA alone to perform the CIFAR-100 image classification task.</p></sec>
<sec>
<title>4.2.2 Results</title>
<p>We report the accuracy of several modern deep networks with and without FeatureDA on CIFAR-100 in <xref ref-type="table" rid="T2">Table 2</xref>. On the single-style dataset CIFAR-100, FeatureDA can improve the classification accuracy of the model by an average of 0.92%, and is applicable to a variety of networks. It proves that FeatureDA can indeed be used as an efficient data augmentation method based on content transfer to improve the generalization ability of the model at the content level.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Evaluation (%) of FeatureDA on CIFAR-100 with different models.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Networks</bold></th>
<th valign="top" align="center" colspan="3"><bold>CIFAR-100</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th/>
<th valign="top" align="center"><bold>Basic</bold></th>
<th valign="top" align="center"><bold>FeatureDA</bold></th>
<th valign="top" align="center"><bold>Improvement</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">ResNet-32</td>
<td valign="top" align="center">68.80</td>
<td valign="top" align="center">70.04</td>
<td valign="top" align="center">1.24</td>
</tr> <tr>
<td valign="top" align="left">ResNet-110</td>
<td valign="top" align="center">71.33</td>
<td valign="top" align="center">74.19</td>
<td valign="top" align="center">2.86</td>
</tr> <tr>
<td valign="top" align="left">SE-ResNet-110</td>
<td valign="top" align="center">72.70</td>
<td valign="top" align="center">74.04</td>
<td valign="top" align="center">1.34</td>
</tr> <tr>
<td valign="top" align="left">Wide-ResNet-16-8</td>
<td valign="top" align="center">79.76</td>
<td valign="top" align="center">79.98</td>
<td valign="top" align="center">0.22</td>
</tr> <tr>
<td valign="top" align="left">Wide-ResNet-28-10</td>
<td valign="top" align="center">81.47</td>
<td valign="top" align="center">81.91</td>
<td valign="top" align="center">0.44</td>
</tr> <tr>
<td valign="top" align="left">ResNeXt-29, 8x64d</td>
<td valign="top" align="center">81.84</td>
<td valign="top" align="center">82.44</td>
<td valign="top" align="center">0.60</td>
</tr> <tr>
<td valign="top" align="left">DenseNet-BC-100-12</td>
<td valign="top" align="center">77.39</td>
<td valign="top" align="center">77.81</td>
<td valign="top" align="center">0.42</td>
</tr> <tr>
<td valign="top" align="left">Shake-Shake (26, 2x32d)</td>
<td valign="top" align="center">79.88</td>
<td valign="top" align="center">80.46</td>
<td valign="top" align="center">0.58</td>
</tr> <tr>
<td valign="top" align="left">Shake-Shake (26, 2x112d)</td>
<td valign="top" align="center">82.58</td>
<td valign="top" align="center">83.13</td>
<td valign="top" align="center">0.55</td>
</tr> <tr>
<td valign="top" align="left">Average</td>
<td valign="top" align="center">&#x02013;</td>
<td valign="top" align="center">&#x02013;</td>
<td valign="top" align="center">0.92</td>
</tr></tbody>
</table>
</table-wrap></sec></sec>
<sec>
<title>4.3 Robustness to noise</title>
<sec>
<title>4.3.1 Setup and implementation details</title>
<p>We add Gaussian noise that follows <inline-formula><mml:math id="M44"><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mstyle class="text"><mml:mtext>_</mml:mtext></mml:mstyle><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:msup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> to the feature map of each sample in PACS training data, and then perform the PACS multi-style classification task. <italic>noise</italic>_<italic>std</italic> is selected from {0.25, 0.5, 1, 1.5, 2}. We compare our method (RCDSU &#x0002B; FeatureDA) with DSU (Li et al., <xref ref-type="bibr" rid="B39">2022</xref>), ISDA (Wang et al., <xref ref-type="bibr" rid="B54">2021</xref>), MixStyle (Zhou et al., <xref ref-type="bibr" rid="B62">2021</xref>), pAdaIN (Nuriel et al., <xref ref-type="bibr" rid="B44">2021</xref>), PCL (Yao et al., <xref ref-type="bibr" rid="B57">2022</xref>), SWAD (Cha et al., <xref ref-type="bibr" rid="B7">2021</xref>), and MODE (Dai R. et al., <xref ref-type="bibr" rid="B14">2023</xref>) to verify the robustness of our method.</p></sec>
<sec>
<title>4.3.2 Results</title>
<p>The results are shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. It can be seen that when we add Gaussian noise that follows <inline-formula><mml:math id="M45"><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mstyle class="text"><mml:mtext>_</mml:mtext></mml:mstyle><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:msup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> to the feature map of each sample in PACS training data, the classification accuracy of our method is better than that of DSU and other methods. When <italic>noise</italic>_<italic>std</italic> is set to 2, our method outperforms other methods by over 15%. This is because our method considers outliers and uses robust statistics to weaken the influence of outliers.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Experiment results of adding Gaussian noise to PACS training data with different methods.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0006.tif"/>
</fig>
<p>MODE performs distribution exploration in an uncertainty subset that shares the same semantic factors with the training domains. However, it does not consider the outliers. So its performance degrades dramatically in the case of high noise. Besides, the mean and standard deviation in DSU, MixStyle and pAdaIN, and the covariance matrix in ISDA are affected by outliers. Outliers are not handled in these methods. Therefore, they don&#x00027;t perform well in high noise.</p>
<p>When the training data is mixed with noise, the model trained by our method can still maintain good generalization ability. It shows that our method can indeed improve the robustness and the ability to resist outlier disturbances of the model after using robust statistics. That is to say, our method is more robust than DSU and other methods. In other words, after adding a small amount of Gaussian noise to each training sample, our method can still learn the key features of each sample. However, DSU and other methods cannot learn the key features of each sample well under the disturbance of a small amount of Gaussian noise. That is to say, when the training data is mixed with noise, our method can make the deep network perform feature extraction better, compared to DSU and other methods.</p></sec></sec>
<sec>
<title>4.4 Ablation study</title>
<p>Next we will perform ablation studies of the proposed method on PACS and CIFAR-100 with models trained on ResNet. We will conduct the following ablation studies respectively: (1) Set different starting and ending points when FeatureDA controls the variance coefficient. (2) Set different starting and ending points when RCDSU controls the variance coefficient. (3) Set the number of segments when RCDSU uses robust statistics. (4) Conduct a series of experiments on the combinations of RCDSU and FeatureDA.</p>
<p>We use FeatureDA(no coefficient) to represent FeatureDA without controlling the coefficient of variance, and use RCDSU(no modules) to represent RCDSU that neither uses robust statistics nor controls the coefficient of variance.</p>
<sec>
<title>4.4.1 Controlling the variance coefficient in FeatureDA</title>
<p>We set different starting and ending points, <italic>start</italic> and <italic>end</italic>, when FeatureDA controls the variance coefficient.</p>
<sec>
<title>4.4.1.1 CIFAR-100 image classification task</title>
<p>As shown in <xref ref-type="table" rid="T3">Table 3</xref>, when we use FeatureDA to perform the CIFAR-100 image classification task with Resnet-32, setting <italic>start</italic> and <italic>end</italic> to 0.4 and 0.9 works best.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Setting different starting and ending points when FeatureDA controls the variance coefficient on CIFAR-100 image classification task.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Networks</bold></th>
<th valign="top" align="center"><bold>Start, end</bold></th>
<th valign="top" align="center"><bold>CIFAR-100</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" rowspan="10">ResNet-32</td>
<td valign="top" align="left">FeatureDA (no coefficient)</td>
<td valign="top" align="center">68.71</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.5, end = 2)</td>
<td valign="top" align="center">67.84</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.8, end = 1.5)</td>
<td valign="top" align="center">68.45</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.8, end = 1.2)</td>
<td valign="top" align="center">68.57</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.8, end = 1.0)</td>
<td valign="top" align="center">68.59</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.5, end = 1.0)</td>
<td valign="top" align="center">69.50</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.5, end = 0.8)</td>
<td valign="top" align="center">68.99</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.4, end = 0.9)</td>
<td valign="top" align="center"><bold>70.04</bold></td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.3, end = 0.9)</td>
<td valign="top" align="center">69.17</td>
</tr>
<tr>
<td valign="top" align="left">FeatureDA (start = 0.3, end = 0.8)</td>
<td valign="top" align="center">69.51</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Bold values indicate the best result.</p>
</table-wrap-foot>
</table-wrap>
<p>In the CIFAR-100 dataset, the difference between all features in a mini-batch is too large, that is, the variance is too large. This means that the shift will be large and the semantics will change. So the coefficient multiplied in front of the variance should be &#x0003C; 1 to make the variance smaller. We can see that setting <italic>start</italic> and <italic>end</italic> to 0.4 and 0.9 works better than setting <italic>start</italic> and <italic>end</italic> to 0.5 and 2 because the coefficients of the former are smaller. We reduce the shift by making the coefficient small to avoid semantic changes.</p>
<p>However, the coefficient cannot be infinitely small. As the shift gets smaller, the diversity of augmented features will decrease. We can see that setting <italic>start</italic> and <italic>end</italic> to 0.4 and 0.9 works better than setting <italic>start</italic> and <italic>end</italic> to 0.3 and 0.8 because the diversity of the latter are smaller. The coefficients can be neither too large nor too small. We need to find a balance between not changing the semantics and keeping the diversity of augmented features not too small.</p>
<p>Both ISDA and FeatureDA essentially add a random vector following a zero-mean multi-variate normal distribution to the original feature vector, and each value of the random vector is a random quantity that fluctuates around 0. Because the random vectors of FeatureDA and ISDA both fluctuate around 0, the difference between using the variance of all features in a mini-batch and using the covariance of all features in a class is actually not that big.</p></sec>
<sec>
<title>4.4.1.2 PACS multi-style classification task</title>
<p>As shown in <xref ref-type="table" rid="T4">Table 4</xref>, when we use FeatureDA to perform the PACS multi-style classification task with ResNet18, setting <italic>start</italic> and <italic>end</italic> to 2 and 2.5 works best.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Setting different starting and ending points when FeatureDA controls the variance coefficient on PACS multi-style classification task.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Start, end</bold></th>
<th valign="top" align="center"><bold>Accuracy (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">FeatureDA (no coefficient)</td>
<td valign="top" align="center">80.5350</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 0.5, end = 2)</td>
<td valign="top" align="center">80.7000</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 0.8, end = 1.5)</td>
<td valign="top" align="center">80.7000</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 0.5, end = 1.0)</td>
<td valign="top" align="center">80.3225</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 0.4, end = 0.9)</td>
<td valign="top" align="center">80.2925</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 1, end = 2)</td>
<td valign="top" align="center">80.7400</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 1, end = 1.5)</td>
<td valign="top" align="center">80.6175</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 1.5, end = 2)</td>
<td valign="top" align="center">80.8025</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 1.5, end = 2.5)</td>
<td valign="top" align="center">80.9300</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 2, end = 2.5)</td>
<td valign="top" align="center"><bold>81.1475</bold></td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 2, end = 3)</td>
<td valign="top" align="center">81.0350</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 2.5, end = 3)</td>
<td valign="top" align="center">80.9200</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Bold values indicate the best result.</p>
</table-wrap-foot>
</table-wrap>
<p>In the PACS dataset, the difference between all features in a mini-batch is too small, that is, the variance is too small. This means that the shift range will be small and the diversity of augmented features will decrease. So the coefficient multiplied in front of the variance should be &#x0003E; 1 to make the variance larger. We can see that setting <italic>start</italic> and <italic>end</italic> to 2 and 2.5 works better than setting <italic>start</italic> and <italic>end</italic> to 0.5 and 2 because the coefficients of the former are larger. We increase the shift range by enlarging the coefficient.</p>
<p>However, the coefficient cannot be infinitely large. As the shift gets larger, the semantics will change. We can see that setting <italic>start</italic> and <italic>end</italic> to 2 and 2.5 works better than setting <italic>start</italic> and <italic>end</italic> to 2.5 and 3 because the semantics of the latter change. The coefficients can be neither too large nor too small. We need to find a balance between not changing the semantics and keeping the diversity of augmented features not too small.</p></sec></sec>
<sec>
<title>4.4.2 Controlling the variance coefficient in RCDSU</title>
<p>We set different starting and ending points, <italic>start</italic> and <italic>end</italic>, when RCDSU controls variance coefficient. As shown in <xref ref-type="table" rid="T5">Table 5</xref>, when we use RCDSU to perform the PACS multi-style classification task with ResNet18, setting <italic>start</italic> and <italic>end</italic> to 0.7 and 2 works best.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Setting different starting and ending points when RCDSU controls variance coefficient on PACS multi-style classification task.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Start, end</bold></th>
<th valign="top" align="center"><bold>Accuracy (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">RCDSU (no modules)</td>
<td valign="top" align="center">83.1125</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.5, end = 2)</td>
<td valign="top" align="center">83.4600</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.8, end = 1.5)</td>
<td valign="top" align="center">82.9900</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.5, end = 1)</td>
<td valign="top" align="center">82.8675</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 1, end = 2)</td>
<td valign="top" align="center">82.8000</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.8, end = 2)</td>
<td valign="top" align="center">83.1350</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.5, end = 1.5)</td>
<td valign="top" align="center">82.9650</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.4, end = 2)</td>
<td valign="top" align="center">83.2775</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.3, end = 2)</td>
<td valign="top" align="center">83.1050</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.6, end =2)</td>
<td valign="top" align="center">83.4175</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.7, end = 2)</td>
<td valign="top" align="center"><bold>83.5250</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Bold values indicate the best result.</p>
</table-wrap-foot>
</table-wrap>
<p>In order to avoid the semantic change, we set <italic>start</italic> to 0.7 to reduce the shift of the data point which is close to the outer layer of the sphere distribution. In order to increase the diversity of the augmented feature statistics, we set <italic>end</italic> to 2 to increase the shift of the data point which is close to the center of the sphere distribution.</p>
<p>We can see that setting <italic>start</italic> and <italic>end</italic> to 0.7 and 2 works better than setting <italic>start</italic> and <italic>end</italic> to 0.5 and 1 because the coefficients of the former are larger and the diversity of the former is greater. We can also see that setting <italic>start</italic> and <italic>end</italic> to 0.7 and 2 works better than setting <italic>start</italic> and <italic>end</italic> to 1 and 2 because the coefficients of the latter are larger and the semantics of the latter change. The coefficients can be neither too large nor too small. We need to find a balance between not changing the semantics and keeping the diversity of augmented features not too small.</p></sec>
<sec>
<title>4.4.3 Using robust statistics in RCDSU</title>
<p>We set the number of segments <italic>S</italic> to different values when RCDSU uses robust statistics. <italic>S</italic> is selected from {32, 64, 128, 196, 256, 512, 1, 024}. As shown in <xref ref-type="fig" rid="F7">Figure 7</xref>, when we use RCDSU to perform the PACS multi-style classification task with ResNet18, setting the number of segments <italic>S</italic> to 512 works best.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Setting the number of segments <italic>S</italic> for RCDSU with robust statistics.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-18-1382406-g0007.tif"/>
</fig>
<p>The number of segments can be neither too large nor too small. When the number of segments is set to 32, the number of segments is too small, so that the mean and variance of all medians cannot approach the true mean and variance. When the number of segments is set to 1,024, the number of segments is too large, resulting in increased calculation costs, and at the same time, it cannot well avoid the influence of outliers. So we need to find a balance between approaching the true mean and variance and avoiding the influence of outliers.</p></sec>
<sec>
<title>4.4.4 Combinations of RCDSU and FeatureDA</title>
<sec>
<title>4.4.4.1 PACS multi-style classification task</title>
<p>As shown in <xref ref-type="table" rid="T6">Table 6</xref>, when we combine RCDSU and FeatureDA to perform the PACS multi-style classification task with ResNet18, setting the number of segments <italic>S</italic> to 512 for RCDSU, <italic>start</italic> and <italic>end</italic> to 0.7 and 2 for RCDSU, and <italic>start</italic> and <italic>end</italic> to 2 and 2.5 for FeatureDA works best.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Different combinations of RCDSU and FeatureDA on PACS multi-style classification task.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Different combinations</bold></th>
<th valign="top" align="center"><bold>Accuracy (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">RCDSU (start = 0.7, end = 2)</td>
<td valign="top" align="center">83.5250</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (S = 512)</td>
<td valign="top" align="center">83.3075</td>
</tr> <tr>
<td valign="top" align="left">FeatureDA (start = 2, end = 2.5)</td>
<td valign="top" align="center">81.1475</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.7, end = 2) &#x0002B; FeatureDA (start = 2, end = 2.5)</td>
<td valign="top" align="center">83.6100</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (S = 512) &#x0002B; FeatureDA (start = 2, end = 2.5)</td>
<td valign="top" align="center">83.4100</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.7, end = 2, S = 512)</td>
<td valign="top" align="center">83.5900</td>
</tr> <tr>
<td valign="top" align="left">RCDSU (start = 0.7, end = 2, S = 512) &#x0002B; FeatureDA (start = 2, end = 2.5)</td>
<td valign="top" align="center"><bold>83.6800</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Bold values indicate the best result.</p>
</table-wrap-foot>
</table-wrap>
<p>For the three modules, controlling the variance coefficient in FeatureDA, controlling the variance coefficient in RCDSU, and using robust statistics in RCDSU, we can see that pairwise combinations of the three modules work better than single modules. The combination of three modules works better than all combinations of two. This proves that each module in our method is effective.</p></sec>
<sec>
<title>4.4.4.2 CIFAR-100 image classification task</title>
<p>As shown in <xref ref-type="table" rid="T7">Table 7</xref>, when we use RCDSU alone or use RCDSU plus FeatureDA to perform the CIFAR-100 image classification task with ResNet-32, the results are not excellent. This shows that as a style transfer module, RCDSU can not be used to perform the CIFAR-100 image classification task because CIFAR-100 is a single-style dataset and there are not large style differences between training data and testing data in CIFAR-100. RCDSU can only be used in the multi-style dataset classification task. As a content transfer module, FeatureDA improves the generalization ability of the model at the content level, which works on any dataset, so FeatureDA can be regarded as a data augmentation method that can be used in any classification task.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>Experiment results of combining RCDSU and FeatureDA on CIFAR-100 image classification task.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Networks</bold></th>
<th valign="top" align="center"><bold>Different combinations</bold></th>
<th valign="top" align="center"><bold>CIFAR-100</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td/>
<td valign="top" align="left">RCDSU (no modules)</td>
<td valign="top" align="center">59.98</td>
</tr> <tr>
<td valign="top" align="center">ResNet-32</td>
<td valign="top" align="left">FeatureDA (start = 0.4, end = 0.9)</td>
<td valign="top" align="center">70.04</td>
</tr> <tr>
<td/>
<td valign="top" align="left">RCDSU (no modules) &#x0002B; FeatureDA (start = 0.4, end = 0.9)</td>
<td valign="top" align="center">51.65</td>
</tr></tbody>
</table>
</table-wrap>
</sec></sec></sec></sec>
<sec sec-type="conclusions" id="s5">
<title>5 Conclusion</title>
<p>In this paper, we proposed a brain-inspired semantic data augmentation method consisting of RCDSU and FeatureDA to perform style transfer and content transfer in the feature space. RCDSU used robust statistics to calculate feature statistics, improving the robustness of deep models. Based on the idea of spherical data distribution, we controlled the coefficient of variance for RCDSU and FeatureDA to preserve semantics and increase shift range. On PACS multi-style classification task, RCDSU plus FeatureDA achieved competitive accuracy. After adding Gaussian noise to PACS dataset, RCDSU plus FeatureDA showed strong robustness against outliers. FeatureDA achieved excellent results on CIFAR-100 image classification task. RCDSU plus FeatureDA can be applied as a novel semantic data augmentation method with implicit robot automation which is suitable for multi-style datasets. Experiment results demonstrated the effectiveness of the proposed method in improving the generalization ability of the model at the style and content level. Our augmentation method is based on the feature level. Thus, for future work, we will design a decoder to restore features to images, and generate some interesting and unexpected images. In addition, our method can be applied to situations where there are great differences between actual scenes and training scenes.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p></sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>WW: Conceptualization, Data curation, Writing &#x02013; original draft, Writing &#x02013; review &#x00026; editing. ZS: Funding acquisition, Software, Supervision, Validation, Visualization, Writing &#x02013; review &#x00026; editing. CL: Validation, Writing &#x02013; review &#x00026; editing.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.</p>
</sec>
<ack><p>The authors thank the participants for their advice and help.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer JY declared a shared affiliation with the authors to the handling editor at the time of review.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amaya</surname> <given-names>C.</given-names></name> <name><surname>Von Arnim</surname> <given-names>A.</given-names></name></person-group> (<year>2023</year>). <article-title>Neurorobotic reinforcement learning for domains with parametrical uncertainty</article-title>. <source>Front. Neurorobot</source>. <volume>17</volume>:<fpage>1239581</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2023.1239581</pub-id><pub-id pub-id-type="pmid">37965072</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antoniou</surname> <given-names>A.</given-names></name> <name><surname>Storkey</surname> <given-names>A.</given-names></name> <name><surname>Edwards</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Data augmentation generative adversarial networks</article-title>. <source>arXiv [Preprint]</source>. arXiv:1711.04340. <pub-id pub-id-type="doi">10.48550/arXiv.1711.04340</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Balakrishnan</surname> <given-names>S.</given-names></name> <name><surname>Du</surname> <given-names>S. S.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Singh</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Computationally efficient robust sparse estimation in high dimensions,&#x0201D;</article-title> in <source>Conference on Learning Theory</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>169</fpage>&#x02013;<lpage>212</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bousmalis</surname> <given-names>K.</given-names></name> <name><surname>Silberman</surname> <given-names>N.</given-names></name> <name><surname>Dohan</surname> <given-names>D.</given-names></name> <name><surname>Erhan</surname> <given-names>D.</given-names></name> <name><surname>Krishnan</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Unsupervised pixel-level domain adaptation with generative adversarial networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>, <fpage>3722</fpage>&#x02013;<lpage>3731</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowles</surname> <given-names>C.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Guerrero</surname> <given-names>R.</given-names></name> <name><surname>Bentley</surname> <given-names>P.</given-names></name> <name><surname>Gunn</surname> <given-names>R.</given-names></name> <name><surname>Hammers</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Gan augmentation: augmenting training data using generative adversarial networks</article-title>. <source>arXiv [Preprint].</source> arXiv:1810.10863. <pub-id pub-id-type="doi">10.48550/arXiv.1810.10863</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cauli</surname> <given-names>N.</given-names></name> <name><surname>Reforgiato Recupero</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>Survey on videos data augmentation for deep learning models</article-title>. <source>Future Internet</source> <volume>14</volume>:<fpage>93</fpage>. <pub-id pub-id-type="doi">10.3390/fi14030093</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cha</surname> <given-names>J.</given-names></name> <name><surname>Chun</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>K.</given-names></name> <name><surname>Cho</surname> <given-names>H.-C.</given-names></name> <name><surname>Park</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>&#x0201C;Swad: Domain Generalization by Seeking Flat Minima,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 31</source>, <fpage>22405</fpage>&#x02013;<lpage>22418</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>J.</given-names></name> <name><surname>Lan</surname> <given-names>Z.</given-names></name> <name><surname>Cheng</surname> <given-names>C.</given-names></name> <name><surname>Wei</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Data uncertainty learning in face recognition,&#x0201D;</article-title> in <source>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Seattle, WA</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00575</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Ge</surname> <given-names>R.</given-names></name> <name><surname>Gupta</surname> <given-names>S.</given-names></name> <name><surname>Kane</surname> <given-names>D.</given-names></name> <name><surname>Soltanolkotabi</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>&#x0201C;Outlier-robust Sparse Estimation via Non-convex Optimization,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 35</source>, <fpage>7318</fpage>&#x02013;<lpage>7327</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Ge</surname> <given-names>R.</given-names></name> <name><surname>Woodruff</surname> <given-names>D. P.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Faster algorithms for high-dimensional robust covariance estimation,&#x0201D;</article-title> in <source>Conference on Learning Theory</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>727</fpage>&#x02013;<lpage>757</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611975482.171</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Kane</surname> <given-names>D.</given-names></name> <name><surname>Stewart</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Robust learning of fixed-structure Bayesian networks,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 31</source> (<publisher-loc>Montreal, QC</publisher-loc>).</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name></person-group> (<year>2023</year>). <article-title>Promatch: semi-supervised learning with prototype consistency</article-title>. <source>Mathematics</source> <volume>11</volume>:<fpage>3537</fpage>. <pub-id pub-id-type="doi">10.3390/math11163537</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cubuk</surname> <given-names>E. D.</given-names></name> <name><surname>Zoph</surname> <given-names>B.</given-names></name> <name><surname>Mane</surname> <given-names>D.</given-names></name> <name><surname>Vasudevan</surname> <given-names>V.</given-names></name> <name><surname>Le</surname> <given-names>Q. V.</given-names></name></person-group> (<year>2018</year>). <article-title>Autoaugment: learning augmentation policies from data</article-title>. <source>arXiv [Preprint]. arXiv:1805.09501</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1805.09501</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>R.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Fang</surname> <given-names>Z.</given-names></name> <name><surname>Han</surname> <given-names>B.</given-names></name> <name><surname>Tian</surname> <given-names>X.</given-names></name></person-group> (<year>2023b</year>). <article-title>Moderately distributional exploration for domain generalization</article-title>. <source>arXiv [Preprint]</source>. arXiv:2304.13976. <pub-id pub-id-type="doi">10.48550/arXiv.2304.13976</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Liao</surname> <given-names>W.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2023a</year>). <article-title>Auggpt: leveraging chatgpt for text data augmentation</article-title>. <source>arXiv [Preprint].</source> arXiv:2302.13007. <pub-id pub-id-type="doi">10.48550/arXiv.2302.13007</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeVries</surname> <given-names>T.</given-names></name> <name><surname>Taylor</surname> <given-names>G. W.</given-names></name></person-group> (<year>2017</year>). <article-title>Improved regularization of convolutional neural networks with cutout</article-title>. <source>arXiv [Preprint]</source>. arXiv:1708.04552. <pub-id pub-id-type="doi">10.48550/arXiv.</pub-id> 708.04552</citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeWolf</surname> <given-names>T.</given-names></name> <name><surname>Jaworski</surname> <given-names>P.</given-names></name> <name><surname>Eliasmith</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Nengo and low-power ai hardware for robust, embedded neurorobotics</article-title>. <source>Front. Neurorobot</source>. <volume>14</volume>:<fpage>568359</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2020.568359</pub-id><pub-id pub-id-type="pmid">33162886</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Kamath</surname> <given-names>G.</given-names></name> <name><surname>Kane</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Moitra</surname> <given-names>A.</given-names></name> <name><surname>Stewart</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2019a</year>). <article-title>Robust estimators in high-dimensions without the computational intractability</article-title>. <source>SIAM J. Comput</source>. <volume>48</volume>, <fpage>742</fpage>&#x02013;<lpage>864</lpage>. <pub-id pub-id-type="doi">10.1137/17M1126680</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Kane</surname> <given-names>D. M.</given-names></name> <name><surname>Stewart</surname> <given-names>A.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Outlier-robust learning of Ising models under Dobrushin&#x00027;s condition,&#x0201D;</article-title> in <source>Conference on Learning Theory</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>1645</fpage>&#x02013;<lpage>1682</lpage>.</citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Kong</surname> <given-names>W.</given-names></name> <name><surname>Stewart</surname> <given-names>A.</given-names></name></person-group> (<year>2019d</year>). <article-title>&#x0201C;Efficient algorithms and lower bounds for robust linear regression,&#x0201D;</article-title> in <source>Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms</source> (<publisher-loc>Philadelphia, PA</publisher-loc>: <publisher-name>Society for Industrial and Applied Mathematics</publisher-name>), <fpage>2745</fpage>&#x02013;<lpage>2754</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611975482.170</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Kamath</surname> <given-names>G.</given-names></name> <name><surname>Kane</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Steinhardt</surname> <given-names>J.</given-names></name> <name><surname>Stewart</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2019b</year>). <article-title>&#x0201C;Sever: a robust meta-algorithm for stochastic optimization,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>1596</fpage>&#x02013;<lpage>1606</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Diakonikolas</surname> <given-names>I.</given-names></name> <name><surname>Kane</surname> <given-names>D.</given-names></name> <name><surname>Karmalkar</surname> <given-names>S.</given-names></name> <name><surname>Price</surname> <given-names>E.</given-names></name> <name><surname>Stewart</surname> <given-names>A.</given-names></name></person-group> (<year>2019c</year>). <article-title>&#x0201C;Outlier-robust High-dimensional Sparse Estimation via Iterative Filtering,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 32</source>.</citation>
</ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eckert</surname> <given-names>D.</given-names></name> <name><surname>Vesal</surname> <given-names>S.</given-names></name> <name><surname>Ritschl</surname> <given-names>L.</given-names></name> <name><surname>Kappler</surname> <given-names>S.</given-names></name> <name><surname>Maier</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Deep learning-based denoising of mammographic images using physics-driven data augmentation,&#x0201D;</article-title> in <source>Bildverarbeitung f&#x000FC;r die Medizin 2020: Algorithmen-Systeme-Anwendungen. Proceedings des Workshops vom 15. bis 17. M&#x000E4;rz 2020 in Berlin</source> (<publisher-loc>Wiesbaden</publisher-loc>: <publisher-name>Springer Fachmedien Wiesbaden</publisher-name>), <fpage>94</fpage>&#x02013;<lpage>100</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname> <given-names>T.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Song</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>On-the-fly denoising for data augmentation in natural language understanding</article-title>. <source>arXiv [Preprint]</source>. arXiv:2212.10558. <pub-id pub-id-type="doi">10.48550/arXiv.2212.10558</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feldotto</surname> <given-names>B.</given-names></name> <name><surname>Soare</surname> <given-names>C.</given-names></name> <name><surname>Knoll</surname> <given-names>A.</given-names></name> <name><surname>Sriya</surname> <given-names>P.</given-names></name> <name><surname>Astill</surname> <given-names>S.</given-names></name> <name><surname>de Kamps</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Evaluating muscle synergies with EMG data and physics simulation in the neurorobotics platform</article-title>. <source>Front. Neurorobot</source>. <volume>16</volume>:<fpage>856797</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2022.856797</pub-id><pub-id pub-id-type="pmid">35903555</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gal</surname> <given-names>Y.</given-names></name> <name><surname>Ghahramani</surname> <given-names>Z.</given-names></name></person-group> (<year>2015</year>). <article-title>Bayesian convolutional neural networks with Bernoulli approximate variational inference</article-title>. <source>arXiv [Preprint]</source>. arXiv:1506.02158. <pub-id pub-id-type="doi">10.48550/arXiv.1506.02158</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gal</surname> <given-names>Y.</given-names></name> <name><surname>Ghahramani</surname> <given-names>Z.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Dropout as a Bayesian approximation: representing model uncertainty in deep learning,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>1050</fpage>&#x02013;<lpage>1059</lpage>.</citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gorpincenko</surname> <given-names>A.</given-names></name> <name><surname>Mackiewicz</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Extending temporal data augmentation for video action recognition,&#x0201D;</article-title> in <source>International Conference on Image and Vision Computing New Zealand</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer Nature Switzerland</publisher-name>), <fpage>104</fpage>&#x02013;<lpage>118</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Deep residual learning for image recognition,&#x0201D;</article-title> in <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/CVPR.2016.90</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Pleiss</surname> <given-names>G.</given-names></name> <name><surname>Maaten</surname> <given-names>L.</given-names></name> <name><surname>Weinberger</surname> <given-names>K. Q.</given-names></name></person-group> (<year>2019</year>). <article-title>Convolutional networks with dense connectivity</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>44</volume>, <fpage>8704</fpage>&#x02013;<lpage>8716</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2019.2918284</pub-id><pub-id pub-id-type="pmid">31135351</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Xing</surname> <given-names>E. P.</given-names></name> <name><surname>Huang</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Self-challenging improves cross-domain generalization,&#x0201D;</article-title> in <source>Computer Vision ECCV 2020</source> - <italic>16th European Conference, 2020, Proceedings, Vol. 12347</italic>, eds <person-group person-group-type="editor"><name><surname>Vedaldi</surname> <given-names>A.</given-names></name> <name><surname>Bischof</surname> <given-names>H.</given-names></name> <name><surname>Brox</surname> <given-names>T.</given-names></name> <name><surname>Frahm</surname> <given-names>J.-M.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>124</fpage>&#x02013;<lpage>140</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-58536-5_8</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jaderberg</surname> <given-names>M.</given-names></name> <name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Vedaldi</surname> <given-names>A.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Reading text in the wild with convolutional neural networks</article-title>. <source>Int. J. Comput. Vis</source>. <volume>116</volume>, <fpage>1</fpage>&#x02013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-015-0823-z</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jeon</surname> <given-names>H.</given-names></name> <name><surname>Ko</surname> <given-names>H. K.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name> <name><surname>Jo</surname> <given-names>J.</given-names></name> <name><surname>Seo</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Uniform manifold approximation with two-phase optimization,&#x0201D;</article-title> in <source>2022 IEEE Visualization and Visual Analytics (VIS)</source> (<publisher-loc>Oklahoma City</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>80</fpage>&#x02013;<lpage>84</lpage>.</citation>
</ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kendall</surname> <given-names>A.</given-names></name> <name><surname>Gal</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;What uncertainties do we need in Bayesian deep learning for computer vision?&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 30</source> (<publisher-loc>Long Beach, CA</publisher-loc>).</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>T.</given-names></name> <name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Shim</surname> <given-names>M.</given-names></name> <name><surname>Yun</surname> <given-names>S.</given-names></name> <name><surname>Kang</surname> <given-names>M.</given-names></name> <name><surname>Wee</surname> <given-names>D.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Exploring temporally dynamic data augmentation for video recognition</article-title>. <source>arXiv [Preprint].</source> arXiv:2206.15015. <pub-id pub-id-type="doi">10.48550/arXiv.2206.15015</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Klivans</surname> <given-names>A.</given-names></name> <name><surname>Kothari</surname> <given-names>P. K.</given-names></name> <name><surname>Meka</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Efficient algorithms for outlier-robust regression,&#x0201D;</article-title> in <source>Conference On Learning Theory</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>1420</fpage>&#x02013;<lpage>1430</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2017</year>). <article-title>Imagenet classification with deep convolutional neural networks</article-title>. <source>Commun. ACM</source> <volume>60</volume>, <fpage>84</fpage>&#x02013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1145/3065386</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <source>Learning Multiple Layers of Features from Tiny Images</source>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Dai</surname> <given-names>Y.</given-names></name> <name><surname>Ge</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Shan</surname> <given-names>Y.</given-names></name> <name><surname>Duan</surname> <given-names>L. Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Uncertainty modeling for out-of-distribution generalization</article-title>. <source>arXiv [Preprint].</source> arXiv:2202.03958. <pub-id pub-id-type="doi">10.48550/arXiv.2202.03958</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Wu</surname> <given-names>C.-H.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>Q.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>&#x0201C;Learning raw image denoising with bayer pattern unification and bayer preserving augmentation,&#x0201D;</article-title> in <source>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/CVPRW.2019.00259</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Zhao</surname> <given-names>W.</given-names></name></person-group> (<year>2023</year>). <article-title>Attentive neighborhood feature augmentation for semi-supervised learning</article-title>. <source>Intell. Autom. Soft Comput</source>. <volume>37</volume>, <fpage>1753</fpage>&#x02013;<lpage>1771</lpage>. <pub-id pub-id-type="doi">10.32604/iasc.2023.039600</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>J.</given-names></name> <name><surname>Lei</surname> <given-names>W.</given-names></name> <name><surname>Hou</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Ren</surname> <given-names>Q.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>GPR B-scan image denoising via multi-scale convolutional autoencoder with data augmentation</article-title>. <source>Electronics</source> <volume>10</volume>:<fpage>1269</fpage>. <pub-id pub-id-type="doi">10.3390/electronics10111269</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Maronna</surname> <given-names>R. A.</given-names></name> <name><surname>Martin</surname> <given-names>R. D.</given-names></name> <name><surname>Yohai</surname> <given-names>V. J.</given-names></name> <name><surname>Salibi&#x000E1;n-Barrera</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <source>Robust Statistics: Theory and Methods (with R)</source>. <publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>John Wiley &#x00026; Sons</publisher-name>. <pub-id pub-id-type="doi">10.1002/9781119214656</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nuriel</surname> <given-names>O.</given-names></name> <name><surname>Benaim</surname> <given-names>S.</given-names></name> <name><surname>Wolf</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Permuted ADaIN: reducing the bias towards global statistics in image classification,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>, <fpage>9482</fpage>&#x02013;<lpage>9491</lpage>.</citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pensia</surname> <given-names>A.</given-names></name> <name><surname>Jog</surname> <given-names>V.</given-names></name> <name><surname>Loh</surname> <given-names>P.-L.</given-names></name></person-group> (<year>2020</year>). <article-title>Robust regression with covariate filtering: heavy tails and adversarial contamination</article-title>. <source>arXiv [Preprint].</source> arXiv:2009.12976. <pub-id pub-id-type="doi">10.48550/arXiv.2009.12976</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prasad</surname> <given-names>A.</given-names></name> <name><surname>Suggala</surname> <given-names>A. S.</given-names></name> <name><surname>Balakrishnan</surname> <given-names>S.</given-names></name> <name><surname>Ravikumar</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Robust estimation via robust gradient estimation</article-title>. <source>J. R. Stat. Soc. B: Stat. Methodol</source>. <volume>82</volume>, <fpage>601</fpage>&#x02013;<lpage>627</lpage>. <pub-id pub-id-type="doi">10.1111/rssb.12364</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qiu</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Hou</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>Instance reweighting adversarial training based on confused label</article-title>. <source>Intell. Autom. Soft Comput</source>. <volume>37</volume>, <fpage>1243</fpage>&#x02013;<lpage>1256</lpage>. <pub-id pub-id-type="doi">10.32604/iasc.2023.038241</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ratner</surname> <given-names>A. J.</given-names></name> <name><surname>Ehrenberg</surname> <given-names>H.</given-names></name> <name><surname>Hussain</surname> <given-names>Z.</given-names></name> <name><surname>Dunnmon</surname> <given-names>J.</given-names></name> <name><surname>R&#x000E9;</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Learning to compose domain-specific transformations for data augmentation,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 30</source> (<publisher-loc>Long Beach, CA</publisher-loc>).<pub-id pub-id-type="pmid">29375240</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rousseeuw</surname> <given-names>P. J.</given-names></name> <name><surname>Hubert</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Robust statistics for outlier detection</article-title>. <source>Wiley Interdiscip. Rev.: Data Min. Knowl. Discov</source>. <volume>1</volume>, <fpage>73</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1002/widm.2</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>Y.</given-names></name> <name><surname>Jain</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Probabilistic face embeddings,&#x0201D;</article-title> in <source>2019 IEEE/CVF International Conference on Computer Vision (ICCV)</source> (<publisher-loc>Seoul</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/ICCV.2019.00700</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. <source>arXiv [Preprint]</source>. arXiv:1409.1556. <pub-id pub-id-type="doi">10.48550/arXiv.1409.1556</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srivastava</surname> <given-names>R. K.</given-names></name> <name><surname>Greff</surname> <given-names>K.</given-names></name> <name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Training Very Deep Networks,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 28</source>.</citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Kuang</surname> <given-names>X.</given-names></name> <name><surname>Tan</surname> <given-names>Y.-A.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>The security of machine learning in an adversarial setting: a survey</article-title>. <source>J. Parallel Distributed Comput</source>. <volume>130</volume>, <fpage>12</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1016/j.jpdc.2019.03.003</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Song</surname> <given-names>S.</given-names></name> <name><surname>Pan</surname> <given-names>X.</given-names></name> <name><surname>Xia</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Regularizing deep networks with semantic data augmentation</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>44</volume>, <fpage>3733</fpage>&#x02013;<lpage>3748</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2021.3052951</pub-id><pub-id pub-id-type="pmid">33476265</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>J.</given-names></name> <name><surname>Zou</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;EDA: easy data augmentation techniques for boosting performance on text classification tasks,&#x0201D;</article-title> in <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source> (<publisher-loc>Hong Kong</publisher-loc>). <pub-id pub-id-type="doi">10.18653/v1/D19-1670</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>X.</given-names></name> <name><surname>Gao</surname> <given-names>C.</given-names></name> <name><surname>Lin</surname> <given-names>M.</given-names></name> <name><surname>Zang</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Hu</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Text smoothing: enhance various data augmentation methods on text classification tasks</article-title>. <source>arXiv</source> [Preprint]. arXiv:2202.13840. <pub-id pub-id-type="doi">10.48550/arXiv.2202.13840</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yao</surname> <given-names>X.</given-names></name> <name><surname>Bai</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Sun</surname> <given-names>Q.</given-names></name> <name><surname>Chen</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>&#x0201C;PCL: proxy-based contrastive learning for domain generalization,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>New Orleans, LA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>7097</fpage>&#x02013;<lpage>7107</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR52688.2022.00696</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>T.</given-names></name> <name><surname>Li</surname> <given-names>D.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Hospedales</surname> <given-names>T.</given-names></name> <name><surname>Xiang</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Robust person re-identification by modelling feature uncertainty,&#x0201D;</article-title> in <source>2019 IEEE/CVF International Conference on Computer Vision (ICCV)</source> (<publisher-loc>Seoul</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/ICCV.2019.00064</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zendrikov</surname> <given-names>D.</given-names></name> <name><surname>Solinas</surname> <given-names>S.</given-names></name> <name><surname>Indiveri</surname> <given-names>G.</given-names></name></person-group> (<year>2023</year>). <article-title>Brain-inspired methods for achieving robust computation in heterogeneous mixed-signal neuromorphic processing systems</article-title>. <source>Neuromorphic Comput. Eng</source>. <volume>3</volume>:<fpage>034002</fpage>.</citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhong</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>L.</given-names></name> <name><surname>Kang</surname> <given-names>G.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Random erasing data augmentation</article-title>. <source>Proc. AAAI Conf. Artif. Intell</source>. <volume>34</volume>, <fpage>13001</fpage>&#x02013;<lpage>13008</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i07.7000</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>K.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Hospedales</surname> <given-names>T.</given-names></name> <name><surname>Xiang</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Learning to generate novel domains for domain generalization,&#x0201D;</article-title> in <source>Computer Vision ECCV 2020: 16th European Conference, Glasgow, UK, August 23&#x02013;28, 2020, Proceedings, Part XVI</source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>561</fpage>&#x02013;<lpage>578</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-58517-4_33</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>K.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Qiao</surname> <given-names>Y.</given-names></name> <name><surname>Xiang</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Domain generalization with mixstyle</article-title>. <source>arXiv [Preprint]</source> arXiv:2104.02008. <pub-id pub-id-type="doi">10.48550/arXiv.2104.02008</pub-id></citation>
</ref>
</ref-list>
</back>
</article> 