<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title>Frontiers in Computational Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5188</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fncom.2021.773147</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Interpretation of Frequency Channel-Based CNN on Depression Identification</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ke</surname> <given-names>Hengjin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1291733/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Cai</surname> <given-names>Cang</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Fengqin</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Hu</surname> <given-names>Fang</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Tang</surname> <given-names>Jiawei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Shi</surname> <given-names>Yuxin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Computer School, Hubei Polytechnic University</institution>, <addr-line>Huangshi</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Faculty of Artificial Intelligence Education, Central China Normal University</institution>, <addr-line>Wuhan</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>College of Physics and Electronics Science, Hubei Normal University</institution>, <addr-line>Huangshi</addr-line>, <country>China</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Clinical Laboratory, Huangshi Central Hospital, Edong Healthcare Group (Affiliated Hospital of Hubei Polytechnic University)</institution>, <addr-line>Huangshi</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Valeri Makarov, Complutense University of Madrid, Spain</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Shijie Zhao, Northwestern Polytechnical University, China; Rajesh Kumar Tripathy, Birla Institute of Technology and Science, India</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Hengjin Ke <email>hengjin.ke&#x00040;whu.edu.cn</email></corresp>
<corresp id="c002">Cang Cai <email>ccai&#x00040;mail.ccnu.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>15</volume>
<elocation-id>773147</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Ke, Cai, Wang, Hu, Tang and Shi.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Ke, Cai, Wang, Hu, Tang and Shi</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>Online end-to-end electroencephalogram (EEG) classification with high performance can assess the brain status of patients with Major Depression Disabled (MDD) and track their development status in time with minimizing the risk of falling into danger and suicide. However, it remains a grand research challenge due to (1) the embedded intensive noises and the intrinsic non-stationarity determined by the evolution of brain states, (2) the lack of effective decoupling of the complex relationship between neural network and brain state during the attack of brain diseases. This study designs a Frequency Channel-based convolutional neural network (CNN), namely FCCNN, to accurately and quickly identify depression, which fuses the brain rhythm to the attention mechanism of the classifier with aiming at focusing the most important parts of data and improving the classification performance. Furthermore, to understand the complexity of the classifier, this study proposes a calculation method of information entropy based on the affinity propagation (AP) clustering partition to measure the complexity of the classifier acting on each channel or brain region. We perform experiments on depression evaluation to identify healthy and MDD. Results report that the proposed solution can identify MDD with an accuracy of 99&#x000B1;0.08%, the sensitivity of 99.07&#x000B1;0.05%, and specificity of 98.90&#x000B1;0.14%. Furthermore, the experiments on the quantitative interpretation of FCCNN illustrate significant differences between the frontal, left, and right temporal lobes of depression patients and the healthy control group.</p></abstract>
<kwd-group>
<kwd>convolutional neural network (CNN)</kwd>
<kwd>interpretation</kwd>
<kwd>depression</kwd>
<kwd>EEG classification</kwd>
<kwd>attention</kwd>
</kwd-group>
<counts>
<fig-count count="11"/>
<table-count count="3"/>
<equation-count count="7"/>
<ref-count count="24"/>
<page-count count="10"/>
<word-count count="6134"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>More than 350 million people are suffering from depression in the world according to the report of the WHO. The report points out that the suicide rate of depression is about 4.0&#x02013;10.6%. About twenty hundred thousand people commit suicide due to depression every year. As a result, depression has become the second leading cause of death among people aged 15&#x02013;29. To this end, online end-to-end electroencephalogram (EEG) classification has gained increasing attention for the capability of monitoring and evaluating the status of brain disorders remotely. That is, accurate evaluation of brain state and timely tracking of its development can minimize the risk of falling into danger and suicide.</p>
<p>Electroencephalogram classification has always been a considerable topic in brain neuroscience research and clinical practice. Most of the traditional work relies on feature extraction, which can reduce dimension and explore the signals of interest (Wiatowski and B&#x000F6;lcskei, <xref ref-type="bibr" rid="B23">2018</xref>). However, in most cases, they are closely correlated to subjects, so their reductions remain theoretically feasible and require expensive manual processing (Myers et al., <xref ref-type="bibr" rid="B17">2016</xref>). Among the feature extraction methods, the sparse non-negative matrix factorization achieved an accuracy of 87.4%, which is higher than non-negative matrix factorization, independent component analysis, principal component analysis, and wavelet transform (Lu and Yin, <xref ref-type="bibr" rid="B15">2015</xref>). As the dominant method of EEG feature extraction, the accuracy of time frequency was 87.5% (Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>). Thus, traditional feature extraction methods need expensive computation while the performance improvement is not as expected.</p>
<p>With the booming of machine learning methods, we introduced the most outstanding work below. Mumtaz et al. (<xref ref-type="bibr" rid="B16">2017</xref>) proposed a machine learning method to classify features extracted by wavelet transform of EEG signals and achieve high-performance. To effectively identify the heterogeneous lesions of major depression, a spectrum spatial feature extraction method was proposed. It achieved an average accuracy of 81.23% (ShihCheng et al., <xref ref-type="bibr" rid="B18">2017</xref>). A deep convolutional neural network (CNN) was developed to achieve a high Area Under Curve (AUC) of 0.917 on classifying EEG recordings (van Leeuwen et al., <xref ref-type="bibr" rid="B20">2019</xref>). Gemein et al. (<xref ref-type="bibr" rid="B4">2020</xref>) applied a temporal convolutional network to classify pathological and non-pathological on the Temple University Hospital Abnormal EEG Corpus (v2.0.0) and obtained an accuracy of 86%. Recurrent Neural Network (RNN) exhibits great potentials to analyze time-series data regarding functional MRI (fMRI) and EEG data. Recently, a deep sparse RNN model (Wang et al., <xref ref-type="bibr" rid="B22">2019</xref>) was proposed to accurately recognize the brain states across the whole scan session and achieve superior classification performance.</p>
<p>Recently, the attention mechanism (Vaswani et al., <xref ref-type="bibr" rid="B21">2017</xref>) has been widely used in various fields of deep learning tasks such as Nature Language Processing (NLP), image, and speech recognition. Its main idea is to focus on the local information of interest while suppressing other useless information. Understanding of neurotic brain diseases often relies on the intrinsic brain rhythm of neural signals (Fitzgerald and Watson, <xref ref-type="bibr" rid="B2">2018</xref>; Logan and McClung, <xref ref-type="bibr" rid="B14">2019</xref>). Therefore, understanding how to combine the brain rhythm with the attention mechanism of the model is very helpful to improve the performance of the classification model by aiming at focusing on the most considerable parts of the target with different weights on the frequency fluctuations.</p>
<p>Moreover, neural networks play a vital role in Artificial Intelligence (AI), which is one finite interpretable black-box function approximators (Li et al., <xref ref-type="bibr" rid="B13">2019</xref>). However, it is a considerable problem to judge and explain whether the neural network makes correct predictions. The objective AI system can help to (1) make suitable decisions, (2) improve the design of the model, (3) make significant discoveries, and (4) deepen the trust in AI. As a typical example, the system for classifying depression is reasonable when the neural network makes the correct classification by identifying the key features in the brain. On the contrary, although the neural network does not analyze the key feature with the correct fine result, the peripheral factors and even make decisions due to the correct recognition of noise or interference, which leads to the high false-positive and cannot meet the medical requirements. Because of this, it is necessary to decouple the black box by measuring the complex relationship between the key features of the brain regarding channels (brain regions) and the model.</p>
<p>To this end, inspired by attention mechanisms (Vaswani et al., <xref ref-type="bibr" rid="B21">2017</xref>) and time-frequency analysis, we propose a Frequency Channel-based CNN (FCCNN) to identify depression accurately and quickly. It combines the brain rhythm with the attention mechanism of the classifier aiming at focusing on the features of interest. Firstly, a frequency attention structure is constructed to discover features of interest in terms of frequency. The FCCNN then utilizes a lightweight CNN to predict the labels quickly. Moreover, the activation maximization (Hinton et al., <xref ref-type="bibr" rid="B7">2006</xref>) was calculated by information entropy based on the affinity propagation (AP) clustering partition aiming at interpreting the FCCNN. The main contributions of this study are summarized below:</p>
<list list-type="order">
<list-item><p>A frequency attention structure is proposed. With this structure, classifiers can combine the brain rhythm into the attention mechanism of the model and discover features of interest in terms of frequency. Especially for the tensor that contains complex low-frequency fluctuations, it can improve the accuracy.</p></list-item>
<list-item><p>The information entropy based on the AP clustering partition is calculated to measure the activation maximization of FCCNN. It learns the data distribution rather than just assuming that the data obey a uniform distribution. The lower mean entropy values in the regions regarding left temporal and right temporal, frontal lobe conclude that significant differences existed in these brain regions, which reproduced the previous study (Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>).</p></list-item>
<list-item><p>One whole solution has been developed to identify Major Depression Disabled (MDD) subjects. The performance of this solution is overwhelmingly higher than the state-of-the-art methods.</p></list-item>
</list>
</sec>
<sec sec-type="methods" id="s2">
<title>2. Methodology</title>
<p>This section details the design and operation of the classifier (see section 2.1) and the interpretation of the classifier on depression identification (see section 2.2).</p>
<sec>
<title>2.1. Design and Operation of Classifier</title>
<p>Electroencephalogram identification in this study is a binary classification problem to recognize one EEG segment whether it belongs to Depression (label: 1) or Healthy (label: 0). A multivariate series (one matrix) <italic>X</italic><sup><italic>m</italic> &#x000D7; <italic>n</italic></sup> (20 &#x000D7; 1024 in this study) is reshaped into a 3D tensor <italic>T</italic><sup><italic>m</italic> &#x000D7; <italic>a</italic> &#x000D7; <italic>b</italic></sup> (20 &#x000D7; 32 &#x000D7; 32 in this study) for the input of the FCCNN.</p>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> illustrates the architecture of FCCNN. The main design strategy of the classifier is to use as few network layers as possible without reducing the classification performance. The classifier firstly applied an attention block on the input EEG segment (reshaped to 20 &#x000D7; 32 &#x000D7; 32). It is then followed by one dropout layer, two convolutional layers, one flatten layer, and three fully connected layers. The hyper-parameters of the FCCNN are fine-tuned by our previous grouping Bayesian optimization algorithm (Ke et al., <xref ref-type="bibr" rid="B10">2020b</xref>) and also illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>. The hyper-parameter of convolutional layer is denoted as &#x0201C;filters &#x00040; [receptive map size].&#x0201D; The activation function of all fully connected (FC) layers is &#x0201C;sigmoid,&#x0201D; and that of the convolutional layer is &#x0201C;ReLU.&#x0201D; The final &#x0201C;sigmoid&#x0201D; of FCCNN outputs the classification label of a specific segment. The main design principles are as follows:</p>
<list list-type="bullet">
<list-item><p>Attention block focuses on the most considerable parts of the target with different weights on the frequency fluctuations. It first extracts the frequency components of each channel according to the FFT algorithm and then calculates the average power of the frequency components. The power values of all channels are normalized to (0.1, 1) and then mapped to the amplitude of the channel as weights.</p></list-item>
<list-item><p>FCCNN accepts EEG segments from different channels to extract space features of EEG.</p></list-item>
<list-item><p>Fully connected layers play the role of &#x0201C;classifier&#x0201D; to classify the state of the segment in terms of mapping the features learned by previous convolutional layers to the sample tag space.</p></list-item>
</list>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Neural Architecture of FCCNN. &#x0201C;FC&#x0201D; denotes the fully connected layer, &#x0201C;AF&#x0201D; denotes the activation function. The hyper-parameter of convolutional layer is denoted as &#x0201C;filters &#x00040; [receptive map size]&#x0201D;.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0001.tif"/>
</fig>
<p>We use the momentum SGD algorithm with a learning rate of 0.01 to optimize the FCCNN via backpropagation algorithm (Krizhevsky et al., <xref ref-type="bibr" rid="B11">2012</xref>). This study sets a small momentum attenuation factor (decay = 1e-4, momentum = 0.9, nesterov = True) to reduce the residual error (Krizhevsky et al., <xref ref-type="bibr" rid="B11">2012</xref>). The initialization strategy follows the setting in reference (He et al., <xref ref-type="bibr" rid="B5">2015</xref>) and sets the batch normalization of 80 and epochs of 83. The model reports the performance on the test set (or new EEG segment) after training.</p>
</sec>
<sec>
<title>2.2. Interpretation of the Classifier Based on AP Clustering</title>
<p>This subsection mainly discusses the activation maximization (see section 2.2.1) of the input layer. The feature visualization (Zeiler and Fergus, <xref ref-type="bibr" rid="B24">2014</xref>) of neurons will provide a global view of the network. The network rarely uses neurons in isolation, while understanding stays at the subjective level. To verify the rationality of the model and enhance the objectivity of the interpretation, the information entropy of the input based on AP clustering (see section 2.2.3) is then measured.</p>
<sec>
<title>2.2.1. Activation Maximization of the Classifier</title>
<p>Activation maximization finds the input mode with the maximum activation value of a given hidden layer unit. The activation function of each node in the first layer is a linear function of the input and proportional to the filter itself. Formally,</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo class="qopname">arg</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>:</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>t</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mo>||</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>||</mml:mo><mml:mo>=</mml:mo><mml:mi>&#x003C1;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mo>&#x003BB;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B8; is the model parameter of FCCNN, <italic>h</italic><sub><italic>ij</italic></sub> is the joint function of input <italic>X</italic> and model parameter &#x003B8;, <italic>h</italic><sub><italic>ij</italic></sub>(&#x003B8;, <italic>x</italic>) denotes the activation value of the <italic>i</italic>-th neuron in the <italic>j</italic> layer of a neural network, and &#x003BB;(<italic>x</italic>) is the regular term of input <italic>X</italic> and <italic>x</italic><sup>&#x0002A;</sup> is the maximum activation need to be obtained. Activation Maximization is a non-convex problem in most cases because <italic>h</italic> is a general function. Based on the gradient descent method, the problem can be solved approximately with the local minimum can be solved at least. The gradient of <italic>h</italic> is calculated and <italic>x</italic> along the gradient is moved:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mo>&#x02202;</mml:mo><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mo>&#x003BB;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>When the amount of moving <italic>x</italic> is less than a predefined threshold, the algorithm converges. Since the input (the first layer) of the classifier is channel-based, we calculate the activation maximization of the first layer to characterize the activation mode of the neural network. In this way, the activation maximization of the first layer is 20 (the same as channel number) activation matrices with respect to the size of the input layer (20 &#x000D7; 32 &#x000D7; 32), each of which represents the maximum activation feature of each channel.</p>
</sec>
<sec>
<title>2.2.2. AP Clustering Algorithm</title>
<p>Affinity Propagation clustering (Frey and Dueck, <xref ref-type="bibr" rid="B3">2007</xref>) is a clustering algorithm based on information transfer between data points. The input of the AP clustering algorithm is the similarity (s[i, j] s.t. i,j = 1,2,&#x022EF;&#x02009;, N) between sample data, such as the Euclidean distance. The reference matrix <italic>P</italic> consists of the elements on the diagonal of <italic>S</italic> and represents the probability of each center. The alternating update for the responsibilities matrix R(i, k) and the availability matrix A(i, k) is given below:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="" columnalign="left" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02190;</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:msub><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:msup><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x000B1;</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mstyle><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02190;</mml:mo><mml:mo class="qopname">min</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:mo>&#x02209;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mstyle><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, the algorithm finds the cluster center until its convergence.</p>
</sec>
<sec>
<title>2.2.3. Information Entropy Based on AP Clustering Partition</title>
<p>Information entropy describes the uncertainty and complexity of information hidden in the data:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>X</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mo class="qopname">log</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>For each neural data <italic>X</italic>, we calculate the information entropy based on the AP clustering partition as below. First, the <italic>X</italic> is sorted (ascending) to accelerate the convergence speed of AP clustering. Second, applying AP algorithm on the <italic>X</italic> to get the corresponding partitions with the maximum (<inline-formula><mml:math id="M5"><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>) and minimum (<inline-formula><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>) coordinates of each partition <italic>i</italic>. The partition center <italic>C</italic><sup><italic>i</italic></sup> and corresponding partition radius <italic>R</italic><sup><italic>i</italic></sup> can be then calculated as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo>|</mml:mo><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">max</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mi>Z</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>X</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Then, we calculate the dividing point between two adjacent partitions <italic>P</italic><sup><italic>i</italic></sup> and <italic>P</italic><sup><italic>j</italic></sup> as follows:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mo>|</mml:mo><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mi>j</mml:mi><mml:mo>|</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, the corresponding probability of the data falling into different partitions is calculated to obtain the information entropy of <italic>X</italic>. The activation matrix of each channel is obtained (see section 2.2.1) to describe the complexity between the brain state and the classifier. After all the matrices are flattened into series separately, the algorithm will calculate the information entropy based on the AP clustering partition. It then projects the entropy onto a 3D scalp topographies map at the channel level. Furthermore, we visualize the average features of the brain state corresponding to brain regions in terms of 10 to 20 international systems. <xref ref-type="table" rid="T1">Table 1</xref> represents the relationship between the brain region and the channels.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Brain Region based on 10-20 international electroencephalogram (EEG) system.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Region</bold></th>
<th valign="top" align="left"><bold>Electrodes</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Frontal lobe</td>
<td valign="top" align="left">Fp1, Fp2, F3, F4</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Left temporal</td>
<td valign="top" align="left">F7, T3, T5</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">Central</td>
<td valign="top" align="left">C3, C4, Fz, Cz, Pz</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Right temporal</td>
<td valign="top" align="left">F8, T4, T6</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">Occipital lobe</td>
<td valign="top" align="left">P3, P4, O1, O2</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<p>We conducted the experiments to evaluate the performance of the proposed approach upon one public available EEG data set of MDD (section 3.1), which consisted of (1) a performance study for MDD identification (section 3.2); (2) an experiment on the interpretation of classifier (section 3.3); and (3) an experiment on the analysis of attention block (section 3.4).</p>
<sec>
<title>3.1. Experimental Setup</title>
<sec>
<title>3.1.1. Data Description</title>
<p>All samples of 34 MDD patients and 30 Healthy Controls (MPHC, Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>) were collected from the hospital of University Sains Malaysia. MDD participants (17 men, mean age = 40.3&#x000B1; 12.9) with psychiatric symptoms, pregnant women, alcoholics, smokers, and epileptics were excluded from the samples. The healthy control group (21 men, mean age = 38.227 &#x000B1; 15.64) also excluded possible mental or physical illness. Furthermore, the EEG data were digitized with 256 samples per second, band pass filtered from 0.1 to 70 Hz with an additional 50 Hz notch filter to suppress power line noise. For more detailed information please refer to Mumtaz et al. (<xref ref-type="bibr" rid="B16">2017</xref>). Overfitting would occur when performing classification based on subjects. Thus, this study applied time window technology to obtain enough samples. It split all EEG data into 18,442 segments regarding 9,789 MMD and 8,653 Healthy via the time window of 1,024 (4 s). The whole sample space would then be spitted into the training set and test set. Details were available in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The details of training set and test set. HG denotes the health control group and MG denotes the MDD&#x00027;s group.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Training subjects</bold></th>
<th valign="top" align="left"><bold>Training samples</bold></th>
<th valign="top" align="left"><bold>Test subjects</bold></th>
<th valign="top" align="left"><bold>Test samples</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">HG:24 MG:27</td>
<td valign="top" align="left">HG:6898 MG: 7816</td>
<td valign="top" align="left">HG:6 MG:7</td>
<td valign="top" align="left">HG: 1755 MG: 1973</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>3.1.2. Baselines</title>
<p>On the same data set (MPHC EEG data), different classifiers were utilized to classify the depression, and <xref ref-type="table" rid="T3">Table 3</xref> reported the performance indexes. Among these classifiers, except the MLRW (Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>), this study rebuilt several representative neural networks including Resnet-16 (He et al., <xref ref-type="bibr" rid="B6">2016</xref>), GoogLeNet (Szegedy et al., <xref ref-type="bibr" rid="B19">2015</xref>), and Lenet (Lecun et al., <xref ref-type="bibr" rid="B12">1998</xref>). Moreover, we also evaluated our classifier without the attention block. We modified the input as 20*32*32 for all classifiers and output shapes of the models, but other configurations about the layers and hyper-parameters.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Comparison of different classifiers. The value in brackets represents the SD.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Approaches</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>Sensitivity</bold></th>
<th valign="top" align="center"><bold>Specificity</bold></th>
<th valign="top" align="center"><bold>Time</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>(%)</bold></th>
<th valign="top" align="center"><bold>(%)</bold></th>
<th valign="top" align="center"><bold>(%)</bold></th>
<th valign="top" align="center"><bold>(min)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">MLRW (Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>)</td>
<td valign="top" align="center">87.50</td>
<td valign="top" align="center">95</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">-</td>
</tr>
<tr>
<td valign="top" align="left">LeNet (Lecun et al., <xref ref-type="bibr" rid="B12">1998</xref>)</td>
<td valign="top" align="center">93.31 (6.24)</td>
<td valign="top" align="center">91.93 (4.27)</td>
<td valign="top" align="center">94.85 (1.81)</td>
<td valign="top" align="center">2.8</td>
</tr>
<tr>
<td valign="top" align="left">Resnet-16 (He et al., <xref ref-type="bibr" rid="B6">2016</xref>)</td>
<td valign="top" align="center">82.26 (7.59)</td>
<td valign="top" align="center">88.90 (2.14)</td>
<td valign="top" align="center">74.79 (3.83)</td>
<td valign="top" align="center">80</td>
</tr>
<tr>
<td valign="top" align="left">GoogLeNet (Szegedy et al., <xref ref-type="bibr" rid="B19">2015</xref>)</td>
<td valign="top" align="center">93.74 (3.65)</td>
<td valign="top" align="center">96.48 (1.23)</td>
<td valign="top" align="center">90.62 (4.62)</td>
<td valign="top" align="center">42</td>
</tr>
<tr>
<td valign="top" align="left">Ours-withoutAttention</td>
<td valign="top" align="center">96.04 (3.02)</td>
<td valign="top" align="center">97.75 (2.09)</td>
<td valign="top" align="center">94.12 (3.58)</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center">99 (0.08)</td>
<td valign="top" align="center">99.07 (0.05)</td>
<td valign="top" align="center">98.90 (0.14)</td>
<td valign="top" align="center">3.5</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>3.1.3. Training of the Classifier</title>
<p>After all samples in the training set were shuffled and split into training samples (80%) and validation samples (20%), a five-fold cross-validation strategy was adopted for hyper-parameter tuning of the classifier (a total of 20 iterations). Then, the trained model is applied to the test set, and the average performance of the classifier was reported according to its sensitivity, specificity, and accuracy (Ke et al., <xref ref-type="bibr" rid="B8">2018</xref>).</p>
<p>Finally, we calculated the activation maximization of the input to the trained classifier to interpret the FCCNN (see section 3.3).</p>
</sec>
</sec>
<sec>
<title>3.2. Performance Study on MDD Identification</title>
<p>This set of experiments evaluated the classification performance in terms of a learning curve, receiver operating characteristic curve (ROC) curve, and performance indexes regarding sensitivity, specificity, and accuracy.</p>
<p><xref ref-type="fig" rid="F2">Figure 2</xref> was the learning curve of the classifier on training the MPHC data set. Here, &#x0201C;accuracy&#x0201D; and &#x0201C;loss&#x0201D; denoted the accuracy and error in the training stage, respectively; &#x0201C;val_accuracy&#x0201D; and &#x0201C;val_loss&#x0201D; indicated the accuracy and error in the validation stage, respectively. It could verify the generalization ability of the classifier. In the training stage, the accuracy of the classifier on the training set and validation set was consistent, and no obvious gap between the curves existed. At the same time, the excellent classification performance on the test set denoted that the classifier had a desirable generalization ability in the current case study, and the overfitting or underfitting did not occur (Ke et al., <xref ref-type="bibr" rid="B9">2020a</xref>).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Accuracy and loss rates in the training and validating processes upon MPHC.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0002.tif"/>
</fig>
<p>ROC curve is introduced into the field of machine learning to evaluate the results of classification and detection. When the positive and negative samples are not balanced, the ROC curve (AUC value) will be a more stable indicator to reflect the quality of the model than the Precision-Recall curve. <xref ref-type="fig" rid="F3">Figure 3</xref> illustrated the ROC curve on identifying depression state on the MPHC data set. The high AUC (value = 1) indicated that the proposed classifier could distinguish the depression state effectively.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>ROC Curve on identifying depression state on MPHC.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0003.tif"/>
</fig>
<p>The table represented that the classifier proposed in this study was the best in all performance indicators. Meanwhile, the attention block could improve the performance and stability with lower SD, and high sensitivity and specificity also illustrated that the classifier could effectively screen out patients with depression and health controls together.</p>
<p>Moreover, we performed a <italic>t</italic>-test for most of the approaches on the performance indexes regarding sensitivity, specificity, and accuracy to evaluate discrimination via <italic>p</italic>-values. <xref ref-type="fig" rid="F4">Figures 4</xref>&#x02013;<xref ref-type="fig" rid="F6">6</xref> illustrated the <italic>p</italic>-values on performance indexes regarding accuracy, sensitivity, and specificity, respectively. From the figures, we concluded that (1) greater statistical significance of most approaches were observed for the three performance indexes (cool color), (2) all the <italic>p</italic>-values on specificity illustrated statistical significance, (3) smaller statistical significance between Lenet and other approaches on accuracy and sensitivity were observed (hot color), and (4) the <italic>p</italic>-values on the diagonal of the matrix did not make sense.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><italic>p</italic>-value matrix on performance index of accuracy.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0004.tif"/>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><italic>p</italic>-value matrix on performance index of Sensitivity.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><italic>p</italic>-value matrix on performance index of Specificity.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0006.tif"/>
</fig>
</sec>
<sec>
<title>3.3. Interpretation of FCCNN on Identifying MDD</title>
<p>This set of experiments was to explain the FCCNN on identifying MDD. The classifier always tended to classify according to the features with significant differences (less complexity) in the classification problem. In the field of information, entropy was a measure of the uncertainty on random variables. To our best knowledge, the greater the information entropy, the greater the amount of information contained in the variable, and the greater the uncertainty of that. In summary, the classification was that of reducing uncertainty (complexity) of the problem aiming to obtain lower entropy.</p>
<p>Moreover, the activation maximization of the input layer of the classifier was visualized to understand the mechanism of the classifier in processing EEG data because the input of FCCNN reflected the channel level characteristics of EEG data. The activation maximization of the first layer is 20 (the same as channel number) activation matrices with respect to the size of the input layer (20 &#x000D7; 32 &#x000D7; 32), each of which represents the maximum activation feature of each channel. Then, the AP clustering partition algorithm calculated the information entropy of each activation matrix and projected it to the scalp topographic map at the channel level. Besides, we also visualized the average entropy corresponding to brain regions partitioned by <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<p><xref ref-type="fig" rid="F7">Figure 7</xref> illustrated the 3D scalp topographies map visualizations from the activation maximization of FCCNN with channel level (<xref ref-type="fig" rid="F7">Figure 7A</xref>) and brain region level (<xref ref-type="fig" rid="F7">Figure 7B</xref>). From <xref ref-type="fig" rid="F7">Figure 7A</xref>, the entropies of channels (Cz, P6, Fp2, F3, F4, O1, O2, F8) were lower than those of other channels, which indicated that the classifier mainly distinguishes depression and health according to extracting and classifying the features hidden in these channels correctly. <xref ref-type="fig" rid="F7">Figure 7B</xref> also illustrated a 3D scalp topographic map corresponding to the brain region level. The lower mean entropy values in the left and right temporal, frontal lobe concluded that significant differences existed in these brain regions. This result reproduced the study of the data provider (Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>), which meant that our model made correct classification by analyzing the key features among the depression state.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>3D scalp topographies map visualizations from FCCNN with channel <bold>(A)</bold> and brain region <bold>(B)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0007.tif"/>
</fig>
</sec>
<sec>
<title>3.4. Analysis of Attention Block</title>
<p>Two experiments analyzed the attention block. The average power [0, 200 Hz] of each channel was first obtained by Fourier transform to evaluate the statistical difference in frequency between MG (segments: 9,789) and HG (segments: 8,653). Notice that the attention used in the training stage was the average power of frequency on each channel. <xref ref-type="fig" rid="F8">Figure 8</xref> illustrated the average frequency-power representations of the different class labels (Healthy &#x00026; MDD) of a typical channel (Fz), and similar results were obtained in other channels. From the figure, we arrive at the following conclusions: (1) the frequency distribution was concentrated in low-frequency bands, (2) the power peak of the HG was at 3.015 and 22.11 Hz, while that of the MG was at 7.035 Hz, and (3) the power value of MG was generally higher than that of HG.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>The average frequency-power representations of a different class label (Healthy &#x00026; MDD) of a typical channel (Fz).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0008.tif"/>
</fig>
<p><xref ref-type="fig" rid="F9">Figure 9</xref> illustrated the mean entropy values with and without the attention module. They evaluated whether the classifier focused on those &#x0201C;important&#x0201D; channels of interest, especially the channels located in the brain regions of the left and right temporal, frontal lobe (Mumtaz et al., <xref ref-type="bibr" rid="B16">2017</xref>). From <xref ref-type="fig" rid="F9">Figure 9</xref>, we concluded the following insights. First, the information entropy of almost all channels decreased except for Pz, which meant the attention module could (1) greatly reduce the complexity of the classifier, (2) improve the classification performance. The root cause might be that the classifier paid attention to and acted on the features of more channels. Second, the information entropy of channels including F8, Cz, P6, O1, and O2 was very small whether the classifier contained an attention module or not, which meant the classifier with or without attention module were both effective.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>The mean entropy values with and without the attention module.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0009.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<p>First, this section analyzed the computational complexity of the proposed method (see section 4.1). Second, the influence of data partition on a calculation of information entropy was discussed in section 4.2. Third, the influence of Neural Network layers (see section 4.3) and optimizers on performance (see section 4.4) were discussed in detail. Finally, the disadvantages and future research directions of this study were also provided.</p>
<sec>
<title>4.1. Computational Complexity</title>
<p>Experiments were performed on the same Desktop (equipped with AMD R7 3700X CPU&#x00040;3.59GHz, NVidia RTX 3080 10G GPU, and 16GB RAM on 64bit Windows 7). The classifier proposed in this study was based on a sub CNN and sub dense neural network. The time complexity of the sub CNN was proportional to the number of layers (L) and the corresponding number of neurons (N). Thus, the time complexity was calculated as follows:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>O</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>O</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02219;</mml:mo><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x02022;</mml:mo><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000B7;</mml:mo><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>n</italic><sub><italic>L</italic></sub> and <italic>n</italic><sub><italic>L</italic>&#x02212;1</sub> were, respectively, the number of filters (also known as &#x0201C;width&#x0201D;) in the <italic>L</italic>-th and (L-1)-th layers, with the overall network depth d; moreover, <italic>s</italic><sub><italic>L</italic></sub> and <italic>m</italic><sub><italic>L</italic></sub> represented the spatial sizes of the filter and the corresponding feature map, respectively.</p>
<p>For the sub dense network, let <italic>L</italic> denote the number of layers and <italic>U</italic> denoted the number of neurons in each layer, the time complexity is <italic>O</italic>(<italic>UL</italic>). In summary, the overall complexity of the proposed approach is <italic>O</italic>(<italic>S</italic>(<italic>N, L</italic>)) &#x0002B; <italic>O</italic>(<italic>UL</italic>).</p>
</sec>
<sec>
<title>4.2. The Influence of Data Partition on Calculation of Information Entropy</title>
<p><xref ref-type="fig" rid="F10">Figure 10</xref> illustrated the information entropies according to traditional and our strategy in terms of the data partition. The main difference between the two strategies was the assumptions of data distribution. First, the traditional one divided the neural data into partitions with the equivalent range because of obeying uniform distribution (in <xref ref-type="fig" rid="F10">Figure 10B</xref>, the data was divided into six partitions equally). In this case, the result would be close to the grand truth with enough sample points. However, this would produce a big residual with the insufficient data, which led to the inaccuracy of uncertainty measurement between the model and neural data. Second, our strategy assumed that the neural data obeyed general adaptive distribution. That is, it learned the distribution of the data itself in terms of a data-driven approach and made a reasonable partition (in <xref ref-type="fig" rid="F10">Figure 10A</xref>, the three partitions with different data distribution had obtained). In this case, the algorithm calculated the information entropy accurately and measured the uncertainty correctly.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Entropy calculation between AP-based clustering partition <bold>(A)</bold> and traditional methods <bold>(B)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0010.tif"/>
</fig>
</sec>
<sec>
<title>4.3. The Influence of Neural Network Layers on Performance</title>
<p>The classification performance was not related to the number of layers of its model in this study. For example, Resnet-16 and Capsulenet, which had more layers, did not achieve the expected performance indicators but needed a longer training time. The root cause was that the complex classifier might bring the over-fitting problem, which led to the degradation of classification performance. It was a considerable challenge to make the classifier better fit the non-linearity of different data. Furthermore, understanding the non-linear fitting mechanism would be one of the key issues in understanding the neural network black box, which would be one of the key research directions in the future.</p>
</sec>
<sec>
<title>4.4. The Influence of Optimizer on Performance</title>
<p>This subsection compared different optimization methods in the classifier, including momentum SGD in this study, RMSprop, Adagrad, Adadelta, Adam, Adamax, and Nadam. <xref ref-type="fig" rid="F11">Figure 11</xref> illustrated that momentum SGD achieved the best performance, while the three-optimizer including Adagrad, Adam, and Nadam performed poorly in this study. Adagrad optimizer was to modify the learning rate for each parameter according to the previously calculated parameter gradient in each time step. However, the learning rate was always decreasing and decaying, and the learning ability of the model decreased rapidly. In this case, it was very likely that the classification performance became poor without crossing the local minimum value. As an extension of the Adagrad, Adadelta solves the attenuation problem of learning rate and improved performance. Momentum-based methods such as the momentum SGD utilized in this study and the RMSprop optimization method could skip the local optimum. Aiming at training the neural network with complex structure quickly, the optimization methods regarding Adam, Adamax, and Nadam failed to train the light-weighted neural network, such as the classifier in this study. The most likely root reason might be that the oscillation would occur when closing to the optimization goal, which resulted in the performance failing to meet the requirements.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Comparison of candidate optimization methods.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-15-773147-g0011.tif"/>
</fig>
</sec>
<sec>
<title>4.5. Future Work</title>
<p>It was believed that the extended model should be proposed in the future to enable multiple classifications to identify the subtypes of depression state. Moreover, it played a vitally important role in understanding the dynamic evolution mechanism of multi-dimensional EEG data via interpreting the complexity of the classifier evolution over time. A suitable way in the future to extend the ability of neural networks for processing ongoing EEG data would be the Long Short-Term Memory neural network and temporal CNN (Chen et al., <xref ref-type="bibr" rid="B1">2020</xref>).</p>
<p>The use of a single dataset means that the results should not be generalized to a wider population. In future work, multiple datasets will be created and used for validation of the method.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusions</title>
<p>The proposed method can achieve high classification accuracy on the public EEG data set of major depressive disorder. Depression is identified with 99&#x000B1;0.08% of accuracy, 99.07&#x000B1;0.05% of sensitivity, and 98.90&#x000B1;0.14% of specificity, which is better than the classification performance of existing methods (based on the same data set). In addition, the information entropy based on the AP clustering partition was utilized to measure the complexity of FCCNN in terms of depression identification. The smaller information entropy of the left temporal lobe, right temporal lobe, and frontal lobe indicates that the FCCNN in this study can correctly identify the intrinsic features of these brain regions. The consistency with the conclusion of the data provider shows the rationality of the proposed approach.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>FW and HK contributed to the conception of the study and contributed reagents, materials, and analysis tools. CC and FH conceived and designed the experiments. FW, JT, and YS performed the experiments. HK analyzed the data. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This work was supported by the grants from the key project of the scientific research program of Hubei Provincial Department of Education (D20214503), the Talent introduction project of Hubei Polytechnic University (21xjz16R, 2019A02), and Scientific research funding project for young teachers of Hubei Normal University (HS2020QN038).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec> </body>
<back>

<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Kang</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name></person-group> (<year>2020</year>). <article-title>Probabilistic forecasting with temporal convolutional neural network</article-title>. <source>Neurocomputing</source> <volume>399</volume>, <fpage>491</fpage>&#x02013;<lpage>501</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2020.03.011</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fitzgerald</surname> <given-names>P. J.</given-names></name> <name><surname>Watson</surname> <given-names>B. O.</given-names></name></person-group> (<year>2018</year>). <article-title>Gamma oscillations as a biomarker for major depression: an emerging topic</article-title>. <source>Transl. Psychiatry</source> <volume>8</volume>:<fpage>177</fpage>. <pub-id pub-id-type="doi">10.1038/s41398-018-0239-y</pub-id><pub-id pub-id-type="pmid">30181587</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frey</surname> <given-names>B. J.</given-names></name> <name><surname>Dueck</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>Clustering by passing messages between data points</article-title>. <source>Science</source> <volume>315</volume>, <fpage>972</fpage>&#x02013;<lpage>976</lpage>. <pub-id pub-id-type="doi">10.1126/science.1136800</pub-id><pub-id pub-id-type="pmid">18258881</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gemein</surname> <given-names>L. A.</given-names></name> <name><surname>Schirrmeister</surname> <given-names>R. T.</given-names></name> <name><surname>Chrabaszcz</surname> <given-names>P.</given-names></name> <name><surname>Wilson</surname> <given-names>D.</given-names></name> <name><surname>Boedecker</surname> <given-names>J.</given-names></name> <name><surname>Schulze-Bonhage</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Machine-learning-based diagnostics of EEG pathology</article-title>. <source>Neuroimage</source> <volume>220</volume>:<fpage>117021</fpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2020.117021</pub-id><pub-id pub-id-type="pmid">32534126</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Delving deep into rectifiers: surpassing human-level performance on ImageNet classification,</article-title> in <source>IEEE International Conference on Computer Vision (ICCV 2015)</source> (<publisher-loc>Santiago</publisher-loc>), <fpage>1026</fpage>&#x02013;<lpage>1034</lpage>. <pub-id pub-id-type="doi">10.1109/ICCV.2015.123</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep residual learning for image recognition,</article-title> in <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Las Vegas, NV</publisher-loc>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2016.90</pub-id><pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Osindero</surname> <given-names>S.</given-names></name> <name><surname>Teh</surname> <given-names>Y.-W.</given-names></name></person-group> (<year>2006</year>). <article-title>A fast learning algorithm for deep belief nets</article-title>. <source>Neural Comput</source>. <volume>18</volume>, <fpage>1527</fpage>&#x02013;<lpage>1554</lpage>. <pub-id pub-id-type="doi">10.1162/neco.2006.18.7.1527</pub-id><pub-id pub-id-type="pmid">16764513</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ke</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name> <name><surname>Shah</surname> <given-names>T.</given-names></name> <name><surname>Ranjan</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>Towards brain big data classification: epileptic EEG identification with a lightweight VGGNet on global MIC</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>14722</fpage>&#x02013;<lpage>14733</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2018.2810882</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ke</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>D.</given-names></name> <name><surname>Shah</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2020a</year>). <article-title>Cloud-aided online EEG classification system for brain healthcare: a case study of depression evaluation with a lightweight CNN</article-title>. <source>Softw. Pract. Exp</source>. <volume>50</volume>, <fpage>596</fpage>&#x02013;<lpage>610</lpage>. <pub-id pub-id-type="doi">10.1002/spe.2668</pub-id><pub-id pub-id-type="pmid">25855820</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ke</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>D.</given-names></name> <name><surname>Shi</surname> <given-names>B.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2020b</year>). <article-title>Improving brain e-health services via high-performance eeg classification with grouping bayesian optimization</article-title>. <source>IEEE Trans. Serv. Comput</source>. <volume>13</volume>, <fpage>696</fpage>&#x02013;<lpage>708</lpage>. <pub-id pub-id-type="doi">10.1109/TSC.2019.2962673</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2012</year>). <article-title>ImageNet classification with deep convolutional neural networks</article-title>. <source>Commun. ACM</source> <volume>60</volume>:<fpage>2012</fpage>. <pub-id pub-id-type="doi">10.1145/3065386</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lecun</surname> <given-names>Y.</given-names></name> <name><surname>Bottou</surname> <given-names>L.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Haffner</surname> <given-names>P.</given-names></name></person-group> (<year>1998</year>). <article-title>Gradient-based learning applied to document recognition</article-title>. <source>Proc. IEEE</source> <volume>86</volume>, <fpage>2278</fpage>&#x02013;<lpage>2324</lpage>. <pub-id pub-id-type="doi">10.1109/5.726791</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Lin</surname> <given-names>T.</given-names></name> <name><surname>Shen</surname> <given-names>Z.</given-names></name></person-group> (<year>2019</year>). <article-title>Deep learning via dynamical systems: an approximation perspective</article-title>. <source>arXiv preprint arXiv:1912.10382</source>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Logan</surname> <given-names>R. W.</given-names></name> <name><surname>McClung</surname> <given-names>C. A.</given-names></name></person-group> (<year>2019</year>). <article-title>Rhythms of life: circadian disruption and brain disorders across the lifespan</article-title>. <source>Nat. Rev. Neurosci</source>. <volume>20</volume>, <fpage>49</fpage>&#x02013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1038/s41583-018-0088-y</pub-id><pub-id pub-id-type="pmid">30459365</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>N.</given-names></name> <name><surname>Yin</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>Motor imagery classification via combinatory decomposition of ERP and ERSP using sparse nonnegative matrix factorization</article-title>. <source>J. Neurosci. Methods</source> <volume>249</volume>, <fpage>41</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1016/j.jneumeth.2015.03.031</pub-id><pub-id pub-id-type="pmid">25845481</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mumtaz</surname> <given-names>W.</given-names></name> <name><surname>Xia</surname> <given-names>L.</given-names></name> <name><surname>Mohd Yasin</surname> <given-names>M. A.</given-names></name> <name><surname>Azhar Ali</surname> <given-names>S. S.</given-names></name> <name><surname>Malik</surname> <given-names>A. S.</given-names></name></person-group> (<year>2017</year>). <article-title>A wavelet-based technique to predict treatment outcome for major depressive disorder</article-title>. <source>PLoS ONE</source> <volume>12</volume>:<fpage>e0171409</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0171409</pub-id><pub-id pub-id-type="pmid">28152063</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Myers</surname> <given-names>M. H.</given-names></name> <name><surname>Padmanabha</surname> <given-names>A.</given-names></name> <name><surname>Hossain</surname> <given-names>G.</given-names></name> <name><surname>de Jongh Curry</surname> <given-names>A. L.</given-names></name> <name><surname>Blaha</surname> <given-names>C. D.</given-names></name></person-group> (<year>2016</year>). <article-title>Seizure prediction and detection via phase and amplitude lock values</article-title>. <source>Front. Hum. Neurosci</source>. <volume>10</volume>:<fpage>80</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2016.00080</pub-id><pub-id pub-id-type="pmid">27014017</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>ShihCheng</surname> <given-names>L.</given-names></name> <name><surname>ChienTe</surname> <given-names>W.</given-names></name> <name><surname>HaoChuan</surname> <given-names>H.</given-names></name> <name><surname>WeiTeng</surname> <given-names>C.</given-names></name> <name><surname>YiHung</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Major depression detection from EEG signals using kernel eigen-filter-bank common spatial patterns</article-title>. <source>Sensors</source> <volume>17</volume>:<fpage>1385</fpage>. <pub-id pub-id-type="doi">10.3390/s17061385</pub-id><pub-id pub-id-type="pmid">28613237</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Jia</surname> <given-names>Y.</given-names></name> <name><surname>Sermanet</surname> <given-names>P.</given-names></name> <name><surname>Reed</surname> <given-names>S.</given-names></name> <name><surname>Anguelov</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Going deeper with convolutions,</article-title> in <source>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Boston, MA</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2015.7298594</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Leeuwen</surname> <given-names>K.</given-names></name> <name><surname>Sun</surname> <given-names>H.</given-names></name> <name><surname>Tabaeizadeh</surname> <given-names>M.</given-names></name> <name><surname>Struck</surname> <given-names>A.</given-names></name> <name><surname>van Putten</surname> <given-names>M.</given-names></name> <name><surname>Westover</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Detecting abnormal electroencephalograms using deep convolutional networks</article-title>. <source>Clin. Neurophysiol</source>. <volume>130</volume>, <fpage>77</fpage>&#x02013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1016/j.clinph.2018.10.012</pub-id><pub-id pub-id-type="pmid">30481649</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A.</given-names></name> <name><surname>Shazeer</surname> <given-names>N.</given-names></name> <name><surname>Parmar</surname> <given-names>N.</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J.</given-names></name> <name><surname>Jones</surname> <given-names>L.</given-names></name> <name><surname>Gomez</surname> <given-names>A. N.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Attention is all you need,</article-title> in <source>Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS&#x00027;17</source> (<publisher-loc>Red Hook, NY</publisher-loc>: <publisher-name>Curran Associates Inc.</publisher-name>), <fpage>6000</fpage>&#x02013;<lpage>6010</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>S.</given-names></name> <name><surname>Dong</surname> <given-names>Q.</given-names></name> <name><surname>Cui</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Recognizing brain states using deep sparse recurrent neural network</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>38</volume>, <fpage>1058</fpage>&#x02013;<lpage>1068</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2018.2877576</pub-id><pub-id pub-id-type="pmid">30369441</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wiatowski</surname> <given-names>T.</given-names></name> <name><surname>B&#x000F6;lcskei</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>A mathematical theory of deep convolutional neural networks for feature extraction</article-title>. <source>IEEE Trans. Inform. Theory</source> <volume>64</volume>, <fpage>1845</fpage>&#x02013;<lpage>1866</lpage>. <pub-id pub-id-type="doi">10.1109/TIT.2017.2776228</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zeiler</surname> <given-names>M. D.</given-names></name> <name><surname>Fergus</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Visualizing and understanding convolutional networks,</article-title> in <source>Computer Vision-ECCV 2014</source>, eds <person-group person-group-type="editor"><name><surname>Fleet</surname> <given-names>D.</given-names></name> <name><surname>Pajdla</surname> <given-names>T.</given-names></name> <name><surname>Schiele</surname> <given-names>B.</given-names></name> <name><surname>Tuytelaars</surname> <given-names>T.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>818</fpage>&#x02013;<lpage>833</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-10590-1_53</pub-id></citation>
</ref>
</ref-list> 
</back>
</article>