<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2025.1512799</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Emotion recognition based on multimodal physiological electrical signals</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Zhuozheng</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/2206420/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Yihan</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2865968/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff><institution>Faculty of Information Technology, Beijing University of Technology</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by"><p>Edited by: Jiahui Pan, South China Normal University, China</p></fn>
<fn id="fn0002" fn-type="edited-by"><p>Reviewed by: Man Fai Leung, Anglia Ruskin University, United Kingdom</p>
<p>Tanmoy Sarkar Pias, Virginia Tech, United States</p>
<p>I. Made Agus Wirawan, Universitas Pendidikan Ganesha, Indonesia</p></fn>
<corresp id="c001">&#x002A;Correspondence: Yihan Wang, <email>S202371045@emails.bjut.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>05</day>
<month>03</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>19</volume>
<elocation-id>1512799</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>10</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>02</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2025 Wang and Wang.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Wang and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>With the increasing severity of mental health problems, the application of emotion recognition techniques in mental health diagnosis and intervention has gradually received widespread attention. Accurate classification of emotional states is important for individual mental health management. This study proposes a multimodal emotion recognition method based on the fusion of electroencephalography (EEG) and electrocardiography (ECG) signals, aiming at the accurate classification of emotional states, especially for the three dimensions of emotions (potency, arousal, and sense of dominance). To this end, a composite neural network model (Att-1DCNN-GRU) is designed in this paper, which combines a one-dimensional convolutional neural network with an attention mechanism and gated recurrent units, and improves the emotion recognition by extracting the time-domain, frequency-domain, and nonlinear features of the EEG and ECG signals, and by employing a Random Forest approach to feature filtering, so as to improve the emotion recognition accuracy and robustness. The proposed model is validated on the DREAMER dataset, and the results show that the model achieves the three dimensions of emotion: value, arousal and dominance, with a high classification accuracy, especially on the &#x2018;value&#x2019; dimension, with an accuracy of 95.95%. The fusion model significantly improves the recognition effect compared with the traditional emotion recognition methods using only EEG or ECG signals. In addition, to further validate the generalisation ability of the model, this study was also validated on the DEAP dataset, and the results showed that the model also performed well in terms of cross-dataset adaptation. Through a series of comparison and ablation experiments, this study demonstrates the advantages of multimodal signal fusion in emotion recognition and shows the great potential of deep learning methods in processing complex physiological signals. The experimental results show that the Att-1DCNN-GRU model exhibits strong capabilities in emotion recognition tasks, provides valuable technical support for emotion computing and mental health management, and has broad application prospects.</p>
</abstract>
<kwd-group>
<kwd>emotion recognition</kwd>
<kwd>EEG signal</kwd>
<kwd>ECG signal</kwd>
<kwd>multimodal</kwd>
<kwd>deep learning</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="6"/>
<equation-count count="7"/>
<ref-count count="16"/>
<page-count count="11"/>
<word-count count="7098"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Translational Neuroscience</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1">
<label>1</label>
<title>Introduction</title>
<p>In recent years, with the acceleration of the pace of life and the increase in social pressure, emotional problems have increasingly become an important factor affecting the physical and mental health of individuals and have had a far-reaching impact on economic and social development. Emotional state not only directly affects the mental health of individuals, but is also closely related to a variety of physiological diseases. The emotional dimensional model (VAD: Valence, Arousal, Dominance) provides a systematic framework for describing and analysing emotional states (<xref ref-type="bibr" rid="ref11">Russell, 1980</xref>), so accurately identifying and classifying these emotional dimensions is of great theoretical and practical significance.</p>
<p>Most of the traditional emotion recognition methods rely on facial expressions, speech and text analysis, however, these methods are often affected by the individual&#x2019;s subjective perception and environmental factors, making it difficult to accurately reflect the individual&#x2019;s true emotional state. In contrast, physiological signals, especially electroencephalography (EEG) and electrocardiography (ECG), provide a more objective, real-time means of monitoring emotions. Over the past decade, a large number of neuropsychological studies have reported correlations between EEG signals and mood. There are two main regions of the brain associated with emotional activity: the amygdala (located in the anterior part of the temporal lobe, near the hippocampus) and the prefrontal cortex (covering part of the frontal lobe) (<xref ref-type="bibr" rid="ref2">Alarcao and Fonseca, 2017</xref>). Moreover, with the continuous advancement of wearable device technology, it has become possible to acquire and analyse EEG and ECG signals in real time, providing new solutions for monitoring and managing emotional states. Therefore, this study proposes the fusion of EEG and ECG signals, combined with deep learning technology, to achieve accurate classification of the three dimensions of valence, arousal, and dominance in the emotion dimension model VAD, which has important theoretical value and application significance.</p>
<p>In recent years, emotion recognition methods based on physiological signals have been widely studied, and many scholars have proposed different emotion recognition models. For example, <xref ref-type="bibr" rid="ref10">Picard et al. (2001)</xref> used the KNN method to classify eight emotions and achieved 81% classification accuracy. <xref ref-type="bibr" rid="ref7">Huang et al. (2012)</xref> proposed a feature extraction algorithm called asymmetric spatial pattern (ASP), which solves the problems of high dimensionality and high noise of EEG signals, and achieves good results in emotional arousal and intensity detection with accuracies of 60% (VALUE) and 80% (AROUSAL). <xref ref-type="bibr" rid="ref3">Atkinson and Campos (2016)</xref> combined a mutual information feature selection method and an SVM classifier to extend the emotion types and improve the accuracy of emotion classification of EEG signals, and the experimental results showed that the accuracy of this method was about 73% on the standard EEG dataset. In addition, <xref ref-type="bibr" rid="ref14">Thammasan et al. (2016)</xref> investigated the application of deep confidence networks (DBNs) in music emotion recognition, combining fractal dimension (FD), power spectral density (PSD) and discrete wavelet transform (DWT) features for emotion classification, and experimental results showed that the accuracy of this method in emotion arousal classification reached 88.24 and 82.59%. In terms of ECG signals, <xref ref-type="bibr" rid="ref1">Agrafioti et al. (2011)</xref> proposed an empirical modal decomposition (EMD)-based method to differentiate between different emotional modes through instantaneous frequency (Hilbert-Huang transform) and local oscillatory features, achieving a classification accuracy of 89%. <xref ref-type="bibr" rid="ref13">Sarkar and Etemad (2020)</xref>, on the other hand, proposed a self-supervised deepmulti-task learning framework to learn ECG representations through signal transformation recognition networks and applied it to emotion classification, which achieved more than 85% classification accuracy on multiple datasets, creating a new research advancement.</p>
<p>However, despite the good results of single EEG and ECG signals in emotion recognition, existing studies still face some limitations (<xref ref-type="bibr" rid="ref12">Saganowski et al., 2022</xref>). Firstly, single signals often do not fully reflect emotional states; EEG has stronger signals in some emotional states, while ECG performs more significantly in other emotional states. Second, most of the existing methods are limited to single-modal signal analysis, neglecting the complementarity between multimodal signals. Finally, even with deep learning methods, how to effectively fuse EEG and ECG signals to improve classification accuracy and robustness is still an urgent problem.</p>
<p>To address the above challenges, this paper proposes an emotion recognition method based on the fusion of EEG and ECG signals, aiming to overcome the limitations in the existing methods through multimodal signal fusion and deep learning techniques. Compared with traditional emotion recognition methods, this paper innovatively combines deep learning with traditional signal processing techniques to advance the theoretical framework of emotion recognition by adaptively selecting features and fusing multimodal signals. This fusion approach enables emotion recognition not only to accurately capture subtle changes in emotions, but also to improve the robustness and adaptability of the system.</p>
<p>In recent years, many scholars have also adopted hybrid CNN and LSTM networks for EEG-based emotion recognition, and such methods improve the accuracy of emotion classification by extracting spatio-temporal features and capturing long time-dependent information (<xref ref-type="bibr" rid="ref5">Chakravarthi et al., 2022</xref>). While in this paper, we combine CNN and GRU and introduce an attention mechanism (Att-1DCNN-GRU), which enables the model to automatically focus on the importance of different signals when processing multimodal signals, thus further optimising the emotion recognition effect. In addition, this paper validates the applicability of the model by validating it in different experimental environments and comparing it with data from other domains to ensure the consistency and broad applicability of the research results across multiple domains. Through interdisciplinary validation, we are able to ensure that the proposed method has strong generalisation capabilities in multiple application scenarios of emotion recognition. Finally, the experimental results show that the method in this paper achieves significant classification accuracy and better robustness compared to existing single-signal or traditional fusion methods in the classification task of the three emotion dimensions (valence, arousal, and dominance) in the emotion dimensionality model VAD, which validates the effectiveness of the proposed method.</p>
</sec>
<sec sec-type="materials|methods" id="sec2">
<label>2</label>
<title>Materials and methods</title>
<sec id="sec3">
<label>2.1</label>
<title>DREAMER dataset</title>
<p>The DREAMER dataset (<xref ref-type="bibr" rid="ref9">Katsigiannis and Ramzan, 2017</xref>) is a multimodal physiological signal dataset specifically designed for emotion recognition research, aiming to identify and classify emotional states by analysing EEG and ECG signals. The DREAMER dataset stores EEG and ECG data before and after the 23 participants watched 18 movie clips, and scores of the three dimensions of Valence, Arousal and Dominance, respectively. Valence, Arousal, and Dominance.</p>
<p>The EEG data were collected by 14 electrodes covering different regions of the brain at a sampling rate of 128&#x202F;Hz, which can reveal the electrical activity patterns of the brain in different emotional states; the ECG data were collected by a 2-channel ECG sensor at a sampling rate of 256&#x202F;Hz, which provided detailed information on cardiac activity and helped to identify the physiological changes triggered by emotion. Participants rated their emotional experience using self-report after viewing each video. The rating dimensions included Valence, Arousal, and Dominance, each with a rating range of 1 to 5. These ratings provided an important reference for the training and validation of emotion recognition models, helping researchers to understand the relationship between physiological signals and subjective emotional experiences.</p>
</sec>
<sec id="sec4">
<label>2.2</label>
<title>Signal preprocessing</title>
<p>In the emotion recognition task, the preprocessing of electroencephalogram (EEG) and electrocardiogram (ECG) signals is a key step in signal analysis, whose main purpose is to eliminate noise and pseudo-signals so as to improve the quality of the signals, and provide clearer and more reliable data for subsequent feature extraction and classification. Aiming at the characteristics of EEG and ECG signals, this paper adopts a variety of signal processing techniques to ensure the effectiveness and purity of the signals.</p>
<p>First, in order to effectively remove the industrial frequency interference, we use a 50&#x202F;Hz trap filter. This filter is capable of accurate interference removal for the grid frequency (50&#x202F;Hz), eliminating noise introduced by power equipment and the grid. By filtering out the 50&#x202F;Hz frequency signal, the trap filter makes the low and high frequency portions of the EEG and ECG signals unaffected by industrial frequency interference.</p>
<p>Next, to further remove the low-frequency drift and high-frequency noise, a fourth-order Butterworth bandpass filter in the range of 0.5 to 45&#x202F;Hz was used. The Butterworth filter is an important tool in signal processing because of its flat frequency response characteristics and distortion-free phase response. Its design ensures that the main frequency components of the EEG and ECG signals are preserved, while effectively filtering out low-frequency noise (e.g., myoelectric interference) and high-frequency noise (e.g., interference from electrical equipment). The bandpass filters are not only suitable for EEG and ECG signals, but are also widely used in audio processing, telecommunication and biomedical signal analysis due to their high fidelity and noise removal efficiency. The square function form of the amplitude of the Butterworth filter (<xref ref-type="bibr" rid="ref4">Butterworth, 1930</xref>) is shown in <xref ref-type="disp-formula" rid="EQ1">Equation 1</xref>.</p>
<disp-formula id="EQ1"><label>(1)</label><mml:math id="M1"><mml:msup><mml:mi>A</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mfenced open="(" close=")"><mml:mi mathvariant="normal">&#x03A9;</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:msup><mml:mfenced open="|" close="|"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>a</mml:mi></mml:msub><mml:mfenced open="(" close=")"><mml:mrow><mml:mi>j</mml:mi><mml:mi mathvariant="normal">&#x03A9;</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msup><mml:mfenced open="(" close=")"><mml:mfrac><mml:mrow><mml:mi>j</mml:mi><mml:mi mathvariant="normal">&#x03A9;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:msub><mml:mi mathvariant="normal">&#x03A9;</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mfenced><mml:mrow><mml:mn>2</mml:mn><mml:mi>w</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula>
<p>In order to remove the pseudo-signals introduced in the EEG signals due to eye movements (EOG), electromyography (EMG), etc., we used the technique of independent component analysis (ICA), which is a blind source separation technique that is widely used in the denoising of EEG signals (<xref ref-type="bibr" rid="ref8">Hyv&#x00E4;rinen and Oja, 2000</xref>). The basic principle of ICA is to break down the mixed signals into a number of statistically independent components, which represent the sources of the signals, through the demixing process. By applying ICA, we can extract pseudo-signals such as eye movements and EMG from EEG signals and retain the effective EEG activity components through denoising process. In practice, ICA can effectively separate the pseudo-signals that are not related to brain activities, thus improving the purity of EEG signals.</p>
<p>After signal denoising, we slice the EEG and ECG signals to increase the number of samples and improve model training. Specifically, we slice each signal in units of 30&#x202F;s to form multiple samples. Each EEG sample contains 3,840 data points (i.e., 30&#x202F;s of data at a sampling rate of 128&#x202F;Hz), and each ECG sample contains 7,680 data points (i.e., 30&#x202F;s of data at a sampling rate of 256&#x202F;Hz). Through the slicing operation, we not only increase the number of samples, but also are able to ensure that each signal fragment provides sufficient time-domain information for subsequent analyses while maintaining the signal time length and feature stability. The signal preprocessing flowchart used in this experiment is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. These preprocessing steps ensure the quality of the EEG and ECG signals and provide clean signal data for subsequent feature extraction, model training and classification. By combining multiple signal processing techniques, this paper effectively removes noise and pseudo-signals, ensures the high quality of the data, and lays a solid foundation for the accuracy of the emotion recognition task.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption><p>Preprocessing and denoising workflow for EEG and ECG signals.</p></caption>
<graphic xlink:href="fnins-19-1512799-g001.tif"/>
</fig>
</sec>
<sec id="sec5">
<label>2.3</label>
<title>Feature extraction and feature selection</title>
<p>In emotion recognition tasks, EEG and ECG signals contain rich physiological information that can reflect an individual&#x2019;s emotional state. In order to extract effective emotional features from these signals, we perform time-domain, frequency-domain, and nonlinear analyses of EEG and ECG signals, respectively, from which we extract a variety of features. The time-domain features of EEG signals mainly include the maximum, minimum, mean, variance, peak-to-peak, kurtosis, and skewness, which effectively reflect the fluctuation of the signals and their statistical properties. The frequency domain features are then extracted by power spectral density (PSD) analysis, which is calculated for different frequency bands (Delta, Theta, Alpha, Beta, Gamma) to capture the energy distribution of the signal at different frequencies. Nonlinear features are then extracted by Sample Entropy (SE) and Detrended Fluctuation Analysis (DFA), which can reveal the complexity and nonlinear dynamic behaviour of the signal. These features can provide strong support for emotion recognition models, especially in the case of more subtle changes in the emotional state, where nonlinear features are particularly useful.</p>
<p>For feature extraction of ECG signals, we first identified R-wave locations in the ECG using an R-wave detection algorithm and then calculated RR intervals. Based on these RR intervals, heart rate variability (HRV) features were further extracted. The time-domain features of HRV include mean RR interval, heart rate, SDNN, RMSSD, NN50, and pNN50, which reflect the overall variability, short-term variability, and the statistical properties of the change between two heartbeats of heart rate, respectively. In addition, we analysed the short- and long-term variability of HRV by Poincar&#x00E9; plot features (SD1, SD2). Frequency domain features were then calculated by power spectral analysis, including low frequency (LF), high frequency (HF) and their ratio (LF/HF), which provide a quantitative analysis of sympathetic and parasympathetic activity. All these features are finally converged into a raw feature set containing both EEG and ECG signals.</p>
<p>However, these raw features contain a lot of redundant information, which may lead to overfitting during model training and increase the computational effort. Therefore, feature selection becomes an important step to improve the performance of emotion recognition models. In this study, the Random Forest (Random Forest) algorithm was used for feature selection. Random Forest is a powerful integrated learning method that can effectively reduce overfitting and improve the robustness of the model by constructing multiple decision trees and combining their results (<xref ref-type="bibr" rid="ref6">Ho, 1998</xref>). In our experiments, we used 80 trees to train the Random Forest model, and filtered out the most discriminative features for the emotion recognition task by calculating the importance score of each feature. Eventually, after feature selection, nine most discriminative features were selected, and a detailed list of these features is shown in <xref ref-type="table" rid="tab1">Table 1</xref>.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption><p>Ranking of feature importance before and after random forest feature selection.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Rank</th>
<th align="center" valign="top">Feature name</th>
<th align="center" valign="top">Feature importance score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">1</td>
<td align="left" valign="middle">Mean RR interval</td>
<td align="center" valign="middle">0.121</td>
</tr>
<tr>
<td align="left" valign="middle">2</td>
<td align="left" valign="middle">Heart Rate</td>
<td align="center" valign="middle">0.115</td>
</tr>
<tr>
<td align="left" valign="middle">3</td>
<td align="left" valign="middle">Very-low-frequency power (VLF)</td>
<td align="center" valign="middle">0.098</td>
</tr>
<tr>
<td align="left" valign="middle">4</td>
<td align="left" valign="middle">SD2</td>
<td align="center" valign="middle">0.096</td>
</tr>
<tr>
<td align="left" valign="middle">5</td>
<td align="left" valign="middle">Standard Deviation (SDRR)</td>
<td align="center" valign="middle">0.089</td>
</tr>
<tr>
<td align="left" valign="middle">6</td>
<td align="left" valign="middle">&#x03B1;-wave power spectral density</td>
<td align="center" valign="middle">0.085</td>
</tr>
<tr>
<td align="left" valign="middle">7</td>
<td align="left" valign="middle">&#x03B3;-wave power spectral density</td>
<td align="center" valign="middle">0.082</td>
</tr>
<tr>
<td align="left" valign="middle">8</td>
<td align="left" valign="middle">&#x03B2;-wave power spectral density</td>
<td align="center" valign="middle">0.079</td>
</tr>
<tr>
<td align="left" valign="middle">9</td>
<td align="left" valign="middle">DFA</td>
<td align="center" valign="middle">0.077</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This table shows the change in feature importance scores before and after random forest feature selection. In this table, it can be seen that after feature selection was performed, the most important features for the emotion classification task were selected. By calculating the importance score of each feature, we can see that these features play a decisive role in emotion recognition. The selected features include heart rate, RR interval, power spectral density in different frequency bands, sample entropy and DFA, which reflect the activity state of the heart and the brain and have strong emotion differentiation ability.</p>
<p>During the feature selection experiments, we also optimised the parameter settings of the Random Forest model and tried the effects of different numbers of decision trees on the effectiveness of feature selection. Specifically, we used settings of 50, 80 and 100 trees and compared the effects of these settings on model stability, computation time and accuracy. The experimental results show that the model achieves an optimal balance between feature selection stability and computational efficiency when the number of trees is 80. Fewer decision trees (e.g., 50) allowed for fast computation but were less stable and feature selection was not as effective as 80 trees, while increasing the number of trees (e.g., 100) improved stability but also significantly increased computation time. Therefore, 80 trees became the most suitable choice. <xref ref-type="table" rid="tab2">Table 2</xref> shows the experimental results for different numbers of decision trees.</p>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption><p>Random forest model parameter settings and experimental results.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Parameterisation</th>
<th align="left" valign="top">Feature selection stability</th>
<th align="center" valign="top">Computation time</th>
<th align="center" valign="top">Number of features finally selected</th>
<th align="center" valign="top">Training set accuracy</th>
<th align="center" valign="top">Test set accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">50</td>
<td align="left" valign="middle">Instability</td>
<td align="center" valign="middle">3.2&#x202F;s</td>
<td align="center" valign="middle">9</td>
<td align="center" valign="middle">88.3%</td>
<td align="center" valign="middle">85.6%</td>
</tr>
<tr>
<td align="left" valign="middle">80</td>
<td align="left" valign="middle">Stabilise</td>
<td align="center" valign="middle">4.8&#x202F;s</td>
<td align="center" valign="middle">9</td>
<td align="center" valign="middle">91.2%</td>
<td align="center" valign="middle">88.9%</td>
</tr>
<tr>
<td align="left" valign="middle">100</td>
<td align="left" valign="middle">Stabilise</td>
<td align="center" valign="middle">6.3&#x202F;s</td>
<td align="center" valign="middle">9</td>
<td align="center" valign="middle">91.5%</td>
<td align="center" valign="middle">89.1%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Through random forest feature selection, we are able to filter out the most discriminative features for the emotion recognition task from a large number of features, effectively reducing the feature dimensionality and improving the computational efficiency and performance of the model. The subset of sensitive features after feature selection (including 9 HRV features and 56 EEG signal features from EEG and ECG signals) provides efficient feature support for the subsequent emotion classification task. These selected features will be used for further emotion classification tasks in the subsequent training of emotion recognition models, leading to more accurate emotion state recognition.</p>
<p>With this feature selection method, we not only improved the computational efficiency of the model, but also enhanced the generalization ability and interpretability of the model. Eventually, the filtered feature set, consisting of 5 (number)&#x202F;&#x00D7;&#x202F;2 (number of channels)&#x202F;=&#x202F;10 (number of features) for ECG signals and 4 (number)&#x202F;&#x00D7;&#x202F;14 (number of channels)&#x202F;=&#x202F;56 (number of features) for EEG signals, was saved as a new MAT file, which provided a more streamlined and efficient data base for subsequent emotion recognition tasks.</p>
</sec>
<sec id="sec6">
<label>2.4</label>
<title>Composite neural network design</title>
<sec id="sec7">
<label>2.4.1</label>
<title>Network architecture design</title>
<p>The composite neural network model proposed in this paper aims to effectively extract and process temporal features in emotion recognition tasks, and its specific design is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>. The model combines a one-dimensional convolutional neural network (1D CNN), a gated recurrent unit (GRU) and an attention mechanism. In the network design, the EEG and ECG signals after time domain, frequency domain and nonlinear feature extraction are firstly input, and these features are filtered and processed as inputs to the model. To extract the local features, the network first uses two convolutional layers, each with a number of 256 convolutional kernels and a convolutional kernel size of 3. Through these convolutional layers, the model is able to capture short-term local features in the input signals. In addition, the second convolutional layer is followed by a MaxPooling1D layer, an operation that not only effectively reduces the feature dimensions, but also prevents overfitting and improves the generalisation ability of the model.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption><p>Architecture and detailed design of the hybrid neural network model (Att-1DCNN-GRU).</p></caption>
<graphic xlink:href="fnins-19-1512799-g002.tif"/>
</fig>
<p>After the convolutional layer, the network introduces a gated recurrent unit (GRU) layer to capture the temporal dependencies in the input signal. GRU, as a recurrent neural network (RNN) variant, is able to handle temporal data better, especially in emotion recognition tasks, and is able to learn long time dependencies. The output of the GRU layer is the sequential information of each timestep, which is further weighted in the subsequent attention mechanism is further weighted to highlight important features to improve the accuracy of emotion recognition. The attention mechanism, by assigning different weights to the features in the GRU output, enables the model to focus on those time steps that are more critical for emotion classification, thus enhancing the recognition of emotion-related features.</p>
<p>Next, the network further fuses the features from the GRU layer and the attention mechanism through the fully connected layer (Dense), and finally outputs the emotion classification results through the Softmax activation function. This output layer generates probability distributions of the three emotion dimensions (Valence, Arousal, and Dominance) for the classification of emotional states.</p>
</sec>
<sec id="sec8">
<label>2.4.2</label>
<title>Selection of optimizer and loss function</title>
<p>Optimizers for neural networks are used to update the weight parameters in a neural network to minimise the loss function of the neural network. Choosing the right optimizer can speed up training, improve the accuracy of the model and prevent overfitting. In this paper, 5,000 samples were randomly selected in the DREAMER dataset to apply the three current popular classifiers for comparison tests, and the Iteration parameters were adjusted according to the actual situation, and the specific experimental results are shown in <xref ref-type="table" rid="tab3">Table 3</xref>. The results can be seen that Adam optimizer performs the best, the highest classification accuracy is 0.96 for the training set and 0.94 for the test set (epoch&#x202F;=&#x202F;100). So in this study, Adam is chosen as the optimizer of the model, where the learning rate lr is set to 0.001. In this paper, the hidden layer of the hybrid network adopts one of the most used activation functions at present, i.e., the ReLU activation function (<xref ref-type="bibr" rid="ref15">Xuejing et al., 2024</xref>). Because ReLU has a faster gradient drop during training, it can solve the problems of gradient vanishing and gradient explosion.</p>
<table-wrap position="float" id="tab3">
<label>Table 3</label>
<caption><p>Classification accuracy of different optimisers under different iteration.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Optimiser</th>
<th align="center" valign="top" colspan="2">Iteration&#x202F;=&#x202F;20</th>
<th align="center" valign="top" colspan="2">Iteration&#x202F;=&#x202F;50</th>
</tr>
<tr>
<th/>
<th align="center" valign="top">Training set accuracy</th>
<th align="center" valign="top">Test set accuracy</th>
<th align="center" valign="top">Training set accuracy</th>
<th align="center" valign="top">Test set accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle"><bold>Adam</bold></td>
<td align="center" valign="middle">0.85</td>
<td align="center" valign="top">0.87</td>
<td align="center" valign="top"><bold>0.96</bold></td>
<td align="center" valign="top"><bold>0.94</bold></td>
</tr>
<tr>
<td align="left" valign="middle">Adagrad</td>
<td align="center" valign="middle">0.71</td>
<td align="center" valign="top">0.72</td>
<td align="center" valign="top">0.83</td>
<td align="center" valign="top">0.81</td>
</tr>
<tr>
<td align="left" valign="middle">RMSprop</td>
<td align="center" valign="middle">0.81</td>
<td align="center" valign="top">0.76</td>
<td align="center" valign="top">0.90</td>
<td align="center" valign="top">0.88</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values in the table indicate the optimiser used in this paper and its corresponding accuracy.</p>
</table-wrap-foot>
</table-wrap>
<p>In the study of emotion recognition problems, the categorical_cros-sentropy function is chosen as the loss function for the three-classification problem. categorical_crossentropy is one of the commonly used loss functions in multi-class classification problems, and it will compute the cross-entropy loss, which is used to evaluate the difference between the model prediction results and the real results, and update the model parameters by back propagation. The categorical cross entropy function is defined as shown in <xref ref-type="disp-formula" rid="EQ2">Equation 2</xref>.</p>
<disp-formula id="EQ2"><label>(2)</label><mml:math id="M2"><mml:mtext mathvariant="italic">loss</mml:mtext><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover><mml:mo stretchy="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover><mml:mo stretchy="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover></mml:mstyle><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>log</mml:mi><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="true">&#x0302;</mml:mo></mml:mover><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>
<p>Where denotes <inline-formula><mml:math id="M3"><mml:mi>m</mml:mi></mml:math></inline-formula> number of samples, <inline-formula><mml:math id="M4"><mml:mi>n</mml:mi></mml:math></inline-formula> denotes class, <inline-formula><mml:math id="M5"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the true probability of class <inline-formula><mml:math id="M6"><mml:mi>i</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M7"><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="true">&#x0302;</mml:mo></mml:mover><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the predicted probability.</p>
</sec>
<sec id="sec9">
<label>2.4.3</label>
<title>Parameterization</title>
<p>In this paper, the grid search method (<xref ref-type="bibr" rid="ref9001">Krizhevsky et al., 2017</xref>) is used to tune and optimise the network parameters and hyperparameters and find the optimal combination of a set of parameters to be used as the parameters for model training. A 20% sample from the DREAMER dataset is randomly selected for testing. First, Iteration and Batchsize were set to 20 and 256, respectively. In the experiment, the number of filters and neurons was set to a multiple of 2 and the convolution kernel was set to 3 for tuning the network parameters, as shown in <xref ref-type="table" rid="tab4">Table 4</xref>. Subsequently, after determining the network parameters, the selection of hyperparameters was carried out as shown in <xref ref-type="table" rid="tab5">Table 5</xref>.</p>
<table-wrap position="float" id="tab4">
<label>Table 4</label>
<caption><p>Tuning of network parameters for the Att-1DCNN-GRU model.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Model</th>
<th align="center" valign="top">Conv_1</th>
<th align="center" valign="top">Conv_1</th>
<th align="center" valign="top">Kernel</th>
<th align="center" valign="top">GRU</th>
<th align="center" valign="top">Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">M1</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.939</td>
</tr>
<tr>
<td align="left" valign="middle">M2</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">0.946</td>
</tr>
<tr>
<td align="left" valign="middle">M3</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.944</td>
</tr>
<tr>
<td align="left" valign="middle">M4</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">0.952</td>
</tr>
<tr>
<td align="left" valign="middle">M5</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.942</td>
</tr>
<tr>
<td align="left" valign="middle">M6</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">0.949</td>
</tr>
<tr>
<td align="left" valign="middle">M7</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.941</td>
</tr>
<tr>
<td align="left" valign="middle"><bold>M8</bold></td>
<td align="center" valign="top"><bold>256</bold></td>
<td align="center" valign="top"><bold>256</bold></td>
<td align="center" valign="top"><bold>3</bold></td>
<td align="center" valign="top"><bold>256</bold></td>
<td align="center" valign="top"><bold>0.953</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values shown in the table represent the parameter settings used in the model presented in this paper.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="tab5">
<label>Table 5</label>
<caption><p>Hyperparameter tuning and optimal settings for the Att-1DCNN-GRU model.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Model</th>
<th align="center" valign="top">Epoch</th>
<th align="center" valign="top">Batchsize</th>
<th align="center" valign="top">Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">M1</td>
<td align="center" valign="top">50</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.871</td>
</tr>
<tr>
<td align="left" valign="middle"><bold>M2</bold></td>
<td align="center" valign="top"><bold>50</bold></td>
<td align="center" valign="top"><bold>256</bold></td>
<td align="center" valign="top"><bold>0.966</bold></td>
</tr>
<tr>
<td align="left" valign="middle">M3</td>
<td align="center" valign="top">80</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.911</td>
</tr>
<tr>
<td align="left" valign="middle">M4</td>
<td align="center" valign="top">80</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">0.954</td>
</tr>
<tr>
<td align="left" valign="middle">M5</td>
<td align="center" valign="top">100</td>
<td align="center" valign="top">128</td>
<td align="center" valign="top">0.913</td>
</tr>
<tr>
<td align="left" valign="middle">M6</td>
<td align="center" valign="top">100</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">0.947</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values shown in the table represent the parameter settings used in the model presented in this paper.</p>
</table-wrap-foot>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="tab4">Tables 4</xref>, <xref ref-type="table" rid="tab5">5</xref>, the final parameters of the model are: the number of filters in both convolutional layers is 256, the size of the convolutional kernel is 3, the Epoch is 50, and the Batchsize is 256.</p>
</sec>
</sec>
<sec id="sec10">
<label>2.5</label>
<title>Model algorithm design</title>
<p>In this paper, the softmax function is used to triple classify the output of the model. When dealing with multiclassification problems, the softmax activation function is usually used in the output layer to transform the output of the neural network into vectors representing the probabilities of the different classes. The mathematical expression of Softmax is shown in <xref ref-type="disp-formula" rid="EQ3">Equation 3</xref>.</p>
<disp-formula id="EQ3"><label>(3)</label><mml:math id="M8"><mml:mi>F</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfenced><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfenced><mml:mspace width="thickmathspace"/><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:msubsup><mml:mstyle displaystyle="true"><mml:mo stretchy="true">&#x2211;</mml:mo></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:msubsup><mml:mi>exp</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi>X</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mfenced></mml:mrow></mml:mfrac></mml:math></disp-formula>
<p>Where <inline-formula><mml:math id="M9"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the input and <inline-formula><mml:math id="M10"><mml:mi>F</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfenced></mml:math></inline-formula> is the output. The numerator represents the probability to be found for each category and the denominator is the total probability. As can be seen from the formula, the calculated probabilities are in the range [0,1] and all probabilities sum to 1.</p>
<p>The algorithm flow of the model for emotion recognition is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>:</p>
<list list-type="order">
<list-item><p>Extract the EEG and ECG data in the DREAME dataset, as well as the participants&#x2019; scores for the three items of VALENCE, AROUSAL, and DOMINANCE (because this paper designs a three-classification model, and the scores for these three items in the original dataset are 1&#x202F;~&#x202F;5, so in this paper, we will set those with scores of 1 and 2 to 0, those with scores of 3 to 1, and those with scores of 4 and 5 to 2), constituting the original data set.</p></list-item>
<list-item><p>Preprocess the EEG data and ECG data.</p></list-item>
<list-item><p>Extract the time-domain, frequency-domain, and nonlinear features of EEG and ECG and perform feature fusion, and use the method of random forest for feature selection.</p></list-item>
<list-item><p>The dataset is divided according to the ratio of 80% of the training set and 20% of the test set, and 10% of the training set is taken as the validation set, which is used to evaluate the performance of the model.</p></list-item>
<list-item><p>Train the hybrid network model with the training set, inversely update the weights and biases with the validation set, and save the trained model.</p></list-item>
<list-item><p>The test set is used to evaluate the effectiveness and accuracy of the algorithm. Classification for emotion recognition based on real labels and predicted labels.</p></list-item>
</list>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption><p>Overall flowchart of the Att-1DCNN-GRU algorithm for emotion recognition.</p></caption>
<graphic xlink:href="fnins-19-1512799-g003.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="results" id="sec11">
<label>3</label>
<title>Results</title>
<sec id="sec12">
<label>3.1</label>
<title>Evaluation indicators</title>
<p>The evaluation indicators selected for this paper are as follows:</p>
<list list-type="simple">
<list-item><p>(1) Accuracy, defined as the ratio of the number of correctly classified samples to the total number of samples, is calculated using the formula in <xref ref-type="disp-formula" rid="EQ4">Equation 4</xref>.</p></list-item>
</list>
<disp-formula id="EQ4"><label>(4)</label><mml:math id="M11"><mml:mtext mathvariant="italic">Accuracy</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mfenced open="|" close="|"><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="true">|</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="true">|</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfenced><mml:mfenced open="|" close="|"><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="true">|</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="true">|</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="true">|</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="true">|</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy="true">|</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="true">|</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfenced></mml:mfrac></mml:math></disp-formula>
<list list-type="simple">
<list-item><p>(2) The precision rate, which is the ratio of the number of correctly categorised positive samples to the number of samples categorised as positive, measures the rate of checking accuracy, see <xref ref-type="disp-formula" rid="EQ5">Equation 5</xref>.</p></list-item>
</list>
<disp-formula id="EQ5"><label>(5)</label><mml:math id="M12"><mml:mtext mathvariant="italic">Precision</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mfenced open="|" close="|"><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfenced><mml:mfenced open="|" close="|"><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="true">|</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="true">|</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfenced></mml:mfrac></mml:math></disp-formula>
<list list-type="simple">
<list-item><p>(3) Recall, which is the ratio of the number of correctly categorised positive samples to the number of actual positive samples, is measured as a check-perfect rate, see <xref ref-type="disp-formula" rid="EQ6">Equation 6</xref>.</p></list-item>
</list>
<disp-formula id="EQ6"><label>(6)</label><mml:math id="M13"><mml:mtext mathvariant="italic">Recall</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mfenced open="|" close="|"><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfenced><mml:mfenced open="|" close="|"><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="true">|</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="true">|</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfenced></mml:mfrac></mml:math></disp-formula>
<list list-type="simple">
<list-item><p>(4) F1-score, a concept based on Precision and Recall, for which see <xref ref-type="disp-formula" rid="EQ7">Equation 7</xref>.</p></list-item>
</list>
<disp-formula id="EQ7"><label>(7)</label><mml:math id="M14"><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mtext mathvariant="italic">score</mml:mtext><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x2217;</mml:mo><mml:mfrac><mml:mrow><mml:mtext mathvariant="italic">Precision</mml:mtext><mml:mo>&#x2217;</mml:mo><mml:mtext mathvariant="italic">Recall</mml:mtext></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">Precision</mml:mtext><mml:mo>+</mml:mo><mml:mtext mathvariant="italic">Recall</mml:mtext></mml:mrow></mml:mfrac></mml:math></disp-formula>
<list list-type="simple">
<list-item><p>(5) Confusion matrix. The confusion matrix is also an effective model evaluation metric that provides a more intuitive visualisation of the classification accuracy in a data set. Confusion matrices are visualised in terms of probability values and sample sizes.</p></list-item>
</list>
</sec>
<sec id="sec13">
<label>3.2</label>
<title>Experimental results</title>
<p>To ensure that the division between the training set, validation set and test set does not introduce any bias, this paper adopts a random segmentation method and pays special attention to the representativeness and balance of the dataset. In the specific operation, we randomly selected 80% of the samples from the DREAMER dataset as the training set, and the remaining 20% was used for the test set. Meanwhile, in order to avoid possible overfitting phenomenon, this paper also adopts the cross-validation technique in the training process. By dividing different data subsets several times and validating them, the distribution consistency of the training and test sets is ensured, and the impact of data bias on model performance is reduced. In addition, this paper also ensures that the proportion of emotional categories in each subset is as balanced as possible, thus ensuring that the distribution of emotional states in each subset is representative of the characteristics of the overall data. In order to further validate the validity and generalisation ability of the model, we plan to use more datasets for validation and testing in subsequent studies to enhance the credibility of the findings and to identify potential problems and improvement points.</p>
<p>The three graphs (A), (B), and (C) in <xref ref-type="fig" rid="fig4">Figure 4</xref> show the iterative curves of the training process of the Att-1DCNN-GRU model proposed in this paper in the three dimensions of VALENCE, AROUSAL, and DOMINANCE, respectively, where the green dashed line represents the accuracy of the training data, the green solid line represents the accuracy of the validation data, and the red dashed line represents the loss of the training data. The red solid line represents the loss of the validation data. The training process of the model on the dataset are well behaved, convergence is fast, and no overfitting occurs. It is proved that the method proposed in this study can not only effectively perform emotion recognition, but also has high classification accuracy.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption><p>Training and validation accuracy and loss curves: <bold>(A)</bold> VALENCE dimension <bold>(B)</bold> AROUSAL dimension <bold>(C)</bold> DOMINANCE dimension.</p></caption>
<graphic xlink:href="fnins-19-1512799-g004.tif"/>
</fig>
<p>The three graphs (A), (B), and (C) in <xref ref-type="fig" rid="fig5">Figure 5</xref> show the classification results of the model on the test set for the three scores of VALENCE, AROUSAL, and DOMINANCE, respectively, through the confusion matrix. As can be seen from <xref ref-type="fig" rid="fig5">Figure 5</xref>, the model has the best classification effect on VALENCE, which can reach 95.95%; followed by the classification effect on AROUSAL, which can reach 94.93%; and lastly, the classification effect on DOMINANCE, which can also reach 94.91%.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption><p>Confusion matrix for emotion classification on the test set: <bold>(A)</bold> VALENCE dimension <bold>(B)</bold> AROUSAL dimension <bold>(C)</bold> DOMINANCE dimension.</p></caption>
<graphic xlink:href="fnins-19-1512799-g005.tif"/>
</fig>
</sec>
<sec id="sec14">
<label>3.3</label>
<title>Comparative results and analysis of ablation experiments</title>
<p>In order to further evaluate the performance of the Att-1DCNN-GRU model proposed in this paper, this study conducted a multi-group comparison experiment on the emotion dimension VALENCE in the DREAMER dataset. The comparison models used include 1DCNN, GRU, 1DCNN-GRU, 1DCNN-Attention, GRU-Attention, and Att-1DCNN-GRU (the model proposed in this paper). The experimental results are shown in <xref ref-type="table" rid="tab6">Table 6</xref>, indicating that the improved hybrid neural network model outperforms the other compared models in terms of prediction. The accuracy of the deep learning emotion recognition method is significantly higher than that of the traditional neural network algorithm, indicating that deep learning can adaptively extract valuable information from raw physiological data.</p>
<table-wrap position="float" id="tab6">
<label>Table 6</label>
<caption><p>Results of ablation experiments.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Model</th>
<th align="center" valign="top">Accuracy (VALENCE)</th>
<th align="center" valign="top">Accuracy (AROUSAL)</th>
<th align="center" valign="top">Accuracy (DOMINANCE)</th>
<th align="center" valign="top">Overall Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">1DCNN</td>
<td align="center" valign="middle">88.5%</td>
<td align="center" valign="middle">87.3%</td>
<td align="center" valign="middle">85.8%</td>
<td align="center" valign="middle">87.2%</td>
</tr>
<tr>
<td align="left" valign="middle">GRU</td>
<td align="center" valign="middle">85.4%</td>
<td align="center" valign="middle">84.6%</td>
<td align="center" valign="middle">83.2%</td>
<td align="center" valign="middle">84.4%</td>
</tr>
<tr>
<td align="left" valign="middle">1DCNN-GRU</td>
<td align="center" valign="middle">91.2%</td>
<td align="center" valign="middle">90.5%</td>
<td align="center" valign="middle">89.4%</td>
<td align="center" valign="middle">90.4%</td>
</tr>
<tr>
<td align="left" valign="middle">1DCNN-Attention</td>
<td align="center" valign="middle">92.1%</td>
<td align="center" valign="middle">91.7%</td>
<td align="center" valign="middle">90.3%</td>
<td align="center" valign="middle">91.3%</td>
</tr>
<tr>
<td align="left" valign="middle">GRU-Attention</td>
<td align="center" valign="middle">90.7%</td>
<td align="center" valign="middle">89.4%</td>
<td align="center" valign="middle">88.1%</td>
<td align="center" valign="middle">89.4%</td>
</tr>
<tr>
<td align="left" valign="middle"><bold>Att-1DCNN-GRU</bold></td>
<td align="center" valign="middle"><bold>95.95%</bold></td>
<td align="center" valign="middle"><bold>94.93%</bold></td>
<td align="center" valign="middle"><bold>94.91%</bold></td>
<td align="center" valign="middle"><bold>95.26%</bold></td>
</tr>
<tr>
<td align="left" valign="middle">EEG-only</td>
<td align="center" valign="middle">87.2%</td>
<td align="center" valign="middle">86.1%</td>
<td align="center" valign="middle">84.9%</td>
<td align="center" valign="middle">86.1%</td>
</tr>
<tr>
<td align="left" valign="middle">ECG-only</td>
<td align="center" valign="middle">83.6%</td>
<td align="center" valign="middle">81.4%</td>
<td align="center" valign="middle">80.7%</td>
<td align="center" valign="middle">81.9%</td>
</tr>
<tr>
<td align="left" valign="middle">DEAP Dataset</td>
<td align="center" valign="middle">92.5%</td>
<td align="center" valign="middle">91.3%</td>
<td align="center" valign="middle">90.4%</td>
<td align="center" valign="middle">92.5%</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The bold values shown in the table represent the models presented in this paper and their corresponding accuracy rates.</p>
</table-wrap-foot>
</table-wrap>
<p>In addition, it can be concluded from the ablation experiments that the improved hybrid model proposed in this paper has fast convergence, high accuracy, small loss and moderate training time, which proves that the model not only possesses high performance, but also provides theoretical support for the practical application of emotion recognition research. Especially in the training process, the model converges quickly and there is no overfitting phenomenon, which verifies the effectiveness of the method in the emotion recognition task.</p>
<p>In addition to the comparisons with other models, this study further conducted several additional comparison experiments to explore the impact of different data sources and datasets on model performance. Firstly, we conducted separate comparison experiments for the case of using EEG data alone and ECG data alone. The experimental results show that the accuracy of the model when using EEG data alone is significantly lower than the case of fusing EEG and ECG data, especially in the accuracy of recognising the emotion dimension. In contrast, although the use of ECG data alone achieved some success in some of the emotion dimensions, the recognition effect was far inferior to the fusion of the two due to the limitation of the ECG signal information.</p>
<p>To further validate the generality of the model, this study also tested it on the DEAP dataset, which is a typical emotion recognition dataset containing multimodal signals such as EEG and ECG. The experimental results show that the Att-1DCNN-GRU model achieves a classification accuracy of 92.5% on the DEAP dataset, which is a significant advantage over other traditional models. The results further demonstrate the generalisation ability of the model, showing that it not only achieves excellent results on the DREAMER dataset, but also adapts to other emotion recognition tasks, demonstrating strong adaptability.</p>
<p>In addition to this, we also compared with the latest model MS-MPHAN (<xref ref-type="bibr" rid="ref15">Xuejing et al., 2024</xref>) published in 2024, which employs a multi-scale multi-channel hybrid attention mechanism and achieves an accuracy of 93.75% on the DEAP dataset. Although the accuracy of our model is slightly lower than the latest model by about 1%, we believe that by further optimising the feature extraction method, enhancing the fusion effect of spatio-temporal features, and adopting more advanced model architectures (e.g., self-supervised learning, graphical convolutional networks, etc.), we can overcome the current gap and improve the accuracy of the model to achieve a better performance in our future work.</p>
</sec>
</sec>
<sec id="sec15">
<label>4</label>
<title>Conclusion and future work</title>
<p>In this paper, we propose an emotion recognition method based on multimodal signal fusion, combining electroencephalogram (EEG) and electrocardiogram (ECG) signals, and by designing an improved composite neural network model Att-1DCNN-GRU, we successfully achieve accurate classification of the three dimensions of emotion (affect, arousal, and dominance). Through experimental verification, the model performs well in the emotion recognition task, especially the test results on the DEAP dataset, which proves that the fusion of EEG and ECG signals can effectively improve the accuracy and robustness of emotion recognition.</p>
<p>In the experimental process, the EEG and ECG signals were first rigorously pre-processed, including denoising, band-pass filtering, and other steps to ensure the purity and effectiveness of the signals. In terms of feature extraction, we adopted time-domain, frequency-domain and nonlinear methods to extract rich physiological features from EEG and ECG signals, and the most representative features were screened by the random forest method. In this way, the model is able to fully exploit the useful information in the signals and ensure a high classification performance.</p>
<p>The experimental results show that the Att-1DCNN-GRU model achieves a high level of classification accuracy in all three dimensions of emotion (VALENCE, AROUSAL, and DOMINANCE), with VALENCE having the highest classification accuracy of 95.95%. The fusion strategy of deep learning models demonstrates stronger classification ability and higher accuracy compared to traditional methods. In the comparison experiments, we also observed relatively low classification accuracy when using either EEG data or ECG data alone, further demonstrating the complementary nature of EEG and ECG signals and the advantages of multimodal fusion in emotion recognition.</p>
<p>In addition to the comparison of a single data source, we also included the validation of the DEAP dataset in our experiments to further extend the generalisation ability of the model. The experimental results show that the model performs with good stability and robustness on different datasets, providing strong evidence for the cross-dataset adaptability of emotion recognition techniques.</p>
<p>Although this study has achieved significant results, there are still some limitations that need to be addressed in future research. Firstly, despite the use of multimodal data fusion, the model is still sensitive to individual differences and the diversity of emotional states, and the adaptability of the model can be further improved in the future by introducing more personalised features and adaptive mechanisms. Second, the existing experiments mainly focus on the DEAP dataset and the DREAMER dataset, although these datasets are already representative, in order to enhance the credibility of the model, future research should consider validating the model using more publicly available datasets (e.g., AMIGOS, etc.) in order to test the model&#x2019;s performance in different contexts.</p>
<p>In addition, this paper has made preliminary explorations on feature selection and model design, but the physiological significance of various types of features in the EEG and ECG signals and the specific relationship with the emotional state have not yet been explored in depth. Future studies can conduct more detailed studies on feature selection and fusion mechanisms through finer feature analysis, combined with psychological and physiological theories, in order to improve the interpretive and application value of the model.</p>
<p>In terms of model optimisation, more complex deep learning structures, such as dual-channel networks and temporal&#x2013;spatial feature fusion networks, can be further explored in the future to improve the model&#x2019;s processing capability on multimodal data. In addition, further optimisation of the attention mechanism and hierarchical structure design can also bring more flexibility and generalisation ability to the model.</p>
<p>Overall, this study provides a new idea and methodology in the field of emotion recognition, and makes significant progress in emotion recognition accuracy and robustness through multimodal signal fusion and innovative design of deep learning models. In the future, with the popularity of wearable devices and the increasing demand for mental health, emotion recognition technology is expected to become an important auxiliary diagnostic tool to provide personalised emotion calculation and mental health management solutions for individuals.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="sec16">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found at: <ext-link xlink:href="https://zenodo.org/records/546113" ext-link-type="uri">https://zenodo.org/records/546113</ext-link>.</p>
</sec>
<sec sec-type="author-contributions" id="sec17">
<title>Author contributions</title>
<p>ZW: Writing &#x2013; original draft, Writing &#x2013; review &#x0026; editing. YW: Writing &#x2013; original draft, Writing &#x2013; review &#x0026; editing.</p>
</sec>
<sec sec-type="funding-information" id="sec18">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.</p>
</sec>
<sec sec-type="COI-statement" id="sec19">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="sec20">
<title>Generative AI statement</title>
<p>The authors declare that no Gen AI was used in the creation of this manuscript.</p>
</sec>
<sec sec-type="disclaimer" id="sec21">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agrafioti</surname> <given-names>F.</given-names></name> <name><surname>Hatzinakos</surname> <given-names>D.</given-names></name> <name><surname>Anderson</surname> <given-names>A. K.</given-names></name></person-group> (<year>2011</year>). <article-title>ECG pattern analysis for emotion detection</article-title>. <source>IEEE Trans. Affect. Comput.</source> <volume>3</volume>, <fpage>102</fpage>&#x2013;<lpage>115</lpage>. doi: <pub-id pub-id-type="doi">10.1109/T-AFFC.2011.28</pub-id>, PMID: <pub-id pub-id-type="pmid">39573497</pub-id></citation></ref>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alarcao</surname> <given-names>S. M.</given-names></name> <name><surname>Fonseca</surname> <given-names>M. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Emotions recognition using EEG signals: a survey</article-title>. <source>IEEE Trans. Affect. Comput.</source> <volume>10</volume>, <fpage>374</fpage>&#x2013;<lpage>393</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TAFFC.2017.2714671</pub-id>, PMID: <pub-id pub-id-type="pmid">39573497</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atkinson</surname> <given-names>J.</given-names></name> <name><surname>Campos</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>Improving BCI-based emotion recognition by combining EEG feature selection and kernel classifiers</article-title>. <source>Expert Syst. Appl.</source> <volume>47</volume>, <fpage>35</fpage>&#x2013;<lpage>41</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.eswa.2015.10.049</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Butterworth</surname> <given-names>S.</given-names></name></person-group> (<year>1930</year>). <article-title>On the theory of filter amplifiers</article-title>. <source>Wireless Engineer</source> <volume>7</volume>, <fpage>536</fpage>&#x2013;<lpage>541</lpage>.</citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chakravarthi</surname> <given-names>B.</given-names></name> <name><surname>Ng</surname> <given-names>S. C.</given-names></name> <name><surname>Ezilarasan</surname> <given-names>M. R.</given-names></name> <name><surname>Leung</surname> <given-names>M. F.</given-names></name></person-group> (<year>2022</year>). <article-title>EEG-based emotion recognition using hybrid CNN and LSTM classification</article-title>. <source>Front. Comput. Neurosci.</source> <volume>16</volume>:<fpage>1019776</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fncom.2022.1019776</pub-id>, PMID: <pub-id pub-id-type="pmid">36277613</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ho</surname> <given-names>T. K.</given-names></name></person-group> (<year>1998</year>). <article-title>The random subspace method for constructing decision forests</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>20</volume>, <fpage>832</fpage>&#x2013;<lpage>844</lpage>. doi: <pub-id pub-id-type="doi">10.1109/34.709601</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>D.</given-names></name> <name><surname>Guan</surname> <given-names>C.</given-names></name> <name><surname>Ang</surname> <given-names>K. K.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Pan</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>) <article-title>Asymmetric spatial pattern for EEG-based emotion detection</article-title>. In <conf-name>The 2012 International Joint Conference on Neural Networks (IJCNN)</conf-name> (pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>).</citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hyv&#x00E4;rinen</surname> <given-names>A.</given-names></name> <name><surname>Oja</surname> <given-names>E.</given-names></name></person-group> (<year>2000</year>). <article-title>Independent component analysis: algorithms and applications</article-title>. <source>Neural Netw.</source> <volume>13</volume>, <fpage>411</fpage>&#x2013;<lpage>430</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0893-6080(00)00026-5</pub-id>, PMID: <pub-id pub-id-type="pmid">10946390</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katsigiannis</surname> <given-names>S.</given-names></name> <name><surname>Ramzan</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>DREAMER: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices</article-title>. <source>IEEE J. Biomed. Health Inform.</source> <volume>22</volume>, <fpage>98</fpage>&#x2013;<lpage>107</lpage>. doi: <pub-id pub-id-type="doi">10.1109/JBHI.2017.2688239</pub-id>, PMID: <pub-id pub-id-type="pmid">28368836</pub-id></citation></ref>
<ref id="ref9001"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>ImageNet classification with deep convolutional neural networks</article-title>. <source>Advances in neural information processing systems.</source> <volume>65</volume>, <fpage>84</fpage>&#x2013;<lpage>90</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3065386</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Picard</surname> <given-names>R. W.</given-names></name> <name><surname>Vyzas</surname> <given-names>E.</given-names></name> <name><surname>Healey</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>Toward machine emotional intelligence: analysis of affective physiological state</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>23</volume>, <fpage>1175</fpage>&#x2013;<lpage>1191</lpage>. doi: <pub-id pub-id-type="doi">10.1109/34.954607</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Russell</surname> <given-names>J. A.</given-names></name></person-group> (<year>1980</year>). <article-title>A circumplex model of affect</article-title>. <source>J. Pers. Soc. Psychol.</source> <volume>39</volume>, <fpage>1161</fpage>&#x2013;<lpage>1178</lpage>. doi: <pub-id pub-id-type="doi">10.1037/h0077714</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saganowski</surname> <given-names>S.</given-names></name> <name><surname>Perz</surname> <given-names>B.</given-names></name> <name><surname>Polak</surname> <given-names>A. G.</given-names></name> <name><surname>Kazienko</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>Emotion recognition for everyday life using physiological signals from wearables: a systematic literature review</article-title>. <source>IEEE Trans. Affect. Comput.</source> <volume>14</volume>, <fpage>1876</fpage>&#x2013;<lpage>1897</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TAFFC.2022.3176135</pub-id>, PMID: <pub-id pub-id-type="pmid">39573497</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sarkar</surname> <given-names>P.</given-names></name> <name><surname>Etemad</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Self-supervised ECG representation learning for emotion recognition</article-title>. <source>IEEE Trans. Affect. Comput.</source> <volume>13</volume>, <fpage>1541</fpage>&#x2013;<lpage>1554</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TAFFC.2020.3014842</pub-id>, PMID: <pub-id pub-id-type="pmid">39573497</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Thammasan</surname> <given-names>N.</given-names></name> <name><surname>Fukui</surname> <given-names>K. I.</given-names></name> <name><surname>Numao</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Application of deep belief networks in eeg-based dynamic music-emotion recognition</article-title>. In <conf-name>2016 International Joint Conference on Neural Networks (IJCNN)</conf-name> (pp. <fpage>881</fpage>&#x2013;<lpage>888</lpage>).</citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xuejing</surname> <given-names>G.</given-names></name> <name><surname>Jia</surname> <given-names>L.</given-names></name> <name><surname>Yucheng</surname> <given-names>G.</given-names></name> <name><surname>Zhaohui</surname> <given-names>Y.</given-names></name></person-group> (<year>2024</year>). <article-title>EEG emotion recognition method using multi-scale multi-channel hybrid attention mechanism</article-title>. <source>J. Comp. Eng. App.</source> <volume>60</volume>, <fpage>130</fpage>&#x2013;<lpage>138</lpage>. doi: <pub-id pub-id-type="doi">10.3778/j.issn.1002-8331.2309-0201</pub-id></citation></ref>
</ref-list>
</back>
</article>