<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2025.1652274</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Spiking neural networks for EEG signal analysis using wavelet transform</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Yuan</surname> <given-names>Li</given-names></name>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wei</surname> <given-names>Jian</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/3114592/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Liu</surname> <given-names>Ying</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/3109823/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
</contrib>
</contrib-group>
<aff><institution>Academy of Military Sciences</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Gaetano Di Caterina, University of Strathclyde, United Kingdom</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Anguo Zhang, Fuzhou University, China</p>
<p>Zihan Pan, Institute for Infocomm Research (A<sup>&#x0002A;</sup>STAR), Singapore</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Ying Liu <email>hello1668&#x00040;163.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>16</day>
<month>10</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>19</volume>
<elocation-id>1652274</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>06</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>09</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2025 Yuan, Wei and Liu.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Yuan, Wei and Liu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Brain-computer interfaces (BCIs) leverage EEG signal processing to enable human-machine communication and have broad application potential. However, existing deep learning-based BCI methods face two critical limitations that hinder their practical deployment: reliance on manual EEG feature extraction, which constrains their ability to adaptively capture complex neural patterns, and high energy consumption characteristics that make them unsuitable for resource-constrained portable BCI devices requiring edge deployment.</p>
</sec>
<sec>
<title>Methods</title>
<p>To address these limitations, this work combines wavelet transform for automatic feature extraction with spiking neural networks for energy-efficient computation. Specifically, we present a novel spiking transformer that integrates a spiking self-attention mechanism with discrete wavelet transform, termed SpikeWavformer. SpikeWavformer enables automatic EEG signal time-frequency decomposition, eliminates manual feature extraction, and provides energy-efficient classification decision-making, thereby enhancing the model&#x00027;s cross-scene generalization while meeting the constraints of portable BCI applications.</p>
</sec>
<sec>
<title>Results</title>
<p>Experimental results demonstrate the effectiveness and efficiency of SpikeWavformer in emotion recognition and auditory attention decoding tasks.</p>
</sec>
<sec>
<title>Discussion</title>
<p>These findings indicate that SpikeWavformer can address the key limitations of existing BCI methods and holds promise for practical deployment in portable, resource-constrained scenarios.</p>
</sec></abstract>
<kwd-group>
<kwd>spiking neural networks</kwd>
<kwd>EEG signal analysis</kwd>
<kwd>brain-computer interfaces</kwd>
<kwd>discrete wavelet transform</kwd>
<kwd>bio-inspired methods</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="2"/>
<equation-count count="24"/>
<ref-count count="77"/>
<page-count count="12"/>
<word-count count="7931"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Neuromorphic Engineering</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Brain-computer interfaces (BCIs) enable direct communication between the human brain and machines through electroencephalography (EEG) signal processing (<xref ref-type="bibr" rid="B71">Zhang et al., 2020</xref>). A typical BCI architecture comprises four functional modules: data acquisition, preprocessing, classification, and a feedback module (<xref ref-type="bibr" rid="B38">Lotte and Guan, 2010</xref>). BCI systems have demonstrated extensive real-world applicability in diverse domains including robotic manipulation (<xref ref-type="bibr" rid="B37">Liu et al., 2015</xref>), cognitive signal decoding (<xref ref-type="bibr" rid="B7">Cai et al., 2021</xref>), and neuropsychiatric interventions for emotional regulation (<xref ref-type="bibr" rid="B77">Zotev et al., 2020</xref>; <xref ref-type="bibr" rid="B65">Xing et al., 2019</xref>). As a common learning-based BCI method, deep learning methodology has demonstrated superior performance over conventional machine learning approaches across diverse BCI tasks (<xref ref-type="bibr" rid="B5">Ang et al., 2008</xref>; <xref ref-type="bibr" rid="B62">Wang et al., 2015</xref>), including motor imagery classification (<xref ref-type="bibr" rid="B47">Schirrmeister et al., 2017</xref>; <xref ref-type="bibr" rid="B31">Kwon et al., 2019</xref>), mental workload monitoring (<xref ref-type="bibr" rid="B29">Jiao et al., 2018</xref>), auditory attention decoding (<xref ref-type="bibr" rid="B17">Faghihi et al., 2022</xref>; <xref ref-type="bibr" rid="B8">Cai et al., 2024</xref>), and emotion recognition (<xref ref-type="bibr" rid="B3">Alarcao and Fonseca, 2017</xref>; <xref ref-type="bibr" rid="B34">Li et al., 2018</xref>). Nevertheless, previous research has predominantly relied on manually extracted EEG features such as power spectral density (PSD) and differential entropy (DE) (<xref ref-type="bibr" rid="B29">Jiao et al., 2018</xref>; <xref ref-type="bibr" rid="B52">Song et al., 2018</xref>; <xref ref-type="bibr" rid="B72">Zhong et al., 2020</xref>), whose limitations become increasingly evident. First, these feature extraction paradigms exhibit strong dependence on domain-specific knowledge (<xref ref-type="bibr" rid="B51">Singh and Krishnan, 2023</xref>; <xref ref-type="bibr" rid="B54">Subasi, 2019</xref>), necessitating task-specific extraction pipelines tailored to distinct experimental protocols, thereby compromising model generalizability across tasks. Second, manually crafted features often fail to capture nonlinear interrelationships in EEG time-frequency characteristics and multiscale dynamic properties (<xref ref-type="bibr" rid="B51">Singh and Krishnan, 2023</xref>; <xref ref-type="bibr" rid="B56">Vallabhaneni et al., 2021</xref>), potentially leading to critical information loss.</p>
<p>Wavelet Transform (WT) has emerged as a fundamental signal processing tool in EEG analysis (<xref ref-type="bibr" rid="B24">Grobbelaar et al., 2022</xref>) due to its unique time-frequency analysis capabilities. Unlike conventional Fourier Transform that provides only global frequency-domain information, WT enables multi-scale decomposition through its inherent multi-resolution analysis. This capability permits simultaneous signal characterization at distinct resolution levels-capturing macroscopic patterns (e.g., global trends) at coarse-grained scales while resolving microscopic fluctuations (e.g., localized variations) at fine-grained scales when processing electroencephalographic (EEG) signals. Furthermore, WT achieves adaptive hierarchical representation of non-stationary neural activities by dynamically adjusting the scale and translation parameters of basis functions, thereby effectively characterizing both transient features (e.g., high-frequency oscillations in event-related potentials) and long-range rhythmic patterns (e.g., sustained &#x003B1;-wave oscillations). Although recent years have witnessed preliminary applications of wavelet transform methodologies in EEG classification tasks. However, their predominant reliance on deep neural networks (DNNs) introduces computationally and resource-intensive demands, conflicting with the low-power objectives of resource-constrained portable BCI devices. Consequently, achieving optimal trade-offs between classification performance, system portability, and energy efficiency remains a critical challenge in practical BCI implementations.</p>
<p>Spiking neural networks (SNNs), recognized as third-generation neural networks, have emerged as a promising alternative in BCI research due to their biologically plausible computation paradigm (<xref ref-type="bibr" rid="B28">Izhikevich, 2003</xref>; <xref ref-type="bibr" rid="B41">Maass, 1997</xref>; <xref ref-type="bibr" rid="B42">Masquelier et al., 2008</xref>). As shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, instead of continuous activations in deep neural networks (DNNs), SNNs employ discrete spike events as neuronal communication media, where spiking neurons activate exclusively upon reaching threshold potentials and remain quiescent otherwise (<xref ref-type="bibr" rid="B20">Gerstner and Kistler, 2002</xref>). This event-driven mechanism (<xref ref-type="bibr" rid="B63">Wei et al., 2024</xref>) facilitates synaptic computation sparsity while eliminating multiply-accumulate (MAC) operations, thereby achieving superior energy efficiency, which is critical for portable neurotechnological devices. Notably, SNNs have demonstrated remarkable success across multiple computational neuroscience domains in recent years. For instance, the energy-efficient Spike Transformer architectures proposed by <xref ref-type="bibr" rid="B69">Yao et al. (2023</xref>, <xref ref-type="bibr" rid="B68">2024</xref>, <xref ref-type="bibr" rid="B70">2025)</xref> and <xref ref-type="bibr" rid="B75">Zhou et al. (2022</xref>, <xref ref-type="bibr" rid="B73">2023)</xref> have demonstrated exceptional performance in image classification (<xref ref-type="bibr" rid="B15">Deng et al., 2022</xref>; <xref ref-type="bibr" rid="B49">Shi et al., 2024</xref>), detection (<xref ref-type="bibr" rid="B40">Luo et al., 2024</xref>; <xref ref-type="bibr" rid="B60">Wang et al., 2025</xref>), and segmentation (<xref ref-type="bibr" rid="B33">Lei et al., 2025</xref>). Similarly, the SNN-based audio processing models developed by <xref ref-type="bibr" rid="B64">Wu et al. (2018)</xref>; <xref ref-type="bibr" rid="B46">Pan et al. (2020)</xref>; <xref ref-type="bibr" rid="B59">Wang et al. (2024)</xref> have made significant advancements in signal processing and keyword recognition. These successes establish a solid foundation for the broader adoption and cross-domain application of SNNs.</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption><p>Comparison of neuron models in deep neural networks (DNNs) and spiking neural networks (SNNs). <bold>(a)</bold> Conventional DNNs neuron model processes continuous-valued inputs, where <italic>x</italic> represents input activations, <italic>w</italic> denotes synaptic weights, <italic>b</italic> is the bias term, and <italic>Y</italic> corresponds to the output activation. <bold>(b)</bold> Typical spiking neuron model that processes discrete spike events, with <italic>s</italic><sub><italic>i</italic></sub> representing input spikes, <italic>w</italic> indicating synaptic weights, and <italic>Y</italic> signifying the output spike train.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-19-1652274-g0001.tif">
<alt-text>Diagram comparing two models: (a) shows a traditional neural network with inputs \(x_1\), \(x_2\), \(x_3\) passing through synapses with weights, processed by a soma and an activation function to produce output \(Y\). (b) depicts a spiking neural network with pre-spikes \(s_1\), \(s_2\), \(s_3\) processed similarly through synapses and a soma that exhibits neural dynamics, resulting in spikes as output \(Y\).</alt-text>
</graphic>
</fig>
<p>In this paper, we propose a novel BCI signal analysis framework that integrates wavelet transform with an spiking self-attention mechanism. This framework enables dynamic modeling and efficient computation of non-smooth EEG signals by combining brain-inspired spiking neural networks with the global-local feature extraction capabilities of the wavelet domain. Our approach not only overcomes the limitations of traditional manual feature extraction but also demonstrates, for the first time, the synergistic effectiveness of spiking self-attention and wavelet transform in cross-task scenarios through end-to-end training. In experimental evaluations focused on emotion recognition and auditory attention decoding tasks, our method achieves outstanding performance. The main contributions of this work are summarized as follows:</p>
<list list-type="bullet">
<list-item><p>We propose a novel spiking self-attention module integrated with discrete wavelet transform (DWT) for EEG signal processing. This innovative module simultaneously captures global rhythmic patterns and local transient features through multi-scale wavelet decomposition. Leveraging the spatio-temporal dynamics of spiking neurons, it effectively models nonlinear feature dependencies while replacing traditional Transformer&#x00027;s dense attention with efficient sparse pulse sequences.</p></list-item>
<list-item><p>We present SpikeWavformer, the first end-to-end spiking neural network framework specifically designed for multi-task BCI analysis. The framework unifies time-frequency decomposition, dynamic feature selection, and classification within a biologically plausible computational paradigm. Its cascade architecture combines reversible wavelet transforms with spiking self-attention layers, enabling adaptive optimization across diverse BCI tasks including emotion recognition and auditory decoding.</p></list-item>
<list-item><p>We conduct comprehensive evaluations on multiple public benchmark datasets to validate the effectiveness of SpikeWavformer. Experimental results demonstrate superior performance compared to existing methods, particularly in resource-constrained environments. The framework shows significant practical potential for real-world BCI applications, achieving state-of-the-art results while maintaining low computational overhead.</p></list-item>
</list>
</sec>
<sec id="s2">
<title>2 Related works</title>
<sec>
<title>2.1 SNNs for EEG signal processing tasks</title>
<p>EEG-based BCIs have demonstrated significant potential across various downstream tasks, with auditory attention decoding (AAD) and emotion recognition representing two prominent application domains. In AAD research, the challenge stems from the cocktail party effect&#x02014;the neurocognitive ability to selectively focus on target speakers in multi-talker environments (<xref ref-type="bibr" rid="B10">Cherry, 1953</xref>), which contrasts with difficulties experienced by hearing-impaired populations (<xref ref-type="bibr" rid="B8">Cai et al., 2024</xref>). Neurophysiological signal analyses through ECoG (<xref ref-type="bibr" rid="B43">Mesgarani and Chang, 2012</xref>), MEG (<xref ref-type="bibr" rid="B2">Akram et al., 2016</xref>), and EEG (<xref ref-type="bibr" rid="B45">O&#x00027;sullivan et al., 2015</xref>) have enabled AAD implementations, catalyzing developments in neuro-steered hearing aids (<xref ref-type="bibr" rid="B9">Ceolini et al., 2020</xref>). For emotion recognition, the field seeks to model higher-order cognitive functions encoded in neurophysiological signals (<xref ref-type="bibr" rid="B55">Tan et al., 2021</xref>). While emotional states manifest through various modalities, the susceptibility of physical expressions to masking effects positions non-invasive EEG as a robust solution for emotion decoding (<xref ref-type="bibr" rid="B66">Xu et al., 2024</xref>; <xref ref-type="bibr" rid="B35">Li et al., 2019</xref>).</p>
<p>SNNs have emerged as a promising computational framework for both applications, leveraging their inherent low-latency processing and energy-efficient characteristics. In AAD research, <xref ref-type="bibr" rid="B17">Faghihi et al. (2022)</xref> developed efficient left/right attention pattern decoding, while <xref ref-type="bibr" rid="B6">Cai et al. (2023)</xref> proposed BSAnet, integrating biologically plausible mechanisms with attention modeling for temporal dynamics capture. Recent advances include spiking GCNs for spatial feature extraction (<xref ref-type="bibr" rid="B8">Cai et al., 2024</xref>), demonstrating promising results in low-density electrode scenarios. In emotion recognition, pioneering SNN applications have shown methodological viability. <xref ref-type="bibr" rid="B55">Tan et al. (2021)</xref> implemented NeuroSense achieving 78.97%/67.76% (arousal/valence) accuracy on DEAP, while <xref ref-type="bibr" rid="B4">Alzhrani et al. (2021)</xref> attained 94.83% accuracy using bidirectional spiking networks on DREAMER. Recent developments include fractal SNN architectures (<xref ref-type="bibr" rid="B36">Li et al., 2023</xref>), SGLNet for spatiotemporal extraction (<xref ref-type="bibr" rid="B22">Gong et al., 2023</xref>), and EESCN achieving 94.81% accuracy on DEAP and SEED-IV (<xref ref-type="bibr" rid="B66">Xu et al., 2024</xref>). However, previous research has predominantly relied on manually extracted EEG features such as power spectral density (PSD) and differential entropy (DE) (<xref ref-type="bibr" rid="B29">Jiao et al., 2018</xref>; <xref ref-type="bibr" rid="B52">Song et al., 2018</xref>), and automatic EEG feature extraction in this domain remains largely unexplored.</p>
</sec>
<sec>
<title>2.2 Spiking self attention mechanism</title>
<p>Traditional SNNs, despite their inherent advantages in energy efficiency and biological plausibility, still exhibit a performance gap compared to their DNN counterparts. Therefore, many recent works have integrated attention mechanisms into SNNs to enhance their performance and capabilities (<xref ref-type="bibr" rid="B67">Yao et al., 2021</xref>; <xref ref-type="bibr" rid="B76">Zhu et al., 2024</xref>; <xref ref-type="bibr" rid="B74">Zhou et al., 2024</xref>; <xref ref-type="bibr" rid="B39">Lu et al., 2025</xref>). <xref ref-type="bibr" rid="B69">Yao et al. (2023)</xref> addressed this through Spike-Driven Self-Attention (SDSA), reformulating matrix multiplications as masking operations to ensure purely binary spike signal transmission. Building on this foundation, <xref ref-type="bibr" rid="B68">Yao et al. (2024)</xref> introduced the Meta-Spikeformer architecture that extended the SDSA operator. Those advancement inspired subsequent research exploring SNN-specific attention mechanisms. <xref ref-type="bibr" rid="B61">Wang et al. (2023)</xref> proposed Spatiotemporal Self-Attention (STSA) for SNNs, maintaining asynchronous transmission while capturing spatiotemporal feature dependencies. More recently, <xref ref-type="bibr" rid="B60">Wang et al. (2025)</xref> developed Saccade Spike Self-Attention (SSSA), enabling comprehensive spatiotemporal feature processing for holistic visual scene understanding in SNN paradigms. Overall, these novel spiking self-attention mechanisms have significantly advanced SNN performance. However, there remains a lack of effective spiking self-attention designs specifically tailored for EEG signal processing.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Preliminary</title>
<sec>
<title>3.1 Leaky integrate-and-fire neuron</title>
<p>SNNs rely on spiking neurons (<xref ref-type="bibr" rid="B41">Maass, 1997</xref>) as their basic unit of information transfer, and common spiking neurons include the Hodgkin-Huxley (<xref ref-type="bibr" rid="B1">Abbott and Kepler, 2005</xref>), Izhikevich (<xref ref-type="bibr" rid="B28">Izhikevich, 2003</xref>), and Leaky Integrate-and-Fire (LIF) (<xref ref-type="bibr" rid="B28">Izhikevich, 2003</xref>) model. In this work, we use the LIF model as the spiking neuron in the proposed method. The LIF model is a simple and effective spiking neuron model. When the membrane potential reaches a certain threshold, the neuron emits a spike, followed by a reset of the membrane potential to the resting potential <italic>V</italic><sub><italic>reset</italic></sub>. The dynamic model of LIF is described as:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>V</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x00398;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>H</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C4; is the membrane time constant, and <italic>X</italic>[<italic>t</italic>] is the input current at time step <italic>t</italic>. When the membrane potential <italic>H</italic>[<italic>t</italic>] exceeds the firing threshold <italic>V</italic><sub><italic>th</italic></sub>, the spiking neuron triggers a spike <italic>S</italic>[<italic>t</italic>]. &#x00398;(&#x000B7;) is the Heaviside step function which equals 1 for <italic>v</italic>&#x02265;0 and 0 otherwise. <italic>V</italic>[<italic>t</italic>] represents the membrane potential after the trigger event which equals <italic>H</italic>[<italic>t</italic>] if no spike is generated, and otherwise equals to <italic>V</italic><sub><italic>reset</italic></sub>.</p>
</sec>
<sec>
<title>3.2 Wavelet transform</title>
<p>Wavelet transforms (WTs) are powerful signal-processing tools that enable the localization of signals in both time and frequency domains, which is particularly useful for analyzing non-stationary signals like EEG. The discrete wavelet transform (DWT), in particular, provides an efficient method for multi-resolution analysis by decomposing signals into sub-bands corresponding to different frequency scales. This decomposition enables the extraction of local features at various scales, making it well-suited for EEG signal processing. EEG signals are nonlinear and non-stationary, posing challenges for traditional analysis methods in capturing their time-varying and multiscale nature. Wavelet transforms, and specifically DWT, offer a significant advantage in feature extraction and time-frequency characterization of EEG signals. The DWT decomposes EEG data into frequency bands such as delta (0.5&#x02013;4 Hz), theta (4&#x02013;8 Hz), alpha (8&#x02013;13 Hz), beta (13&#x02013;30 Hz), and gamma (greater than 30 Hz). This decomposition allows us to extract meaningful features from the EEG data that correspond to various cognitive and emotional states.</p>
<p>For our application, we employ the Haar wavelet due to its simplicity and computational efficiency. Haar wavelets are among the earliest and simplest wavelet functions, characterized by a two-tap filter with minimal support, which results in fast computations. Compared to other common wavelets like Daubechies or Morlet, Haar wavelets are computationally less expensive, requiring only additions and binary shifts, which makes them well-suited for real-time, low-power applications such as SNN-based systems. Haar wavelets are particularly efficient in extracting local, low-frequency components (such as delta and theta waves) as well as high-frequency components (like beta and gamma waves), which are essential for distinguishing different cognitive states in EEG analysis. The efficiency and simplicity of Haar wavelets also make them ideal for handling the sparse, event-driven nature of SNNs.</p>
</sec>
<sec>
<title>3.3 Spiking self attention mechanism</title>
<p>The Transformer architecture, originally devised for natural language processing tasks (<xref ref-type="bibr" rid="B58">Vaswani et al., 2017</xref>), has subsequently permeated multiple subfields of artificial intelligence. At its core lies the self-attention mechanism, which facilitates selective information processing by focusing on relevant contextual elements. Spikformer (<xref ref-type="bibr" rid="B75">Zhou et al., 2022</xref>) pioneered the integration of self-attention into SNNs through their Spiking Self-Attention (SSA) framework and spikformer architecture. This approach innovatively employs sparse spiking representations for the query (<italic>Q</italic>), key (<italic>K</italic>), and value (<italic>V</italic>) matrices:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>S</mml:mi></mml:mstyle><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>S</mml:mi></mml:mstyle><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>S</mml:mi></mml:mstyle><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>here, <italic>Q</italic>, <italic>K</italic>, and <italic>V</italic> form tensors of dimension &#x0211D;<sup><italic>T</italic>&#x000D7;<italic>C</italic>&#x000D7;<italic>H</italic>&#x000D7;<italic>W</italic></sup>, with BN(&#x000B7;) representing batch normalization and <inline-formula><mml:math id="M7"><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>S</mml:mi></mml:mstyle><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> denoting the spiking neuron layer that maintains the attention mechanism&#x00027;s spiking nature. The similarity computation between spiking <italic>Q</italic> and <italic>K</italic> matrices proceeds via dot-product:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Score</mml:mtext><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">Sim</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Q</mml:mi><mml:mo>,</mml:mo><mml:mi>K</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>Q</mml:mi><mml:msup><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The attention output is subsequently calculated as a scaled weighted sum of <italic>V</italic>, transformed through spiking neuron activation, and further processed through linear transformation and batch normalization before final spiking neuron conversion to produce the output <italic>Z</italic>:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Attn</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>S</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mtext class="textrm" mathvariant="normal">Score</mml:mtext><mml:mo>&#x000B7;</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>Z</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>S</mml:mi></mml:mstyle><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Linear</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Attn</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
</sec>
<sec id="s4">
<title>4 Methods</title>
<p>In this section, we introduce our approach for EEG-based emotion recognition and auditory attention decoding. First, we define the problem formulation in Section 4.1. Then, we describe the overall data processing workflow in Section 4.2. Finally, we present the proposed Spiking Wavelet Transformer (SpikeWavformer) architecture which integrates wavelet transform and self spiking attention mechanisms in Section 4.4.</p>
<sec>
<title>4.1 Problem analysis</title>
<p>Given an EEG dataset <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>D</mml:mi></mml:mstyle></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:math></inline-formula>, it can be represented as:</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>D</mml:mi></mml:mstyle></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M12"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>X</mml:mi></mml:mstyle></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:math></inline-formula> denotes the raw EEG input signal for the <italic>i</italic>-th sample, and <inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>Y</mml:mi></mml:mstyle></mml:mrow></mml:math></inline-formula> represents its corresponding label (emotion category or auditory attention state). Our objective is to learn a spiking neural network model <italic>F</italic><sub>&#x003B8;</sub> with parameters &#x003B8; to predict the class label from the EEG input. The model is optimized by minimizing the expected risk based on the cross-entropy loss <italic>L</italic><sub>CE</sub>:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mo class="qopname">arg</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder><mml:mrow><mml:mstyle mathvariant="double-struck"><mml:mi>E</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0007E;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>D</mml:mi></mml:mstyle></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">CE</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">eeg</mml:mtext></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In this study, we present a novel spiking transformer model, denoted as <italic>F</italic><sub>&#x003B8;</sub>, to learn discriminative spatio-temporal representations directly from raw EEG signals for the joint tasks of emotion recognition and auditory attention decoding. To achieve this, we introduce a novel Spiking Wavelet Self-Attention (SWSA) mechanism within a spiking transformer framework. While conventional Spiking Self-Attention (SSA) enables efficient event-driven computation, it is limited in its ability to capture the multi-scale frequency dynamics intrinsic to non-stationary EEG signals. The proposed SWSA overcomes this limitation by integrating Haar wavelet transforms for joint time&#x02014;frequency analysis, which offer a minimal filter length and computational simplicity, making them highly efficient for real-time processing. Compared to other wavelet bases, such as Daubechies and Morlet, Haar&#x00027;s shorter filters and multiply-free operations align well with the event-driven, low-power nature of spiking neurons. This integration allows the model to focus on neurophysiologically relevant rhythms (e.g., alpha and beta bands) critical for emotional and attentional processes, while maintaining energy-efficient computation. Finally, a cross-entropy loss function is employed to enable effective gradient-based optimization for learning highly discriminative features across both tasks.</p>
</sec>
<sec>
<title>4.2 Workflow</title>
<p>The overall workflow of the proposed method is depicted in <xref ref-type="fig" rid="F2">Figure 2</xref>. Raw EEG signals are first preprocessed and segmented into overlapping windows via a sliding window strategy to preserve temporal continuity. To more effectively capture the spatial characteristics of EEG activity, &#x003B1;-band cortical signals are extracted and projected onto 2D topographic maps, thereby maintaining brain-region dependencies. These maps are subsequently divided into patches and tokenized into fixed-length sequences, which serve as the input to a stack of <italic>N</italic> spiking encoder blocks. Finally, the resulting features are fed into an MLP classification head to predict the corresponding emotional or attentional state. In summary, this high-level pipeline constitutes the basis of the proposed model architecture, which is elaborated in the following section.</p>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption><p>Workflow of the proposed method for EEG-based tasks. First, raw EEG data are preprocessed and segmented via sliding windows. Second, the &#x003B1;-band cortical activity is visualized as 2D topological maps. Finally, the data are tokenized into fixed-length sequences with multiple spiking encoder blocks performing feature extraction and an MLP head outputting the predicted category.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-19-1652274-g0002.tif">
<alt-text>Diagram of the SpikeWaveformer process for analyzing ambulatory EEG data. EEG readings are split into windows and transformed into a spectro-spatial image. This image undergoes a convolution-based projection, followed by spiking encoder blocks. The resulting data is processed through an MLP to classify the output.</alt-text>
</graphic>
</fig>
</sec>
<sec>
<title>4.3 SpikeWavformer</title>
<p>Building on the workflow described above, we design SpikeWavformer&#x02014;an end-to-end spiking transformer architecture that combines wavelet-based multiscale analysis with spiking attention to enhance EEG feature representation. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, The SpikeWavformer can be written as follows:</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">SPS</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">SWSA</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">MLP</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>Y</mml:mi><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">CH</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">GAP</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Given the EEG input <italic>X</italic>, SpikeWavformer first visualizes the spatial focus position via the topographic distribution of oscillatory cortical activities in the &#x003B1; band and converts it into a 2D image. Subsequently, the SPS module partitions the input into patches and progressively extracts features, optionally incorporating wavelet transformation to enhance multiscale feature representation. Then, <italic>L</italic>&#x000D7; spiking wavelet encoder blocks with spiking wave attention mechanism are employed to encode the features. Finally, the features obtained from extraction and encoding are compressed into a fixed-dimension vector via global average pooling (GAP) and fed into a fully connected layer classification head (CH) to produce classification results.</p>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption><p>The overall architecture of our proposed Spiking Wavelet Transformer (SpikeWavformer) for EEG-based tasks, which consists of a spiking patch splitting module, <italic>L</italic>&#x000D7; spiking wavelet encoder blocks, and a linear classification head.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-19-1652274-g0003.tif">
<alt-text>Diagram of a neural network architecture for image classification. It includes components like Conv2D, Batch Normalization (BN), Max Pooling (MP), and Spiking Wavelet Self Attention (SWSA). The encoder block is repeated, leading to a classification head. Vanilla SSA and SWSA have elements like Linear, BN, LIF neurons, and wavelet transforms (DWT, IDWT). Operations include element-wise addition and matrix dot-product.</alt-text>
</graphic>
</fig>
</sec>
<sec>
<title>4.4 Spiking wavelet encoder block</title>
<p>As an essential neurophysiological signal, EEG plays a pivotal role in research areas such as affective computing and auditory attention decoding. Nevertheless, its multi-channel structure, low signal-to-noise ratio (SNR), pronounced temporal non-stationarity, and intricate time&#x02013;frequency characteristics present substantial challenges for existing analysis techniques. Conventional CNNs are limited in capturing long-range temporal dependencies inherent in EEG data. In contrast, vanilla Transformers possess strong long-range modeling capability but incur prohibitive computational costs when processing long-sequence EEG signals. Furthermore, many existing approaches employ irreversible downsampling during multi-scale feature extraction, resulting in the loss of critical frequency-domain information. This drawback is particularly detrimental to neural decoding tasks that rely on specific frequency bands.</p>
<p>To address these issues, we propose a Spiking Wavelet Self-Attention (SWSA) mechanism for EEG signal processing. It combines the biological plausibility of SNNs with the flexible time-frequency analysis of wavelet transforms, offering an efficient, biologically inspired solution for EEG-based emotion recognition and auditory attention decoding. Specifically, given multi-channel EEG inputs <italic>X</italic>&#x02208;&#x0211D;<sup><italic>T</italic>&#x000D7;<italic>B</italic>&#x000D7;<italic>C</italic>&#x000D7;<italic>H</italic>&#x000D7;<italic>W</italic></sup>, where <italic>T</italic> denotes time steps, <italic>B</italic> batch size, <italic>C</italic> EEG channels, and <italic>H</italic>&#x000D7;<italic>W</italic> spatial-topological 2D arrangement. The frequency-domain features of EEG signals are crucial for neuro-decoding. Different frequency bands correspond to different cognitive states: &#x003B4; with deep sleep, &#x003B8; with memory encoding, &#x003B1; with relaxation, &#x003B2; with attention and cognitive activities, and &#x003B3; with perception and higher-order functions. We adopt the Haar wavelet for its minimal filter length and computational simplicity, which enable fast, low-power multiscale decomposition and align well with the event-driven, resource-constrained nature of SNN-based BCI systems. Specifically, the Haar wavelet is used for multiscale decomposition and perform DWT on EEG features at each time step <italic>t</italic>:</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M18"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo class="qopname">DWT</mml:mo><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>here, <inline-formula><mml:math id="M19"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> captures low-frequency components (like &#x003B4;, &#x003B8;), while high-frequency sub-bands <inline-formula><mml:math id="M20"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M21"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M22"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> retain high-frequency information (&#x003B2;, &#x003B3;). Then, spatial local convolution enhances frequency-band interactions:</p>
<disp-formula id="E13"><label>(13)</label><mml:math id="M23"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Conv</mml:mtext><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Concat</mml:mtext><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>here, BN is batch normalization, LIF a spiking neuron layer. IDWT reconstructs spatial-domain features:</p>
<disp-formula id="E14"><label>(14)</label><mml:math id="M24"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo class="qopname">IDWT</mml:mo><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Our encoder, inspired by vanilla encoder (<xref ref-type="bibr" rid="B58">Vaswani et al., 2017</xref>), first calculates block-input spikes for self-attention. Three matrices <inline-formula><mml:math id="M25"><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M26"><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M27"><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> map tokens to vectors. Spiking neurons convert vectors to spiking sequences <italic>Q</italic>, <italic>K</italic>, <italic>V</italic>:</p>
<disp-formula id="E15"><label>(15)</label><mml:math id="M28"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Next, we compute <italic>Q</italic>-<italic>K</italic> similarity. Following <xref ref-type="bibr" rid="B75">Zhou et al. (2022)</xref>, a scaling factor <italic>s</italic> controls matrix-multiplication magnitude without affecting attention properties:</p>
<disp-formula id="E16"><label>(16)</label><mml:math id="M30"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">attn</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Q</mml:mi><mml:msup><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msup><mml:mi>V</mml:mi><mml:mo>&#x0002A;</mml:mo><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E17"><label>(17)</label><mml:math id="M31"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">attn</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Linear</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">attn</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>To integrate wavelet and attention features effectively, we use channel-wise concatenation:</p>
<disp-formula id="E18"><label>(18)</label><mml:math id="M32"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">combined</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">Concat</mml:mtext><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">attn</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">recon</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E19"><label>(19)</label><mml:math id="M33"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">SWSA</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">LIF</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">BN</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">combined</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>By integrating wavelet decomposition with spiking mechanisms, SpikeWavformer enables efficient processing of long-sequence EEG data while facilitating the analysis of cross-frequency neural dynamics, thereby providing richer feature representations for complex neuro decoding tasks. Specifically, we analyze the advantages of integrating wavelet transform into SNNs from the perspectives of convergence and convergence speed. First, We define the EEG signal space as <italic>X</italic> &#x0003D; {<italic>x</italic>&#x02208;&#x0211D;<sup><italic>T</italic>&#x000D7;<italic>C</italic>&#x000D7;<italic>H</italic>&#x000D7;<italic>W</italic></sup>}, where <italic>T</italic> represents time steps. <italic>C</italic> denotes channels, and <italic>H</italic>&#x000D7;<italic>W</italic> represents spatial dimensions. The discrete wavelet transform operator is defined as:</p>
<disp-formula id="E20"><label>(20)</label><mml:math id="M34"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>W</mml:mi><mml:mo>:</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>Y</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>Y</italic> &#x0003D; {(<italic>X</italic><sub><italic>LL</italic></sub>, <italic>X</italic><sub><italic>LH</italic></sub>, <italic>X</italic><sub><italic>HL</italic></sub>, <italic>X</italic><sub><italic>HH</italic></sub>)} represents the wavelet coefficient space.</p>
<p>The SWSA mechanism can be formalized as a composite operator:</p>
<disp-formula id="E21"><label>(21)</label><mml:math id="M35"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">SWSA</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Attn</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>W</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02295;</mml:mo><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>W</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>W</italic> is the DWT operator, Attn is the spiking attention operator, <italic>F</italic> is the fusion operator, &#x02295; denotes concatenation and <italic>W</italic><sup>&#x02212;1</sup> is the inverse DWT (IDWT).</p>
<p><bold>Theorem 1 (Lipschtiz Continuity of SWSA):</bold> The SWSA mechanism satisfies Lipschitz continuity (<xref ref-type="bibr" rid="B25">Hager, 1979</xref>; <xref ref-type="bibr" rid="B23">Gouk et al., 2021</xref>; <xref ref-type="bibr" rid="B21">Goldstein, 1977</xref>) with constant <italic>L</italic><sub><italic>SWSA</italic></sub>, ensuring stable convergence during training.</p>
<p><bold>Proof:</bold> First, we establish the Lipschitz properties of individual components: <italic><bold>Haar Wavelet Transform Lipschitz Property:</bold></italic> For the Haar wavelet transform <italic>W</italic>, we have: ||<italic>W</italic>(<italic>x</italic><sub>1</sub>)&#x02212;<italic>W</italic>(<italic>x</italic><sub>2</sub>)||<sub>2</sub> &#x02264; <italic>L</italic><sub><italic>W</italic></sub>||<italic>x</italic><sub>1</sub>&#x02212;<italic>x</italic><sub>2</sub>||<sub>2</sub>. Since Haar wavelets are orthonormal, <italic>L</italic><sub><italic>W</italic></sub> &#x0003D; 1. <italic><bold>Spiking Attention Lipschitz Property:</bold></italic> For the spiking attention mechanism with LIF neuron, let &#x003D5;(&#x003BC;) &#x0003D; &#x00398;(&#x003BC;&#x02212;<italic>V</italic><sub><italic>th</italic></sub>) be the spike generation function. The membrane potential dynamics: <italic>V</italic>[<italic>t</italic>] &#x0003D; &#x003C4;<italic>V</italic>[<italic>t</italic>&#x02212;1]&#x0002B;<italic>X</italic>[<italic>t</italic>]&#x02212;<italic>v</italic><sub><italic>reset</italic></sub><italic>S</italic>[<italic>t</italic>&#x02212;1]. For bounded inputs, the LIF neuron satisfies: <inline-formula><mml:math id="M36"><mml:mo>|</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:math></inline-formula>. Therefore, <inline-formula><mml:math id="M37"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula>. <italic><bold>Combined Operator:</bold></italic> The SWSA operator combines these components: ||SWSA(<italic>x</italic><sub>1</sub>)&#x02212;SWSA(<italic>x</italic><sub>2</sub>)||<sub>2</sub> &#x02264; <italic>L</italic><sub><italic>SWSA</italic></sub>||<italic>x</italic><sub>1</sub>&#x02212;<italic>x</italic><sub>2</sub>||<sub>2</sub>, where <inline-formula><mml:math id="M38"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mi>W</mml:mi><mml:mi>S</mml:mi><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>W</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula> with <italic>L</italic><sub><italic>F</italic></sub> being the Lipschitz constant of the fusion operation.</p>
<p><bold>Corollary 1:</bold> Under the assumption that <italic>L</italic><sub><italic>SWSA</italic></sub> &#x0003C; 1, the SWSA operator is a contraction mapping, guaranteeing convergence to a unique fixed point.</p>
<p><bold>Theorem 2 (Accelerated convergence):</bold> The SWSA mechanism achieves faster convergence compared to vanilla spiking self attention.</p>
<p><bold>Proof:</bold> Consider the optimization landscape with loss function <italic>L</italic>(&#x003B8;). The gradient update for SWSA parameters follows: &#x003B8;<sub><italic>t</italic>&#x0002B;1</sub> &#x0003D; &#x003B8;<sub><italic>t</italic></sub>&#x02212;&#x003B1;&#x02207;<sub>&#x003B8;</sub><italic>L</italic>(&#x003B8;<sub><italic>t</italic></sub>). The wavelet decomposition provides a natural regularization through frequency localization:</p>
<disp-formula id="E22"><label>(22)</label><mml:math id="M39"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mi>W</mml:mi><mml:mi>S</mml:mi><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x003BB;</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>This <italic>L</italic><sub>1</sub> regularization on wavelet coefficients promotes sparsity. The convergence rate is bounded by:</p>
<disp-formula id="E23"><label>(23)</label><mml:math id="M40"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>&#x003B1;</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mfrac><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B1;</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where the wavelet regularization reduces the effective variance &#x003C3;<sup>2</sup>, leading to faster convergence.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Experiment</title>
<p>This section presents comprehensive experiments to evaluate the effectiveness and efficiency of the proposed SpikeWavformer model. First, we detail the experimental setup, including datasets, preprocessing, and implementation specifics. Second, comparative studies are conducted on the DEAP and KUL datasets, demonstrating superior performance over existing methods in both emotion recognition and auditory attention decoding tasks. Additionally, we provide an analysis of the model&#x00027;s energy efficiency, highlighting its advantages in low-power computing environments.</p>
<sec>
<title>5.1 Experimental setup</title>
<sec>
<title>5.1.1 Datasets</title>
<p><bold>DEAP</bold>. The DEAP dataset (<xref ref-type="bibr" rid="B30">Koelstra et al., 2011</xref>), widely used in emotion recognition research, examines emotional responses to multimedia stimuli by employing peripheral physiological data and EEG signals. It includes 32-channel EEG recordings and various physiological signals, such as skin temperature, blood volume pulse (BVP), respiratory rate, galvanic skin response (GSR), electrooculogram (EOG), and video clips of facial expressions. The facial expressions of the first 22 participants were also recorded. Each participant completed 40 trials, with each trial lasting 1 min and a 3-second baseline recorded before the start of each trial. After each trial, participants filled out a questionnaire to self-report their emotional state in terms of arousal, valence, dominance, and liking, with each dimension rated on a 9-point scale. EEG data were collected using a 32-channel device at a sampling rate of 512 Hz.</p>
<p><bold>KUL</bold>. The KUL dataset (<xref ref-type="bibr" rid="B12">Das et al., 2019</xref>) comprises EEG data collected using the BioSemi ActivateTwo device. The experimental environment was electromagnetically shielded and soundproofed to minimize potential noise interference. Data were collected from 16 subjects with normal hearing, who were instructed to focus on a specific speaker amidst two speakers. The speakers narrated four Dutch stories. Each subject participated in 8 trials, each lasting 6 min. Auditory stimuli, filtered through HRTF, were presented to the subjects in two forms: from the left or right side, in a randomized manner.</p>
</sec>
<sec>
<title>5.1.2 Implementation details</title>
<p>The EEG data from each channel was first re-referenced to the average response of all electrodes. Given that the analyzed EEG signals were collected at different sampling rates, they were all band-pass filtered between 1 and 32 Hz using a 6th-order Chebyshev Type II filter and down sampled to a 128 Hz sampling rate. The frequency range was chosen based on previous nonlinear AAD studies. Finally, the EEG data channels were normalized to ensure a mean of zero and unit variance for each trial. The study on the KUL dataset analyzed seven decision window sizes: 0.1, 0.2, 0.5, 1, 2, 5, and 10 seconds. Experiments were conducted using two NVIDIA RTX 4090 GPUs. The model was optimized using the Adam optimizer with an initial learning rate of 1 &#x000D7; 10<sup>&#x02212;4</sup> and trained for 200 epochs. For the SNN model parameters, LIF neurons were set with an initial membrane potential of 0, a spiking threshold of 0.5, and a simulation time step of 4. To facilitate effective backpropagation, a sigmoid function with parameter &#x003B1; &#x0003D; 4 was used as the surrogate gradient function, expressed as <italic>sigmoid</italic>(<italic>x</italic>) &#x0003D; 1/(1&#x0002B;<italic>exp</italic>(&#x02212;&#x003B1;<italic>x</italic>)). The remaining setup of spiking transformer architecture follows spikformer (<xref ref-type="bibr" rid="B75">Zhou et al., 2022</xref>).</p>
</sec>
</sec>
<sec>
<title>5.2 Comparative study</title>
<p>We conduct experiments on the DEAP and KUL datasets using proposed SpikeWavformer and compare the results with existing methods for emotion recognition and auditory attention decoding. As shown in <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T2">2</xref>, our method achieves state-of-the-art performance on all datasets. On the DEAP dataset for emotion recognition, the SpikeWavformer method reaches an Arousal accuracy of 76.51% (std: 5.48%) and a Valence accuracy of 77.10% (std: 5.68%). Existing methods like EEGNet (<xref ref-type="bibr" rid="B32">Lawhern et al., 2018</xref>) achieve 58.29% (std: 8.60%) for Arousal and 54.56% (std: 8.14%) for Valence. SCN (<xref ref-type="bibr" rid="B47">Schirrmeister et al., 2017</xref>) attains 61.19% (std: 10.28%) for Arousal and 59.42% (std: 8.30%) for Valence. DCN (<xref ref-type="bibr" rid="B47">Schirrmeister et al., 2017</xref>) gets 61.03% (std: 8.58%) for Arousal and 59.92% (std: 7.82%) for Valence. Tsception (<xref ref-type="bibr" rid="B16">Ding et al., 2022</xref>) achieves 61.57% (std: 11.04%) for Arousal and 59.14% (std: 7.60%) for Valence.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Comparison of different methods on DEAP dataset.</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th valign="top" align="left" rowspan="2"><bold>Dataset</bold></th>
<th valign="top" align="left" rowspan="2"><bold>Method</bold></th>
<th valign="top" align="center" colspan="2"><bold>Arousal</bold></th>
<th valign="top" align="center" colspan="2"><bold>Valence</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Acc</bold>.</th>
<th valign="top" align="left"><bold>Std</bold></th>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Std</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" rowspan="5">DEAP</td>
<td valign="top" align="left">EEGNet (<xref ref-type="bibr" rid="B32">Lawhern et al., 2018</xref>)</td>
<td valign="top" align="center">58.29%</td>
<td valign="top" align="center">8.60%</td>
<td valign="top" align="center">54.56%</td>
<td valign="top" align="center">8.14%</td>
</tr>
 <tr>
<td valign="top" align="left">SCN (<xref ref-type="bibr" rid="B47">Schirrmeister et al., 2017</xref>)</td>
<td valign="top" align="center">61.19%</td>
<td valign="top" align="center">10.28%</td>
<td valign="top" align="center">59.42%</td>
<td valign="top" align="center">8.30%</td>
</tr>
 <tr>
<td valign="top" align="left">DCN (<xref ref-type="bibr" rid="B47">Schirrmeister et al., 2017</xref>)</td>
<td valign="top" align="center">61.03%</td>
<td valign="top" align="center">8.58%</td>
<td valign="top" align="center">59.92%</td>
<td valign="top" align="center">7.82%</td>
</tr>
 <tr>
<td valign="top" align="left">Tsception (<xref ref-type="bibr" rid="B16">Ding et al., 2022</xref>)</td>
<td valign="top" align="center">61.57%</td>
<td valign="top" align="center">11.04%</td>
<td valign="top" align="center">59.14%</td>
<td valign="top" align="center">7.60%</td>
</tr>
 <tr>
<td valign="top" align="left"><bold>SpikeWavformer</bold></td>
<td valign="top" align="center"><bold>76.51%</bold></td>
<td valign="top" align="center"><bold>5.48%</bold></td>
<td valign="top" align="center"><bold>77.10%</bold></td>
<td valign="top" align="center"><bold>5.68%</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The bold text refers to the method proposed in this paper.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Performance comparison across different decision windows.</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th valign="top" align="left" rowspan="2"><bold>Dataset</bold></th>
<th valign="top" align="left" rowspan="2"><bold>Model</bold></th>
<th valign="top" align="center" colspan="7"><bold>Decision window (second)</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>0.1</bold></th>
<th valign="top" align="left"><bold>0.2</bold></th>
<th valign="top" align="center"><bold>0.5</bold></th>
<th valign="top" align="center"><bold>1</bold></th>
<th valign="top" align="center"><bold>2</bold></th>
<th valign="top" align="center"><bold>5</bold></th>
<th valign="top" align="center"><bold>10</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" rowspan="4">KUL</td>
<td valign="top" align="left">Linear (CCA) (<xref ref-type="bibr" rid="B13">De Cheveign&#x000E9; et al., 2018</xref>)</td>
<td valign="top" align="center">50.9</td>
<td valign="top" align="center">53.6</td>
<td valign="top" align="center">55.7</td>
<td valign="top" align="center">60.2</td>
<td valign="top" align="center">63.5</td>
<td valign="top" align="center">69.4</td>
<td valign="top" align="center">75.9</td>
</tr>
 <tr>
<td valign="top" align="left">Non-linear (CNN) (<xref ref-type="bibr" rid="B7">Cai et al., 2021</xref>)</td>
<td valign="top" align="center">74.3</td>
<td valign="top" align="center">78.2</td>
<td valign="top" align="center">80.6</td>
<td valign="top" align="center">84.1</td>
<td valign="top" align="center">85.7</td>
<td valign="top" align="center">86.9</td>
<td valign="top" align="center">87.9</td>
</tr>
 <tr>
<td valign="top" align="left">STAnet (<xref ref-type="bibr" rid="B53">Su et al., 2022</xref>)</td>
<td valign="top" align="center">80.8</td>
<td valign="top" align="center">84.3</td>
<td valign="top" align="center">87.2</td>
<td valign="top" align="center">90.1</td>
<td valign="top" align="center">91.4</td>
<td valign="top" align="center">92.6</td>
<td valign="top" align="center">93.9</td>
</tr>
 <tr>
<td valign="top" align="left"><bold>SpikeWavformer</bold></td>
<td valign="top" align="center"><bold>80.5</bold></td>
<td valign="top" align="center"><bold>86.7</bold></td>
<td valign="top" align="center"><bold>94.2</bold></td>
<td valign="top" align="center"><bold>96.5</bold></td>
<td valign="top" align="center"><bold>97.1</bold></td>
<td valign="top" align="center"><bold>97.3</bold></td>
<td valign="top" align="center"><bold>98.6</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The bold text refers to the method proposed in this paper.</p>
</table-wrap-foot>
</table-wrap>
<p>We further compared the performance of the SpikeWavformer for different detection window sizes, ranging from 0.1 to 10 seconds, with the results presented in <xref ref-type="table" rid="T2">Table 2</xref>. On the KUL dataset, the SpikeWavformer achieved an average decoding accuracy of 96.5% across all subjects for a 1-second decision window, 97.1% for a 2-second decision window, 97.3% for a 5-second decision window, and 98.6% for a 10-second decision window. Generally, larger decision windows yielded better results, corroborating findings from previous studies (<xref ref-type="bibr" rid="B14">De Taillez et al., 2020</xref>; <xref ref-type="bibr" rid="B11">Ciccarelli et al., 2019</xref>; <xref ref-type="bibr" rid="B57">Vandecappelle et al., 2021</xref>). Notably, our proposed method is capable of decoding auditory spatial attention with a very short decision window of less than 1 second. For decision windows of 0.5 seconds and 0.2 seconds, the SpikeWavformer attained high accuracy rates of 94.2% and 86.7%, respectively. Although the accuracy for the 0.1-second decision window was lower than that of the 1-second decision window, SpikeWavformer maintained a high accuracy rate of 80.5%. In all comparisons with related work (<xref ref-type="bibr" rid="B13">De Cheveign&#x000E9; et al., 2018</xref>; <xref ref-type="bibr" rid="B7">Cai et al., 2021</xref>; <xref ref-type="bibr" rid="B53">Su et al., 2022</xref>), the SpikeWavformer demonstrated competitive performance.</p>
</sec>
<sec>
<title>5.3 Energy consumption comparison</title>
<p>In this section, we validate the energy efficiency of our proposed model over its ANN counterpart. Based on the energy calculation standard in neuromorphic computing (<xref ref-type="bibr" rid="B48">Sengupta et al., 2019</xref>), we use the method proposed by <xref ref-type="bibr" rid="B59">Wang et al. (2024)</xref> to compute the energy consumption ratio between our model and the equivalent ANN model:</p>
<disp-formula id="E24"><label>(24)</label><mml:math id="M41"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">Energy</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">rate</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">AC</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">MAC</mml:mtext></mml:mrow></mml:mfrac><mml:mo>&#x0002A;</mml:mo><mml:mtext class="textrm" mathvariant="normal">SpikingRate</mml:mtext><mml:mo>&#x0002A;</mml:mo><mml:mtext class="textrm" mathvariant="normal">TimeSteps</mml:mtext><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In the equation, <inline-formula><mml:math id="M42"><mml:mfrac><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">AC</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">MAC</mml:mtext></mml:mstyle></mml:mrow></mml:mfrac></mml:math></inline-formula> denotes the energy consumption ratio of an accumulate (AC) operation in SNNs to a multiplication (MAC) in ANNs. Extensive studies confirm the theoretical value of <inline-formula><mml:math id="M43"><mml:mfrac><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">AC</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">MAC</mml:mtext></mml:mstyle></mml:mrow></mml:mfrac></mml:math></inline-formula> is <inline-formula><mml:math id="M44"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula> (<xref ref-type="bibr" rid="B26">Horowitz, 2014</xref>). Here, SpikingRate is the average spiking rate, and TimeSteps the simulation time window. In our model, SpikingRate is 12.3%, and TimeSteps is set to 4. Based on <xref ref-type="disp-formula" rid="E24">Equation 24</xref>, our model achieves over 7&#x000D7; energy efficiency compared to its ANN counterpart.</p>
</sec>
<sec>
<title>5.4 Interpretability</title>
<p>In this section, saliency maps (<xref ref-type="bibr" rid="B50">Simonyan et al., 2013</xref>) are employed to visualize the areas of the data that contain the most information and contribute to classification performance. The saliency map is one of the most widely used tools for illustrating which regions of the input data hold classification-relevant information. To enhance the visualization of the saliency maps, the original maps were averaged along the time dimension to capture the topology of the EEG channels. Additionally, the normalized saliency maps were averaged across different samples for each subject to produce generalized average saliency maps. The average saliency maps for the DEAP dataset and the KUL dataset are presented in <xref ref-type="fig" rid="F4">Figures 4</xref>, <xref ref-type="fig" rid="F5">5</xref>, respectively.</p>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption><p>Visualization of saliency maps from DEAP dataset (Sub 1&#x02013;8): <bold>(a)</bold> Arousal-dimensional saliency maps and <bold>(b)</bold> valence-dimensional saliency maps.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-19-1652274-g0004.tif">
<alt-text>Two rows of brain activity heatmaps illustrate Arousal-dimensional (top row) and Valence-dimensional (bottom row) data. Each row contains eight diagrams labeled Sub 1 to Sub 8, showing variations in red and blue patterns.</alt-text>
</graphic>
</fig>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption><p>Visualization of Arousal-dimensional saliency maps from KUL dataset (Sub 1&#x02013;16).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-19-1652274-g0005.tif">
<alt-text>Sixteen circular heat maps labeled Sub 1 to Sub 16, showing varying patterns of red and blue gradients. Each map represents a different subject, displaying unique distributions of color intensity, potentially indicating data variations over the subjects.</alt-text>
</graphic>
</fig>
<p><bold>DEAP</bold>. For arousal, as illustrated in <xref ref-type="fig" rid="F4">Figure 4a</xref>, the temporal and frontal regions of the brain contain a wealth of information. This indicates that these regions are more involved in processing emotions, aligning with findings from previous studies (<xref ref-type="bibr" rid="B18">Gao et al., 2021</xref>; <xref ref-type="bibr" rid="B27">Huang et al., 2012</xref>; <xref ref-type="bibr" rid="B44">Mickley Steinmetz and Kensinger, 2009</xref>). Emotional arousal is predominantly represented in the temporal and frontal lobes. The asymmetry between the frontal and temporal lobes is closely associated with emotion recognition within the arousal dimension. In terms of valence, <xref ref-type="fig" rid="F4">Figure 4b</xref> shows that the parietal and temporal lobes are also rich in information. This observation is consistent with earlier research (<xref ref-type="bibr" rid="B27">Huang et al., 2012</xref>), suggesting that the network effectively learns from these relevant regions.</p>
<p><bold>KUL</bold>. It is expected that the areas of neural activity contributing to speech processing will exhibit greater significance. As illustrated in <xref ref-type="fig" rid="F5">Figure 5</xref>, the average saliency map of the KUL dataset reveals that the frontal and temporal regions contain more substantial information. These findings align with previous research indicating that activation is prominently observed in the frontal and temporal cortices (<xref ref-type="bibr" rid="B11">Ciccarelli et al., 2019</xref>; <xref ref-type="bibr" rid="B19">Geirnaert et al., 2020</xref>; <xref ref-type="bibr" rid="B57">Vandecappelle et al., 2021</xref>).</p>
</sec>
</sec>
<sec id="s6">
<title>6 Conclusion</title>
<p>This paper presents SpikeWavformer, an end-to-end deep learning SNN model that integrates the wavelet transform with spiking transformer architecture. The model combines the global&#x02013;local feature extraction capability of the wavelet transform with the low-power, event-driven computation of spiking neurons, enabling dynamic modeling and efficient processing of EEG signals. This integration supports effective time&#x02013;frequency decomposition, automatic feature extraction, and classification, thereby improving generalization across diverse scenarios. Experiments on two publicly available datasets demonstrate that SpikeWavformer consistently outperforms established methods. The experimental results validate its effectiveness in both emotion recognition and auditory attention decoding tasks, highlighting its potential for deployment in resource-constrained brain&#x02013;computer interface applications. Future deployment of SpikeWavformer on neuromorphic hardware platforms presents both promising opportunities and technical challenges. The energy-efficient characteristics of the approach make it particularly well-suited for implementation on neuromorphic chips, potentially enabling low-power BCI applications in portable devices. However, contemporary neuromorphic architectures are primarily optimized for convolution-based SNNs, necessitating further hardware&#x02013;software co-design efforts to fully realize the benefits of Transformer-based spiking architectures. Overall, this study advances the development of energy-efficient, high-performance brain&#x02013;computer interfaces suitable for resource-constrained practical deployment.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The datasets used in this study are publicly available. The dataset DEAP for this study can be found at <ext-link ext-link-type="uri" xlink:href="https://www.eecs.qmul.ac.uk/mmv/datasets/deap/">https://www.eecs.qmul.ac.uk/mmv/datasets/deap/</ext-link>. The dataset KUL for this study can be found at <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/records/4004271">https://zenodo.org/records/4004271</ext-link>.</p>
</sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>LY: Writing &#x02013; review &#x00026; editing, Writing &#x02013; original draft, Software, Methodology. JW: Writing &#x02013; original draft, Formal analysis. YL: Writing &#x02013; review &#x00026; editing, Supervision, Investigation.</p>
</sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research and/or publication of this article.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s10">
<title>Generative AI statement</title>
<p>The author(s) declare that no Gen AI was used in the creation of this manuscript.</p>
<p>Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Abbott</surname> <given-names>L. F.</given-names></name> <name><surname>Kepler</surname> <given-names>T. B.</given-names></name></person-group> (<year>2005</year>). <article-title>&#x0201C;Model neurons: from hodgkin-huxley to hopfield,&#x0201D;</article-title> in <source>Statistical Mechanics of Neural Networks: Proceedings of the Xlth Sitges Conference Sitges, Barcelona, Spain, 3&#x02013;7 June 1990</source> (<publisher-loc>Springer</publisher-loc>), <fpage>5</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1007/3540532676_37</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Akram</surname> <given-names>S.</given-names></name> <name><surname>Presacco</surname> <given-names>A.</given-names></name> <name><surname>Simon</surname> <given-names>J. Z.</given-names></name> <name><surname>Shamma</surname> <given-names>S. A.</given-names></name> <name><surname>Babadi</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Robust decoding of selective auditory attention from meg in a competing-speaker environment via state-space modeling</article-title>. <source>Neuroimage</source> <volume>124</volume>, <fpage>906</fpage>&#x02013;<lpage>917</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2015.09.048</pub-id><pub-id pub-id-type="pmid">26436490</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alarcao</surname> <given-names>S. M.</given-names></name> <name><surname>Fonseca</surname> <given-names>M. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Emotions recognition using EEG signals: a survey</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>10</volume>, <fpage>374</fpage>&#x02013;<lpage>393</lpage>. <pub-id pub-id-type="doi">10.1109/TAFFC.2017.2714671</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Alzhrani</surname> <given-names>W.</given-names></name> <name><surname>Doborjeh</surname> <given-names>M.</given-names></name> <name><surname>Doborjeh</surname> <given-names>Z.</given-names></name> <name><surname>Kasabov</surname> <given-names>N.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Emotion recognition and understanding using EEG data in a brain-inspired spiking neural network architecture,&#x0201D;</article-title> in <source>2021 International Joint Conference on Neural Networks (IJCNN)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN52387.2021.9533368</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ang</surname> <given-names>K. K.</given-names></name> <name><surname>Chin</surname> <given-names>Z. Y.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Guan</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>&#x0201C;Filter bank common spatial pattern (fbcsp) in brain-computer interface,&#x0201D;</article-title> in <source>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>2390</fpage>&#x02013;<lpage>2397</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN.2008.4634130</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2023</year>). <article-title>A bio-inspired spiking attentional neural network for attentional selection in the listening brain</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst</source>. <volume>35</volume>, <fpage>17387</fpage>&#x02013;<lpage>17397</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2023.3303308</pub-id><pub-id pub-id-type="pmid">37585329</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>S.</given-names></name> <name><surname>Su</surname> <given-names>E.</given-names></name> <name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>EEG-based auditory attention detection via frequency and channel neural attention</article-title>. <source>IEEE Trans. Hum.-Mach. Syst</source>. <volume>52</volume>, <fpage>256</fpage>&#x02013;<lpage>266</lpage>. <pub-id pub-id-type="doi">10.1109/THMS.2021.3125283</pub-id><pub-id pub-id-type="pmid">27534393</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2024</year>). <article-title>EEG-based auditory attention detection with spiking graph convolutional network</article-title>. <source>IEEE Trans. Cogn. Dev. Syst</source>. <volume>16</volume>, <fpage>1698</fpage>&#x02013;<lpage>1706</lpage>. <pub-id pub-id-type="doi">10.1109/TCDS.2024.3376433</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ceolini</surname> <given-names>E.</given-names></name> <name><surname>Hjortkj&#x000E6;r</surname> <given-names>J.</given-names></name> <name><surname>Wong</surname> <given-names>D. D.</given-names></name> <name><surname>O&#x00027;Sullivan</surname> <given-names>J.</given-names></name> <name><surname>Raghavan</surname> <given-names>V. S.</given-names></name> <name><surname>Herrero</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Brain-informed speech separation (biss) for enhancement of target speaker in multitalker speech perception</article-title>. <source>Neuroimage</source> <volume>223</volume>:<fpage>117282</fpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2020.117282</pub-id><pub-id pub-id-type="pmid">32828921</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cherry</surname> <given-names>E. C.</given-names></name></person-group> (<year>1953</year>). <article-title>Some experiments on the recognition of speech, with one and with two ears</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>25</volume>, <fpage>975</fpage>&#x02013;<lpage>979</lpage>. <pub-id pub-id-type="doi">10.1121/1.1907229</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ciccarelli</surname> <given-names>G.</given-names></name> <name><surname>Nolan</surname> <given-names>M.</given-names></name> <name><surname>Perricone</surname> <given-names>J.</given-names></name> <name><surname>Calamia</surname> <given-names>P. T.</given-names></name> <name><surname>Haro</surname> <given-names>S.</given-names></name> <name><surname>O&#x00027;sullivan</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods</article-title>. <source>Sci. Rep</source>. <volume>9</volume>:<fpage>11538</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-019-47795-0</pub-id><pub-id pub-id-type="pmid">31395905</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Das</surname> <given-names>N.</given-names></name> <name><surname>Francart</surname> <given-names>T.</given-names></name> <name><surname>Bertrand</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <source>Auditory Attention Detection Dataset Kuleuven</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Zenodo</publisher-name>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Cheveign&#x000E9;</surname> <given-names>A.</given-names></name> <name><surname>Wong</surname> <given-names>D. D.</given-names></name> <name><surname>Di Liberto</surname> <given-names>G. M.</given-names></name> <name><surname>Hjortkj&#x000E6;r</surname> <given-names>J.</given-names></name> <name><surname>Slaney</surname> <given-names>M.</given-names></name> <name><surname>Lalor</surname> <given-names>E.</given-names></name></person-group> (<year>2018</year>). <article-title>Decoding the auditory brain with canonical component analysis</article-title>. <source>Neuroimage</source> <volume>172</volume>, <fpage>206</fpage>&#x02013;<lpage>216</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2018.01.033</pub-id><pub-id pub-id-type="pmid">29378317</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Taillez</surname> <given-names>T.</given-names></name> <name><surname>Kollmeier</surname> <given-names>B.</given-names></name> <name><surname>Meyer</surname> <given-names>B. T.</given-names></name></person-group> (<year>2020</year>). <article-title>Machine learning for decoding listeners&#x00027; attention from electroencephalography evoked by continuous speech</article-title>. <source>Eur. J. Neurosci</source>. <volume>51</volume>, <fpage>1234</fpage>&#x02013;<lpage>1241</lpage>. <pub-id pub-id-type="doi">10.1111/ejn.13790</pub-id><pub-id pub-id-type="pmid">29205588</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Gu</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Temporal efficient training of spiking neural network via gradient re-weighting</article-title>. <source>arXiv preprint arXiv:2202.11946</source>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Robinson</surname> <given-names>N.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Zeng</surname> <given-names>Q.</given-names></name> <name><surname>Guan</surname> <given-names>C.</given-names></name></person-group> (<year>2022</year>). <article-title>Tsception: capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>14</volume>, <fpage>2238</fpage>&#x02013;<lpage>2250</lpage>. <pub-id pub-id-type="doi">10.1109/TAFFC.2022.3169001</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faghihi</surname> <given-names>F.</given-names></name> <name><surname>Cai</surname> <given-names>S.</given-names></name> <name><surname>Moustafa</surname> <given-names>A. A.</given-names></name></person-group> (<year>2022</year>). <article-title>A neuroscience-inspired spiking neural network for EEG-based auditory spatial attention detection</article-title>. <source>Neural Netw</source>. <volume>152</volume>, <fpage>555</fpage>&#x02013;<lpage>565</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2022.05.003</pub-id><pub-id pub-id-type="pmid">35679747</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>Y.</given-names></name> <name><surname>Cao</surname> <given-names>Z.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>A novel dynamic brain network in arousal for brain states and emotion analysis</article-title>. <source>Mathem. Biosci. Eng</source>. <volume>18</volume>, <fpage>7440</fpage>&#x02013;<lpage>7463</lpage>. <pub-id pub-id-type="doi">10.3934/mbe.2021368</pub-id><pub-id pub-id-type="pmid">34814257</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Geirnaert</surname> <given-names>S.</given-names></name> <name><surname>Francart</surname> <given-names>T.</given-names></name> <name><surname>Bertrand</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns</article-title>. <source>IEEE Trans. Biomed. Eng</source>. <volume>68</volume>, <fpage>1557</fpage>&#x02013;<lpage>1568</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2020.3033446</pub-id><pub-id pub-id-type="pmid">33095706</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gerstner</surname> <given-names>W.</given-names></name> <name><surname>Kistler</surname> <given-names>W. M.</given-names></name></person-group> (<year>2002</year>). <source>Spiking Neuron Models: Single Neurons, Populations, Plasticity</source>. Cambridge: Cambridge University Press. <pub-id pub-id-type="doi">10.1017/CBO9780511815706</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname> <given-names>A. A.</given-names></name></person-group> (<year>1977</year>). <article-title>Optimization of lipschitz continuous functions</article-title>. <source>Mathem. Program.</source> <volume>13</volume>, <fpage>14</fpage>&#x02013;<lpage>22</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gong</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>P.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name></person-group> (<year>2023</year>). <article-title>A spiking neural network with adaptive graph convolution and LSTM for EEG-based brain-computer interfaces</article-title>. <source>IEEE Trans. Neural Syst. Rehabilit. Eng</source>. <volume>31</volume>, <fpage>1440</fpage>&#x02013;<lpage>1450</lpage>. <pub-id pub-id-type="doi">10.1109/TNSRE.2023.3246989</pub-id><pub-id pub-id-type="pmid">37027669</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gouk</surname> <given-names>H.</given-names></name> <name><surname>Frank</surname> <given-names>E.</given-names></name> <name><surname>Pfahringer</surname> <given-names>B.</given-names></name> <name><surname>Cree</surname> <given-names>M. J.</given-names></name></person-group> (<year>2021</year>). <article-title>Regularisation of neural networks by enforcing lipschitz continuity</article-title>. <source>Mach. Learn.</source> <volume>110</volume>, <fpage>393</fpage>&#x02013;<lpage>416</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-020-05929-w</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grobbelaar</surname> <given-names>M.</given-names></name> <name><surname>Phadikar</surname> <given-names>S.</given-names></name> <name><surname>Ghaderpour</surname> <given-names>E.</given-names></name> <name><surname>Struck</surname> <given-names>A. F.</given-names></name> <name><surname>Sinha</surname> <given-names>N.</given-names></name> <name><surname>Ghosh</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>A survey on denoising techniques of electroencephalogram signals using wavelet transform</article-title>. <source>Signals</source> <volume>3</volume>, <fpage>577</fpage>&#x02013;<lpage>586</lpage>. <pub-id pub-id-type="doi">10.3390/signals3030035</pub-id><pub-id pub-id-type="pmid">23314762</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hager</surname> <given-names>W. W.</given-names></name></person-group> (<year>1979</year>). <article-title>Lipschitz continuity for constrained processes</article-title>. <source>SIAM J. Control Optim.</source> <volume>17</volume>, <fpage>321</fpage>&#x02013;<lpage>338</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Horowitz</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;1.1 computing&#x00027;s energy problem (and what we can do about it),&#x0201D;</article-title> in <source>2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>10</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1109/ISSCC.2014.6757323</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>D.</given-names></name> <name><surname>Guan</surname> <given-names>C.</given-names></name> <name><surname>Ang</surname> <given-names>K. K.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Pan</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Asymmetric spatial pattern for EEG-based emotion detection,&#x0201D;</article-title> in <source>The 2012 International Joint Conference on Neural Networks (IJCNN)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN.2012.6252390</pub-id><pub-id pub-id-type="pmid">27534393</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Izhikevich</surname> <given-names>E. M.</given-names></name></person-group> (<year>2003</year>). <article-title>Simple model of spiking neurons</article-title>. <source>IEEE Trans. Neural Netw</source>. <volume>14</volume>, <fpage>1569</fpage>&#x02013;<lpage>1572</lpage>. <pub-id pub-id-type="doi">10.1109/TNN.2003.820440</pub-id><pub-id pub-id-type="pmid">18244602</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiao</surname> <given-names>Z.</given-names></name> <name><surname>Gao</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep convolutional neural networks for mental load classification based on EEG data</article-title>. <source>Pattern Recognit</source>. <volume>76</volume>, <fpage>582</fpage>&#x02013;<lpage>595</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2017.12.002</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koelstra</surname> <given-names>S.</given-names></name> <name><surname>Muhl</surname> <given-names>C.</given-names></name> <name><surname>Soleymani</surname> <given-names>M.</given-names></name> <name><surname>Lee</surname> <given-names>J.-S.</given-names></name> <name><surname>Yazdani</surname> <given-names>A.</given-names></name> <name><surname>Ebrahimi</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Deap: a database for emotion analysis; using physiological signals</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>3</volume>, <fpage>18</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1109/T-AFFC.2011.15</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kwon</surname> <given-names>O.-Y.</given-names></name> <name><surname>Lee</surname> <given-names>M.-H.</given-names></name> <name><surname>Guan</surname> <given-names>C.</given-names></name> <name><surname>Lee</surname> <given-names>S.-W.</given-names></name></person-group> (<year>2019</year>). <article-title>Subject-independent brain-computer interfaces based on deep convolutional neural networks</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst</source>. <volume>31</volume>, <fpage>3839</fpage>&#x02013;<lpage>3852</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2019.2946869</pub-id><pub-id pub-id-type="pmid">31725394</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lawhern</surname> <given-names>V. J.</given-names></name> <name><surname>Solon</surname> <given-names>A. J.</given-names></name> <name><surname>Waytowich</surname> <given-names>N. R.</given-names></name> <name><surname>Gordon</surname> <given-names>S. M.</given-names></name> <name><surname>Hung</surname> <given-names>C. P.</given-names></name> <name><surname>Lance</surname> <given-names>B. J.</given-names></name></person-group> (<year>2018</year>). <article-title>EEGnet: a compact convolutional neural network for EEG-based brain-computer interfaces</article-title>. <source>J. Neural Eng</source>. <volume>15</volume>:<fpage>056013</fpage>. <pub-id pub-id-type="doi">10.1088/1741-2552/aace8c</pub-id><pub-id pub-id-type="pmid">29932424</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lei</surname> <given-names>Z.</given-names></name> <name><surname>Yao</surname> <given-names>M.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Lu</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>&#x0201C;Spike2former: efficient spiking transformer for high-performance image segmentation,&#x0201D;</article-title> in <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>, 1364&#x02013;1372. <pub-id pub-id-type="doi">10.1609/aaai.v39i2.32126</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>He</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>Hierarchical convolutional neural networks for EEG-based emotion recognition</article-title>. <source>Cognit. Comput</source>. <volume>10</volume>, <fpage>368</fpage>&#x02013;<lpage>380</lpage>. <pub-id pub-id-type="doi">10.1007/s12559-017-9533-x</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Si</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>F.</given-names></name> <name><surname>Zhu</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>EEG based emotion recognition by combining functional connectivity network and local activations</article-title>. <source>IEEE Trans. Biomed. Eng</source>. <volume>66</volume>, <fpage>2869</fpage>&#x02013;<lpage>2881</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2019.2897651</pub-id><pub-id pub-id-type="pmid">30735981</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Fang</surname> <given-names>C.</given-names></name> <name><surname>Zhu</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Song</surname> <given-names>A.</given-names></name></person-group> (<year>2023</year>). <article-title>Fractal spiking neural network scheme for EEG-based emotion recognition</article-title>. <source>IEEE J. Translat. Eng. Health Med</source>. <volume>12</volume>, <fpage>106</fpage>&#x02013;<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1109/JTEHM.2023.3320132</pub-id><pub-id pub-id-type="pmid">38088998</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Wang</surname> <given-names>Y.-X.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name></person-group> (<year>2015</year>). <article-title>An FDES-based shared control method for asynchronous brain-actuated robot</article-title>. <source>IEEE Trans. Cybern</source>. <volume>46</volume>, <fpage>1452</fpage>&#x02013;<lpage>1462</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2015.2469278</pub-id><pub-id pub-id-type="pmid">26357416</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lotte</surname> <given-names>F.</given-names></name> <name><surname>Guan</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms</article-title>. <source>IEEE Trans. Biomed. Eng</source>. <volume>58</volume>, <fpage>355</fpage>&#x02013;<lpage>362</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2010.2082539</pub-id><pub-id pub-id-type="pmid">20889426</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>C.</given-names></name> <name><surname>Du</surname> <given-names>H.</given-names></name> <name><surname>Wei</surname> <given-names>W.</given-names></name> <name><surname>Sun</surname> <given-names>Q.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Zeng</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>Estsformer: efficient spatio-temporal spiking transformer</article-title>. <source>Neural Netw</source>. <volume>191</volume>:<fpage>107786</fpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2025.107786</pub-id><pub-id pub-id-type="pmid">40614455</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Yao</surname> <given-names>M.</given-names></name> <name><surname>Chou</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>B.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name></person-group> (<year>2024</year>). <article-title>&#x0201C;Integer-valued training and spike-driven inference spiking neural network for high-performance and energy-efficient object detection,&#x0201D;</article-title> in <source>European Conference on Computer Vision</source> (<publisher-loc>Springer</publisher-loc>), <fpage>253</fpage>&#x02013;<lpage>272</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-031-73411-3_15</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maass</surname> <given-names>W.</given-names></name></person-group> (<year>1997</year>). <article-title>Networks of spiking neurons: the third generation of neural network models</article-title>. <source>Neural Netw</source>. <volume>10</volume>, <fpage>1659</fpage>&#x02013;<lpage>1671</lpage>. <pub-id pub-id-type="doi">10.1016/S0893-6080(97)00011-7</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Masquelier</surname> <given-names>T.</given-names></name> <name><surname>Guyonneau</surname> <given-names>R.</given-names></name> <name><surname>Thorpe</surname> <given-names>S. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Spike timing dependent plasticity finds the start of repeating patterns in continuous spike trains</article-title>. <source>PLoS ONE</source> <volume>3</volume>:<fpage>e1377</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0001377</pub-id><pub-id pub-id-type="pmid">18167538</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mesgarani</surname> <given-names>N.</given-names></name> <name><surname>Chang</surname> <given-names>E. F.</given-names></name></person-group> (<year>2012</year>). <article-title>Selective cortical representation of attended speaker in multi-talker speech perception</article-title>. <source>Nature</source> <volume>485</volume>, <fpage>233</fpage>&#x02013;<lpage>236</lpage>. <pub-id pub-id-type="doi">10.1038/nature11020</pub-id><pub-id pub-id-type="pmid">22522927</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mickley Steinmetz</surname> <given-names>K. R.</given-names></name> <name><surname>Kensinger</surname> <given-names>E. A.</given-names></name></person-group> (<year>2009</year>). <article-title>The effects of valence and arousal on the neural activity leading to subsequent memory</article-title>. <source>Psychophysiology</source> <volume>46</volume>, <fpage>1190</fpage>&#x02013;<lpage>1199</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.2009.00868.x</pub-id><pub-id pub-id-type="pmid">19674398</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;sullivan</surname> <given-names>J. A.</given-names></name> <name><surname>Power</surname> <given-names>A. J.</given-names></name> <name><surname>Mesgarani</surname> <given-names>N.</given-names></name> <name><surname>Rajaram</surname> <given-names>S.</given-names></name> <name><surname>Foxe</surname> <given-names>J. J.</given-names></name> <name><surname>Shinn-Cunningham</surname> <given-names>B. G.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Attentional selection in a cocktail party environment can be decoded from single-trial EEG</article-title>. <source>Cerebral cortex</source> <volume>25</volume>, <fpage>1697</fpage>&#x02013;<lpage>1706</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bht355</pub-id><pub-id pub-id-type="pmid">24429136</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>Z.</given-names></name> <name><surname>Chua</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Ambikairajah</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks</article-title>. <source>Front. Neurosci</source>. <volume>13</volume>:<fpage>1420</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2019.01420</pub-id><pub-id pub-id-type="pmid">32038132</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schirrmeister</surname> <given-names>R. T.</given-names></name> <name><surname>Springenberg</surname> <given-names>J. T.</given-names></name> <name><surname>Fiederer</surname> <given-names>L. D. J.</given-names></name> <name><surname>Glasstetter</surname> <given-names>M.</given-names></name> <name><surname>Eggensperger</surname> <given-names>K.</given-names></name> <name><surname>Tangermann</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Deep learning with convolutional neural networks for EEG decoding and visualization</article-title>. <source>Hum. Brain Mapp</source>. <volume>38</volume>, <fpage>5391</fpage>&#x02013;<lpage>5420</lpage>. <pub-id pub-id-type="doi">10.1002/hbm.23730</pub-id><pub-id pub-id-type="pmid">28782865</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sengupta</surname> <given-names>A.</given-names></name> <name><surname>Ye</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Roy</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Going deeper in spiking neural networks: VGG and residual architectures</article-title>. <source>Front. Neurosci</source>. <volume>13</volume>:<fpage>95</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2019.00095</pub-id><pub-id pub-id-type="pmid">30899212</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>X.</given-names></name> <name><surname>Hao</surname> <given-names>Z.</given-names></name> <name><surname>Yu</surname> <given-names>Z.</given-names></name></person-group> (<year>2024</year>). <article-title>&#x0201C;Spikingresformer: bridging resnet and vision transformer in spiking neural networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>, 5610&#x02013;5619. <pub-id pub-id-type="doi">10.1109/CVPR52733.2024.00536</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Vedaldi</surname> <given-names>A.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Deep inside convolutional networks: visualising image classification models and saliency maps</article-title>. <source>arXiv preprint arXiv:1312.6034</source>.</citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>A. K.</given-names></name> <name><surname>Krishnan</surname> <given-names>S.</given-names></name></person-group> (<year>2023</year>). <article-title>Trends in EEG signal feature extraction applications</article-title>. <source>Front. Artif. Intell</source>. <volume>5</volume>:<fpage>1072801</fpage>. <pub-id pub-id-type="doi">10.3389/frai.2022.1072801</pub-id><pub-id pub-id-type="pmid">36760718</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>T.</given-names></name> <name><surname>Zheng</surname> <given-names>W.</given-names></name> <name><surname>Song</surname> <given-names>P.</given-names></name> <name><surname>Cui</surname> <given-names>Z.</given-names></name></person-group> (<year>2018</year>). <article-title>EEG emotion recognition using dynamical graph convolutional neural networks</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>11</volume>, <fpage>532</fpage>&#x02013;<lpage>541</lpage>. <pub-id pub-id-type="doi">10.1109/TAFFC.2018.2817622</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Su</surname> <given-names>E.</given-names></name> <name><surname>Cai</surname> <given-names>S.</given-names></name> <name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Schultz</surname> <given-names>T.</given-names></name></person-group> (<year>2022</year>). <article-title>Stanet: a spatiotemporal attention network for decoding auditory spatial attention from EEG</article-title>. <source>IEEE Trans. Biomed. Eng</source>. <volume>69</volume>, <fpage>2233</fpage>&#x02013;<lpage>2242</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2022.3140246</pub-id><pub-id pub-id-type="pmid">34982671</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Subasi</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <source>Practical Guide for Biomedical Signals Analysis Using Machine Learning Techniques: A MATLAB Based Approach</source>. New York: Academic Press. <pub-id pub-id-type="doi">10.1016/B978-0-12-817444-9.00002-7</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>C.</given-names></name> <name><surname>&#x00160;arlija</surname> <given-names>M.</given-names></name> <name><surname>Kasabov</surname> <given-names>N.</given-names></name></person-group> (<year>2021</year>). <article-title>Neurosense: short-term emotion recognition and understanding based on spiking neural network modelling of spatio-temporal EEG patterns</article-title>. <source>Neurocomputing</source> <volume>434</volume>, <fpage>137</fpage>&#x02013;<lpage>148</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2020.12.098</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vallabhaneni</surname> <given-names>R. B.</given-names></name> <name><surname>Sharma</surname> <given-names>P.</given-names></name> <name><surname>Kumar</surname> <given-names>V.</given-names></name> <name><surname>Kulshreshtha</surname> <given-names>V.</given-names></name> <name><surname>Reddy</surname> <given-names>K. J.</given-names></name> <name><surname>Kumar</surname> <given-names>S. S.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Deep learning algorithms in EEG signal decoding application: a review</article-title>. <source>IEEE Access</source> <volume>9</volume>, <fpage>125778</fpage>&#x02013;<lpage>125786</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3105917</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vandecappelle</surname> <given-names>S.</given-names></name> <name><surname>Deckers</surname> <given-names>L.</given-names></name> <name><surname>Das</surname> <given-names>N.</given-names></name> <name><surname>Ansari</surname> <given-names>A. H.</given-names></name> <name><surname>Bertrand</surname> <given-names>A.</given-names></name> <name><surname>Francart</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>EEG-based detection of the locus of auditory attention with convolutional neural networks</article-title>. <source>Elife</source> <volume>10</volume>:<fpage>e56481</fpage>. <pub-id pub-id-type="doi">10.7554/eLife.56481</pub-id><pub-id pub-id-type="pmid">33929315</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A.</given-names></name> <name><surname>Shazeer</surname> <given-names>N.</given-names></name> <name><surname>Parmar</surname> <given-names>N.</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J.</given-names></name> <name><surname>Jones</surname> <given-names>L.</given-names></name> <name><surname>Gomez</surname> <given-names>A. N.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>&#x0201C;Attention is all you need,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source>, 30.</citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Shi</surname> <given-names>K.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Wei</surname> <given-names>W.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Global-local convolution with spiking neural networks for energy-efficient keyword spotting</article-title>. <source>arXiv preprint arXiv:2406.13179</source>.</citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Belatreche</surname> <given-names>A.</given-names></name> <name><surname>Xiao</surname> <given-names>Y.</given-names></name> <name><surname>Liang</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>Spiking vision transformer with saccadic attention</article-title>. <source>arXiv preprint arXiv:2502.12677</source>.</citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Shi</surname> <given-names>K.</given-names></name> <name><surname>Lu</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Qu</surname> <given-names>H.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;Spatial-temporal self-attention for asynchronous spiking neural networks,&#x0201D;</article-title> in <source>IJCAI</source>, 3085&#x02013;3093. <pub-id pub-id-type="doi">10.24963/ijcai.2023/344</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.-K.</given-names></name> <name><surname>Jung</surname> <given-names>T.-P.</given-names></name> <name><surname>Lin</surname> <given-names>C.-T.</given-names></name></person-group> (<year>2015</year>). <article-title>EEG-based attention tracking during distracted driving</article-title>. <source>IEEE Trans. Neural Syst. Rehabilit. Eng</source>. <volume>23</volume>, <fpage>1085</fpage>&#x02013;<lpage>1094</lpage>. <pub-id pub-id-type="doi">10.1109/TNSRE.2015.2415520</pub-id><pub-id pub-id-type="pmid">25850090</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Belatreche</surname> <given-names>A.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Event-driven learning for spiking neural networks</article-title>. <source>arXiv preprint arXiv:2403.00270</source>.</citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Chua</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Tan</surname> <given-names>K. C.</given-names></name></person-group> (<year>2018</year>). <article-title>A spiking neural network framework for robust sound classification</article-title>. <source>Front. Neurosci</source>. <volume>12</volume>:<fpage>836</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2018.00836</pub-id><pub-id pub-id-type="pmid">30510500</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xing</surname> <given-names>M.</given-names></name> <name><surname>Lee</surname> <given-names>H.</given-names></name> <name><surname>Morrissey</surname> <given-names>Z.</given-names></name> <name><surname>Chung</surname> <given-names>M. K.</given-names></name> <name><surname>Phan</surname> <given-names>K. L.</given-names></name> <name><surname>Klumpp</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Altered dynamic electroencephalography connectome phase-space features of emotion regulation in social anxiety</article-title>. <source>Neuroimage</source> <volume>186</volume>, <fpage>338</fpage>&#x02013;<lpage>349</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2018.10.073</pub-id><pub-id pub-id-type="pmid">30391563</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>F.</given-names></name> <name><surname>Pan</surname> <given-names>D.</given-names></name> <name><surname>Zheng</surname> <given-names>H.</given-names></name> <name><surname>Ouyang</surname> <given-names>Y.</given-names></name> <name><surname>Jia</surname> <given-names>Z.</given-names></name> <name><surname>Zeng</surname> <given-names>H.</given-names></name></person-group> (<year>2024</year>). <article-title>EESCN: a novel spiking neural network method for EEG-based emotion recognition</article-title>. <source>Comput. Methods Programs Biomed</source>. <volume>243</volume>:<fpage>107927</fpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2023.107927</pub-id><pub-id pub-id-type="pmid">38000320</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname> <given-names>M.</given-names></name> <name><surname>Gao</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>G.</given-names></name> <name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>&#x0201C;Temporal-wise attention spiking neural networks for event streams classification,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF International Conference on Computer Vision</source>, 10221&#x02013;10230. <pub-id pub-id-type="doi">10.1109/ICCV48922.2021.01006</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname> <given-names>M.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>T.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Tian</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Spike-driven transformer v2: meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips</article-title>. <source>arXiv preprint arXiv:2404.03663</source>.</citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname> <given-names>M.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Yuan</surname> <given-names>L.</given-names></name> <name><surname>Tian</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2023</year>). &#x0201C;Spike-driven transformer. <italic>Advances in Neural Information Processing Systems</italic>, <fpage>64043</fpage>&#x02013;<lpage>64058</lpage>.</citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname> <given-names>M.</given-names></name> <name><surname>Qiu</surname> <given-names>X.</given-names></name> <name><surname>Hu</surname> <given-names>T.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Chou</surname> <given-names>Y.</given-names></name> <name><surname>Tian</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>Scaling spike-driven transformer with efficient spike firing approximation training</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>47</volume>, <fpage>2973</fpage>&#x02013;<lpage>2990</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2025.3530246</pub-id><pub-id pub-id-type="pmid">40031207</pub-id></citation></ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Shen</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Hou</surname> <given-names>K.</given-names></name> <name><surname>Hu</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine</article-title>. <source>IEEE Trans. Cybern</source>. <volume>51</volume>, <fpage>4386</fpage>&#x02013;<lpage>4399</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2020.2987575</pub-id><pub-id pub-id-type="pmid">32413939</pub-id></citation></ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhong</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Miao</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>EEG-based emotion recognition using regularized graph neural networks</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>13</volume>, <fpage>1290</fpage>&#x02013;<lpage>1301</lpage>. <pub-id pub-id-type="doi">10.1109/TAFFC.2020.2994159</pub-id></citation>
</ref>
<ref id="B73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>C.</given-names></name> <name><surname>Yu</surname> <given-names>L.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Ma</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Zhou</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>Spikingformer: spike-driven residual learning for transformer-based spiking neural network</article-title>. <source>arXiv preprint arXiv:2304.11954</source>.</citation>
</ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Yu</surname> <given-names>L.</given-names></name> <name><surname>Huang</surname> <given-names>L.</given-names></name> <name><surname>Fan</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Qkformer: Hierarchical spiking transformer using qk attention</article-title>. <source>arXiv preprint arXiv:2403.16552</source>.</citation>
</ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>He</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Yan</surname> <given-names>S.</given-names></name> <name><surname>Tian</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Spikformer: When spiking neural network meets transformer</article-title>. <source>arXiv preprint arXiv:2209.15425</source>.</citation>
</ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>R.-J.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Zhao</surname> <given-names>Q.</given-names></name> <name><surname>Deng</surname> <given-names>H.</given-names></name> <name><surname>Duan</surname> <given-names>Y.</given-names></name> <name><surname>Deng</surname> <given-names>L.-J.</given-names></name></person-group> (<year>2024</year>). <article-title>TCJA-SNN: Temporal-channel joint attention for spiking neural networks</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst</source>. <volume>36</volume>, <fpage>5112</fpage>&#x02013;<lpage>5125</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2024.3377717</pub-id><pub-id pub-id-type="pmid">38598397</pub-id></citation></ref>
<ref id="B77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zotev</surname> <given-names>V.</given-names></name> <name><surname>Mayeli</surname> <given-names>A.</given-names></name> <name><surname>Misaki</surname> <given-names>M.</given-names></name> <name><surname>Bodurka</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Emotion self-regulation training in major depressive disorder using simultaneous real-time fMRI and EEG neurofeedback</article-title>. <source>NeuroImage</source> <volume>27</volume>:<fpage>102331</fpage>. <pub-id pub-id-type="doi">10.1016/j.nicl.2020.102331</pub-id><pub-id pub-id-type="pmid">32623140</pub-id></citation></ref>
</ref-list>
</back>
</article>