Introduction

Front. Neurosci.

Frontiers in Neuroscience

Front. Neurosci.

1662-453X

Frontiers Media S.A.

10.3389/fnins.2025.1652274

Neuroscience

Original Research

Spiking neural networks for EEG signal analysis using wavelet transform

Yuan

Wei

Jian

Liu

Ying

Academy of Military Sciences, Beijing, China

Edited by: Gaetano Di Caterina, University of Strathclyde, United Kingdom

Reviewed by: Anguo Zhang, Fuzhou University, China

Zihan Pan, Institute for Infocomm Research (A^*STAR), Singapore

*Correspondence: Ying Liu hello1668@163.com

16 10 2025

2025

1652274

23 06 2025 17 09 2025

2025

Yuan, Wei and Liu

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Introduction

Brain-computer interfaces (BCIs) leverage EEG signal processing to enable human-machine communication and have broad application potential. However, existing deep learning-based BCI methods face two critical limitations that hinder their practical deployment: reliance on manual EEG feature extraction, which constrains their ability to adaptively capture complex neural patterns, and high energy consumption characteristics that make them unsuitable for resource-constrained portable BCI devices requiring edge deployment.

Methods

To address these limitations, this work combines wavelet transform for automatic feature extraction with spiking neural networks for energy-efficient computation. Specifically, we present a novel spiking transformer that integrates a spiking self-attention mechanism with discrete wavelet transform, termed SpikeWavformer. SpikeWavformer enables automatic EEG signal time-frequency decomposition, eliminates manual feature extraction, and provides energy-efficient classification decision-making, thereby enhancing the model's cross-scene generalization while meeting the constraints of portable BCI applications.

Results

Experimental results demonstrate the effectiveness and efficiency of SpikeWavformer in emotion recognition and auditory attention decoding tasks.

Discussion

These findings indicate that SpikeWavformer can address the key limitations of existing BCI methods and holds promise for practical deployment in portable, resource-constrained scenarios.

spiking neural networks EEG signal analysis brain-computer interfaces discrete wavelet transform bio-inspired methods

section-at-acceptance

Neuromorphic Engineering

1 Introduction

Brain-computer interfaces (BCIs) enable direct communication between the human brain and machines through electroencephalography (EEG) signal processing (Zhang et al., 2020). A typical BCI architecture comprises four functional modules: data acquisition, preprocessing, classification, and a feedback module (Lotte and Guan, 2010). BCI systems have demonstrated extensive real-world applicability in diverse domains including robotic manipulation (Liu et al., 2015), cognitive signal decoding (Cai et al., 2021), and neuropsychiatric interventions for emotional regulation (Zotev et al., 2020; Xing et al., 2019). As a common learning-based BCI method, deep learning methodology has demonstrated superior performance over conventional machine learning approaches across diverse BCI tasks (Ang et al., 2008; Wang et al., 2015), including motor imagery classification (Schirrmeister et al., 2017; Kwon et al., 2019), mental workload monitoring (Jiao et al., 2018), auditory attention decoding (Faghihi et al., 2022; Cai et al., 2024), and emotion recognition (Alarcao and Fonseca, 2017; Li et al., 2018). Nevertheless, previous research has predominantly relied on manually extracted EEG features such as power spectral density (PSD) and differential entropy (DE) (Jiao et al., 2018; Song et al., 2018; Zhong et al., 2020), whose limitations become increasingly evident. First, these feature extraction paradigms exhibit strong dependence on domain-specific knowledge (Singh and Krishnan, 2023; Subasi, 2019), necessitating task-specific extraction pipelines tailored to distinct experimental protocols, thereby compromising model generalizability across tasks. Second, manually crafted features often fail to capture nonlinear interrelationships in EEG time-frequency characteristics and multiscale dynamic properties (Singh and Krishnan, 2023; Vallabhaneni et al., 2021), potentially leading to critical information loss.

Wavelet Transform (WT) has emerged as a fundamental signal processing tool in EEG analysis (Grobbelaar et al., 2022) due to its unique time-frequency analysis capabilities. Unlike conventional Fourier Transform that provides only global frequency-domain information, WT enables multi-scale decomposition through its inherent multi-resolution analysis. This capability permits simultaneous signal characterization at distinct resolution levels-capturing macroscopic patterns (e.g., global trends) at coarse-grained scales while resolving microscopic fluctuations (e.g., localized variations) at fine-grained scales when processing electroencephalographic (EEG) signals. Furthermore, WT achieves adaptive hierarchical representation of non-stationary neural activities by dynamically adjusting the scale and translation parameters of basis functions, thereby effectively characterizing both transient features (e.g., high-frequency oscillations in event-related potentials) and long-range rhythmic patterns (e.g., sustained α-wave oscillations). Although recent years have witnessed preliminary applications of wavelet transform methodologies in EEG classification tasks. However, their predominant reliance on deep neural networks (DNNs) introduces computationally and resource-intensive demands, conflicting with the low-power objectives of resource-constrained portable BCI devices. Consequently, achieving optimal trade-offs between classification performance, system portability, and energy efficiency remains a critical challenge in practical BCI implementations.

Spiking neural networks (SNNs), recognized as third-generation neural networks, have emerged as a promising alternative in BCI research due to their biologically plausible computation paradigm (Izhikevich, 2003; Maass, 1997; Masquelier et al., 2008). As shown in Figure 1, instead of continuous activations in deep neural networks (DNNs), SNNs employ discrete spike events as neuronal communication media, where spiking neurons activate exclusively upon reaching threshold potentials and remain quiescent otherwise (Gerstner and Kistler, 2002). This event-driven mechanism (Wei et al., 2024) facilitates synaptic computation sparsity while eliminating multiply-accumulate (MAC) operations, thereby achieving superior energy efficiency, which is critical for portable neurotechnological devices. Notably, SNNs have demonstrated remarkable success across multiple computational neuroscience domains in recent years. For instance, the energy-efficient Spike Transformer architectures proposed by Yao et al. (2023, 2024, 2025) and Zhou et al. (2022, 2023) have demonstrated exceptional performance in image classification (Deng et al., 2022; Shi et al., 2024), detection (Luo et al., 2024; Wang et al., 2025), and segmentation (Lei et al., 2025). Similarly, the SNN-based audio processing models developed by Wu et al. (2018); Pan et al. (2020); Wang et al. (2024) have made significant advancements in signal processing and keyword recognition. These successes establish a solid foundation for the broader adoption and cross-domain application of SNNs.

Figure 1

Comparison of neuron models in deep neural networks (DNNs) and spiking neural networks (SNNs). (a) Conventional DNNs neuron model processes continuous-valued inputs, where x represents input activations, w denotes synaptic weights, b is the bias term, and Y corresponds to the output activation. (b) Typical spiking neuron model that processes discrete spike events, with s_i representing input spikes, w indicating synaptic weights, and Y signifying the output spike train.

Diagram comparing two models: (a) shows a traditional neural network with inputs \(x_1\), \(x_2\), \(x_3\) passing through synapses with weights, processed by a soma and an activation function to produce output \(Y\). (b) depicts a spiking neural network with pre-spikes \(s_1\), \(s_2\), \(s_3\) processed similarly through synapses and a soma that exhibits neural dynamics, resulting in spikes as output \(Y\).

In this paper, we propose a novel BCI signal analysis framework that integrates wavelet transform with an spiking self-attention mechanism. This framework enables dynamic modeling and efficient computation of non-smooth EEG signals by combining brain-inspired spiking neural networks with the global-local feature extraction capabilities of the wavelet domain. Our approach not only overcomes the limitations of traditional manual feature extraction but also demonstrates, for the first time, the synergistic effectiveness of spiking self-attention and wavelet transform in cross-task scenarios through end-to-end training. In experimental evaluations focused on emotion recognition and auditory attention decoding tasks, our method achieves outstanding performance. The main contributions of this work are summarized as follows:

We propose a novel spiking self-attention module integrated with discrete wavelet transform (DWT) for EEG signal processing. This innovative module simultaneously captures global rhythmic patterns and local transient features through multi-scale wavelet decomposition. Leveraging the spatio-temporal dynamics of spiking neurons, it effectively models nonlinear feature dependencies while replacing traditional Transformer's dense attention with efficient sparse pulse sequences.

We present SpikeWavformer, the first end-to-end spiking neural network framework specifically designed for multi-task BCI analysis. The framework unifies time-frequency decomposition, dynamic feature selection, and classification within a biologically plausible computational paradigm. Its cascade architecture combines reversible wavelet transforms with spiking self-attention layers, enabling adaptive optimization across diverse BCI tasks including emotion recognition and auditory decoding.

We conduct comprehensive evaluations on multiple public benchmark datasets to validate the effectiveness of SpikeWavformer. Experimental results demonstrate superior performance compared to existing methods, particularly in resource-constrained environments. The framework shows significant practical potential for real-world BCI applications, achieving state-of-the-art results while maintaining low computational overhead.

2 Related works 2.1 SNNs for EEG signal processing tasks

EEG-based BCIs have demonstrated significant potential across various downstream tasks, with auditory attention decoding (AAD) and emotion recognition representing two prominent application domains. In AAD research, the challenge stems from the cocktail party effect—the neurocognitive ability to selectively focus on target speakers in multi-talker environments (Cherry, 1953), which contrasts with difficulties experienced by hearing-impaired populations (Cai et al., 2024). Neurophysiological signal analyses through ECoG (Mesgarani and Chang, 2012), MEG (Akram et al., 2016), and EEG (O'sullivan et al., 2015) have enabled AAD implementations, catalyzing developments in neuro-steered hearing aids (Ceolini et al., 2020). For emotion recognition, the field seeks to model higher-order cognitive functions encoded in neurophysiological signals (Tan et al., 2021). While emotional states manifest through various modalities, the susceptibility of physical expressions to masking effects positions non-invasive EEG as a robust solution for emotion decoding (Xu et al., 2024; Li et al., 2019).

SNNs have emerged as a promising computational framework for both applications, leveraging their inherent low-latency processing and energy-efficient characteristics. In AAD research, Faghihi et al. (2022) developed efficient left/right attention pattern decoding, while Cai et al. (2023) proposed BSAnet, integrating biologically plausible mechanisms with attention modeling for temporal dynamics capture. Recent advances include spiking GCNs for spatial feature extraction (Cai et al., 2024), demonstrating promising results in low-density electrode scenarios. In emotion recognition, pioneering SNN applications have shown methodological viability. Tan et al. (2021) implemented NeuroSense achieving 78.97%/67.76% (arousal/valence) accuracy on DEAP, while Alzhrani et al. (2021) attained 94.83% accuracy using bidirectional spiking networks on DREAMER. Recent developments include fractal SNN architectures (Li et al., 2023), SGLNet for spatiotemporal extraction (Gong et al., 2023), and EESCN achieving 94.81% accuracy on DEAP and SEED-IV (Xu et al., 2024). However, previous research has predominantly relied on manually extracted EEG features such as power spectral density (PSD) and differential entropy (DE) (Jiao et al., 2018; Song et al., 2018), and automatic EEG feature extraction in this domain remains largely unexplored.

2.2 Spiking self attention mechanism

Traditional SNNs, despite their inherent advantages in energy efficiency and biological plausibility, still exhibit a performance gap compared to their DNN counterparts. Therefore, many recent works have integrated attention mechanisms into SNNs to enhance their performance and capabilities (Yao et al., 2021; Zhu et al., 2024; Zhou et al., 2024; Lu et al., 2025). Yao et al. (2023) addressed this through Spike-Driven Self-Attention (SDSA), reformulating matrix multiplications as masking operations to ensure purely binary spike signal transmission. Building on this foundation, Yao et al. (2024) introduced the Meta-Spikeformer architecture that extended the SDSA operator. Those advancement inspired subsequent research exploring SNN-specific attention mechanisms. Wang et al. (2023) proposed Spatiotemporal Self-Attention (STSA) for SNNs, maintaining asynchronous transmission while capturing spatiotemporal feature dependencies. More recently, Wang et al. (2025) developed Saccade Spike Self-Attention (SSSA), enabling comprehensive spatiotemporal feature processing for holistic visual scene understanding in SNN paradigms. Overall, these novel spiking self-attention mechanisms have significantly advanced SNN performance. However, there remains a lack of effective spiking self-attention designs specifically tailored for EEG signal processing.

3 Preliminary 3.1 Leaky integrate-and-fire neuron

SNNs rely on spiking neurons (Maass, 1997) as their basic unit of information transfer, and common spiking neurons include the Hodgkin-Huxley (Abbott and Kepler, 2005), Izhikevich (Izhikevich, 2003), and Leaky Integrate-and-Fire (LIF) (Izhikevich, 2003) model. In this work, we use the LIF model as the spiking neuron in the proposed method. The LIF model is a simple and effective spiking neuron model. When the membrane potential reaches a certain threshold, the neuron emits a spike, followed by a reset of the membrane potential to the resting potential V_reset. The dynamic model of LIF is described as:

(1)H[t]=V[t-1]+1τ(X[t]-(V[t-1]-Vreset)),

(2)S[t]=Θ(H[t]-Vth),

(3)V[t]=H[t](1-S[t])+VresetS[t],

where τ is the membrane time constant, and X[t] is the input current at time step t. When the membrane potential H[t] exceeds the firing threshold V_th, the spiking neuron triggers a spike S[t]. Θ(·) is the Heaviside step function which equals 1 for v≥0 and 0 otherwise. V[t] represents the membrane potential after the trigger event which equals H[t] if no spike is generated, and otherwise equals to V_reset.

3.2 Wavelet transform

Wavelet transforms (WTs) are powerful signal-processing tools that enable the localization of signals in both time and frequency domains, which is particularly useful for analyzing non-stationary signals like EEG. The discrete wavelet transform (DWT), in particular, provides an efficient method for multi-resolution analysis by decomposing signals into sub-bands corresponding to different frequency scales. This decomposition enables the extraction of local features at various scales, making it well-suited for EEG signal processing. EEG signals are nonlinear and non-stationary, posing challenges for traditional analysis methods in capturing their time-varying and multiscale nature. Wavelet transforms, and specifically DWT, offer a significant advantage in feature extraction and time-frequency characterization of EEG signals. The DWT decomposes EEG data into frequency bands such as delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (greater than 30 Hz). This decomposition allows us to extract meaningful features from the EEG data that correspond to various cognitive and emotional states.

For our application, we employ the Haar wavelet due to its simplicity and computational efficiency. Haar wavelets are among the earliest and simplest wavelet functions, characterized by a two-tap filter with minimal support, which results in fast computations. Compared to other common wavelets like Daubechies or Morlet, Haar wavelets are computationally less expensive, requiring only additions and binary shifts, which makes them well-suited for real-time, low-power applications such as SNN-based systems. Haar wavelets are particularly efficient in extracting local, low-frequency components (such as delta and theta waves) as well as high-frequency components (like beta and gamma waves), which are essential for distinguishing different cognitive states in EEG analysis. The efficiency and simplicity of Haar wavelets also make them ideal for handling the sparse, event-driven nature of SNNs.

3.3 Spiking self attention mechanism

The Transformer architecture, originally devised for natural language processing tasks (Vaswani et al., 2017), has subsequently permeated multiple subfields of artificial intelligence. At its core lies the self-attention mechanism, which facilitates selective information processing by focusing on relevant contextual elements. Spikformer (Zhou et al., 2022) pioneered the integration of self-attention into SNNs through their Spiking Self-Attention (SSA) framework and spikformer architecture. This approach innovatively employs sparse spiking representations for the query (Q), key (K), and value (V) matrices:

(4)Q=SN(BN(XWQ)),

(5)K=SN(BN(XWK)),

(6)V=SN(BN(XWV)),

here, Q, K, and V form tensors of dimension ℝ^T×C×H×W, with BN(·) representing batch normalization and SN(·) denoting the spiking neuron layer that maintains the attention mechanism's spiking nature. The similarity computation between spiking Q and K matrices proceeds via dot-product:

(7)Score=Sim(Q,K)=QK⊤.

The attention output is subsequently calculated as a scaled weighted sum of V, transformed through spiking neuron activation, and further processed through linear transformation and batch normalization before final spiking neuron conversion to produce the output Z:

(8)Attn=SN(s·Score·V), Z=SN(BN(Linear(Attn))).

4 Methods

In this section, we introduce our approach for EEG-based emotion recognition and auditory attention decoding. First, we define the problem formulation in Section 4.1. Then, we describe the overall data processing workflow in Section 4.2. Finally, we present the proposed Spiking Wavelet Transformer (SpikeWavformer) architecture which integrates wavelet transform and self spiking attention mechanisms in Section 4.4.

4.1 Problem analysis

Given an EEG dataset Deeg, it can be represented as:

(9)Deeg={(xieeg,yi)}i=1N,

where xieeg∈Xeeg denotes the raw EEG input signal for the i-th sample, and yi∈Y represents its corresponding label (emotion category or auditory attention state). Our objective is to learn a spiking neural network model F_θ with parameters θ to predict the class label from the EEG input. The model is optimized by minimizing the expected risk based on the cross-entropy loss L_CE:

(10)argminθE(xeeg,y)~Deeg[LCE(Fθ(xeeg),y)].

In this study, we present a novel spiking transformer model, denoted as F_θ, to learn discriminative spatio-temporal representations directly from raw EEG signals for the joint tasks of emotion recognition and auditory attention decoding. To achieve this, we introduce a novel Spiking Wavelet Self-Attention (SWSA) mechanism within a spiking transformer framework. While conventional Spiking Self-Attention (SSA) enables efficient event-driven computation, it is limited in its ability to capture the multi-scale frequency dynamics intrinsic to non-stationary EEG signals. The proposed SWSA overcomes this limitation by integrating Haar wavelet transforms for joint time—frequency analysis, which offer a minimal filter length and computational simplicity, making them highly efficient for real-time processing. Compared to other wavelet bases, such as Daubechies and Morlet, Haar's shorter filters and multiply-free operations align well with the event-driven, low-power nature of spiking neurons. This integration allows the model to focus on neurophysiologically relevant rhythms (e.g., alpha and beta bands) critical for emotional and attentional processes, while maintaining energy-efficient computation. Finally, a cross-entropy loss function is employed to enable effective gradient-based optimization for learning highly discriminative features across both tasks.

4.2 Workflow

The overall workflow of the proposed method is depicted in Figure 2. Raw EEG signals are first preprocessed and segmented into overlapping windows via a sliding window strategy to preserve temporal continuity. To more effectively capture the spatial characteristics of EEG activity, α-band cortical signals are extracted and projected onto 2D topographic maps, thereby maintaining brain-region dependencies. These maps are subsequently divided into patches and tokenized into fixed-length sequences, which serve as the input to a stack of N spiking encoder blocks. Finally, the resulting features are fed into an MLP classification head to predict the corresponding emotional or attentional state. In summary, this high-level pipeline constitutes the basis of the proposed model architecture, which is elaborated in the following section.

Figure 2

Workflow of the proposed method for EEG-based tasks. First, raw EEG data are preprocessed and segmented via sliding windows. Second, the α-band cortical activity is visualized as 2D topological maps. Finally, the data are tokenized into fixed-length sequences with multiple spiking encoder blocks performing feature extraction and an MLP head outputting the predicted category.

Diagram of the SpikeWaveformer process for analyzing ambulatory EEG data. EEG readings are split into windows and transformed into a spectro-spatial image. This image undergoes a convolution-based projection, followed by spiking encoder blocks. The resulting data is processed through an MLP to classify the output.

4.3 SpikeWavformer

Building on the workflow described above, we design SpikeWavformer—an end-to-end spiking transformer architecture that combines wavelet-based multiscale analysis with spiking attention to enhance EEG feature representation. As shown in Figure 3, The SpikeWavformer can be written as follows:

(11)X=SPS(X),Xl′=SWSA(Xl-1)+Xl-1, l=1,…,LXl=MLP(Xl′)+Xl′, l=1,…,LY=CH(GAP(XL)).

Given the EEG input X, SpikeWavformer first visualizes the spatial focus position via the topographic distribution of oscillatory cortical activities in the α band and converts it into a 2D image. Subsequently, the SPS module partitions the input into patches and progressively extracts features, optionally incorporating wavelet transformation to enhance multiscale feature representation. Then, L× spiking wavelet encoder blocks with spiking wave attention mechanism are employed to encode the features. Finally, the features obtained from extraction and encoding are compressed into a fixed-dimension vector via global average pooling (GAP) and fed into a fully connected layer classification head (CH) to produce classification results.

Figure 3

The overall architecture of our proposed Spiking Wavelet Transformer (SpikeWavformer) for EEG-based tasks, which consists of a spiking patch splitting module, L× spiking wavelet encoder blocks, and a linear classification head.

Diagram of a neural network architecture for image classification. It includes components like Conv2D, Batch Normalization (BN), Max Pooling (MP), and Spiking Wavelet Self Attention (SWSA). The encoder block is repeated, leading to a classification head. Vanilla SSA and SWSA have elements like Linear, BN, LIF neurons, and wavelet transforms (DWT, IDWT). Operations include element-wise addition and matrix dot-product.

4.4 Spiking wavelet encoder block

As an essential neurophysiological signal, EEG plays a pivotal role in research areas such as affective computing and auditory attention decoding. Nevertheless, its multi-channel structure, low signal-to-noise ratio (SNR), pronounced temporal non-stationarity, and intricate time–frequency characteristics present substantial challenges for existing analysis techniques. Conventional CNNs are limited in capturing long-range temporal dependencies inherent in EEG data. In contrast, vanilla Transformers possess strong long-range modeling capability but incur prohibitive computational costs when processing long-sequence EEG signals. Furthermore, many existing approaches employ irreversible downsampling during multi-scale feature extraction, resulting in the loss of critical frequency-domain information. This drawback is particularly detrimental to neural decoding tasks that rely on specific frequency bands.

To address these issues, we propose a Spiking Wavelet Self-Attention (SWSA) mechanism for EEG signal processing. It combines the biological plausibility of SNNs with the flexible time-frequency analysis of wavelet transforms, offering an efficient, biologically inspired solution for EEG-based emotion recognition and auditory attention decoding. Specifically, given multi-channel EEG inputs X∈ℝ^{T×B×C×H×W}, where T denotes time steps, B batch size, C EEG channels, and H×W spatial-topological 2D arrangement. The frequency-domain features of EEG signals are crucial for neuro-decoding. Different frequency bands correspond to different cognitive states: δ with deep sleep, θ with memory encoding, α with relaxation, β with attention and cognitive activities, and γ with perception and higher-order functions. We adopt the Haar wavelet for its minimal filter length and computational simplicity, which enable fast, low-power multiscale decomposition and align well with the event-driven, resource-constrained nature of SNN-based BCI systems. Specifically, the Haar wavelet is used for multiscale decomposition and perform DWT on EEG features at each time step t:

(12)[XLL(t),XLH(t),XHL(t),XHH(t)]=DWT(X(t)),

here, XLL(t) captures low-frequency components (like δ, θ), while high-frequency sub-bands XLH(t), XHL(t), XHH(t) retain high-frequency information (β, γ). Then, spatial local convolution enhances frequency-band interactions:

(13)Xfilt(t)=LIF(BN(Conv(Concat([XLL(t),XLH(t),XHL(t)])))),

here, BN is batch normalization, LIF a spiking neuron layer. IDWT reconstructs spatial-domain features:

(14)Xrecon(t)=IDWT(Xfilt(t)).

Our encoder, inspired by vanilla encoder (Vaswani et al., 2017), first calculates block-input spikes for self-attention. Three matrices Wq∈ℝd×dq, Wk∈ℝd×dk, Wv∈ℝd×dv map tokens to vectors. Spiking neurons convert vectors to spiking sequences Q, K, V:

(15)Q=LIF(BN(XWq)),K=LIF(BN(XWk)),V=LIF(BN(XWv)).

Next, we compute Q-K similarity. Following Zhou et al. (2022), a scaling factor s controls matrix-multiplication magnitude without affecting attention properties:

(16)Xattn=LIF(QK⊤V*s),

(17)Xattn′=LIF(BN(Linear(Xattn))).

To integrate wavelet and attention features effectively, we use channel-wise concatenation:

(18)Xcombined=Concat(Xattn′,Xrecon(t)),

(19)SWSA(X)=LIF(BN(Xcombined)).

By integrating wavelet decomposition with spiking mechanisms, SpikeWavformer enables efficient processing of long-sequence EEG data while facilitating the analysis of cross-frequency neural dynamics, thereby providing richer feature representations for complex neuro decoding tasks. Specifically, we analyze the advantages of integrating wavelet transform into SNNs from the perspectives of convergence and convergence speed. First, We define the EEG signal space as X = {x∈ℝ^T×C×H×W}, where T represents time steps. C denotes channels, and H×W represents spatial dimensions. The discrete wavelet transform operator is defined as:

(20)W:X→Y,

where Y = {(X_LL, X_LH, X_HL, X_HH)} represents the wavelet coefficient space.

The SWSA mechanism can be formalized as a composite operator:

(21)SWSA(X)=F(Attn(W(X)))⊕W-1(W(X)),

where W is the DWT operator, Attn is the spiking attention operator, F is the fusion operator, ⊕ denotes concatenation and W⁻¹ is the inverse DWT (IDWT).

Theorem 1 (Lipschtiz Continuity of SWSA): The SWSA mechanism satisfies Lipschitz continuity (Hager, 1979; Gouk et al., 2021; Goldstein, 1977) with constant L_SWSA, ensuring stable convergence during training.

Proof: First, we establish the Lipschitz properties of individual components: Haar Wavelet Transform Lipschitz Property: For the Haar wavelet transform W, we have: ||W(x₁)−W(x₂)||₂ ≤ L_W||x₁−x₂||₂. Since Haar wavelets are orthonormal, L_W = 1. Spiking Attention Lipschitz Property: For the spiking attention mechanism with LIF neuron, let ϕ(μ) = Θ(μ−V_th) be the spike generation function. The membrane potential dynamics: V[t] = τV[t−1]+X[t]−v_resetS[t−1]. For bounded inputs, the LIF neuron satisfies: |ϕ(μ1)-ϕ(μ2)|≤1Vth|μ1-μ2|. Therefore, LA=1Vth. Combined Operator: The SWSA operator combines these components: ||SWSA(x₁)−SWSA(x₂)||₂ ≤ L_SWSA||x₁−x₂||₂, where LSWSA=LW·LA·LF=LFVth with L_F being the Lipschitz constant of the fusion operation.

Corollary 1: Under the assumption that L_SWSA < 1, the SWSA operator is a contraction mapping, guaranteeing convergence to a unique fixed point.

Theorem 2 (Accelerated convergence): The SWSA mechanism achieves faster convergence compared to vanilla spiking self attention.

Proof: Consider the optimization landscape with loss function L(θ). The gradient update for SWSA parameters follows: θ_t+1 = θ_t−α∇_θL(θ_t). The wavelet decomposition provides a natural regularization through frequency localization:

(22)LSWSA(θ)=Ldata(θ)+λ∑j||Wj||1.

This L₁ regularization on wavelet coefficients promotes sparsity. The convergence rate is bounded by:

(23)L(θr)-L*≤12αT||θ0-θ*||2+αL2σ2,

where the wavelet regularization reduces the effective variance σ², leading to faster convergence.

5 Experiment

This section presents comprehensive experiments to evaluate the effectiveness and efficiency of the proposed SpikeWavformer model. First, we detail the experimental setup, including datasets, preprocessing, and implementation specifics. Second, comparative studies are conducted on the DEAP and KUL datasets, demonstrating superior performance over existing methods in both emotion recognition and auditory attention decoding tasks. Additionally, we provide an analysis of the model's energy efficiency, highlighting its advantages in low-power computing environments.

5.1 Experimental setup 5.1.1 Datasets

DEAP. The DEAP dataset (Koelstra et al., 2011), widely used in emotion recognition research, examines emotional responses to multimedia stimuli by employing peripheral physiological data and EEG signals. It includes 32-channel EEG recordings and various physiological signals, such as skin temperature, blood volume pulse (BVP), respiratory rate, galvanic skin response (GSR), electrooculogram (EOG), and video clips of facial expressions. The facial expressions of the first 22 participants were also recorded. Each participant completed 40 trials, with each trial lasting 1 min and a 3-second baseline recorded before the start of each trial. After each trial, participants filled out a questionnaire to self-report their emotional state in terms of arousal, valence, dominance, and liking, with each dimension rated on a 9-point scale. EEG data were collected using a 32-channel device at a sampling rate of 512 Hz.

KUL. The KUL dataset (Das et al., 2019) comprises EEG data collected using the BioSemi ActivateTwo device. The experimental environment was electromagnetically shielded and soundproofed to minimize potential noise interference. Data were collected from 16 subjects with normal hearing, who were instructed to focus on a specific speaker amidst two speakers. The speakers narrated four Dutch stories. Each subject participated in 8 trials, each lasting 6 min. Auditory stimuli, filtered through HRTF, were presented to the subjects in two forms: from the left or right side, in a randomized manner.

5.1.2 Implementation details

The EEG data from each channel was first re-referenced to the average response of all electrodes. Given that the analyzed EEG signals were collected at different sampling rates, they were all band-pass filtered between 1 and 32 Hz using a 6th-order Chebyshev Type II filter and down sampled to a 128 Hz sampling rate. The frequency range was chosen based on previous nonlinear AAD studies. Finally, the EEG data channels were normalized to ensure a mean of zero and unit variance for each trial. The study on the KUL dataset analyzed seven decision window sizes: 0.1, 0.2, 0.5, 1, 2, 5, and 10 seconds. Experiments were conducted using two NVIDIA RTX 4090 GPUs. The model was optimized using the Adam optimizer with an initial learning rate of 1 × 10⁻⁴ and trained for 200 epochs. For the SNN model parameters, LIF neurons were set with an initial membrane potential of 0, a spiking threshold of 0.5, and a simulation time step of 4. To facilitate effective backpropagation, a sigmoid function with parameter α = 4 was used as the surrogate gradient function, expressed as sigmoid(x) = 1/(1+exp(−αx)). The remaining setup of spiking transformer architecture follows spikformer (Zhou et al., 2022).

5.2 Comparative study

We conduct experiments on the DEAP and KUL datasets using proposed SpikeWavformer and compare the results with existing methods for emotion recognition and auditory attention decoding. As shown in Tables 1, 2, our method achieves state-of-the-art performance on all datasets. On the DEAP dataset for emotion recognition, the SpikeWavformer method reaches an Arousal accuracy of 76.51% (std: 5.48%) and a Valence accuracy of 77.10% (std: 5.68%). Existing methods like EEGNet (Lawhern et al., 2018) achieve 58.29% (std: 8.60%) for Arousal and 54.56% (std: 8.14%) for Valence. SCN (Schirrmeister et al., 2017) attains 61.19% (std: 10.28%) for Arousal and 59.42% (std: 8.30%) for Valence. DCN (Schirrmeister et al., 2017) gets 61.03% (std: 8.58%) for Arousal and 59.92% (std: 7.82%) for Valence. Tsception (Ding et al., 2022) achieves 61.57% (std: 11.04%) for Arousal and 59.14% (std: 7.60%) for Valence.

Table 1

Comparison of different methods on DEAP dataset.

Dataset	Method	Arousal		Valence
Dataset	Method	Acc.	Std	Acc.	Std
DEAP	EEGNet (Lawhern et al., 2018)	58.29%	8.60%	54.56%	8.14%
	SCN (Schirrmeister et al., 2017)	61.19%	10.28%	59.42%	8.30%
	DCN (Schirrmeister et al., 2017)	61.03%	8.58%	59.92%	7.82%
	Tsception (Ding et al., 2022)	61.57%	11.04%	59.14%	7.60%
	SpikeWavformer	76.51%	5.48%	77.10%	5.68%

The bold text refers to the method proposed in this paper.

Table 2

Performance comparison across different decision windows.

Dataset	Model	Decision window (second)
Dataset	Model	0.1	0.2	0.5	1	2	5	10
KUL	Linear (CCA) (De Cheveigné et al., 2018)	50.9	53.6	55.7	60.2	63.5	69.4	75.9
	Non-linear (CNN) (Cai et al., 2021)	74.3	78.2	80.6	84.1	85.7	86.9	87.9
	STAnet (Su et al., 2022)	80.8	84.3	87.2	90.1	91.4	92.6	93.9
	SpikeWavformer	80.5	86.7	94.2	96.5	97.1	97.3	98.6

The bold text refers to the method proposed in this paper.

We further compared the performance of the SpikeWavformer for different detection window sizes, ranging from 0.1 to 10 seconds, with the results presented in Table 2. On the KUL dataset, the SpikeWavformer achieved an average decoding accuracy of 96.5% across all subjects for a 1-second decision window, 97.1% for a 2-second decision window, 97.3% for a 5-second decision window, and 98.6% for a 10-second decision window. Generally, larger decision windows yielded better results, corroborating findings from previous studies (De Taillez et al., 2020; Ciccarelli et al., 2019; Vandecappelle et al., 2021). Notably, our proposed method is capable of decoding auditory spatial attention with a very short decision window of less than 1 second. For decision windows of 0.5 seconds and 0.2 seconds, the SpikeWavformer attained high accuracy rates of 94.2% and 86.7%, respectively. Although the accuracy for the 0.1-second decision window was lower than that of the 1-second decision window, SpikeWavformer maintained a high accuracy rate of 80.5%. In all comparisons with related work (De Cheveigné et al., 2018; Cai et al., 2021; Su et al., 2022), the SpikeWavformer demonstrated competitive performance.

5.3 Energy consumption comparison

In this section, we validate the energy efficiency of our proposed model over its ANN counterpart. Based on the energy calculation standard in neuromorphic computing (Sengupta et al., 2019), we use the method proposed by Wang et al. (2024) to compute the energy consumption ratio between our model and the equivalent ANN model:

(24)Energyrate=ACMAC*SpikingRate*TimeSteps.

In the equation, ACMAC denotes the energy consumption ratio of an accumulate (AC) operation in SNNs to a multiplication (MAC) in ANNs. Extensive studies confirm the theoretical value of ACMAC is 17 (Horowitz, 2014). Here, SpikingRate is the average spiking rate, and TimeSteps the simulation time window. In our model, SpikingRate is 12.3%, and TimeSteps is set to 4. Based on Equation 24, our model achieves over 7× energy efficiency compared to its ANN counterpart.

5.4 Interpretability

In this section, saliency maps (Simonyan et al., 2013) are employed to visualize the areas of the data that contain the most information and contribute to classification performance. The saliency map is one of the most widely used tools for illustrating which regions of the input data hold classification-relevant information. To enhance the visualization of the saliency maps, the original maps were averaged along the time dimension to capture the topology of the EEG channels. Additionally, the normalized saliency maps were averaged across different samples for each subject to produce generalized average saliency maps. The average saliency maps for the DEAP dataset and the KUL dataset are presented in Figures 4, 5, respectively.

Figure 4

Visualization of saliency maps from DEAP dataset (Sub 1–8): (a) Arousal-dimensional saliency maps and (b) valence-dimensional saliency maps.

Two rows of brain activity heatmaps illustrate Arousal-dimensional (top row) and Valence-dimensional (bottom row) data. Each row contains eight diagrams labeled Sub 1 to Sub 8, showing variations in red and blue patterns.

Figure 5

Visualization of Arousal-dimensional saliency maps from KUL dataset (Sub 1–16).

Sixteen circular heat maps labeled Sub 1 to Sub 16, showing varying patterns of red and blue gradients. Each map represents a different subject, displaying unique distributions of color intensity, potentially indicating data variations over the subjects.

DEAP. For arousal, as illustrated in Figure 4a, the temporal and frontal regions of the brain contain a wealth of information. This indicates that these regions are more involved in processing emotions, aligning with findings from previous studies (Gao et al., 2021; Huang et al., 2012; Mickley Steinmetz and Kensinger, 2009). Emotional arousal is predominantly represented in the temporal and frontal lobes. The asymmetry between the frontal and temporal lobes is closely associated with emotion recognition within the arousal dimension. In terms of valence, Figure 4b shows that the parietal and temporal lobes are also rich in information. This observation is consistent with earlier research (Huang et al., 2012), suggesting that the network effectively learns from these relevant regions.

KUL. It is expected that the areas of neural activity contributing to speech processing will exhibit greater significance. As illustrated in Figure 5, the average saliency map of the KUL dataset reveals that the frontal and temporal regions contain more substantial information. These findings align with previous research indicating that activation is prominently observed in the frontal and temporal cortices (Ciccarelli et al., 2019; Geirnaert et al., 2020; Vandecappelle et al., 2021).

6 Conclusion

This paper presents SpikeWavformer, an end-to-end deep learning SNN model that integrates the wavelet transform with spiking transformer architecture. The model combines the global–local feature extraction capability of the wavelet transform with the low-power, event-driven computation of spiking neurons, enabling dynamic modeling and efficient processing of EEG signals. This integration supports effective time–frequency decomposition, automatic feature extraction, and classification, thereby improving generalization across diverse scenarios. Experiments on two publicly available datasets demonstrate that SpikeWavformer consistently outperforms established methods. The experimental results validate its effectiveness in both emotion recognition and auditory attention decoding tasks, highlighting its potential for deployment in resource-constrained brain–computer interface applications. Future deployment of SpikeWavformer on neuromorphic hardware platforms presents both promising opportunities and technical challenges. The energy-efficient characteristics of the approach make it particularly well-suited for implementation on neuromorphic chips, potentially enabling low-power BCI applications in portable devices. However, contemporary neuromorphic architectures are primarily optimized for convolution-based SNNs, necessitating further hardware–software co-design efforts to fully realize the benefits of Transformer-based spiking architectures. Overall, this study advances the development of energy-efficient, high-performance brain–computer interfaces suitable for resource-constrained practical deployment.

Data availability statement

The datasets used in this study are publicly available. The dataset DEAP for this study can be found at https://www.eecs.qmul.ac.uk/mmv/datasets/deap/. The dataset KUL for this study can be found at https://zenodo.org/records/4004271.

Author contributions

LY: Writing – review & editing, Writing – original draft, Software, Methodology. JW: Writing – original draft, Formal analysis. YL: Writing – review & editing, Supervision, Investigation.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References Abbott

L. F.

Kepler

T. B.

(2005). “Model neurons: from hodgkin-huxley to hopfield,” in Statistical Mechanics of Neural Networks: Proceedings of the Xlth Sitges Conference Sitges, Barcelona, Spain, 3–7 June 1990 (Springer), 5–18. 10.1007/3540532676_37 Akram

Presacco

Simon

J. Z.

Shamma

S. A.

Babadi

(2016). Robust decoding of selective auditory attention from meg in a competing-speaker environment via state-space modeling. Neuroimage 124, 906–917. 10.1016/j.neuroimage.2015.09.048

26436490

Alarcao

S. M.

Fonseca

M. J.

(2017). Emotions recognition using EEG signals: a survey. IEEE Trans. Affect. Comput. 10, 374–393. 10.1109/TAFFC.2017.2714671 Alzhrani

Doborjeh

Kasabov

(2021). “Emotion recognition and understanding using EEG data in a brain-inspired spiking neural network architecture,” in 2021 International Joint Conference on Neural Networks (IJCNN) (IEEE), 1–9. 10.1109/IJCNN52387.2021.9533368 Ang

K. K.

Chin

Z. Y.

Zhang

Guan

(2008). “Filter bank common spatial pattern (fbcsp) in brain-computer interface,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (IEEE), 2390–2397. 10.1109/IJCNN.2008.4634130 Cai

(2023). A bio-inspired spiking attentional neural network for attentional selection in the listening brain. IEEE Trans. Neural Netw. Learn. Syst. 35, 17387–17397. 10.1109/TNNLS.2023.3303308

37585329

Cai

Xie

(2021). EEG-based auditory attention detection via frequency and channel neural attention. IEEE Trans. Hum.-Mach. Syst. 52, 256–266. 10.1109/THMS.2021.3125283

27534393

Cai

Zhang

(2024). EEG-based auditory attention detection with spiking graph convolutional network. IEEE Trans. Cogn. Dev. Syst. 16, 1698–1706. 10.1109/TCDS.2024.3376433 Ceolini

Hjortkjær

Wong

D. D.

O'Sullivan

Raghavan

V. S.

Herrero

. (2020). Brain-informed speech separation (biss) for enhancement of target speaker in multitalker speech perception. Neuroimage 223:117282. 10.1016/j.neuroimage.2020.117282

32828921

Cherry

E. C.

(1953). Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25, 975–979. 10.1121/1.1907229 Ciccarelli

Nolan

Perricone

Calamia

P. T.

Haro

O'sullivan

. (2019). Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods. Sci. Rep. 9:11538. 10.1038/s41598-019-47795-0

31395905

Das

Francart

Bertrand

(2019). Auditory Attention Detection Dataset Kuleuven. London: Zenodo. De Cheveigné

Wong

D. D.

Di Liberto

G. M.

Hjortkjær

Slaney

Lalor

(2018). Decoding the auditory brain with canonical component analysis. Neuroimage 172, 206–216. 10.1016/j.neuroimage.2018.01.033

29378317

De Taillez

Kollmeier

Meyer

B. T.

(2020). Machine learning for decoding listeners' attention from electroencephalography evoked by continuous speech. Eur. J. Neurosci. 51, 1234–1241. 10.1111/ejn.13790

29205588

Deng

Zhang

(2022). Temporal efficient training of spiking neural network via gradient re-weighting. arXiv preprint arXiv:2202.11946. Ding

Robinson

Zhang

Zeng

Guan

(2022). Tsception: capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition. IEEE Trans. Affect. Comput. 14, 2238–2250. 10.1109/TAFFC.2022.3169001 Faghihi

Cai

Moustafa

A. A.

(2022). A neuroscience-inspired spiking neural network for EEG-based auditory spatial attention detection. Neural Netw. 152, 555–565. 10.1016/j.neunet.2022.05.003

35679747

Gao

Cao

Liu

Zhang

(2021). A novel dynamic brain network in arousal for brain states and emotion analysis. Mathem. Biosci. Eng. 18, 7440–7463. 10.3934/mbe.2021368

34814257

Geirnaert

Francart

Bertrand

(2020). Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns. IEEE Trans. Biomed. Eng. 68, 1557–1568. 10.1109/TBME.2020.3033446

33095706

Gerstner

Kistler

W. M.

(2002). Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge: Cambridge University Press. 10.1017/CBO9780511815706 Goldstein

A. A.

(1977). Optimization of lipschitz continuous functions. Mathem. Program. 13, 14–22. Gong

Wang

Zhou

Zhang

(2023). A spiking neural network with adaptive graph convolution and LSTM for EEG-based brain-computer interfaces. IEEE Trans. Neural Syst. Rehabilit. Eng. 31, 1440–1450. 10.1109/TNSRE.2023.3246989

37027669

Gouk

Frank

Pfahringer

Cree

M. J.

(2021). Regularisation of neural networks by enforcing lipschitz continuity. Mach. Learn. 110, 393–416. 10.1007/s10994-020-05929-w Grobbelaar

Phadikar

Ghaderpour

Struck

A. F.

Sinha

Ghosh

. (2022). A survey on denoising techniques of electroencephalogram signals using wavelet transform. Signals 3, 577–586. 10.3390/signals3030035

23314762

Hager

W. W.

(1979). Lipschitz continuity for constrained processes. SIAM J. Control Optim. 17, 321–338. Horowitz

(2014). “1.1 computing's energy problem (and what we can do about it),” in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) (IEEE), 10–14. 10.1109/ISSCC.2014.6757323 Huang

Guan

Ang

K. K.

Zhang

Pan

(2012). “Asymmetric spatial pattern for EEG-based emotion detection,” in The 2012 International Joint Conference on Neural Networks (IJCNN) (IEEE), 1–7. 10.1109/IJCNN.2012.6252390

27534393

Izhikevich

E. M.

(2003). Simple model of spiking neurons. IEEE Trans. Neural Netw. 14, 1569–1572. 10.1109/TNN.2003.820440

18244602

Jiao

Gao

Wang

(2018). Deep convolutional neural networks for mental load classification based on EEG data. Pattern Recognit. 76, 582–595. 10.1016/j.patcog.2017.12.002 Koelstra

Muhl

Soleymani

Lee

J.-S.

Yazdani

Ebrahimi

. (2011). Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3, 18–31. 10.1109/T-AFFC.2011.15 Kwon

O.-Y.

Lee

M.-H.

Guan

Lee

S.-W.

(2019). Subject-independent brain-computer interfaces based on deep convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31, 3839–3852. 10.1109/TNNLS.2019.2946869

31725394

Lawhern

V. J.

Solon

A. J.

Waytowich

N. R.

Gordon

S. M.

Hung

C. P.

Lance

B. J.

(2018). EEGnet: a compact convolutional neural network for EEG-based brain-computer interfaces. J. Neural Eng. 15:056013. 10.1088/1741-2552/aace8c

29932424

Lei

Yao

Luo

. (2025). “Spike2former: efficient spiking transformer for high-performance image segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 1364–1372. 10.1609/aaai.v39i2.32126 Li

Zhang

(2018). Hierarchical convolutional neural networks for EEG-based emotion recognition. Cognit. Comput. 10, 368–380. 10.1007/s12559-017-9533-x Li

Liu

Zhu

. (2019). EEG based emotion recognition by combining functional connectivity network and local activations. IEEE Trans. Biomed. Eng. 66, 2869–2881. 10.1109/TBME.2019.2897651

30735981

Fang

Zhu

Chen

Song

(2023). Fractal spiking neural network scheme for EEG-based emotion recognition. IEEE J. Translat. Eng. Health Med. 12, 106–118. 10.1109/JTEHM.2023.3320132

38088998

Liu

Wang

Y.-X.

Zhang

(2015). An FDES-based shared control method for asynchronous brain-actuated robot. IEEE Trans. Cybern. 46, 1452–1462. 10.1109/TCYB.2015.2469278

26357416

Lotte

Guan

(2010). Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms. IEEE Trans. Biomed. Eng. 58, 355–362. 10.1109/TBME.2010.2082539

20889426

Wei

Sun

Wang

Zeng

. (2025). Estsformer: efficient spatio-temporal spiking transformer. Neural Netw. 191:107786. 10.1016/j.neunet.2025.107786

40614455

Luo

Yao

Chou

(2024). “Integer-valued training and spike-driven inference spiking neural network for high-performance and energy-efficient object detection,” in European Conference on Computer Vision (Springer), 253–272. 10.1007/978-3-031-73411-3_15 Maass

(1997). Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10, 1659–1671. 10.1016/S0893-6080(97)00011-7 Masquelier

Guyonneau

Thorpe

S. J.

(2008). Spike timing dependent plasticity finds the start of repeating patterns in continuous spike trains. PLoS ONE 3:e1377. 10.1371/journal.pone.0001377

18167538

Mesgarani

Chang

E. F.

(2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236. 10.1038/nature11020

22522927

Mickley Steinmetz

K. R.

Kensinger

E. A.

(2009). The effects of valence and arousal on the neural activity leading to subsequent memory. Psychophysiology 46, 1190–1199. 10.1111/j.1469-8986.2009.00868.x

19674398

O'sullivan

J. A.

Power

A. J.

Mesgarani

Rajaram

Foxe

J. J.

Shinn-Cunningham

B. G.

. (2015). Attentional selection in a cocktail party environment can be decoded from single-trial EEG. Cerebral cortex 25, 1697–1706. 10.1093/cercor/bht355

24429136

Pan

Chua

Zhang

Ambikairajah

(2020). An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks. Front. Neurosci. 13:1420. 10.3389/fnins.2019.01420

32038132

Schirrmeister

R. T.

Springenberg

J. T.

Fiederer

L. D. J.

Glasstetter

Eggensperger

Tangermann

. (2017). Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 38, 5391–5420. 10.1002/hbm.23730

28782865

Sengupta

Wang

Liu

Roy

(2019). Going deeper in spiking neural networks: VGG and residual architectures. Front. Neurosci. 13:95. 10.3389/fnins.2019.00095

30899212

Shi

Hao

(2024). “Spikingresformer: bridging resnet and vision transformer in spiking neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5610–5619. 10.1109/CVPR52733.2024.00536 Simonyan

Vedaldi

Zisserman

(2013). Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Singh

A. K.

Krishnan

(2023). Trends in EEG signal feature extraction applications. Front. Artif. Intell. 5:1072801. 10.3389/frai.2022.1072801

36760718

Song

Zheng

Song

Cui

(2018). EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans. Affect. Comput. 11, 532–541. 10.1109/TAFFC.2018.2817622 Su

Cai

Xie

Schultz

(2022). Stanet: a spatiotemporal attention network for decoding auditory spatial attention from EEG. IEEE Trans. Biomed. Eng. 69, 2233–2242. 10.1109/TBME.2022.3140246

34982671

Subasi

(2019). Practical Guide for Biomedical Signals Analysis Using Machine Learning Techniques: A MATLAB Based Approach. New York: Academic Press. 10.1016/B978-0-12-817444-9.00002-7 Tan

Šarlija

Kasabov

(2021). Neurosense: short-term emotion recognition and understanding based on spiking neural network modelling of spatio-temporal EEG patterns. Neurocomputing 434, 137–148. 10.1016/j.neucom.2020.12.098 Vallabhaneni

R. B.

Sharma

Kumar

Kulshreshtha

Reddy

K. J.

Kumar

S. S.

. (2021). Deep learning algorithms in EEG signal decoding application: a review. IEEE Access 9, 125778–125786. 10.1109/ACCESS.2021.3105917 Vandecappelle

Deckers

Das

Ansari

A. H.

Bertrand

Francart

(2021). EEG-based detection of the locus of auditory attention with convolutional neural networks. Elife 10:e56481. 10.7554/eLife.56481

33929315

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

. (2017). “Attention is all you need,” in Advances in Neural Information Processing Systems, 30. Wang

Zhang

Shi

Wang

Wei

. (2024). Global-local convolution with spiking neural networks for energy-efficient keyword spotting. arXiv preprint arXiv:2406.13179. Wang

Zhang

Belatreche

Xiao

Liang

. (2025). Spiking vision transformer with saccadic attention. arXiv preprint arXiv:2502.12677. Wang

Shi

Liu

Zhang

(2023). “Spatial-temporal self-attention for asynchronous spiking neural networks,” in IJCAI, 3085–3093. 10.24963/ijcai.2023/344 Wang

Y.-K.

Jung

T.-P.

Lin

C.-T.

(2015). EEG-based attention tracking during distracted driving. IEEE Trans. Neural Syst. Rehabilit. Eng. 23, 1085–1094. 10.1109/TNSRE.2015.2415520

25850090

Wei

Zhang

Belatreche

. (2024). Event-driven learning for spiking neural networks. arXiv preprint arXiv:2403.00270. Wu

Chua

Zhang

Tan

K. C.

(2018). A spiking neural network framework for robust sound classification. Front. Neurosci. 12:836. 10.3389/fnins.2018.00836

30510500

Xing

Lee

Morrissey

Chung

M. K.

Phan

K. L.

Klumpp

. (2019). Altered dynamic electroencephalography connectome phase-space features of emotion regulation in social anxiety. Neuroimage 186, 338–349. 10.1016/j.neuroimage.2018.10.073

30391563

Pan

Zheng

Ouyang

Jia

Zeng

(2024). EESCN: a novel spiking neural network method for EEG-based emotion recognition. Comput. Methods Programs Biomed. 243:107927. 10.1016/j.cmpb.2023.107927

38000320

Yao

Gao

Zhao

Wang

Lin

Yang

. (2021). “Temporal-wise attention spiking neural networks for event streams classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 10221–10230. 10.1109/ICCV48922.2021.01006 Yao

Zhou

Tian

. (2024). Spike-driven transformer v2: meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips. arXiv preprint arXiv:2404.03663. Yao

Zhou

Yuan

Tian

. (2023). “Spike-driven transformer. Advances in Neural Information Processing Systems, 64043–64058. Yao

Qiu

Chou

Tian

. (2025). Scaling spike-driven transformer with efficient spike firing approximation training. IEEE Trans. Pattern Anal. Mach. Intell. 47, 2973–2990. 10.1109/TPAMI.2025.3530246

40031207

Zhang

Liu

Shen

Hou

. (2020). Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE Trans. Cybern. 51, 4386–4399. 10.1109/TCYB.2020.2987575

32413939

Zhong

Wang

Miao

(2020). EEG-based emotion recognition using regularized graph neural networks. IEEE Trans. Affect. Comput. 13, 1290–1301. 10.1109/TAFFC.2020.2994159 Zhou

Zhou

Zhang

Zhou

. (2023). Spikingformer: spike-driven residual learning for transformer-based spiking neural network. arXiv preprint arXiv:2304.11954. Zhou

Zhang

Zhou

Huang

Fan

. (2024). Qkformer: Hierarchical spiking transformer using qk attention. arXiv preprint arXiv:2403.16552. Zhou

Zhu

Wang

Yan

Tian

. (2022). Spikformer: When spiking neural network meets transformer. arXiv preprint arXiv:2209.15425. Zhu

R.-J.

Zhang

Zhao

Deng

Duan

Deng

L.-J.

(2024). TCJA-SNN: Temporal-channel joint attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst. 36, 5112–5125. 10.1109/TNNLS.2024.3377717

38598397

Zotev

Mayeli

Misaki

Bodurka

(2020). Emotion self-regulation training in major depressive disorder using simultaneous real-time fMRI and EEG neurofeedback. NeuroImage 27:102331. 10.1016/j.nicl.2020.102331

32623140