AUTHOR=Sedi Nzakuna Pierre , D'Auria Emanuele , Paciello Vincenzo , Gallo Vincenzo , Kamavuako Ernest Nlandu , Lay-Ekuakille Aimé , Kyamakya Kyandoghere TITLE=Real-world evaluation of deep learning decoders for motor imagery EEG-based BCIs JOURNAL=Frontiers in Systems Neuroscience VOLUME=Volume 19 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/systems-neuroscience/articles/10.3389/fnsys.2025.1718390 DOI=10.3389/fnsys.2025.1718390 ISSN=1662-5137 ABSTRACT=IntroductionMotor Imagery (MI) Electroencephalography (EEG)-based control in online Brain-Computer Interfaces requires decisions to be made within short temporal windows. However, the majority of published Deep Learning (DL) EEG decoders are developed and validated offline on public datasets using longer window lengths, leaving their real-time applicability unclear.MethodsTo address this gap, we evaluate 10 representative DL decoders, including convolutional neural networks (CNNs), filter-bank CNNs, temporal convolutional networks (TCNs), and attention- and Transformer-based hybrids-under a soft real-time protocol using 2-s windows. We quantify performance using accuracy, sensitivity, precision, miss-as-neutral rate (MANR), false-alarm rate (FAR), information-transfer rate (ITR), and workload. To relate decoder behavior to physiological markers, we examine lateralization indices, mu-band power at C3 vs. C4, and topographical contrasts between MI and neutral conditions.ResultsResults show shifts in performance ranking between offline and online BCI settings, along with a pronounced increase in inter-subject variability. Best online means were FBLight ConvNet 71.7% (±2.1) and EEG-TCNet 70.0% (±5.3), with attention/Transformer designs less stable. Errors were mainly Left-Right swaps while Neutral was comparatively stable. Lateralization indices/topomaps revealed subject-specific μ/β patterns consistent with class-wise precision/sensitivity.DiscussionCompact spectro-temporal CNN backbones combined with lightweight temporal context (such as TCNs or dilated convolutions) deliver more stable performance under short-time windows, whereas deeper attention and Transformer architectures are more susceptible to variation across subjects and sessions. This study establishes a reproducible benchmark and provides actionable guidance for designing and calibrating online-first EEG decoders that remain robust under real-world, short-time constraints.