AUTHOR=Li Lilan , Zhan Xuemei , Wu TianTian , Ma Hua TITLE=Optimized encoder-based transformers for improved local and global integration in railway image classification JOURNAL=Frontiers in Computer Science VOLUME=Volume 7 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1658556 DOI=10.3389/fcomp.2025.1658556 ISSN=2624-9898 ABSTRACT=Railway image classification (RIC) represents a critical application in railway infrastructure monitoring, involving the analysis of hyperspectral datasets with complex spatial-spectral relationships unique to railway environments. Nevertheless, Transformer-based methodologies for RIC face obstacles pertaining to the extraction of local features and the efficiency of training processes. To address these challenges, we introduce the Pure Transformer Network (PTN), an entirely Transformer-centric framework tailored for the effective execution of RIC tasks. Our approach improves the amalgamation of local and global data within railway images by utilizing a Patch Embedding Transformer (PET) module that employs an “unfold + attention + fold” mechanism in conjunction with a Transformer module that incorporates relative attention. The PET module harnesses attention mechanisms to replicate convolutional operations, enabling adaptive receptive fields for varying spatial patterns in railway infrastructure, thus circumventing the constraints imposed by fixed convolutional kernels. Additionally, we propose a Memory Efficient Algorithm that achieves 35% training time reduction while preserving accuracy. Thorough assessments conducted on four hyperspectral railway image datasets validate the PTN's exceptional performance, demonstrating superior accuracy compared to existing CNN- and Transformer-based baselines.