AUTHOR=Guo Hui , Yang Ziyu , Zhang Gaopan , Lv Lingling , Zhao Xiongfei TITLE=Meta analysis of the diagnostic efficacy of transformer-based multimodal fusion deep learning models in early Alzheimer’s disease JOURNAL=Frontiers in Neurology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2025.1641548 DOI=10.3389/fneur.2025.1641548 ISSN=1664-2295 ABSTRACT=IntroductionThis study aims to systematically evaluate the diagnostic efficacy of Transformer-based multimodal fusion deep learning models in early Alzheimer’s disease (AD) through a Meta-analysis, providing a scientific basis for clinical applications.MethodsFollowing PRISMA guidelines, databases such as PubMed and Web of Science were searched, and 20 eligible clinical studies (2022-2025) involving 12,897 participants were included. Study quality was assessed using the modified QUADAS-2 tool, statistical analyses were performed with Stata 16.0, effect sizes were pooled via random-effects models, and subgroup analyses, sensitivity analyses, and publication bias tests were conducted.ResultsResults showed that Transformer-based multimodal fusion models exhibited excellent overall diagnostic performance, with a pooled AUC of 0.924 (95% CI: 0.912–0.936), sensitivity of 0.887 (0.865–0.904), specificity of 0.892 (0.871–0.910), and accuracy of 0.879 (0.858–0.897), significantly outperforming traditional single-modality methods. Subgroup analyses revealed that: Three or more modalities achieved a higher AUC (0.935 vs. 0.908 for two modalities, p =0.012). Intermediate fusion strategies (feature-level, AUC=0.931) significantly outperformed early (0.905) and late (0.912) fusion (p <0.05 for both). Multicenter data improved AUC (0.930 vs. 0.918 for single-center, p =0.046), while sample size stratification (<200 vs. ≥200 cases) showed no significant difference (p =0.113). Hybrid Transformer models (Transformer +CNN) trended toward higher AUC (0.928 vs. pure Transformer 0.917, p =0.068) but did not reach statistical significance.DiscussionNotable studies included Khan et al.’s (2024) Dual-3DM3AD model (AUC=0.945 for AD vs. MCI) and Gao et al.’s (2023) generative network (AUC=0.912 under data loss), validating model robustness and feature complementarity. Sensitivity analysis confirmed stable results (AUC range: 0.920–0.928), and Egger’s test (p =0.217) and funnel plot symmetry indicated no significant publication bias. Limitations included a high proportion of single-center data and insufficient model interpretability. Future research should focus on multicenter data integration, interpretable module development, and lightweight design to facilitate clinical translation. Transformer-based multimodal fusion models demonstrate exceptional efficacy in early AD diagnosis, with multimodal integration, feature-level fusion, and multicenter data application as key advantages. They hold promise as core tools for AD “early diagnosis and treatment” but require further optimization for cross-cohort generalization and clinical interpretability.