AUTHOR=Bayahya Areej Y. , Jammal Fares , Banjar Haneen , Eassa Fathy Elbouraey , Talabay Omar , Alamri Sultan H. TITLE=Multi-model deep learning for dementia detection: addressing data and model limitations JOURNAL=Frontiers in Neuroscience VOLUME=Volume 19 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2025.1638022 DOI=10.3389/fnins.2025.1638022 ISSN=1662-453X ABSTRACT=IntroductionDeep neural network architectures have transformed medical imaging, particularly in structural MRI (sMRI) classification. However, existing state-of-the-art deep learning models face limitations in preprocessing and feature extraction when classifying dementia-related conditions. This study addresses these challenges by evaluating multiple architectures for dementia diagnosis.MethodsThis study assessed eight pretrained convolutional neural networks (CNNs), a Vision Transformer (ViT), a multimodal attention model, and a capsule network (CapsNet) for classifying three classes: dementia, mild cognitive impairment (MCI), and healthy controls. The dataset, obtained from ADNI, was balanced across classes and comprised 10,000 training images per class, 3,000 validation images per class, and 850 test images per class. Classification was performed using 2D slices from sMRI scans. Performance metrics included accuracy, specificity, and sensitivity.ResultsAmong all evaluated models, the 3D-CNN and multimodal attention models achieved the highest performance, with accuracies of 84% and 86%, specificities of 83% and 86%, and sensitivities of 84% and 86%, respectively. The ViT and CapsNet models achieved 100% sensitivity for Alzheimer’s disease (AD) but demonstrated low precision for AD (43%) and 0% for other classes, indicating class imbalance effects. All models showed reduced performance and bias toward certain classes.DiscussionThe findings highlight the limitations of current architectures in sMRI dementia classification, including suboptimal feature extraction and class-specific biases. While certain models, such as multimodal attention and 3D-CNN, performed better overall, precision and generalization remain challenges. Future work should focus on improved data representation through advanced computer vision methods and architectural modifications to enhance diagnostic accuracy and computational efficiency.