AUTHOR=Shi Dan , Yang Meng , Dong Min , Xuan Ning , Zhu Yinsu , Lv Xiaoqiong , Xie Chao , Xia Fei , Xu Lingchun , Zhang Qinglei , Yin Na 

TITLE=Development and validation of a deep learning model using MR imaging for predicting brain metastases: an accuracy-focused study

JOURNAL=Frontiers in Oncology

VOLUME=Volume 15 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1657604

DOI=10.3389/fonc.2025.1657604

ISSN=2234-943X

ABSTRACT=BackgroundBrain metastases (BM), originating from extracranial malignancies, significantly threaten patient health. Accurate BM identification is crucial but labor-intensive manually. This study developed and validated a system for BM diagnosis, assessing its performance and stability.Methods470 patients diagnosed with BM were divided into an 80% training set (n=379) and a 20% internal test set (n=91) using systematic sampling. An additional 172 patients were retrospectively enrolled for external validation. A comprehensive preprocessing pipeline was implemented. We developed a 3D U-Net model with a ResNet-34 backbone for BM prediction. MRI scans were resampled to 0.833 mm³ isotropic voxels, underwent skull stripping using SynthStrip, and were intensity-normalized via Z-score normalization. The model was trained on MRI scans paired with segmentation masks, utilizing ImageNet-pretrained encoder weights and a patch-based strategy (128×128×128 voxels).ResultsThe model maintained perfect specificity and AUCs across gender and age groups, with no significant differences in other metrics, confirming false positive exclusion unaffected by demographics. By cancer type: Internal testing showed significant difference of AUC (p<0.001) between lung cancer (n=74) and other cancers (n=17). The differences of other performance metrics were not statistically significant (p>0.13), though other cancers showed higher median F1/IoU/MCC. External validation showed other cancers (n=79) had significantly higher precision than lung cancer (n=93) (p<0.05). Lung cancer AUC (0.82) was significantly lower than other cancers (0.89) (p<0.001), suggesting need for sensitivity optimization; both maintained specificity=1.0000. Model time was significantly shorter than manual annotation (internal: 69s vs 113s; external: 66s vs 96s; both p<0.001), with high agreement.ConclusionThe model demonstrated strong robustness and perfect specificity across demographics. While showing cancer type dependency (requiring improved lung cancer sensitivity), its high efficiency (40%-50% time reduction) and generalization provide a solid foundation for clinical translation.