AUTHOR=Zhao Suhong , Li Zhaohua , Wang Yanan , Zhao Fang , Chen Peipei , Pang Guodong TITLE=Enhancing preoperative HER2 status classification of invasive breast cancers using machine learning models based on clinicopathological and MRI features: a multicenter study JOURNAL=Frontiers in Cell and Developmental Biology VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2025.1669651 DOI=10.3389/fcell.2025.1669651 ISSN=2296-634X ABSTRACT=Rationale and ObjectivesThe human epidermal growth factor receptor 2 (HER2) gene status is crucial for determining treatment efficacy. This study assessed preoperative HER2 classification in breast cancer using machine learning based on clinicopathological and MRI characteristics.Materials and MethodsThis retrospective study involved 1,015 patients (1,030 lesions) across two centers. Patients were divided into training, internal validation, and external validation sets. Nomograms were developed using clinicopathological and MRI features. Predictive models were constructed using decision trees (DT), support vector machines (SVM), k-nearest neighbors (k-NN), artificial neural networks (ANN), and multivariable logistic regression (LR). Model performance was evaluated using receiver operating characteristic curves, decision curve analysis, and calibration curves. Model interpretability was achieved by developing nomograms and employing SHAP (SHapley Additive exPlanations) analysis.ResultsKey variables for distinguishing HER2-positive from HER2-negative cases included regional N category, estrogen receptor, PR (progesterone receptor) status, Ki-67 status, lesion number, distribution quadrant, and accompanying signs. The SVM model achieved the highest AUC of 0.86 (95% confidence interval (CI): 0.81–0.90) in the training set, while the ANN model had an AUC of 0.77 (95% CI: 0.67–0.86) in the internal validation set. In the external validation set, the LR model achieved the highest AUC of 0.66 (95% CI: 0.56–0.76), although the overall performance was modest. For HER2-low versus HER2-zero differentiation, Ki-67 status, lesion number, distribution quadrant, mass shape, early enhancement rate, and ADC (apparent diffusion coefficient) were significant. The SVM model attained the highest AUC of 0.87 (95% CI: 0.83–0.91) in the training set, while the LR model demonstrated superior generalizability, yielding the highest AUCs in both the internal and external validation sets (internal: 0.67, 95% CI: 0.58–0.76; external: 0.74, 95% CI: 0.65–0.83). Radiologists benefited from the nomogram for improved diagnostic accuracy, especially junior radiologists. SHAP analysis revealed that PR status was paramount for HER2-positive classification, whereas mass shape and ADC values were dominant for identifying HER2-low status.ConclusionIntegrating machine learning with clinicopathological and MRI characteristics improves the accuracy of HER2 status classification in breast cancer and enhances diagnostic capabilities for radiologists in clinical practice.