AUTHOR=Xu Zhirong , Ye Jiayi , Zhong Huohu , Chen Jiemin , Wang Han , Zhang Xiaoqian , Lyu Guorong , Su Shanshan TITLE=Machine learning model for predicting epidermal growth factor receptor expression status in breast cancer using ultrasound radiomics JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1683164 DOI=10.3389/fonc.2025.1683164 ISSN=2234-943X ABSTRACT=Background/objectivesThe epidermal growth factor receptor (EGFR) is a clinically important target, as its expression in patients with breast cancer influences both overall and disease-free survival. Current methods for assessing EGFR expression status in a patient are invasive. Therefore, in this study, we developed a machine learning-based approach utilizing ultrasound radiomics to non-invasively predict EGFR expression status in patients with breast cancer.MethodsRadiomic features were extracted from grayscale and wavelet-transformed ultrasound images of 321 patients. The dataset was randomly split into training (n = 225) and test (n = 96) sets at a 7:3 ratio with stratified sampling to preserve the EGFR+/– ratio. Key predictors were identified using a multi-step procedure—including reproducibility filtering (ICC > 0.75), univariate F-test filtering (p < 0.05), and L1-regularized selection via LASSO regression. Seven machine-learning models were trained. Model interpretability was assessed using SHAP (Shapley Additive Explanations). In addition to the hold-out evaluation, we performed stratified 10-fold cross-validation to reduce selection bias.ResultsThe random forest model demonstrated the optimal performance, with an area under the receiver operating characteristic curve of 0.86 in the training set and 0.70 in the test set. It significantly outperformed the other models (P < 0.001). The Shapley additive explanation method was used to interpret the model, revealing that original_ngtdm_Coarseness, original_ngtdm_Strength, and wavelet.LL_glcm_ClusterProminence were the top predictors. These features reflect structural compactness and heterogeneity associated with EGFR overexpression.ConclusionsWe present a reliable and interpretable tool for non-invasively assessing EGFR expression status in patients with breast cancer. The most important predictors captured tumor heterogeneity and microstructural uniformity, highlighting the biological relevance of radiomic patterns in EGFR-positive tumors. This model integrates advanced imaging analyses with machine learning, underscoring the potential of radiomics to advance precision oncology.