AUTHOR=Tanveer Aleena , Ali Raja Hashim , Majhi Jitendra , Mukherjee Moumita TITLE=Predicting and identifying correlates of inequalities in breast cancer screening uptake using national level data from India JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1729796 DOI=10.3389/frai.2025.1729796 ISSN=2624-8212 ABSTRACT=BackgroundDespite national screening initiatives, coverage of breast cancer screening is low, and late-stage diagnosis remains a major contributor to mortality among Indian women. Accurate, precise, and actionable prediction of socioeconomic and structural inequities in screening uptake is critical for formulating equitable cancer control policies. This study aimed to apply machine learning to predict determinants of screening uptake, estimate inequalities in uptake and their concentration indices, and identify contributing factors to inequity using concentration index decomposition across economic, educational, and caste gradients.MethodsCross-sectional National Family Health Survey (NFHS-5) 2019–2021 data, comprising 68,526 women aged 30–49 years, is used for the study. Levesque’s framework of healthcare access directed variable selection across approachability, acceptability, affordability, availability, and appropriateness dimensions to decide on the set of explanatory covariates. We applied three single learners—Logistic Regression (LR), Naïve Bayes (NB), and Decision Tree (DT)—and two ensemble learners—Random Forest (RF) and XGBoost (XGB)—to train on balanced weighted data. Given the risk of overfitting after the synthetic minority oversampling technique (SMOTE), predictive performance was validated using 10-fold cross-validation. Five evaluation metrics were compared to select the best learner predicting the screening uptake. Inequality was measured using conventional and algorithm-based concentration indices and decomposed using algorithm-based feature importance and feature-specific inequality scores to estimate contributions to three inequality-health gradients in screening access.FindingsIn India, remarkably low (0.9%) screening uptake with clear economic, educational, and social disparities is evident. Although Random Forest and XGBoost performed with higher predictive accuracy (96%) and explainability (AUROC = 0.99), Decision Tree brought stable generalizability (mean AUROC = 0.995) after 10-fold validation. Feature importance results indicate that education, autonomy, interactions with community health workers, provincial and spatial features explain most of the variability. Proximity, transport availability, hesitancy in unaccompanied care seeking, and financial constraints were access barriers with limited contribution to the variation in screening uptake. Concentration index estimates reflect a pro-rich (0.1, p < 0.001), pro-educated (0.182, p < 0.001), and pro-marginalized social gradient (−0.011, p < 0.05). Tree-based decomposition predicts higher affordability, and education deepens pro-rich and pro-educated inequalities but can be an effective policy instrument to mitigate social position-based disparities if contributions can be increased. Access-related barriers intensified inequality across all gradients. Nevertheless, factors that enable access flatten the gradients.ConclusionMachine learning models can improve decision making, enhancing accuracy and precision in inequity prediction for breast cancer screening uptake and revealing crucial gradients and access barriers shaping breast cancer screening uptake in India. ML-based predictions that offer higher explainability suggest that financial protection, spatial accessibility to health centers, access to education, autonomy, higher contact with community health workers, and community-based awareness programs targeting poor, less educated, socially disadvantaged middle-aged women are likely to smooth the economic, educational disparities in screening coverage, claiming a requirement of deeper investigation with respect to social gradients.