AUTHOR=Wang Szu-Yung , Ye Nian-Zu TITLE=Invisible footprints, visible insights: machine learning reveals Scope 3 emissions JOURNAL=Frontiers in Sustainability VOLUME=Volume 6 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/sustainability/articles/10.3389/frsus.2025.1649150 DOI=10.3389/frsus.2025.1649150 ISSN=2673-4524 ABSTRACT=IntroductionScope 3 greenhouse gas emissions are critical to firms’ carbon footprints yet are often difficult to quantify due to limited direct data, motivating predictive modeling approaches.MethodsWe developed and compared four machine learning algorithms (K-nearest neighbors, random forest, AdaBoost, and XGBoost) to estimate corporate Scope 3 emissions using readily available financial and sustainability performance data. We leverage 10,449 listed firm-level data from 2014 to 2023, covering major industries such as semiconductor, steel, textile, and building materials, evaluating performance of each model by a held-out test set with metrics including R2, mean absolute percentage error (MAPE), and root mean squared logarithmic error (RMSLE).ResultsXGBoost achieved the highest accuracy (R2 = 0.85, MAPE = 15%, RMSLE = 0.20), outperforming random forest (R2 = 0.80, MAPE = 20%) and AdaBoost (R2 = 0.78), while K-NN had the lowest accuracy (R2 = 0.60). The results demonstrate that ensemble tree-based models substantially improve Scope 3 emission prediction accuracy over simpler models.DiscussionNotably, random forest’s interpretable feature importance provided insight into key emission drivers with only a slight accuracy trade-off, highlighting the balance between predictive accuracy and model interpretability.