AUTHOR=Liao Jiajia , Chen Zhijie , Jin Wanqing 

TITLE=Uncovering predictors of myopia in youth: a secondary data analysis using a machine learning approach

JOURNAL=Frontiers in Medicine

VOLUME=Volume 12 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1595320

DOI=10.3389/fmed.2025.1595320

ISSN=2296-858X

ABSTRACT=IntroductionMyopia is a multifactorial condition driven by an interplay of genetic predisposition and environmental triggers. This study aims to harmonize and analyze risk predictors from two distinct datasetsone historical and clinical, the other contemporary and behavioralto develop an integrated framework for myopia risk prediction.MethodsWe analyzed two datasets: the Orinda Longitudinal Study of Myopia (OLSM), a 1995 US cohort (n≈500) with detailed ocular biometrics (e.g., spherical equivalent refraction, axial length) and lifestyle factors, and a 2022-2023 Chinese cross-sectional study (n=100,000) highlighting modern behaviors (e.g., screen time, posture). We employed multiple machine learning modelsincluding logistic regression, Explainable Boosting Machine (EBM), gradient boosting decision trees (GBDT) on OLSM, and deep neural networks (DNN) and XGBoost on the Chinese datasetto identify key predictors. Model interpretability was assessed using SHapley Additive exPlanations (SHAP). We also tested three ensemble strategies (sequential, averaging, transfer learning) to merge insights across the structurally divergent datasets.ResultsBoth datasets confirmed parental myopia as a universal risk factor and time spent outdoors as a protective factor. In the OLSM dataset, spherical equivalent refraction and parental myopia were the top predictors, with models achieving an AUC of up to 0.92. In the Chinese dataset, the DNN model achieved 71% accuracy, identifying screen time, posture, and parental history as major risk factors. Cross-dataset integration via transfer learning proved most effective, successfully amplifying features like outdoor activity and posture while retaining core behavioral predictors like screen time. This approach bridged the clinical depth of OLSM with the granular, modern lifestyle insights from the Chinese dataset.DiscussionOur analysis confirms the multifactorial nature of myopia, blending historical biological mechanisms with contemporary behavioral drivers. The study demonstrates a scalable strategy for global myopia risk prediction by adaptively integrating diverse datasets. While not yet a turnkey clinical tool, this work lays the groundwork for future multimodal risk-prediction frameworks that can bridge era-specific biases and harness machine learning to capture the evolving profile of myopia risk.