AUTHOR=Donnipadu Rithvik Krishna , Sivolella Maxim , Carroll Cody , Wang Sophia Y. TITLE=Predicting progressive vision loss in glaucoma patients using functional principal component analysis and electronic health records JOURNAL=Frontiers in Ophthalmology VOLUME=Volume 5 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/ophthalmology/articles/10.3389/fopht.2025.1632827 DOI=10.3389/fopht.2025.1632827 ISSN=2674-0826 ABSTRACT=BackgroundGlaucoma is a leading cause of irreversible blindness worldwide. Predicting a patient’s future clinical trajectory would help physicians personalize management. We present a novel approach for predicting patient visual field (VF) progression by combining Functional Principal Component Analysis (FPCA) with electronic health record (EHR) data.MethodsWe identified glaucoma patients using diagnosis codes who had >=3 VF tests. We developed a 2-stage modeling pipeline: 1) Patients were split 80:10:10 into train, validation, and test sets and classified as fast-progressors or slow-progressors. 2) FPCA was used to predict mean deviation (MD) trajectories over 10 years after the baseline year of VF exams, using the first 2 principal components. To make predictions, the model uses up to one year of baseline VF and EHR data as input, but can flexibly make predictions from as few as a single VF test.Results15,764 VF tests belonging to 2,372 patients were included, of which 8.92% of eyes were fast progressors. On the held-out test set, predictions over 10 years of future MD trajectories using baseline VF and EHR clinical data yielded an R2 of 0.646 and an RMSE of 3.67 for fast-progressors, and an R2 of 0.728 and an RMSE of 3.09 for slow-progressors. Performance was higher when predicting over the near term (fast progressors: year 1 R2 0.920, RMSE 1.83; year 5 R2 0.515, RMSE 4.26; slow progressors: year 1 R2 0.918, RMSE 1.771; year 5 R2 0.717, RMSE 3.12).ConclusionA novel modeling approach combining FPCA with clinical data from EHR was able to model longitudinal clinical trajectories of glaucoma patients. This method is well-suited for modeling longitudinal healthcare data, handling sparse and irregular observation schedules with a varying number of inputs.