AUTHOR=Ahmed Faizan , Haider Faseeh , Arham Muhammad , Dad Allah , Bakht Kinza , Hashim Muhammad Moseeb Ali , Łajczak Paweł , Hassan Muhammad , Athar Fatima Binte , Adnan Muhammad , Usman Muhammad , Gohar Najam , Mirza Tehmasp , Ahmed Mushood , Moshiyakhov Mark , Sealove Brett , Patel Swapnil , Almendral Jesus , Bakr Mohamed , Sattar Yasar , Alenezi Fawaz TITLE=Comparative diagnostic accuracy of artificial intelligence-derived risk stratification versus conventional risk stratification methods in pulmonary hypertension patients: a systematic review and meta-analysis JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1692829 DOI=10.3389/frai.2025.1692829 ISSN=2624-8212 ABSTRACT=BackgroundAccurate risk stratification in pulmonary hypertension (PH) is integral for optimizing therapeutic strategies and improving patient outcomes. Recent artificial intelligence (AI) models have demonstrated notable efficacy in risk stratification of PH, achieving area under the curve (AUC) values of 0.94 and 0.81 in internal and external validation cohorts, respectively. This meta-analysis aims to demonstrate the effectiveness of AI models in the risk stratification of PH by comparing their performance to conventional risk stratification methods.MethodsA systematic search of five databases (PubMed, Embase, ScienceDirect, Scopus, and the Cochrane Library) was conducted from inception to March 2025. Statistical analysis was performed in R (version 2024.12.1 + 563) using 2 × 2 contingency data. Sensitivity, specificity, and diagnostic odds ratio (DOR) were pooled using a bivariate random-effects model (reitsma from the mada package), while the AUC was meta-analyzed using logit-transformed values via the metagen() function from the meta package.ResultsSix studies were included in the final synthesis, comprising 14,095 patients: 4,481 in internal test datasets and 4,948 in external datasets. AI risk stratification models showed significant performance with a logit mean difference of 0.26 (95% CI 0.09–0.43; p = 0.31), having low heterogeneity (I2 = 14.3%) as compared to conventional methods. Furthermore, pooled sensitivity and specificity were 0.77 (95% CI 0.74–0.79) and 0.72 (95% CI 0.70–0.75) in favor of AI methods, respectively. The heterogeneities for pooled sensitivity and specificity were 57.1% (p = 0.04) and 91.8% (p < 0.0001), underscoring high variability across all studies. Finally, DOR was substantially high, 8.53 (6.59–11.04) in favor of AI models with a high heterogeneity of 73.6% (p = 0.002). Heterogeneity (I2) for pooled sensitivity went to 25.9% after excluding a major outlier, but it remained high for pooled specificity and DOR upon leave-one-out sensitivity analysis.ConclusionArtificial intelligence-based risk stratification demonstrates significantly higher diagnostic performance compared to conventional methods in pulmonary hypertension. The higher pooled AUC, sensitivity, specificity, and DOR highlight AI’s potential to enhance predictive accuracy, guiding better treatment strategies. Nonetheless, more superior quality studies are needed to validate AI models for clinical integration.