AUTHOR=Banerjee Prasenjit , Chattopadhyay Asis Kumar TITLE=Habitable exoplanet - a statistical search for life JOURNAL=Frontiers in Astronomy and Space Sciences VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/astronomy-and-space-sciences/articles/10.3389/fspas.2025.1674754 DOI=10.3389/fspas.2025.1674754 ISSN=2296-987X ABSTRACT=IntroductionThe identification of habitable exoplanets is an important challenge in modern space science, requiring the combination of planetary and stellar parameters to assess conditions that support life.MethodsUsing a dataset of 5867 exoplanets from the NASA Exoplanet Archive (as of April 3, 2025), we have applied Random Forest and eXtreme Gradient Boosting (XGBoost) to classify planets as habitable or non-habitable based on 32 continuous parameters, including orbital semi-major axis, planetary radius, mass, density, and stellar properties. Habitability is defined through physics-based criteria rooted in the presence of liquid water, stable climates, and Earth-like characteristics using seven key parameters: planetary radius, density, orbital eccentricity, mass, stellar effective temperature, luminosity, and orbital semi-major axis. To make the classification accurate, we deal with multicollinearity and we checked the Variance Inflation Factor (VIF). We selected parameters with VIF < 5: planetary orbital period, semi-major axis, density, eccentricity, inclination; stellar effective temperature, radius, mass, metallicity, age, density, and total proper motion. Although the defining parameters are used for labeling, only those with low VIF (orbital semi-major axis and eccentricity, planetary density, and stellar effective temperature) are retained for modeling, supplemented by additional low-VIF parameters. Class imbalance is addressed using the Random Over-Sampling Examples (ROSE) technique with both over- and under-sampling to create a balanced dataset.ResultsThe models achieve classification accuracies of 99.99% for Random Forest and 99.93% for eXtreme Gradient Boosting (XGBoost) on the test set, with high sensitivity and specificity. We analyze the data distributions of the key defining parameters, revealing skewed distributions typical of exoplanet populations. Parameter uncertainties are incorporated through Monte Carlo perturbations to assess prediction stability, showing minimal impact on overall accuracy but possible biases in borderline cases. We consider the intersection of habitable exoplanets identified by the seven defining parameters and verify with the twelve low-VIF parameters, confirming consistent classification and making habitability assessments more reliable.DiscussionOur findings highlight the potential of machine learning techniques to prioritize exoplanet targets for future observations, providing a fast and understandable approach for habitability assessment.