AUTHOR=Zhang Tianze , Chai Hui , Wang Hongjun , Guo Tongcui , Zhang Liangjie , Zhang Wenqi TITLE=Interpretable machine learning model for shear wave estimation in a carbonate reservoir using LightGBM and SHAP: a case study in the Amu Darya right bank JOURNAL=Frontiers in Earth Science VOLUME=Volume 11 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/earth-science/articles/10.3389/feart.2023.1217384 DOI=10.3389/feart.2023.1217384 ISSN=2296-6463 ABSTRACT=The shear wave velocity (Vs) is significant for quantitative seismic interpretation. Although numerous research has proved the effectiveness of machine learning method in estimating the Vs using well logging parameters, the real-world application is still hindered because of the black-box nature of machine learning models. With the rapid development of the interpretable machine learning (ML) technique, the drawback of ML can be overcome by various interpretation methods. This study applies the Light Gradient Boosting Machine (LightGBM) to predict the Vs of a carbonate reservoir and uses the Shapley Additive Explanations (SHAP) to interpret the model. The application of ML in Vs estimation normally involves using conventional well-log data that highly correlated with Vs to train the model. To expand the model’s applicability in wells that lack essential logs like density and neutron log, we introduce three geologically important features, which are temperature, pressure and formation, into the model. The LightGBM model is tuned by the automatic hyperparameter optimization framework, the result is compared with Xu-Payne rock physics model and four machine learning models which tuned with the same process. The results showing the LightGBM model can fit the training data and provide accurate predictions in the test well. The model outperforms the rock physics model and other ML models in both accuracy and training time. The SHAP analysis provides detailed explanation on the contribution of each input variable to the model, and demonstrates the variation of feature contribution in different reservoir conditions. Moreover, the validity of the LightGBM model is further proved by the consistency of the deduced information from feature dependency with the geological understanding of the carbonate formation. The study demonstrates that the newly added features can effectively improve the model performance, and the importance of the input feature is not necessarily related to its correlation with Vs. With the assistance of interpretable machine learning techniques, the application of ML in predicting Vs on larger scale would be conducted with confidence, and more features could be introduced and examined for improving the model performance.