AUTHOR=Fang Zhengwei , Zhang Liqiang , Yan Shicui TITLE=Forecast of lacustrine shale lithofacies types in continental rift basins based on machine learning: A case study from Dongying Sag, Jiyang Depression, Bohai Bay Basin, China JOURNAL=Frontiers in Earth Science VOLUME=Volume 11 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/earth-science/articles/10.3389/feart.2023.1047981 DOI=10.3389/feart.2023.1047981 ISSN=2296-6463 ABSTRACT=Shale strata in continental rifted basins have characteristics of high heterogeneous, the lithofacies of which are complex. The exact prediction of shale lithofacies type is the key to shale oil exploration and development. At present, there is no effective prediction method or artificial intelligence solution. In this work, the shale lithofacies prediction of the upper Es4 member in the Dongying sag of Jiyang Depression was studied based on machine learning. Using core and thin section analysis, 22 types of lithofacies were identified in the target strata, according to the typical characteristics of components and sedimentary structures. The vertical lithofacies division of the well FY1 was carried out and the frequency and thickness of various lithofacies were calculated. Five kinds of commonly used well logging and five kinds of paleoenvironment parameter were selected, and two machine-learning methods, support vector machines (SVM) and extreme gradient boosting (XGBoost), were used to carry out lithofacies prediction experiments under six different conditions. Compared with the prediction results, using both well logging data and paleoenvironment parameter data to predict shale lithofacies has the highest accuracy for the overall lithofacies and the dessert lithofacies, which can reach 68% and 98% respectively, based on the SVM method with curve shape-to-point sample extraction mode. The prediction accuracy of the overall lithofacies can be improved approximately 7~28% by using both well logging data and paleoenvironmental parameter data rather than just using well logging data or paleoenvironmental parameter data, and approximately 7~8% by using curve shape-to-point sample extraction mode compared to point-to-point sample extraction mode. The sample quantity of different lithofacies and overlap of the paleoenvironmental parameter value range affect the prediction accuracy of the overall lithofacies. Using curve shape-to-point sample extraction mode and feature fusion of well logging and paleoenvironmental parameter is effective way to improve the prediction accuracy of lacustrine shale lithofacies. The research results provide technical guidance for shale lithofacies prediction in new shale oil wells of eastern China.