AUTHOR=Zuo Zheng , Zhao Maocheng , Qi Liang , Wu Bin , Zou Hongyan , Xie Weijun , Ye Qiaolin , Zhou Chi , Zhang Kai TITLE=Hyperspectral inversion model of ginkgo leaf yield prediction based on machine learning JOURNAL=Frontiers in Plant Science VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2025.1698830 DOI=10.3389/fpls.2025.1698830 ISSN=1664-462X ABSTRACT=IntroductionThe yield of ginkgo biloba leaves serves as a critical indicator for assessing their growth and health status. However, current assessment methods primarily rely on manual harvesting and weighing, which are time-consuming, labor-intensive, inefficient, and costly.MethodsTo address these limitations, this study designed an algorithm-based yield estimation approach: by employing airborne hyperspectral imaging technology at a research base to replace traditional manual measurements, a canopy hyperspectral dataset and Region of Interest Pixel (ROP) sets were constructed. Five preprocessing methods, Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), Savitzky-Golay (SG), First Derivative (FD), and Standard Scaling (SS), were employed to develop Partial Least Squares Regression (PLSR) models, identifying the optimal hyperspectral data preprocessing approach. The optimal preprocessing model was subsequently integrated with Particle Swarm Optimization (PSO), Successive Projections Algorithm (SPA), Principal Component Analysis (PCA), Least Absolute Shrinkage and Selection Operator (LASSO), Competitive Adaptive Reweighted Sampling (CARS) and Particle Swarm Attention Mechanism Algorithm (PSAMA) for feature band selection. Traditional spectral vegetation indices were refined through random forest stepwise regression and spectral index correlation analysis, ultimately determining Soil-Adjusted Vegetation Index (SAVI), Modified Soil-Adjusted Vegetation Index (MSAVI), Normalized Difference Red Edge Index (NDRE), Structure Insensitive Pigment Index (SIPI) as the final indices. The selected spectral bands and vegetation indices were then incorporated with PLSR, Random Forest (RF), K-Nearest Neighbors Regression (KNNR), Long Short-Term Memory (LSTM), Support Vector Regression (SVR), Bidirectional LSTM (BiLSTM), and BiLSTM- Grid SearchCV (BiLSTM-GS) machine learning models for yield prediction.ResultsResults demonstrated that the SNV-PLSR model achieved superior performance (Rp2 = 0.7831, RMSEP = 0.0325). The optimal SNV- (SAVI - MSAVI - NDRE - SIPI - ROP) - (BiLSTM-GS) model, combining PSAMA-selected feature bands with vegetation index and ROP, yielded outstanding prediction accuracy (Rp2 = 0.8795, RMSEP= 0.1021).DiscussionThis airborne hyperspectral canopy-based estimation technology provides an accurate, non-destructive solution for monitoring ginkgo leaf yield in field cultivation.