AUTHOR=Tang Zhipeng , Adhikari Hari , Pellikka Petri K. E. , Heiskanen Janne TITLE=Impact of Preprocessing on Tree Canopy Cover Modelling: Does Gap-Filling of Landsat Time Series Improve Modelling Accuracy? JOURNAL=Frontiers in Remote Sensing VOLUME=Volume 3 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2022.936194 DOI=10.3389/frsen.2022.936194 ISSN=2673-6187 ABSTRACT=Preprocessing of Landsat images is a double-edged sword, transforming the raw data into a useful format but potentially introducing unwanted values with unnecessary steps. Gap-filling is an important, highly developed, preprocessing procedure in time series analysis, but its necessity and effects in numerous Landsat applications, e.g. tree canopy cover (TCC) modelling, are rarely examined. We address this barrier by providing a quantitative comparison of TCC modelling using predictor variables derived from Landsat time series that included gap-filling versus those that did not include gap-filling and evaluating the effects that gap-filling has on modelling TCC. With 1-year Landsat time series from a tropical region located in Taita Hills, Kenya, and a reference TCC map in 0--100 scales derived from airborne laser scanning data, we designed comparable random forest modelling experiments to address the following questions: (i) Does gap-filling improve TCC modelling based on time series predictor variables including the seasonal composites (SC), spectral-temporal metrics (STMs), and harmonic regression (HR) coefficients? (ii) What is the difference in TCC modelling between using gap-filled pixels and using valid (actual or cloud-free) pixels? Two gap-filling methods, one temporal-based method (Steffen spline interpolation) and one hybrid method (MOPSTM) have been examined. We show that gap-filled predictors derived from the Landsat time series (e.g. the average of median RMSE of Steffen-filled and MOPSTM-filled SCs is 17.09 and 16.57) delivered better performance on average than non-gap-filled predictors (the average of median RMSE is 17.21). The effects of gap-filling may be reduced when there are sufficient high-quality valid observations to generate a seasonal composite. The single-date experiment suggests that gap-filled data (e.g. RMSE of 16.99, 17.71, 16.24, and 17.85 with 100% gap-filled pixels as training and test datasets for four seasons) may deliver no worse performance than valid data (e.g. RMSE of 15.46, 17.07, 16.31, and 18.14 with 100% valid pixels as training and test datasets for four seasons). We conclude that gap-filling has a positive effect on the accuracy of TCC modelling, which justifies its inclusion in image preprocessing workflows.