AUTHOR=Karale Yogita , Yuan May TITLE=Spatially lagged predictors from a wider area improve PM2.5 estimation at a finer temporal interval—A case study of Dallas-Fort Worth, United States JOURNAL=Frontiers in Remote Sensing VOLUME=Volume 4 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2023.1041466 DOI=10.3389/frsen.2023.1041466 ISSN=2673-6187 ABSTRACT=Fine particulate matter, also known as PM2.5, has many adverse impacts on human health. The availability of PM2.5 data at fine temporal interval is vital for studies examining the effects of environmental exposomes. This work investigated the potential of a Convolutional Neural Network (CNN) to estimate the PM2.5 concentration at an hourly average using high-resolution Aerosol Optical Depth (AOD) from the MODIS MAIAC algorithm and meteorological data. Satellite-acquired AOD data are instantaneous measurements, whereas air quality stations on the ground provide a 1-hour average of PM2.5 concentration. Many previous studies use machine learning methods to estimate the 24-hr averaged PM2.5 using satellite AOD. The current work aimed to refine PM2.5 estimates at temporal intervals from 24-hour to one-hour averages. Our premise posited the enabling effects of spatial convolution on temporal refinements in PM2.5 estimates. We trained CNN to estimate PM2.5 corresponding to the hour of AOD acquisition in the Dallas-Fort Worth and surrounding area using 10 years of data from 2006-2015. CNN accepts images as input. For each PM2.5 station, we strategically subset temporal MODIS images centering the PM2.5 station. Hence, the input image size represented the size of the area around the PM2.5 station. It thus was analogous to a spatial lag in spatial statistics. We systematically increased input image size from 3 km by 3 km, 5 km by 5km,…, to 19 km by 19 km and observed how increasing the spatial lag impacted PM2.5 estimation. The study found that the model performance improved with a larger spatial lag. The model with an input image size of 19 km by 19 km performed best and achieved correlation coefficients of 0.87 and RMSE of 2.57 µg/m3 to estimate PM2.5 at in-situ stations corresponding to the hour of satellite acquisition time. To overcome the problem of a reduced number of images available for training due to missing AOD, the study employed a data augmentation technique to increase the number of samples available to train the model. The data augmentation not only helped avoid overfitting but also improved model performance as compared to models that did not use data augmentation.