AUTHOR=Danso Samuel O. , Zeng Zhanhang , Muniz-Terrera Graciela , Ritchie Craig W. 
  
TITLE=Developing an Explainable Machine Learning-Based Personalised Dementia Risk Prediction Model: A Transfer Learning Approach With Ensemble Learning Algorithms
  
JOURNAL=Frontiers in Big Data
  
VOLUME=Volume 4 - 2021
  
YEAR=2021
  
URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2021.613047
  
DOI=10.3389/fdata.2021.613047
  
ISSN=2624-909X
  
ABSTRACT=Alzheimer’s disease (AD) has its onset many decades before dementia develops and work is ongoing to characterise individuals at risk of decline on the basis of early detection through biomarker and cognitive testing as well as the presence/absence of identified risk factors. Risk prediction models for AD based on various computational approaches including machine learning are being developed with promising results. However, these approaches have been criticised as they are unable to generalise due to over-reliance on one data source; poor internal and external validations; and lack of understanding of prediction models, thereby limiting the clinical utility of these prediction models. We propose a framework that employs transfer-learning paradigm with ensemble learning algorithms to develop explainable personalised risk prediction models for dementia. Our prediction models, known as source models, are initially trained and tested using publicly available dataset (n= 84,856,  mean age = 69 years) with 14 years of follow-up samples to predict individual risk of developing dementia. The decision boundaries of the best source model are further updated by using an alternative dataset from a different and much younger population (n=210, mean age=52) to obtain an additional prediction model known as target model.  We further apply the SHapely Additive exPlanations (SHAP) algorithm to visualise the risk factors responsible for the prediction at both population and individual levels. The best source achieves a geometric accuracy of 87%, speciﬁcity of 99%  and sensitivity of 76%. Our target model also achieves a geometric accuracy of 64%, speciﬁcity of 82% and sensitivity of 50% with transfer learning efficacy rate of 4%. The strength of our approach is the large sample size used in training the source model, transferring and applying the ‘knowledge’ to another dataset from a different population, and the ability to visualise the interaction of the risk factors that drive the prediction. This approach has direct clinical utility.