AUTHOR=Nim Hieu T. , Furtado Milena B. , Ramialison Mirana , Boyd Sarah E. TITLE=Combinatorial Ranking of Gene Sets to Predict Disease Relapse: The Retinoic Acid Pathway in Early Prostate Cancer JOURNAL=Frontiers in Oncology VOLUME=Volume 7 - 2017 YEAR=2017 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2017.00030 DOI=10.3389/fonc.2017.00030 ISSN=2234-943X ABSTRACT=Background. Quantitative high-throughput data deposited in consortia such as International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) present opportunities and challenges for computational analyses. Methods. We present a computational strategy to systematically rank and investigate a large number (210-220) of clinically testable gene sets, using combinatorial gene subset generation and disease-free survival analyses. This approach integrates protein-protein interaction networks, gene expression, DNA methylation, and copy number data, in association with disease-free survival profiles from patient clinical records. Results. As a case study, we applied this pipeline to systematically analyse the role of ALDH1A2 in prostate cancer. We have previously found this gene to have multiple roles in disease and homeostasis, and here we investigate the role of the associated ALDH1A2 gene/protein networks in prostate cancer, using our methodology in combination with PCa patient clinical profiles from ICGC and TCGA databases. Relationships between gene signatures and relapse were analysed using Kaplan-Meier log-rank analysis and multivariable Cox regression. Relative expression versus pooled mean from diploid population was used for z statistics calculation. Gene/protein interaction network analyses generated 11 core genes associated with ALDH1A2; combinatorial ranking of the power set of these core genes identified 2 gene sets (out of 211-1=1023 combinations) with significant correlation with disease relapse (Kaplan Meier log rank p<0.05). For the more significant of these two sets, referred to as the optimal gene set (OGS), patients have median survival 62.7 months with OGS alterations compared to >150 months without OGS alterations (p=0.0248, hazard ratio= 2.213, 95% confidence interval = 1.1 - 4.098). Two genes comprising OGS (CYP26A1 and RDH10) are strongly associated with ALDH1A2 in the retinoic acid pathways, suggesting a major role of retinoic acid signalling in early prostate cancer progression. Our pipeline complements human expertise in the search for prognostic biomarkers in large-scale datasets.