<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">771301</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2021.771301</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Exploring Pathway-Based Group Lasso for Cancer Survival Analysis: A Special Case of Multi-Task Learning</article-title>
<alt-title alt-title-type="left-running-head">Malenov&#xe1; et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Exploring Pathway-Based Group Lasso</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Malenov&#xe1;</surname>
<given-names>Gabriela</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rowson</surname>
<given-names>Daniel</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1508224/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Boeva</surname>
<given-names>Valentina</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/86846/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Department of Computer Science, Institute for Machine Learning, ETH Zurich, <addr-line>Z&#xfc;rich</addr-line>, <country>Switzerland</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Swiss Institute for Bioinformatics (SIB), <addr-line>Z&#xfc;rich</addr-line>, <country>Switzerland</country>
</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>Institut Cochin, Inserm U1016, CNRS UMR 8104, Universit&#x00E9; de Paris UMR-S1016, <addr-line>Paris</addr-line>, <country>France</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/867113/overview">Wail Ba Alawi</ext-link>, University Health Network, Canada</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/711875/overview">Khanh N. Q. Le</ext-link>, Taipei Medical University, Taiwan</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/811564/overview">Yongcui Wang</ext-link>, Northwest Institute of Plateau Biology (CAS), China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Valentina Boeva, <email>valentina.boeva@inf.ethz.ch</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>29</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>771301</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Malenov&#xe1;, Rowson and Boeva.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Malenov&#xe1;, Rowson and Boeva</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>
<bold>Motivation:</bold> The Cox proportional hazard models are widely used in the study of cancer survival. However, these models often meet challenges such as the large number of features and small sample sizes of cancer data sets. While this issue can be partially solved by applying regularization techniques such as lasso, the models still suffer from unsatisfactory predictive power and low stability.</p>
<p>
<bold>Methods:</bold> Here, we investigated two methods to improve survival models. Firstly, we leveraged the biological knowledge that groups of genes act together in pathways and regularized both at the group and gene level using latent group lasso penalty term. Secondly, we designed and applied a multi-task learning penalty that allowed us leveraging the relationship between survival models for different cancers.</p>
<p>
<bold>Results:</bold> We observed modest improvements over the simple lasso model with the inclusion of latent group lasso penalty for six of the 16 cancer types tested. The addition of a multi-task penalty, which penalized coefficients in pairs of cancers from diverging too greatly, significantly improved accuracy for a single cancer, lung squamous cell carcinoma, while having minimal effect on other cancer&#x20;types.</p>
<p>
<bold>Conclusion:</bold> While the use of pathway information and multi-tasking shows some promise, these methods do not provide a substantial improvement when compared with standard methods.</p>
</abstract>
<kwd-group>
<kwd>survival analysis</kwd>
<kwd>Cox model</kwd>
<kwd>cancer</kwd>
<kwd>lasso</kwd>
<kwd>group lasso</kwd>
<kwd>multi-task</kwd>
<kwd>signalling pathways</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Survival analysis is an important topic in cancer research as it allows predicting the time to death or tumor progression as well as providing potential insights into the drivers of the disease. To predict the prognostic score of cancer patients, numerous survival models using patients&#x2019; molecular and clinical data have been proposed. In particular, gene expression data have been widely used since changes in the regulation of genes are ubiquitous in cancer. A variety of learning methods has been applied to survival data, <italic>e.g.</italic>, the Cox proportional hazard model, deep learning or random forests&#x2013;see <xref ref-type="bibr" rid="B21">Matsuo et&#x20;al. (2019)</xref> for their comparison on cervical cancer data. Going beyond just gene expression, these models have been used with many data types, such as radiography data and histopathology images, to investigate cancer survival (<xref ref-type="bibr" rid="B32">Wulczyn et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B16">Le et&#x20;al. (2021)</xref>).</p>
<p>In this work, we utilized a version of the Cox model (<xref ref-type="bibr" rid="B2">Cox (1972)</xref>)&#x2013;its main strenghts being the ease of use, strong results and interpretability. While the deep learning approach has shown minor concordance improvements compared with the linear Cox model it suffers in terms of interpretability (<xref ref-type="bibr" rid="B9">Huang et&#x20;al. (2020)</xref>), and random survival forests have consistently underperformed the linear models, although variants such as block forest do show promise for multi-omics data (<xref ref-type="bibr" rid="B21">Matsuo et&#x20;al. (2019)</xref>; <xref ref-type="bibr" rid="B8">Herrmann et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B10">Huang et&#x20;al. (2019</xref>, <xref ref-type="bibr" rid="B9">2020)</xref>).</p>
<p>The large number of genes and the high multicollinearity found between them, coupled with low sample numbers makes overfitting a major issue. It is therefore desirable to identify a smaller set of genes determining the cancer progression and severity. For this purpose, the Cox proportional hazard model can be supplemented with a lasso regression term (<xref ref-type="bibr" rid="B28">Tibshirani (1997)</xref>). Depending on the strength of the lasso regularization, some of the gene coefficients are truncated, effectively making the model sparse. However, there is no guarantee that the genes that are included in the Cox model are truly more predictive than those whose contributions are truncated. Indeed, slight variations in the sample set can lead to large variations in the included genes. One potential way to alleviate this is by grouping the&#x20;genes.</p>
<p>Often, genes are activated together in synchronized processes called signaling pathways (<xref ref-type="bibr" rid="B23">Parikh et&#x20;al. (2010)</xref>), a potential solution to the multicollinearity problem is therefore to build a model that is sparse not on a gene level, but on a pathway level. Of particular interest to us are pathways that are downstream of known cancer drivers. To achieve this, a version of the group lasso penalty, grouping genes by pathway, has been proposed and applied to cancer data (<xref ref-type="bibr" rid="B22">Obozinski et&#x20;al. (2011)</xref>). Group lasso regularization works by performing ridge (<italic>L</italic>
<sub>2</sub>) regression on the components within a group and then performing lasso (<italic>L</italic>
<sub>1</sub>) regression across the groups. This means that the lasso component of the regularization causes entire groups to be included or removed from the model as a whole, while the ridge component reduces some of the coefficients&#x2019; size within any group that is included.</p>
<p>The version of the group lasso penalty that we use in this paper is the latent group lasso penalty. This penalty deals with the issue present in the na&#xef;ve group lasso implementation that if the same gene is included in two groups and model coefficients for one of those groups are set to zero, then the gene contribution will also be set to zero in the second group. Latent group lasso allows for genes that fall into multiple groups to have independent coefficients, while not biasing the model towards their inclusion (<xref ref-type="bibr" rid="B22">Obozinski et&#x20;al. (2011)</xref>).</p>
<p>Since their introduction for cancer, group lasso approaches have been used a number of times in survival analysis (<xref ref-type="bibr" rid="B13">Kim et&#x20;al. (2012)</xref>; <xref ref-type="bibr" rid="B31">Wang et&#x20;al. (2018)</xref>). For instance, group lasso was used to integrate multi-omics data at the gene level (<xref ref-type="bibr" rid="B33">Xie et&#x20;al. (2019)</xref>). However, to the best of our knowledge, the application of pathway level latent group lasso to gene expression data for cancer survival has not been investigated for large cohorts of patients such as the Tumor Genome Atlas (TCGA).</p>
<p>Of note, in addition to group lasso, there exist other pathway based approaches; they however failed to demonstrate major improvements compared with standard lasso. Zheng <italic>et&#x20;al.</italic>, using Gene Set Variation Analysis (GSVA) to reduce gene expression to pathway expression, showed no significant improvement over standard lasso (<xref ref-type="bibr" rid="B34">Zheng et&#x20;al. (2020)</xref>). Our own preliminary work using pathway based dimension reduction via PCA and autoencoders also resulted in worse results compared with standard lasso and the latent group lasso method (results not shown).</p>
<p>One further challenge associated with cancer survival modelling is that while across all cancers the number of samples is quite large (over 10,000 in the TCGA data set), the number of samples for any single cancer type can be as low as 36. Unfortunately, the na&#xef;ve solution to this, merely training multiple cancers all together, does not perform well for a few reasons. Firstly, while there are many similarities across cancers, there are also many differences and thus building a single model to describe survival across all cancers is not feasible. Secondly, the survival across different cancers varies greatly and therefore models trained on all cancers together often get good global results by discriminating samples by cancer type, essentially giving high hazard scores to low survival cancer types and visa-versa, while being very inaccurate on any individual cancer.</p>
<p>We would like to combine multiple cancers into a single model in such a way that the similarities between them can be leveraged. A number of multi-task approaches has been tested for survival analysis, including autoencoders and clustered learning. Furthermore a kernel based approach has been developed which incorporated pathways and multi-tasking, but showed no consistent improvements compared with the random forest and survival SVM models (<xref ref-type="bibr" rid="B19">Li et&#x20;al. (2016)</xref>; <xref ref-type="bibr" rid="B3">Dereli et&#x20;al. (2019)</xref>; <xref ref-type="bibr" rid="B14">Kim et&#x20;al. (2020)</xref>).</p>
<p>Additionally, several extensions of the group lasso regularization were proposed in the literature: a multivariate sparse group lasso&#x2013;a version generalized to multidimensional response variables and predictors (<xref ref-type="bibr" rid="B18">Li et&#x20;al. (2015)</xref>), or the generalized elastic net (GELnet)&#x2013;a penalty that admits general weigths on both individual and pair-wise feature levels (<xref ref-type="bibr" rid="B26">Sokolov et&#x20;al. (2016)</xref>). Neither of the group lasso generalizations, however, took into account the possibly different scaling of various cancer solutions. Moreover, the weigths are set <italic>a priori</italic>, so a particular pathway cannot be decoupled during the optimization process in case it is predictive for one cancer but not for the other&#x20;one.</p>
<p>In this work, we present a method which links cancers together by means of a coupling term in the loss function which penalizes the models for having diverging coefficients (<xref ref-type="bibr" rid="B4">Evgeniou et&#x20;al. (2005)</xref>; <xref ref-type="bibr" rid="B7">G&#xf6;rnitz et&#x20;al. (2011)</xref>). The aim of this method is to allow individual cancer models to leverage the information from other cancers, while still allowing the coefficients of each cancer model to vary individually. Ideally, this will drive the inclusion of genes corresponding to pathways equally important for survival in two cancer types. In this work, this multicancer coupling term has been incorporated in addition to latent group&#x20;lasso.</p>
</sec>
<sec id="s2">
<title>2 Methods</title>
<sec id="s2-1">
<title>2.1 Data</title>
<p>In this study, we used clinical and gene expression data generated by the TCGA Research Network: <ext-link ext-link-type="uri" xlink:href="https://www.cancer.gov/tcga">https://www.cancer.gov/tcga</ext-link> (<xref ref-type="bibr" rid="B29">Tomczak et&#x20;al. (2015)</xref>). For this work, we selected 30 cancer types. From these, the 16 cancers with over 300 samples were used for the comparison of latent group lasso with na&#xef;ve lasso and all 30 were used in the multi-tasking study. For each cancer, RNA-Seq data, time since inclusion in study, and survival status were used. The TCGA RNA-Seq data set was generated following the Firehose pipeline: MapSplice followed by RSEM (<xref ref-type="bibr" rid="B17">Li and Dewey (2011)</xref>), then normalized using upper quartile fragments per kilobase per million reads (FPKM-UQ).</p>
<p>The following cancer types were selected: Adrenocortical Carcinoma (ACC), Bladder Urothelial Carcinoma (BLCA), Breast Invasive Carcinoma (BRCA), Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC), Cholangiocarcinoma (CHOL), Colorectal Adenocarcinoma (COADREAD), Diffuse Large B-Cell Lymphoma (DLBC), Esophageal Carcinoma (ESCA), Glioblastoma Multiforme (GBM), Head and Neck Squamous Cell Carcinoma (HNSC), Kidney Chromophobe (KICH), Kidney Renal Clear Cell Carcinoma (KIRC), Kidney Renal Papillary Cell Carcinoma (KIRP), Acute Myeloid Leukemia (LAML), Brain Lower Grade Glioma (LGG), Liver Hepatocellular Carcinoma (LIHC), Lung Adenocarcinoma (LUAD), Lung Squamous Cell Carcinoma (LUSC), Mesothelioma (MESO), Ovarian Serous Cystadenocarcinoma (OV), Pancreatic Adenocarcinoma (PAAD), Prostate Adenocarcinoma (PRAD), Sarcoma (SARC), Skin Cutaneous Melanoma (SKCM), Stomach Adenocarcinoma (STAD), Thyroid Carcinoma (THCA), Thymoma (THYM), Uterine Corpus Endometrial Carcinoma (UCEC), Uterine Carcinosarcoma (UCS), and Uveal Melanoma (UVM).</p>
<p>To group genes into pathways, we combined several databases of genes activated or repressed as a result of an activation of signaling pathway (pathway downstream genes): SPEED, PROGENy, Duke University and Curie Institute-curated data sets (<xref ref-type="bibr" rid="B20">Martignetti et&#x20;al. (2016)</xref>; <xref ref-type="bibr" rid="B6">Gatza et&#x20;al. (2010)</xref>; <xref ref-type="bibr" rid="B23">Parikh et&#x20;al. (2010)</xref>; <xref ref-type="bibr" rid="B24">Rydenfelt et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B25">Schubert et&#x20;al. (2018)</xref>). Merging these databases resulted in a total of 69 unique sets of pathway downstream genes, which were further used in our&#x20;study.</p>
<p>Of note, we made a choice to use in this study only genes representing downstream targets of signaling pathways instead of other available gene sets representing pathway players, <italic>e.g.,</italic> Reactome or KEGG (<xref ref-type="bibr" rid="B5">Fabregat et&#x20;al. (2018)</xref>; <xref ref-type="bibr" rid="B12">Kanehisa and Goto (2000)</xref>), or biological processes from Gene Ontology (<xref ref-type="bibr" rid="B1">Ashburner et&#x20;al. (2000)</xref>) since biologically gene expression of pathway downstream genes only is expected to show coordinated changes.</p>
</sec>
<sec id="s2-2">
<title>2.2 Group Lasso</title>
<p>The Cox proportional hazards model is the most common survival prediction model for cancer prognosis. We denote <italic>m</italic> the number of covariates (genes) and <italic>n</italic> the number of patients. Moreover, <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is the (standardized) gene expression data matrix. For each patient, <italic>Y</italic>
<sub>
<italic>i</italic>
</sub> is the time of event, <italic>i</italic>&#x20;&#x3d; 1, &#x2026; , <italic>n</italic>, and <italic>C</italic>
<sub>
<italic>i</italic>
</sub> is its type: <italic>C</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; 1 stands for deceased and <italic>C</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; 0 for right-censored (removed from study) patients. The negative log-partial likelihood associated with the Cox model is then defined as<disp-formula id="e1">
<mml:math id="m2">
<mml:mi>&#x2113;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="italic">&#x3b2;</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>log</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(1)</label>
</disp-formula>where <inline-formula id="inf2">
<mml:math id="m3">
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is the (unknown) dependence of patients&#x2019; survival on their gene expression: positive elements correspond to the positive association of gene expression with a poor prognosis.</p>
<p>We are interested in <bold>
<italic>&#x3b2;</italic>
</bold> minimizing <italic>&#x2113;</italic>(<bold>
<italic>&#x3b2;</italic>
</bold>) in <xref ref-type="disp-formula" rid="e1">(1)</xref>. The minimum is, however, not well defined for <italic>m</italic>&#x20;&#x226b; <italic>n</italic>, which is often the case in the cancer survival analysis setting. Tumor databases typically include several hundreds of patients characterized for over 20,000 gene expression values. A remedy is provided by adding a regularization term, the most popular being ridge and lasso, or their combination into an elastic net (<xref ref-type="bibr" rid="B35">Zou and Hastie (2005)</xref>). In this work, we use the standard lasso term penalty<disp-formula id="e2">
<mml:math id="m4">
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>where <italic>&#x3bb;</italic> is a non-negative constant corresponding to the strength of the regularization. Finding <bold>
<italic>&#x3b2;</italic>
</bold> that minimizes<disp-formula id="e3">
<mml:math id="m5">
<mml:mi>&#x2113;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>log</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>produces a sparse solution where some of the coefficients are reduced to zero. However, while such regularization usually improves survival predictions, one of the important limitations remains excessive variation in selected genes across models trained on even slightly varying data (<italic>e.g.</italic>, different folds in a cross-validation).</p>
<p>In addition to the classic lasso setting, here we explore the group lasso model, where genes are grouped by molecular pathways. However, two distinct pathways often share a number of common genes. In the standard group lasso setting each gene only has a single coefficient and thus if a gene is truncated in one pathway it will be truncated in all of them. However, a simple duplication of genes occurring in two or more pathways has been shown to solve this issue and is known as latent group lasso (<xref ref-type="bibr" rid="B11">Jacob et&#x20;al. (2009)</xref>; <xref ref-type="bibr" rid="B22">Obozinski et&#x20;al. (2011)</xref>). Therefore, we consider pathways as non-overlapping; but the overall gene set contains repetitive elements.</p>
<p>More precisely, we have a partition of the index set {1, &#x2026; , <italic>m</italic>} into non-overlapping sets (groups). Consider a group <italic>g</italic> and <inline-formula id="inf3">
<mml:math id="m6">
<mml:mi mathvariant="bold">u</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>. Then <inline-formula id="inf4">
<mml:math id="m7">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> denotes its projection to <inline-formula id="inf5">
<mml:math id="m8">
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>: <inline-formula id="inf6">
<mml:math id="m9">
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> for <italic>i</italic>&#x20;&#x2208; <italic>g</italic>, and <inline-formula id="inf7">
<mml:math id="m10">
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:math>
</inline-formula> otherwise. Here, &#x7c;<italic>g</italic>&#x7c; is the number of elements in group <italic>g</italic>. In this work, we use the latent group lasso constraint<disp-formula id="e4">
<mml:math id="m11">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>The Cox group lasso regression then will minimize the following loss function:<disp-formula id="e5">
<mml:math id="m12">
<mml:mi>&#x2113;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>log</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>Adding <italic>R</italic>
<sub>
<italic>&#x3bb;</italic>
</sub>(<bold>
<italic>&#x3b2;</italic>
</bold>) to the loss function <italic>&#x2113;</italic>(<bold>
<italic>&#x3b2;</italic>
</bold>) effectively shrinks some of the coefficient groups to 0. Hence, one obtains a sparse model where only some of the covariate groups have non-zero coefficients (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Typical solutions minimizing the Cox loss function with either the lasso <bold>(left)</bold> or the group lasso regression term <bold>(right)</bold>, here computed for the UCEC gene expression data: these two models predict survival of cancer patients based on expression values for genes from downstream targets of 69 signaling pathways. The number of inputs varies between the two examples since the implementation of latent group lasso duplicates genes that appear in more than one group (<italic>i.e.</italic>, set of signaling pathway downstream genes).</p>
</caption>
<graphic xlink:href="fgene-12-771301-g001.tif"/>
</fig>
<p>Since many genes have correlated expression, the full set of genes is generally not necessary to achieve a good model accuracy. Typically, the group lasso is expected to achieve a similar precision as the standard lasso; however, we hypothesize that it will provide both better interpretability as well as higher congruence across folds. Since our gene grouping is based on cancer associated signaling pathways, the selected groups should be informative of cancer driving molecular processes.</p>
</sec>
<sec id="s2-3">
<title>2.3&#x20;Multi-Task Model</title>
<p>The single-type cancer survival prediction accuracy can be limited by various factors, <italic>e.g.</italic>, the low number of patients, noise, or high proportion of censored patients. The goal of the multi-task model that we introduce here is to improve that accuracy by forcing sharing (with some re-scaling coefficients) <bold>
<italic>&#x3b2;</italic>
</bold> weights of gene contributions to survival between different cancer types. We design a penalty for coupling gene contributions in a per-pathway way, assuming that gene contributions to pathway activities should be constant and therefore gene contributions to survival, which is driven by pathway deregulations, should be proportional across cancer&#x20;types.</p>
<p>Let us consider two cancers with their corresponding loss functions <inline-formula id="inf8">
<mml:math id="m13">
<mml:mi>&#x2113;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>j</italic>&#x20;&#x3d; 1, 2. To force a coupling between the coefficients <bold>
<italic>&#x3b2;</italic>
</bold>
<sup>1</sup> and <bold>
<italic>&#x3b2;</italic>
</bold>
<sup>2</sup>, we introduce a new penalty term:<disp-formula id="e6">
<mml:math id="m14">
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>12</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>21</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mspace width="2em"/>
<mml:mtext>where</mml:mtext>
<mml:mspace width="1em"/>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="&#x2016;" close="&#x2016;">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>Here, <italic>&#x3bc;</italic> is a hyperparameter corresponding to the strength of the coupling term <italic>C</italic>
<sub>
<italic>&#x3bc;</italic>
</sub>(<bold>
<italic>&#x3b2;</italic>
</bold>
<sup>1</sup>, <bold>
<italic>&#x3b2;</italic>
</bold>
<sup>2</sup>) and <italic>I</italic> denotes the indicator function.</p>
<p>The penalty <italic>C</italic>
<sub>
<italic>&#x3bc;</italic>
</sub> has the following properties:<list list-type="simple">
<list-item>
<p>1) for each pathway <italic>g</italic> actively contributing to patients&#x2019; survival, the penalty matches <inline-formula id="inf9">
<mml:math id="m15">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf10">
<mml:math id="m16">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>,</p>
</list-item>
<list-item>
<p>2) normalization with <inline-formula id="inf11">
<mml:math id="m17">
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mo>/</mml:mo>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:math>
</inline-formula> allows for matching in a situation when the same pathway is differentially predictive for survival in two cancers,</p>
</list-item>
<list-item>
<p>3) if a pathway is not important for patients&#x2019; survival in one of the cancers, the indicator function will remove corresponding coefficients from the matching penalty,&#x20;and</p>
</list-item>
<list-item>
<p>4) the penalty is symmetric.</p>
</list-item>
</list>
</p>
<p>Finally, we find <bold>
<italic>&#x3b2;</italic>
</bold>
<sup>1</sup> and <bold>
<italic>&#x3b2;</italic>
</bold>
<sup>2</sup> minimizing the following loss function to produce maximum partial likelihood estimates of the model parameters:<disp-formula id="e7">
<mml:math id="m18">
<mml:mi>&#x2113;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x2113;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>The loss function (7) can be extended to an arbitrary number <italic>k</italic> of cancer types. Note that the number of hyperparameters is growing quadratically since there are <italic>k</italic> terms <italic>R</italic>
<sub>
<italic>&#x3bb;</italic>
</sub>, and <italic>k</italic>(<italic>k</italic>&#x20;&#x2212; 1)/2 terms <italic>C</italic>
<sub>
<italic>&#x3bc;</italic>
</sub>.</p>
</sec>
<sec id="s2-4">
<title>2.4 Assessing Model Accuracy and Reproducibility</title>
<p>We define a hazard score <bold>x</bold>
<sub>
<italic>i</italic>
</sub> &#x22c5;<bold>
<italic>&#x3b2;</italic>
</bold> for each patient <italic>i</italic>&#x20;&#x3d; 1, &#x2026; , <italic>n</italic>. In this work, we used the <italic>concordance index</italic> (<italic>c</italic>-value) on the test data to evaluate model accuracy (<xref ref-type="bibr" rid="B27">Steck et&#x20;al. (2008)</xref>). The <italic>c</italic>-value is equal to the proportion of pairs of observations where an event occurred first for an individual with a higher hazard score predicted by the&#x20;model.</p>
<p>The interpretability of the model is conditional on how consistent the pathway selection is over different random seeds. As a measure of consistency, we compute the Tucker&#x2019;s congruence coefficient (<xref ref-type="bibr" rid="B30">Tucker (1951)</xref>), and average it over all pairs of <bold>
<italic>&#x3b2;</italic>
</bold>. To assess its significance, we carry out a paired <italic>t</italic>-test over the congruence of non-overlapping pairs of <bold>
<italic>&#x3b2;</italic>
</bold>.</p>
</sec>
<sec id="s2-5">
<title>2.5 Model Optimization</title>
<p>To find <bold>
<italic>&#x3b2;</italic>
</bold> minimizing the loss functions of lasso, group lasso and multi-task group lasso models, we used the Adam optimizer implemented in the PyTorch package (<xref ref-type="bibr" rid="B15">Kingma and Ba (2014)</xref>). Moreover, in case of group lasso or multi-task group lasso, we truncated <bold>
<italic>&#x3b2;</italic>
</bold>
<sub>
<italic>g</italic>
</sub> to zero when all elements from a group <italic>g</italic> were below a threshold of 0.001 in absolute&#x20;value.</p>
<sec id="s2-5-1">
<title>Selection of Hyper-Parameters</title>
<p>For each cancer type, we selected the hyperparameter <italic>&#x3bb;</italic> using a 10-fold cross-validated grid search over a suitable range on&#x20;the&#x20;training set. We then performed 100 random 80&#x2013;20&#x20;training-test splits, computed <bold>
<italic>&#x3b2;</italic>
</bold> on the training sets and evaluated the <italic>c</italic>-value on the test sets. Finally, we computed the paired <italic>t</italic>-test statistics value and its associated <italic>p</italic>-value, along with a congruence coefficient for both lasso and group lasso&#x20;cases.</p>
<p>In the multi-task setting, along with <italic>&#x3bb;</italic>
<sub>1</sub> and <italic>&#x3bb;</italic>
<sub>2</sub> parameters, we select the best value of the coupling parameter <italic>&#x3bc;</italic>, which we do in a similar cross-validation loop as for the standard lasso and group lasso. With a growing number of tasks, a grid search over multiple hyperparameters becomes computationally demanding or even unfeasible. An implementation of a random search then provides a possible solution. To determine <italic>&#x3bb;</italic>
<sub>
<italic>j</italic>
</sub> in the multi-task setting, 30 values were selected randomly from a normal distribution with the mean set as the <italic>&#x3bb;</italic>
<sub>
<italic>j</italic>
</sub> previously calculated from standard group lasso and a standard deviation of 0.1<italic>&#x3bb;</italic>
<sub>
<italic>j</italic>
</sub>. Additionally, 30 values for <italic>&#x3bc;</italic> were randomly selected from a half-normal distribution around 0 with standard deviation 0.5 (chosen heuristically). By selecting the best cross-validated set of hyperparameters per task, in the Results section, we compared the performance (<italic>c</italic>-values) of the multi-task model with its single-task counterpart.</p>
</sec>
</sec>
<sec id="s2-6">
<title>2.6 Training and Testing a Multi-Task Model</title>
<sec id="s2-6-1">
<title>Training on Synthetic Data</title>
<p>To check the validity of our multi-task learning approach and corresponding code, we simulated the following synthetic data set: Two &#x201c;toy&#x201d; cancer gene expression and survival data sets <italic>T</italic>
<sub>1</sub> and <italic>T</italic>
<sub>2</sub> drawn from a normal sampling distribution generated from two TCGA cancers COADREAD and STAD. Both <italic>T</italic>
<sub>1</sub> and <italic>T</italic>
<sub>2</sub> comprised nearly 10,000 genes, and 300 and 200 patients respectively. Moreover, we assumed that the patients&#x2019; survival is fully determined by two pathways each where one is being shared among the two toy cancer types. The corresponding &#x201c;true&#x201d; <bold>
<italic>&#x3b2;</italic>
</bold> coefficients were obtained as the first principal component coefficients of the genes included in the pathway over the combined COADREAD and STAD data&#x20;sets.</p>
<p>To each patient <italic>i</italic>, we randomly assigned either event <italic>C</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; 1 (with probability 70%) or censorship <italic>C</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; 0 (30%). The score <bold>
<italic>x</italic>
</bold>
<sub>
<italic>i</italic>
</sub> &#x22c5;<bold>
<italic>&#x3b2;</italic>
</bold> is an indicator of the patient&#x2019;s risk. In case all patients were deceased, we could use &#x2212; <bold>x</bold>
<sup>
<italic>T</italic>
</sup>
<bold>
<italic>&#x3b2;</italic>
</bold> as the time-of-event <bold>Y</bold> (since actual values do not matter in the Cox model (1), only their ordering). However, since censorship only provides a lower bound on the time of death, we randomly decreased the censored patients&#x2019; times <italic>Y</italic>
<sub>
<italic>i</italic>
</sub> as a function of the number of patients with a higher&#x20;score.</p>
<p>We trained individual latent group lasso and multi-task models. After hyperparameter selection, 100&#x20;80&#x2013;20 splits were performed to calculate significance.</p>
</sec>
<sec id="s2-6-2">
<title>Training on TCGA Data</title>
<p>We examined all possible pairs between 30 cancer types in the TCGA data set. For each pair, we selected hyperparameters using a 10-fold cross-validated random search. We then performed 30&#x20;80&#x2013;20&#x20;training-test splits, computed <bold>
<italic>&#x3b2;</italic>
</bold> on the training sets and evaluated the <italic>c</italic>-value on the test sets for both cancers. We computed the paired <italic>t</italic>-test statistics value and its associated <italic>p</italic>-value for each pair with respect to the latent group lasso without multi-tasking. Finally, the false discovery rate (FDR) correction for the number of pairs tested per cancer was applied.</p>
</sec>
</sec>
</sec>
<sec id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Latent Group Lasso</title>
<p>As despite the popularity of group lasso, we could not find a comparison between standard lasso and group lasso model for the cancer survival prediction on gene expression data, we first evaluated and compared accuracies of these two models on 16 cancer types from the TCGA database with at least 300 patients per set (see <xref ref-type="sec" rid="s2-1">Section 2.1</xref> for data set description). Out of the 16 cancers tested, five had a significantly higher prediction accuracy (<italic>c</italic>-value) for simple lasso, seven were significantly higher for latent group lasso and there was no significant difference for the remaining four cancers (<xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>). Also, model reproducibility measured through the averaged congruence coefficients (see <xref ref-type="sec" rid="s2-4">Section 2.4</xref>) was better for the&#x20;group lasso model for 12 out of the 16 cancers tested&#x20;(<xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>). The most frequently selected pathways across all cancer types over all random tests (<italic>i.e.,</italic> 16 &#xd7; 100 data points) are plotted in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>. We observed that the most&#x20;common pathways are the stromal up-(63%) and downtake (58%).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The average test prediction accuracy (<italic>c</italic>-value, <bold>(A)</bold>) and model stability (average congruence coefficients, <bold>(B)</bold>) for the standard and group lasso models, evaluated on 16 cancers from the TCGA data set. The error bars represent standard deviations. The asterisks (&#x2a;) mark significant <italic>p</italic>-values at the <italic>p</italic>&#x20;&#x3c;0.05&#x20;level.</p>
</caption>
<graphic xlink:href="fgene-12-771301-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The most commonly selected pathways by group lasso across 16 cancer types with more that 300 samples. The frequency is computed over all random tests, totalling 1,600 data points.</p>
</caption>
<graphic xlink:href="fgene-12-771301-g003.tif"/>
</fig>
<p>Our results showed a very modest improvement in prediction accuracy from applying latent group lasso to cancer survival; however, we hypothesized that this accuracy could be improved by adding a multi-task term to the loss function to allow sharing information across cancer&#x20;types.</p>
</sec>
<sec id="s3-2">
<title>3.2 Validating the Multi-Task Penalty on Synthetic Data</title>
<p>To explore the efficacy of the multi-task penalty (7) we designed, we first applied our approach to synthetic data sets <italic>T</italic>
<sub>1</sub> and <italic>T</italic>
<sub>2</sub> comprising 300 and 200 samples respectively (see <xref ref-type="sec" rid="s2-6">Section 2.6</xref> for the detailed data set description). Our simulation results showed that while the latent group lasso without multi-tasking generally selected the correct pathways for <italic>T</italic>
<sub>1</sub> (pathways 1, 2) and <italic>T</italic>
<sub>2</sub> (pathways 1, 3) the model also assigned non-zero coefficients to a number of the irrelevant pathways (<xref ref-type="fig" rid="F4">Figures 4A,B</xref>). However, when the multi-task penalty was added, the number of irrelevant pathways included in the model usually reduced for both data sets (<xref ref-type="fig" rid="F4">Figures 4C,D</xref>), and no correctly included pathways were lost. Furthermore, the average <italic>c</italic>-value increased significantly for <italic>T</italic>
<sub>1</sub> when the multi-task penalty was included, and did not change significantly for <italic>T</italic>
<sub>2</sub> (<xref ref-type="fig" rid="F4">Figures 4E,F</xref>). From these results, we concluded that the multi-task penalty we designed was acting as intended. Finally, however, the congruence of the models across folds decreased for both sets, significantly for&#x20;<italic>T</italic>
<sub>2</sub>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Performance of the multi-task model on a synthetic data set. Box plots showing the range of model coefficients <bold>
<italic>&#x3b2;</italic>
</bold> by pathway for standard group lasso <bold>(A&#x2013;B)</bold> and multi-task group lasso <bold>(C&#x2013;D)</bold> on <italic>T</italic>
<sub>1</sub>&#x20;and <italic>T</italic>
<sub>2</sub> data sets. Out of the 69 total pathways, only pathways with at least one non-zero coefficient are shown. By design, activity of pathways 1 and 2, and one and three were predictive for patient survival for <italic>T</italic>
<sub>1</sub> and <italic>T</italic>
<sub>2</sub>, respectively. The average congruence coefficient of <bold>
<italic>&#x3b2;</italic>
</bold> is 0.81 (<italic>T</italic>
<sub>1</sub>) and 0.96 (<italic>T</italic>
<sub>2</sub>) for single tasking, and 0.80 (<italic>T</italic>
<sub>1</sub>) and 0.91 (<italic>T</italic>
<sub>2</sub>) for multitasking (e&#x2013;f) <italic>c</italic>-values and box plots of the paired difference for 100 random seeds for single and multi-tasking for <italic>T</italic>
<sub>1</sub> and <italic>T</italic>
<sub>2</sub> synthetic data. Note that <italic>T</italic>
<sub>1</sub> and <italic>T</italic>
<sub>2</sub> are sorted independently, so their random seed numberings do not correspond. The <italic>t</italic>-test <italic>p</italic>-values are 1.35&#x22c5;10<sup>&#x2013;7</sup> (<italic>T</italic>
<sub>1</sub>) and 0.49&#x20;(<italic>T</italic>
<sub>2</sub>).</p>
</caption>
<graphic xlink:href="fgene-12-771301-g004.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>3.3&#x20;Multi-Task Group Lasso Model on the TCGA Data</title>
<p>To check the efficacy of the multi-task group lasso model for the survival prediction, we applied it to 30 TCGA cancer data sets (see <xref ref-type="sec" rid="s2-1">Section 2.1</xref> for more details). For each pair of cancer types, we compared the resulting model accuracies (<italic>c</italic>-values) calculated for 100 random splits for the individual group lasso and multi-task group lasso models. Although based on the results of the model validation on synthetic data, we expected the multi-task setting to improve predictions, little significant difference was observed after multiple testing correction (<xref ref-type="fig" rid="F5">Figure&#x20;5</xref>). For one cancer type, LUSC, significant improvements were observed when the cancer was paired with a number of other cancer types. Further, while not significant after multiple testing correction, significant uncorrected differences were observed for PRAD. In particular, combining PRAD with CHOL, COADREAD and GBM each led to an improvement of <italic>c</italic>-value over 0.08. Finally, several other combinations showed a marginal significant improvement, <italic>e.g.</italic>, BLCA with STAD, KIRC <italic>w</italic>th KIRP, or UCEC with COADREAD.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Heatmap showing the mean paired difference in <italic>c</italic>-value between single and multi-task training. Positive values correspond to the improvement of model prediction accuracy with multi-tasking. The rows correspond to cancer types for which the <italic>c</italic>-values were calculated and the columns to cancer types with which the target cancer was paired for the multi-task training. Asterisks (&#x2a;) indicate significance with the FDR corrected <italic>p</italic>-value <italic>p</italic>&#x20;&#x3c;0.05.</p>
</caption>
<graphic xlink:href="fgene-12-771301-g005.tif"/>
</fig>
<p>No significant improvements in the model stability, measured by congruence between model coefficients <bold>
<italic>&#x3b2;</italic>
</bold>, were observed with the addition of multi-tasking (<xref ref-type="fig" rid="F6">Figure&#x20;6</xref>). The mean congruence decreased for almost every cancer pair tested.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Heatmap showing the mean paired difference in congruence between single and multi-task training. Positive values correspond to the improvement of model stability with multi-tasking. The row gives the cancer for which the congruence was calculated and the columns give the cancer with which the target cancer was paired for the multi-task training. Asterisks (&#x2a;) indicate significance with the FDR corrected <italic>p</italic>-value <italic>p</italic>&#x20;&#x3c;0.05.</p>
</caption>
<graphic xlink:href="fgene-12-771301-g006.tif"/>
</fig>
<p>Of note, other multi-tasking approaches using the same data and similar validation strategies, such as VAECox (<xref ref-type="bibr" rid="B14">Kim et&#x20;al. (2020)</xref>), have reported similar results, with only limited improvements over standard lasso. VAECox observed a microaverage concordance across 10 cancers from TCGA of 0.649; using the same microaverage method for those 10 cancers, our multi-task approach gave results in the range 0.645&#x2013;0.663, depending on the paired cancer&#x20;type.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Discussion</title>
<p>In this paper, we assessed the efficacy of different regularization penalties for linear models for survival prediction on cancer gene expression data. First, we compared standard lasso with latent group lasso. This analysis showed a very slight overall improvement in survival prediction accuracy when using molecular pathways as <italic>a priori</italic> known groups compared to simple lasso. In short, for seven cancers the prediction accuracy significantly increased, significantly reduced for five cancer types, and for the remainder it did not significantly vary between the two methods. This suggested that latent group lasso alone does not meaningfully improve cancer survival predictions beyond what can be achieved with na&#xef;ve lasso when using gene expression data. Despite these modest results, we observed that model stability, <italic>i.e.</italic>, congruence between model coefficients when training using different random seeds, appeared to be higher for latent group lasso regularization, suggesting potential improvements in biological interpretability.</p>
<p>Next, we tested our multi-tasking model a on syntetic data set designed so that it closely mimicked real cancer data (including strong gene collinearity). We used two toy sets drawn from a sample distribution associated with COADREAD and STAD, and then determined the patients&#x2019; hazard scores from two overlapping gene groups each. We randomly censored 30% of the patients and adjusted for their survival time uncertainty. In order to leverage similarities between cancers, we introduced a rather low number of patients&#x2013;300 and 200 respectively. Our model showed a comparably high <italic>c</italic>-value for both toy cancers separately, and a significant improvement in the accuracy of the first set after multi-tasking. Moreover, fewer irrelevant pathways were generally selected with multi-tasking compared to the univariate model, though the congruence decreased, significantly for the second data set. Therefore, we would expect similar improvements in real data sets, especially if they comprise a low number of patients.</p>
<p>However, in the multi-tasking test on experimental data, we saw relevant significant improvements in prediction accuracy measured by <italic>c</italic>-value with only one cancer type, LUSC. For this type of cancer, we witnessed extremely poor performance of single-task group lasso regression on gene expression data, generally giving results around 0.52 of <italic>c</italic>-value, marginally above the random level (0.5). This value improved slightly with multi-tasking up to 0.53. Further, we observed the largest, albeit not significant improvement in <italic>c</italic>-value for PRAD. However, the comparably high survival rate (10 deaths for 498 patients) causes a large variance in the <italic>c</italic>-values due to the random fold splitting. The improvement in <italic>c</italic>-value for both LUSC and PRAD occurred when they were paired with many different cancers and the improvements were of a similar magnitude across the board. This suggested that the benefit here was not from finding a similar cancer to leverage from but more that any extra available information was benefitting survival models, which are inherently difficult to build from expression data. Our initial intuition that survival models for cancer types sharing similar features, such as ovarian and cervical cancers or uveal and skin melanomas, would benefit from multi-tasking was not confirmed.</p>
<p>We hypothesize that this may depend on the noise in the data and measurement uncertainties, or simply the limitation of gene expression prediction power. We cannot exclude however that different, possibly non-linear cancer survival models could benefit from multi-tasking and prior knowledge on pathway downstream genes. We are going to explore this type of approaches in our future&#x20;work.</p>
<p>For our linear group lasso-based approach, we also tested a number of other potential coupling penalty terms, including very simple ones such as penalizing the mean absolute difference in coefficients (<xref ref-type="table" rid="T1">Table&#x20;1</xref>). None of these approaches were as successful on our synthetic data as the one that has been presented in this work, but we include them for completeness.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Alternative coupling penalty terms that were given preliminary investigation using synthetic&#x20;data.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Coupling term</th>
<th align="center">Preliminary results</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<inline-formula id="inf12">
<mml:math id="m19">
<mml:mi>&#x3bc;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:mfenced open="&#x2016;" close="&#x2016;">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>
</td>
<td align="left">This term was discarded as it did not allow for different scaling for <inline-formula id="inf13">
<mml:math id="m20">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf14">
<mml:math id="m21">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> between cancer types</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf15">
<mml:math id="m22">
<mml:mi>&#x3bc;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>
</td>
<td align="left">This term allowed matching of <inline-formula id="inf16">
<mml:math id="m23">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf17">
<mml:math id="m24">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> as intended, but did not show improvement of <italic>c</italic>-value on synthetic data</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf18">
<mml:math id="m25">
<mml:mi>&#x3bc;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:mfenced open="&#x2016;" close="&#x2016;">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
</mml:msqrt>
</mml:math>
</inline-formula>
</td>
<td align="left">This term allowed matching of <inline-formula id="inf19">
<mml:math id="m26">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf20">
<mml:math id="m27">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> as intended and showed improvement of <italic>c</italic>-value on synthetic data. However, the improvement was slightly worse than for the penalty we proposed in <xref ref-type="disp-formula" rid="e7">(7)</xref> and used in this study</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Although theoretically our approach could be extended to triplets of cancer types and larger groups, we do not present these results here. Indeed, several tests applied on cancer triplets did not show strong positive results, which was expected given moderate performance of our new model on cancer&#x20;pairs.</p>
<p>To sum up, in this study we addressed the question of building cancer survival models on gene expression data when incorporating both information about pathway downstream genes and multi-tasking across different cancer types. For the majority of cancer types we tested, the performance of our multi-task model was generally comparable with that of the latent group lasso and classic lasso approaches. However, we would advocate for the use the individual latent group lasso because of the improved model stability and interpretability.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The code to run multi-task group lasso on the Tumor Genome Atlas (TCGA) and synthetic data sets is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/BoevaLab/Group_Lasso_and_Multitask">https://github.com/BoevaLab/Group_Lasso_and_Multitask</ext-link>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>GM and VB devised the project. GM designed the model and the computational framework and analyzed the data. DR contributed to the design and implementation of the research, and to the analysis of the results. GM wrote the manuscript. All authors provided approval for publication of the content.</p>
</sec>
<sec sec-type="COI-statement" id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Botstein</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Butler</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Cherry</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<etal/>
</person-group> (<year>2000</year>). <article-title>Gene Ontology: Tool for the Unification of Biology</article-title>. <source>Nat. Genet.</source> <volume>25</volume>, <fpage>25</fpage>&#x2013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1038/75556</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cox</surname>
<given-names>D. R.</given-names>
</name>
</person-group> (<year>1972</year>). <article-title>Regression Models and Life-Tables</article-title>. <source>J.&#x20;R. Stat. Soc. Ser. B (Methodological)</source> <volume>34</volume>, <fpage>187</fpage>&#x2013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.1111/j.2517-6161.1972.tb00899.x</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dereli</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>O&#x11f;uz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>G&#xf6;nen</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology</article-title>,&#x201d; in <conf-name>International Conference on Machine Learning</conf-name> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>1576</fpage>&#x2013;<lpage>1585</lpage>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Evgeniou</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Micchelli</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Pontil</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Shawe-Taylor</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Learning Multiple Tasks with Kernel Methods</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>6</volume>. </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fabregat</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jupe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Matthews</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sidiropoulos</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Gillespie</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Garapati</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>The Reactome Pathway Knowledgebase</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume>, <fpage>D649</fpage>&#x2013;<lpage>D655</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx1132</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gatza</surname>
<given-names>M. L.</given-names>
</name>
<name>
<surname>Lucas</surname>
<given-names>J.&#x20;E.</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>W. T.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.&#x20;W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>D. Crawford</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2010</year>). <article-title>A Pathway-Based Classification of Human Breast Cancer</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>107</volume>, <fpage>6994</fpage>&#x2013;<lpage>6999</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0912708107</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>G&#xf6;rnitz</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Widmer</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zeller</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kahles</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sonnenburg</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>R&#xe4;tsch</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2011</year>). <source>Hierarchical Multitask Structured Output Learning for Large-Scale Sequence Segmentation</source>. <publisher-loc>Granada, Spain</publisher-loc>: <publisher-name>NIPS</publisher-name>, <fpage>2690</fpage>&#x2013;<lpage>2698</lpage>. </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Herrmann</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Probst</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hornung</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Jurinovic</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Boulesteix</surname>
<given-names>A.-L.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Large-scale Benchmark Study of Survival Prediction Methods Using Multi-Omics Data</article-title>. <source>Brief. Bioinform.</source> <volume>22</volume>, <fpage>1</fpage>&#x2013;<lpage>15</lpage>. <comment>
<italic>arXiv</italic> 00</comment>. <pub-id pub-id-type="doi">10.1093/bib/bbaa167</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>T. S.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Helm</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Deep Learning-Based Cancer Survival Prognosis from RNA-Seq Data: Approaches and Evaluations</article-title>. <source>BMC Med. Genomics</source> <volume>13</volume>, <fpage>41</fpage>. <pub-id pub-id-type="doi">10.1186/s12920-020-0686-1</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xiang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>T. S.</given-names>
</name>
<name>
<surname>Helm</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>C. Y.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Salmon: Survival Analysis Learning with Multi-Omics Neural Networks on Breast Cancer</article-title>. <source>Front. Genet.</source> <volume>10</volume>, <fpage>1</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.3389/fgene.2019.00166</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Jacob</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Obozinski</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Vert</surname>
<given-names>J.-P.</given-names>
</name>
</person-group> (<year>2009</year>). &#x201c;<article-title>Group Lasso with Overlap and Graph Lasso</article-title>,&#x201d; in <conf-name>Proceedings of the 26th Annual International Conference on Machine Learning</conf-name>, <fpage>433</fpage>&#x2013;<lpage>440</lpage>. <pub-id pub-id-type="doi">10.1145/1553374.1553431</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Kegg: Kyoto Encyclopedia of Genes and Genomes</article-title>. <source>Nucleic Acids Res.</source> <volume>28</volume>, <fpage>27</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.27</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sohn</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>S.-H.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Analysis of Survival Data with Group Lasso</article-title>. <source>Commun. Stat. - Simulation Comput.</source> <volume>41</volume>, <fpage>1593</fpage>&#x2013;<lpage>1605</lpage>. <pub-id pub-id-type="doi">10.1080/03610918.2011.611311</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Choe</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Improved Survival Analysis by Learning Shared Genomic Information from Pan-Cancer Data</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>i389</fpage>&#x2013;<lpage>i398</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa462</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Ba</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Adam: A Method for Stochastic Optimization</source>. <comment>arXiv preprint arXiv:1412.6980.</comment> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>V.-H.</given-names>
</name>
<name>
<surname>Kha</surname>
<given-names>Q.-H.</given-names>
</name>
<name>
<surname>Hung</surname>
<given-names>T. N. K.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>N. Q. K.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Risk Score Generated from CT-Based Radiomics Signatures for Overall Survival Prediction in Non-small Cell Lung Cancer</article-title>. <source>Cancers</source> <volume>13</volume>, <fpage>3616</fpage>. <pub-id pub-id-type="doi">10.3390/cancers13143616</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Dewey</surname>
<given-names>C. N.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>RSEM: Accurate Transcript Quantification from Rna-Seq Data with or without a Reference Genome</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>, <fpage>323</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-12-323</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Nan</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Multivariate Sparse Group Lasso for the Multivariate Multiple Linear Regression with an Arbitrary Group Structure</article-title>. <source>Biom.</source> <volume>71</volume>, <fpage>354</fpage>&#x2013;<lpage>363</lpage>. <pub-id pub-id-type="doi">10.1111/biom.12292</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Reddy</surname>
<given-names>C. K.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>A Multi-Task Learning Formulation for Survival Analysis</article-title>,&#x201d; in <conf-name>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>. <fpage>1715</fpage>&#x2013;<lpage>1724</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939857</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martignetti</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Calzone</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Bonnet</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Barillot</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Zinovyev</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Roma: Representation and Quantification of Module Activity from Target Expression Data</article-title>. <source>Front. Genet.</source> <volume>7</volume>, <fpage>18</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2016.00018</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matsuo</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Purushotham</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Mandelbaum</surname>
<given-names>R. S.</given-names>
</name>
<name>
<surname>Takiuchi</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Survival Outcome Prediction in Cervical Cancer: Cox Models vs Deep-Learning Model</article-title>. <source>Am. J.&#x20;Obstet. Gynecol.</source> <volume>220</volume>, <fpage>381</fpage>&#x2013;<lpage>e14</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajog.2018.12.030</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Obozinski</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Jacob</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Vert</surname>
<given-names>J.-P.</given-names>
</name>
</person-group> (<year>2011</year>). <source>Group Lasso with Overlaps: The Latent Group Lasso Approach</source>. <comment>arXiv preprint arXiv:1110.0413</comment>. </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parikh</surname>
<given-names>J.&#x20;R.</given-names>
</name>
<name>
<surname>Klinger</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Marto</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Bl&#xef;&#xbf;&#xbd;thgen</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Discovering Causal Signaling Pathways through Gene-Expression Patterns</article-title>. <source>Nucleic Acids Res.</source> <volume>38</volume>, <fpage>W109</fpage>&#x2013;<lpage>W117</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkq424</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rydenfelt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Klinger</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kl&#xfc;nemann</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bl&#xfc;thgen</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>SPEED2: Inferring Upstream Pathway Activity from Differential Gene Expression</article-title>. <source>Nucleic Acids Res.</source> <volume>48</volume>, <fpage>W307</fpage>&#x2013;<lpage>W312</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa236</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schubert</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Klinger</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kl&#xfc;nemann</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sieber</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Uhlitz</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Sauer</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group>&#x20;(<year>2018</year>). <article-title>Perturbation-response Genes Reveal Signaling Footprints in Cancer Gene Expression</article-title>. <source>Nat. Commun.</source> <volume>9</volume>, <fpage>20</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1038/s41467-017-02391-6</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sokolov</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Carlin</surname>
<given-names>D. E.</given-names>
</name>
<name>
<surname>Paull</surname>
<given-names>E. O.</given-names>
</name>
<name>
<surname>Baertsch</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Stuart</surname>
<given-names>J.&#x20;M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Pathway-based Genomics Prediction Using Generalized Elastic Net</article-title>. <source>Plos Comput. Biol.</source> <volume>12</volume>, <fpage>e1004790</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004790</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Steck</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Krishnapuram</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Dehing-Oberije</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lambin</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Raykar</surname>
<given-names>V. C.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>On Ranking in Survival Analysis: Bounds on the Concordance index</article-title>,&#x201d; in <conf-name>Advances in neural information processing systems</conf-name> (<publisher-loc>Vancouver, Canada</publisher-loc>: <publisher-name>Citeseer</publisher-name>), <fpage>1209</fpage>&#x2013;<lpage>1216</lpage>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tibshirani</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>The Lasso Method for Variable Selection in the Cox Model</article-title>. <source>Statist. Med.</source> <volume>16</volume>, <fpage>385</fpage>&#x2013;<lpage>395</lpage>. <pub-id pub-id-type="doi">10.1002/(sici)1097-0258(19970228)16:4&#x3c;385:aid-sim380&#x3e;3.0.co;2-3</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tomczak</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Czerwi&#x144;ska</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Wiznerowicz</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>The Cancer Genome Atlas (Tcga): an Immeasurable Source of Knowledge</article-title>. <source>Contemp. Oncol. (Pozn)</source> <volume>19</volume>, <fpage>A68</fpage>&#x2013;<lpage>A77</lpage>. <pub-id pub-id-type="doi">10.5114/wo.2014.47136</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Tucker</surname>
<given-names>L. R.</given-names>
</name>
</person-group> (<year>1951</year>). <source>A Method for Synthesis of Factor Analysis Studies</source>. <publisher-loc>Princeton Nj</publisher-loc>: <publisher-name>Tech. rep., Educational Testing Service</publisher-name>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ruiz</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Weighted General Group Lasso for Gene Selection in Cancer Classification</article-title>. <source>IEEE Trans. Cybern.</source> <volume>49</volume>, <fpage>2860</fpage>&#x2013;<lpage>2873</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2018.2829811</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wulczyn</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Steiner</surname>
<given-names>D. F.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Sadhwani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Flament-Auvigne</surname>
<given-names>I.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Deep Learning-Based Survival Prediction for Multiple Cancer Types Using Histopathology Images</article-title>. <source>PLOS ONE</source> <volume>15</volume>, <fpage>e0233678</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0233678</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Group&#x20;Lasso Regularized Deep Learning for Cancer Prognosis from Multi-Omics and Clinical Features</article-title>. <source>Genes</source> <volume>10</volume>, <fpage>240</fpage>. <pub-id pub-id-type="doi">10.3390/genes10030240</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Amos</surname>
<given-names>C. I.</given-names>
</name>
<name>
<surname>Frost</surname>
<given-names>H. R.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Comparison of Pathway and Gene-Level Models for Cancer Prognosis Prediction</article-title>. <source>BMC Bioinformatics</source> <volume>21</volume>, <fpage>76</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-020-3423-z</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zou</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Hastie</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Regularization and Variable Selection via the Elastic Net</article-title>. <source>J.&#x20;R. Stat. Soc. B</source> <volume>67</volume>, <fpage>301</fpage>&#x2013;<lpage>320</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9868.2005.00503.x</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>