<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd"><article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2021.682841</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>MONTI: A Multi-Omics Non-negative Tensor Decomposition Framework for Gene-Level Integrative Analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Jung</surname> <given-names>Inuk</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/419077/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Kim</surname> <given-names>Minsu</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Rhee</surname> <given-names>Sungmin</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Lim</surname> <given-names>Sangsoo</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1131698/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kim</surname> <given-names>Sun</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/418234/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computer Science and Engineering, Kyungpook National University</institution>, <addr-line>Daegu</addr-line>, <country>South Korea</country></aff>
<aff id="aff2"><sup>2</sup><institution>Computing and Computational Sciences Directorate, Oak Ridge National Laboratory</institution>, <addr-line>Oak Ridge, TN</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Computer Science and Engineering, Seoul National University</institution>, <addr-line>Seoul</addr-line>, <country>South Korea</country></aff>
<aff id="aff4"><sup>4</sup><institution>Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-Gu</institution>, <addr-line>Seoul</addr-line>, <country>South Korea</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Fengfeng Zhou, Jilin University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Florian Buettner, German Cancer Research Center (DKFZ), Germany; Fuhai Li, Washington University in St. Louis, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Inuk Jung <email>inukjung&#x00040;knu.ac.kr</email></corresp>
<corresp id="c002">Sun Kim <email>sunkim.bioinfo&#x00040;snu.ac.kr</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p></fn></author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>09</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>682841</elocation-id>
<history>
<date date-type="received">
<day>19</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>08</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Jung, Kim, Rhee, Lim and Kim.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Jung, Kim, Rhee, Lim and Kim</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract><p>Multi-omics data is frequently measured to enrich the comprehension of biological mechanisms underlying certain phenotypes. However, due to the complex relations and high dimension of multi-omics data, it is difficult to associate omics features to certain biological traits of interest. For example, the clinically valuable breast cancer subtypes are well-defined at the molecular level, but are poorly classified using gene expression data. Here, we propose a multi-omics analysis method called MONTI (Multi-Omics Non-negative Tensor decomposition for Integrative analysis), which goal is to select multi-omics features that are able to represent trait specific characteristics. Here, we demonstrate the strength of multi-omics integrated analysis in terms of cancer subtyping. The multi-omics data are first integrated in a biologically meaningful manner to form a three dimensional tensor, which is then decomposed using a non-negative tensor decomposition method. From the result, MONTI selects highly informative subtype specific multi-omics features. MONTI was applied to three case studies of 597 breast cancer, 314 colon cancer, and 305 stomach cancer cohorts. For all the case studies, we found that the subtype classification accuracy significantly improved when utilizing all available multi-omics data. MONTI was able to detect subtype specific gene sets that showed to be strongly regulated by certain omics, from which correlation between omics types could be inferred. Furthermore, various clinical attributes of nine cancer types were analyzed using MONTI, which showed that some clinical attributes could be well explained using multi-omics data. We demonstrated that integrating multi-omics data in a gene centric manner improves detecting cancer subtype specific features and other clinical features, which may be used to further understand the molecular characteristics of interest. The software and data used in this study are available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/inukj/MONTI">https://github.com/inukj/MONTI</ext-link>.</p></abstract>
<kwd-group>
<kwd>feature selection</kwd>
<kwd>tensor decomposition</kwd>
<kwd>cancer</kwd>
<kwd>multi-omics</kwd>
<kwd>integrative analysis</kwd>
</kwd-group>
<counts>
<fig-count count="12"/>
<table-count count="1"/>
<equation-count count="3"/>
<ref-count count="54"/>
<page-count count="14"/>
<word-count count="8797"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Genes are among the most important building blocks of all organisms. Their transcription and translation are essential for maintaining fundamental cellular mechanisms. Genes are continuously and precisely regulated by a wide variety of mechanisms, including transcription factors, miRNAs, methylation, and mutations, which are often cumulatively referred to as multi-omics. When investigating a biological mechanism, each omics can only provide a single perspective. By matching multi-omics data sampled from a common subject, a multiple-perspective view can be generated for an enhanced understanding of the complex dynamics of biology in the subject. For each additionally integrated omics data type, a new relationship can be mined between a gene and the newly added, which increases the ability to represent complex relationships across multi-omics data types, as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. However, due to their heterogeneous nature, it is difficult to integrate such different omics data types within a common data structure and even more difficult to analyze them in a combined manner due to their high dimension.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The possible number of relations that a gene can have across omics layers (GE, gene expression; ME, methylation; MI, miRNA) increases exponentially with each omics data type added to the integration. Here, <italic>n</italic> indicates the number of genes within a single omics layer.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0001.tif"/>
</fig>
<p>A number of initiative projects have made great effort to collect and publicly provide large amounts of multi-omics data, such as TCGA (Weinstein et al., <xref ref-type="bibr" rid="B49">2013</xref>), GTEx (Carithers et al., <xref ref-type="bibr" rid="B9">2015</xref>), ENCODE (The ENCODE Project Consortium, <xref ref-type="bibr" rid="B45">2012</xref>), and HFGP (Li et al., <xref ref-type="bibr" rid="B28">2016</xref>). These databases provide more than 10,000 high-throughput sequencing data sets generated using various platforms and collected from cancer patients, normal human tissues and model organisms. Compared to the availability of such large amounts of multi-omics data, the development of analytic methods that can encompass such large-scale heterogeneous data is just recently gaining interest (Hasin et al., <xref ref-type="bibr" rid="B19">2017</xref>).</p>
<p>It is well understood that more data can improve the accuracy of data mining. However, this is true only if the data are precisely understood and, more importantly, correctly integrated. Omics data are generated on different platforms, which implies unique measurement scales, data formats, as well as different emphasis on molecular domains and relationships among molecular entities. Hence, normalization, pre-processing, as well as how to evaluate associations with genes or other entities must be carefully taken into account for each omics data set. Finally, the data must be analyzed in an integrative manner in order to data mine inter-relationships across the multi-omics domains.</p>
<p>While the aforementioned initiative projects are focused on providing large-scale multi-omics data, other databases have gathered and processed these large data sets to allow statistical queries. The LinkedOmics project (Vasaikar et al., <xref ref-type="bibr" rid="B46">2017</xref>) collected multi-omics data from TCGA that includes 32 cancer types, surpassing 1 billion data points in total. Using simple correlation methods (i.e., Pearson, Spearman), a user may search for genes that are significantly correlated with the query gene. Here, the correlation is in the context of multi-omics. In addition to issues around data collection and analysis, methods for visualizing multi-omics data is important. With an increasing number of omics comes increased difficulty in visualizing the relationships between multiple omics. PaintOmics3 (Hern&#x000E1;ndez-de Diego et al., <xref ref-type="bibr" rid="B21">2018</xref>) is a web-based visualization tool that allows users to observe multi-omics relationships in a graphical manner. It supports nearly every sequencing technology platform, including proteomics and region-based omics data, such as ATAC (Buenrostro et al., <xref ref-type="bibr" rid="B6">2015</xref>) or ChIP-seq (Park, <xref ref-type="bibr" rid="B34">2009</xref>) data.</p>
<p>To date, studies sought to analyze high-throughput multi-omics sequencing data, with the majority reporting results using a single or a pair of omics (e.g, mRNA-miRNA, mRNA-methylation). In addition, the majority of such studies focus on identifying genes showing significant correlation with a certain omics type using statistical methods, such as Pearson&#x00027;s correlation or cosine similarity. Furthermore, such approaches tend to focus on finding a matching omics relation for a single gene with each iteration of the analysis rather than analyzing all genes and omics data in a combined manner. This is mainly due to the heavy computation load and requirements of multiple testing, which makes statistical analysis difficult.</p>
<p>A number of studies have reviewed multi-omics integration methods. A recent study (Huang et al., <xref ref-type="bibr" rid="B24">2017</xref>) grouped multi-omics integration methods into four categories: (1) Matrix factorization methods, (2) Bayesian methods, (3) Network-based methods, and (4) Multiple step-analysis. In addition to those categories, the recently popular deep learning technique has been applied to predict genes that yield significant survival results in liver cancer (Chaudhary et al., <xref ref-type="bibr" rid="B12">2017</xref>). Such multi-omics integration methods can also be categorized as supervised and unsupervised by making use of labels that represent the phenotype of the data, such as normal vs tumor sample. Tools such as jNMF (Zhang et al., <xref ref-type="bibr" rid="B53">2012</xref>), MOFA (Argelaguet et al., <xref ref-type="bibr" rid="B4">2018</xref>), and PARADIGM (Vaske et al., <xref ref-type="bibr" rid="B47">2010</xref>) are unsupervised methods that mine gene clusters or modules associated with a phenotype of interest. Also, a network based multi-omics clustering method, SNF (Similarity Network Fusion) (Wang et al., <xref ref-type="bibr" rid="B48">2014</xref>), was proposed that integrates multiple omics networks by weighted similarity of cluster samples.</p>
<p>More importantly, the aspect of the result greatly depends on how the multiple omics data are integrated. Two studies well-categorized and defined two important integration methods, which are the meta-dimensional and multi-staged integration approaches (Ritchie et al., <xref ref-type="bibr" rid="B37">2015</xref>; Sathyanarayanan et al., <xref ref-type="bibr" rid="B38">2020</xref>). The multi-staged integration method focuses on identifying omics factors that effect gene expression level, which is expected to find the causal relationship of a certain phenotype of interest. Hence, the omics data are integrated in a gene-centric manner and requires that each omics data have the same dimensions in sample and gene numbers as shown in <xref ref-type="fig" rid="F2">Figure 2</xref> (top). Here, <italic>g</italic> and <italic>p</italic> refers to the gene and patient (or sample) indices <italic>i</italic> and <italic>m</italic>, respectively. Such gene-level multi-omics integration can be advantageous in assessing the flow of information from omics to genes. For example, gene-level analysis of mRNA, methylation, and miRNA omics data can discover strong relationships across the three omics layers in means to explain the dynamics of gene expression (Subramanian et al., <xref ref-type="bibr" rid="B42">2020</xref>). However, with limited number of omics data, the landscape of gene expression modulation may not be fully explained. Also, the selection of omics data need to be focused on the assumption that they influence the gene expression regulation. In the other hand, the multi-dimensional integration method makes us of each omics data as is. Thus, the number of entities in each omics matrix may differ. The two integration methods both assume a matched multi-omics, that is, multi-omics data are retrieved from the same subject and therefore have the same number of samples. Such assumption is also referred to as multi-modal data. Such omics-level integration may capture the bigger dynamics underlying a phenotype since the entire data is analyzed as is (Sathyanarayanan et al., <xref ref-type="bibr" rid="B38">2020</xref>). However, to analyze relationships across the omics layers, post-processing of the result is required, which can become very complex with larger number of omics data since the combinations of omics exponentially increase.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Two prevalently used multi-omics integration methods. The multi-staged <bold>(top)</bold>, or gene-centric, method encodes all omics measurement values in a per gene basis. Hence, the number of genes (<italic>g</italic>) and samples (or patients <italic>p</italic>) in each omics matrix are required to have equal dimensions. The multi-dimensional <bold>(bottom)</bold> integration method is less restrictive in the dimensions and makes use of each omics data as is.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0002.tif"/>
</fig>
<p>Utilizing multi-omics data, we can identify important biomarkers and also identify multi-omics features specific to a given sample or phenotype. In the context of cancer, multi-omics features specific to cancer subtypes can be identified, which can serve as valuable information for constructing highly accurate subtype classification models. This approach will eventually facilitate enhanced identification of subtype-specific genes. Delineation between cancer and normal tissues or across different cancer types have long been a popular problem (Furey et al., <xref ref-type="bibr" rid="B16">2000</xref>; Ramaswamy et al., <xref ref-type="bibr" rid="B36">2001</xref>; Sotiriou et al., <xref ref-type="bibr" rid="B41">2003</xref>), with a classification accuracy reaching 85% (Gevaert et al., <xref ref-type="bibr" rid="B17">2006</xref>). However, classifying cancer subtypes (Network et al., <xref ref-type="bibr" rid="B7">2012</xref>; Shen et al., <xref ref-type="bibr" rid="B39">2012</xref>; Paquet and Hallett, <xref ref-type="bibr" rid="B33">2015</xref>) is more difficult than distinguishing tumor and normal samples. For example, classification accuracy for predicting breast cancer subtypes is low, ranging from 56.7 to 75% (Wu et al., <xref ref-type="bibr" rid="B51">2017</xref>; Tao et al., <xref ref-type="bibr" rid="B44">2019</xref>).</p>
<p>In this study, we developed MONTI (Multi-Omics Non-negative Tensor Decomposition Integration) that learns hidden features through tensor decomposition for the integration of multi-omics data. MONTI is based on the gene-level integration method, which we find to be more helpful in understanding the results. The objective of MONTI is to extract feature genes that well explain some clinical attribute of interest in large multi-omics data. Being able to extract such a genes list with significant relation to clinical attributes can serve as a source that can naturally be used for simpler downstream analysis, such as, gene set enrichment of pathway analysis. Also, MONTI constraints the multi-omics data to be subject matched, where each omics data are collected from a common subject (i.e., patient). Such design may avoid omics variance within a same group, thus, amplifying the signals of hidden features.</p>
<p>In experiments with TCGA multi-omics data sets from breast, colon and stomach cancer samples, MONTI achieved significantly higher cancer subtype classification accuracy than existing multi-omics analysis methods. For the downstream analysis, genes associated with subtype-specific features were identified for biological interpretation.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and Methods</title>
<sec>
<title>2.1. MONTI Framework Overview</title>
<p>The MONTI workflow operates in two phases. In the first phase, the multi-omics data are integrated and decomposed using non-negative tensor decomposition. In the second phase, subtype-specific features and genes associated with them are selected using L1 regularization, and these features are then used to generate a subtype classifier using the multi-layer perceptron (MLP) neural network. The overall workflow is depicted in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The workflow of the MONTI framework. <bold>(A)</bold> Each omics data (gene expression, methylation, miRNA expression) is pre-processed as a two-dimensional gene-centric matrix comprised of genes and samples. <bold>(B)</bold> The omics matrices are then stacked to form a three-dimensional tensor structure (genes, samples, omics) all sharing the same genes and samples. <bold>(C)</bold> Using the PARAFAC approach, the tensor is decomposed into two-dimensional gene, patient and omics components. Here, the components share the rank features. <bold>(D)</bold> The patient component is used to select subtype-specific features using subtype-specific L1 classifiers. The selected subtype-specific features are used to build a subtype classifier model using MLP (Multi-layer perceptron). Genes associated to the subtype-specific features are then selected for biological function analysis.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0003.tif"/>
</fig>
</sec>
<sec>
<title>2.2. Data Preparation and Preprocessing of Multi-Omics Data</title>
<p>Samples with matched gene expression, methylation, and miRNA expression data sets were collected for three case studies from TCGA: (1) 597 breast cancer samples, (2) 314 colon cancer, and (3) 305 stomach cancer samples. Only primary tumor samples with all three matching omics data sets were selected for the analysis. The pre-quantified gene and miRNA expression values from TCGA were used as provided. For the methylation data, we used the HumanMethylation450 BeadChip-based data and further selected probes located within the gene promoter regions (i.e., 2 Kb upstream of a gene&#x00027;s transcription start site). Subtype information were acquired from the original studies. The partially missing subtype information of the breast cancer case study was taken from Lim et al. (<xref ref-type="bibr" rid="B31">2018</xref>), which were generated by the PAM50 classification method (Parker et al., <xref ref-type="bibr" rid="B35">2009</xref>). Sample case IDs and annotated cancer subtypes of the samples used in this study are in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 1</xref>.</p>
<p>Because we aim to discover gene regulatory multi-omics features, each omics data is individually processed to form a <italic>gene-centric</italic> two-dimensional sample(patient)-gene matrix. The values in each omics matrix are computed and assigned with respect to each gene. The tensor structure requires all slices to be of the same size. Thus, while each omics matrix is independently processed, they share the same set of genes and samples.</p>
<p>The gene expression values were preprocessed according to the provided TCGA level 3 gene expression data, which were subject to <italic>log</italic><sub>2</sub> quantile normalization across samples. For miRNA, they were first bundled per target gene, such that the number of bundles matched the number of genes. The geometric mean of miRNA expression per bundle was assigned to each corresponding gene. The expression values were then <italic>log</italic><sub>2</sub> quantile normalized. For methylation data, probes located within the transcription start site and 2 Kb upstream of gene promoter regions were grouped per gene. The average methylation level per gene was further quantile normalized.</p>
<p>Due to the nature of tensor decomposition, the omics value in each matrix need to be scaled within a common range. If not, an omics matrix with comparably large values, such as gene expression, would have a diminishing effect on other omics matrices with relatively lower values. Hence, normalized matrices are further scaled within the range of 0&#x02013;1. Finally, the omics matrices were stacked on an orthogonal axis to form a three dimensional tensor structure.</p>
</sec>
<sec>
<title>2.3. Tensor Decomposition</title>
<p>There are several ways to decompose a tensor. PARAFAC (Carroll and Chang, <xref ref-type="bibr" rid="B10">1970</xref>; Harshman, <xref ref-type="bibr" rid="B18">1970</xref>) (a.k.a CANDECOMP-canonical decomposition) and TUCKER3 (Kroonenberg, <xref ref-type="bibr" rid="B25">1983</xref>) are the most widely used methods. Both are multi- or bi-linear decomposition methods, which decompose the array into sets of scores and loadings. The decomposed scores and loadings describe the original data in a more compressed form. PARAFAC is based on factorization, whereas TUCKER3 utilizes principal component analysis. The resulting decomposition structure also differs between the two. PARAFAC decomposes a tensor into three two-dimensional components or matrices, while TUCKER3 generates three two-dimensional components along with an additional core matrix that is shared by the components. Due to the core matrix, interpreting data with the TUCKER3 model is more complicated (due to the increased number of parameters) than PARAFAC (Bro, <xref ref-type="bibr" rid="B5">1997</xref>). Hence, here we used the PARAFAC method to decompose the multi-omics tensor.</p>
<p>A PARAFAC model of a three-way array <italic>T</italic> with elements <italic>x</italic><sub><italic>ijk</italic></sub> is given by three loading matrices, <italic>C</italic><sub><italic>g</italic></sub>, <italic>C</italic><sub><italic>p</italic></sub>, and <italic>C</italic><sub><italic>o</italic></sub> with elements <italic>g</italic><sub><italic>if</italic></sub>, <italic>p</italic><sub><italic>jf</italic></sub>, and <italic>o</italic><sub><italic>kf</italic></sub>. Here, we refer to <italic>C</italic><sub><italic>g</italic></sub>, <italic>C</italic><sub><italic>p</italic></sub>, and <italic>C</italic><sub><italic>o</italic></sub> as the gene, patient and omics components, respectively. The tensor <italic>T</italic> is decomposed using a predefined number of ranks <italic>R</italic>, which we will refer to as features <italic>f</italic> &#x0003D; 1, &#x02026;, <italic>R</italic>.</p>
<p>Due to the non-negative constraint, the interpretation of the feature values are much easier, since they are cumulative and do not negate themselves. Thus, a larger value will imply a strong signal of the feature. Furthermore, since omics data are most non-negative, the non-negative constraint can be naturally applied.</p>
<p>The trilinear model minimizes the sum of squares of the residuals, <italic>e</italic><sub><italic>ijk</italic></sub> in the model</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>which can also be written as</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02297;</mml:mo><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02297;</mml:mo><mml:msub><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>An illustration of the PARAFAC model using gene expression, methylation level and miRNA expression data is in <xref ref-type="fig" rid="F4">Figure 4</xref>. Here, <italic>g</italic><sub><italic>n</italic></sub>(<italic>n</italic> &#x0003D; 0, &#x02026;, <italic>N</italic>) refers to the genes, <italic>o</italic><sub><italic>k</italic></sub>(<italic>k</italic> &#x0003D; 0, &#x02026;, <italic>K</italic>) indicates the type of omics and <italic>p</italic><sub><italic>m</italic></sub>(<italic>m</italic> &#x0003D; 0, &#x02026;, <italic>M</italic>) refers to patient samples. <italic>N</italic>, <italic>M</italic> and <italic>O</italic> indicate the number of genes, samples, and omics types, respectively. Three omics types are used in this illustration; thus, <italic>K</italic> &#x0003D; 2.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>An illustration of the non-negative tensor decomposition (PARAFAC) using three types of omics. The tensor <italic>T</italic> is decomposed into three components: gene, sample (patient), and omics. Each component corresponds to one axis in tensor <italic>T</italic>. Each component is a two-dimensional matrix where one axis embeds the rank features <italic>f</italic><sub><italic>i</italic></sub>(<italic>i</italic> &#x0003D; 0, &#x02026;, <italic>r</italic>) for the entities in each component (i.e., genes, samples, and omics), similar to the traditional matrix factorization method.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0004.tif"/>
</fig>
</sec>
<sec>
<title>2.4. Feature Selection</title>
<p>Subtype-associated tensor features, a subset of features selected from the tensor decomposition result, significantly improved subtype classification accuracy. To select such subtype-specific features, L1 regularization was used for each subtype and applied to the (<italic>C</italic><sub><italic>p</italic></sub>) component (i.e., patient component) with the following equation,</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>R</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle><mml:mo>+</mml:mo></mml:mrow></mml:mstyle><mml:mi>&#x003B1;</mml:mi><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>R</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>f</mml:mi></mml:msub></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mstyle><mml:mo>.</mml:mo></mml:math></disp-formula>
<p>Here, <italic>M</italic> refers to the number of patient samples and <italic>R</italic> the number of features, or columns, in <italic>C</italic><sub><italic>p</italic></sub>. <italic>y</italic><sub><italic>i</italic></sub> refers to the target subtype value. Because an L1 model is built for each subtype, the target value is set to 1 for the corresponding subtype and 0 for the other subtype samples. For example, for the breast cancer case study, four L1 models were generated, one for each subtype of Luminal A, Luminal B, Her2, and Basal. <italic>z</italic> refers to the values of each feature in <italic>C</italic><sub><italic>p</italic></sub>. <italic>w</italic><sub><italic>f</italic></sub>(<italic>f</italic> &#x0003D; 1, &#x02026;, <italic>R</italic>) refers to the weight of each feature to be inferred. The &#x003B1; value is the weight of the penalty term. Larger &#x003B1; values yields greater penalty, which will result in more features having zero weight and causing fewer features to be selected. We found that the L1 regularization achieved greater performance compared to the L2 regularization (<xref ref-type="fig" rid="F5">Figure 5</xref>).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>The cancer subtype classification accuracy of BRCA, COAD, and STAD was measured using features selected by the L1 and L2 method with different ranks.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0005.tif"/>
</fig>
<p>The feature selection performance using L1 and L2 were measured using the BRCA, COAD, and STAD data with varying ranks. As show in <xref ref-type="fig" rid="F5">Figure 5</xref>, L1 showed better feature selection performance in terms of subtype classification accuracy in the three cancer types.</p>
</sec>
<sec>
<title>2.5. Selecting Feature Associated Genes</title>
<p>Based on the L1 selected features from <italic>C</italic><sub><italic>p</italic></sub>, feature genes were further selected from <italic>C</italic><sub><italic>g</italic></sub>. This procedure outputs a sparse set of genes, where each gene has a membership to a single feature. The association of a gene <italic>g</italic> to a feature is decided by <italic>g</italic><sub><italic>f</italic></sub> &#x0003D; <italic>max</italic>(<italic>g</italic><sub>0,<italic>R</italic></sub>), where the weight is maximum at the corresponding feature index <italic>f</italic>.</p>
</sec>
<sec>
<title>2.6. Cancer Subtype Classification Analysis</title>
<p>The significance of the selected feature genes was measured by their power of subtype classification accuracy. The classification accuracy was measured using a multi-layer perceptron (MLP) classifier with 10-fold cross validation. Here, values of the feature genes from <italic>C</italic><sub><italic>g</italic></sub> were given as input to build the MLP classifier.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<sec>
<title>3.1. Three Case Studies</title>
<p>MONTI was applied to three cancer types: breast cancer (BRCA), colorectal cancer (COAD), and stomach cancer (STAD). The cancer types were chosen based on the number of samples that had matched multi-omics data from the same patient. There were 597, 314, and 305 matched omics data for BRCA, COAD, and STAD, respectively. To avoid an overly sparse tensor, genes that do not have any methylation probes located within their promoter and 2 Kb upstream of transcription start site (TSS) regions were discarded, which resulted in 14,513 genes with 60,707 methylation probes in total. The average methylation beta values were taken and assigned per gene. Similarly, miRNA expression values were grouped per target gene and the arithmetic mean of miRNA expression values in a group was assigned to its target gene. The multi-omics data items were used to produce gene centric omics matrices, which were then combined to form a three dimensional tensor of each cancer type, i.e., genes &#x000D7; multi-omics &#x000D7; patient samples.</p>
</sec>
<sec>
<title>3.2. Subtype Classification Results</title>
<p>Before deriving cancer subtype-specific features through tensor decomposition, a pre-defined rank <italic>R</italic> value for decomposing the tensor were needed to be chosen. In addition, a penalty strength, &#x003B1; value needed to be set for L1 regularization. Both were empirically chosen over a range of values by testing the subtype classification accuracy.</p>
<p>First, we evaluated the subtype classification accuracy using the feature in <italic>C</italic><sub><italic>p</italic></sub> over different ranks. The subtype classification accuracy for BRCA, COAD, and STAD was the highest with ranks 450, 150, and 100, respectively. The &#x003B1; value for L1 regularization determines the strength of the penalty for the features. The larger the &#x003B1; is the smaller number of features and genes be selected. Subtype classification performance was further investigated using &#x003B1; values ranging from 0 to 0.1. To further select informative features, the non-zero weight features were ranked by their absolute coefficient value from which top 20% features were chosen.</p>
<p>The subtype classification accuracy was the highest when &#x003B1; &#x0003D; 0.01 (<xref ref-type="fig" rid="F6">Figure 6</xref>). As a result, 26, 31, and 37 features from <italic>C</italic><sub><italic>p</italic></sub> were selected for subtype classification from the BRCA, COAD, and STAD tensors, respectively.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>The classification accuracy with varying &#x003B1; values. The classification accuracy was the highest with &#x003B1; &#x0003D; 0.01.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0006.tif"/>
</fig>
<p>The multi-omics tensors for the three cancer case studies were decomposed with the optimal rank numbers and &#x003B1; values that were chosen as explained above. We then investigated how much contributions feature genes (i.e., from <italic>C</italic><sub><italic>g</italic></sub>) made to the improvement in subtype classification accuracy.</p>
<p>Our primary interest in this study was whether the selected features would better represent the underlying biological mechanism when using multiple omics data compared to single or a smaller subset of omics data. As shown in <xref ref-type="fig" rid="F7">Figure 7A</xref>, subtype classification the accuracy was the highest when all available multi-omics data were used and combined by the tensor features, which are labeled as GE, ME, and MI for gene expression, methylation, and miRNA expression respectively.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>The MONTI analysis results of BRCA, COAD, and STAD subtypes are shown. <bold>(A)</bold> The subtype classification accuracy was the highest when using all three omics data for all three cancer types. <bold>(B)</bold> The cancer subtype specific genes. Here, the genes are shared by at most two different subtypes. <bold>(C)</bold> tSNE plots that were drawn using the selected features from <italic>C</italic><sub><italic>p</italic></sub> of each cancer type.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0007.tif"/>
</fig>
<p>Here, we find that such accuracy reflects how much the subtypes are explainable by the selected features and their associated genes in multi-omics manner.</p>
<p>The number of features and their associated genes are shown in <xref ref-type="table" rid="T1">Table 1</xref>. Since a feature can be associated with multiple subtypes, the sum of features in the <monospace>St-Features</monospace> column may be larger than the number of selected features. Here, <monospace>Features</monospace> and <monospace>Genes</monospace> refer to the total number of genes and the number of features in each cancer case study and <monospace>St-Features</monospace> and <monospace>St-Genes</monospace> to the number of genes and the number of features in each subtype <monospace>St</monospace>, respectively. A total of 2,385 genes, 3,831 genes, and 5,461 genes were found to be associated with BRCA, COAD, and STAD subtypes, respectively. The majority of genes were exclusively assigned to a certain subtype in all three cancer data sets (<xref ref-type="fig" rid="F7">Figure 7B</xref>). This was more intuitive in the tSNE plot in <xref ref-type="fig" rid="F7">Figure 7C</xref>. While the number of features was the largest in BRCA, the total number of genes did not necessarily differ with the other cancer types.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The number of selected features and genes in BRCA, COAD, and STAD.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Case study</bold></th>
<th valign="top" align="center"><bold>Ranks</bold></th>
<th valign="top" align="center"><bold>Features</bold></th>
<th valign="top" align="center"><bold>Genes</bold></th>
<th valign="top" align="left"><bold>Subtypes</bold></th>
<th valign="top" align="center"><bold>St-Features</bold></th>
<th valign="top" align="center"><bold>St-Genes</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">BRCA</td>
<td valign="top" align="center">120</td>
<td valign="top" align="center">26</td>
<td valign="top" align="center">2,385</td>
<td valign="top" align="left">Luminal A</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">879</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">Luminal B</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">732</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">Her2</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">1,080</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">Basal</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">665</td>
</tr>
<tr>
<td valign="top" align="left">COAD</td>
<td valign="top" align="center">120</td>
<td valign="top" align="center">31</td>
<td valign="top" align="center">3,831</td>
<td valign="top" align="left">CMS1</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">1,129</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">CMS2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">1,403</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">CMS3</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">1,473</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">CMS4</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">704</td>
</tr>
<tr>
<td valign="top" align="left">STAD</td>
<td valign="top" align="center">120</td>
<td valign="top" align="center">37</td>
<td valign="top" align="center">5,461</td>
<td valign="top" align="left">CIN</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">1,234</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">GS</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">1,007</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">MSI</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">839</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">EBV</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">652</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The 10-fold cross validated F1 scores of MONTI were 0.844, 0.9, and 0.91 for BRCA, COAD, and STAD, respectively. As far as we are aware of, the classification accuracy are highest among classification results reported in the literature so far and, in our experiments, MONTI outperformed existing methods such as MOFA2, iCluster, and SNF. For BRCA and COAD, the classification accuracy increased significantly when at least two omics data were used involving gene expression omics (GE). Improvement in classification accuracy was dramatic for COAD where use of single omics resulted in poor performance. Interestingly, methylation showed to be more influential in STAD, where ME alone achieved high classification accuracy. The CpG island methylator phenotype (CIMP) information can be used to characterize distinct subtypes of gastric cancer well and it is known that specific methylation patterns and clinicopathological features are associated (Network et al., <xref ref-type="bibr" rid="B8">2014</xref>; Tahara and Arisawa, <xref ref-type="bibr" rid="B43">2015</xref>) with it. While the majority of feature genes were associated with a single subtype (<xref ref-type="fig" rid="F7">Figure 7B</xref>), some had membership to multiple. For example, the Venn diagram of BRCA shows that Luminal A and Luminal B subtypes share 265 genes while Her2 and Basal shared 53, which is true in the biological concept. Luminal A and Luminal B are hormone-receptor positive subtypes whereas Her2 and Basal are hormone-receptor negative subtypes, which also reflects the aggressiveness of the cancer (i.e., hormone-receptor negative cancers grow faster). Such characteristics are well-observed in the tSNE plots in <xref ref-type="fig" rid="F7">Figure 7C</xref>.</p>
</sec>
<sec>
<title>3.3. Performance Evaluation</title>
<p>While few tools are available for multi-omics analysis with the goal of classifying cancer subtypes, all such tools aim to discover genes that have a strong correlation with one or more omics. In other words, such relational information is expected to differ between the cancer subtypes, which information is used to build classifiers or to mine subtype-specific data on genes or features. We compared the BRCA, COAD, and STAD subtype classification accuracy of five methods, which are MONTI, SNF (Wang et al., <xref ref-type="bibr" rid="B48">2014</xref>), MOFA2 (Multi-Omics Factor Analysis) (Argelaguet et al., <xref ref-type="bibr" rid="B3">2020</xref>), iCluster (Shen et al., <xref ref-type="bibr" rid="B40">2009</xref>), and PCA.</p>
<p>The three cancer data sets consist of four subtypes. In BRCA, the number of samples per subtype were 220, 152, 91, and 132 for Luminal A, Luminal B, Her2, and Basal, respectively. In COAD, the number of samples per subtype are 43, 125, 48, 99 for CMS1, CMS2, CMS3, and CMS4, respectively. In STAD, the number of samples per subtype are 188, 26, 42, and 49 for CIN, EBV, GS, and MSI, respectively.</p>
<p>The genes used for analysis were chosen by two criteria. First, only protein coding genes were selected. Second, genes where the methylation values in the TSS 2 k upstream region was missing in more than 80% of the samples were filtered out. The miRNA data was used as is and the target gene information was acquired from mirDB (Chen and Wang, <xref ref-type="bibr" rid="B13">2020</xref>). As a result, 14,514 genes were selected based on the BRCA, COAD, and STAD data sets. Methylation probes with missing values in all samples were dropped, resulting in 62,070 probes. Similarly, miRNAs with zero expression in all samples were excluded, resulting in 1,882 miRNAs. Each omics data were normalized as described in section 2.</p>
<p>The optimal number of ranks for MONTI were selected using the <monospace>nmfEstimateRank</monospace> function in the R <monospace>preprocessCore</monospace> package. For each gene-level omics data the optimal number of ranks were investigated based on the dispersion metric, from which we chose an appropriate rank number based on the elbow method. As a result, 120 ranks were chosen for BRCA, COAD and STAD. As an example, the dispersion plot of BRCA omics data are shown in <xref ref-type="fig" rid="F8">Figure 8</xref>. The feature genes omics values were used for measuring the F1 score.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>The dispersion plot using different ranks using BRCA omics data for estimating optimal NMF ranks.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0008.tif"/>
</fig>
<p>SNF (Similarity Network Fusion) integrates multi-omics data by constructing networks for each omics data in terms of the sample similarity using the omics data and then fusing the networks iteratively using the message-passing method. The principle is to keep edges between samples that are consistent across the different omics networks and to remove that are inconsistent and of low similarity. The optimal hyper parameters <italic>K</italic>, the number of neighbors in K-nearest neighbor, and <italic>T</italic>, the number of iterations for the diffusion process, where determined via the parameter grid search. The (<italic>K</italic>, <italic>T</italic>) parameters were set as (10, 30), (10, 10), and (5, 20) for BRCA, COAD, and STAD data sets, respectively. The output of SNF is the sample clusters, which was used to measure the F1 score.</p>
<p>MOFA2 utilizes matrix decomposition with the purpose of identifying sources of heterogeneity in multi-omics data sets. It decomposes multiple two-dimensional matrices, where each matrix represents an omics data type comprised of genes and samples. The decomposition yields feature matrices, each associated to one of the input omics matrices, and an additional factor matrix, which represents the activation values of each feature per sample. Thus, if three omics data are given as input, they will be decomposed into four matrices (i.e., three feature and one factor matrices). MOFA2 allows to chose the number of factors or features from the decomposed factor matrix, where we utilized as many as possible for each dataset. The maximum features that could be used was 10 for BRCA, COAD, and STAD, respectively. The output of MOFA was the Z sample factor matrix, which was used for measuring the F1 score.</p>
<p>iCluster adopts a joint latent variable model for integrative clustering of multi-omics data. iCluster aims to data mine significant associations between different omics data types through likelihood-inference using the Expectation-Maximization algorithm. iCluster supports a omics optimal weight estimation function, which we used for each data set for clustering. The output of iCluster is the sample clusters, which was used to measure the F1 score.</p>
<p>At last, sample PCA features were extracted and used for classifying the cancer subtypes. For each cancer and omics data, optimal number of PCA features were selected based on the classification accuracy via a parameter grid search. For BRCA, 10, 6, and 10 PCs were selected from gene, methylation, and miRNA data, respectively. Similarly, 8, 5, and 2 PCs for COAD and 20, 2, and 18 PCs for STAD were selected from gene, methylation, and miRNA data, respectively. The selected PCs were stacked and given as input to the random forest classifier to measure the F1 score.</p>
<p>The average F1 score was measured via 10-cross validation for each tool with configurations described above. The train and test data were split before any normalization or feature selection in each BRCA, COAD, and STAD data set. The same train and test data sets were used to measure the F1 score in each method. Furthermore, the input data were both prepared in gene-level (i.e., multi-staged) and omics-level (i.e., multi-dimension) format to observe the difference between the two integration methods. Thus, each method, except MONTI, was subject to two types of input data and were tested for classification accuracy accordingly. The tools measured with gene-level input data are labeled as SNF_g, MOFA2_g, iCluster_g, and PCA_g.</p>
<p>The comparison results are shown in <xref ref-type="fig" rid="F9">Figure 9</xref>. The F1 score was the highest in MONTI for all cancer subtypes, followed by iCluster and SNF. We observed that the gene-level input data yielded lower F1 scores in MOFA2, while it remained relatively similar in SNF, iCluster, and PCA methods. The significant drop of F1 score in MOFA2_g may be due to its feature extraction method. While the omics-level input data matrix is very dense, the gene-level matrix is relatively sparse, especially for the miRNA data. Hence, the latent factors associated with the miRNA data will loose information. Furthermore, while MONTI utilizes larger number of rank features, MOFA2 utilized 10 features, which may have reduced the dimension too much, thus, loosing more information accordingly.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>The F1 scores of five tools using gene-level and omics-level data sets of BRCA, COAD, and STAD subtypes.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0009.tif"/>
</fig>
</sec>
<sec>
<title>3.4. Analysis of Pan-Cancer Clinical Features</title>
<p>The relatively high classification accuracy of the cancer subtypes above implies that they may be explained using the feature extracted genes in terms of multi-omics. Thus, we further investigated whether clinical attributes, other than cancer subtypes, such as gender, mutation groups or metastasis can be explained using multi-omics data. Among the many clinical attributes, categorical attributes with &#x0003C;5 groups were used. Also, clinical attributes with high sample bias were excluded. As a result, a total of nine cancer types and 95 clinical attributes were analyzed using mRNA, methylation and miRNA data. For example, the &#x0201C;Pathologic M&#x0201D; feature of STAD, which is the TNM staging of metastasis, has three classes, which are M0, M1, and MX. If the cancer has spread, the sample is labeled as M0, and if not it is labeled as M1. If metastasis cannot be measured, it is labeled as MX. Thus, similar to the cancer subtype classification, we measured the classification accuracy of each of the categorical clinical attributes that were selected by the criteria described above. The details of the data set and clinical attributes are provided in <xref ref-type="supplementary-material" rid="SM2">Supplementary Table 2</xref>.</p>
<p>MONTI was executed on each cancer type and each clinical feature as described in section 2. The classification accuracy of the cancer clinical attributes are shown in <xref ref-type="fig" rid="F10">Figure 10</xref>. Here, we observed that some clinical attributes were well classified while others showed poor classification.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>The radar chart showing classification results of nine cancers and their clinical attributes.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0010.tif"/>
</fig>
<p>All cancer subtypes showed relatively high accuracy in BRCA, COAD, STAD, and PRAD (Prostate adenocarcinoma), which hints that the multi-omics profile is highly correlated with cancer molecular subtypes. Also, while mutation data was not utilized, the BRAF and RAS mutation classes were well distinguished in THCA (Thyroid carcinoma). From such result, we may infer that at least mRNA, methylation and miRNA omics have causal relationship with BRAF and RAS mutations, which was also reported in Agrawal et al. (<xref ref-type="bibr" rid="B1">2014</xref>). In case of HNSC (Head and Neck squamous cell carcinoma), the gender attribute was classified with almost perfect accuracy, which was also reported in Yuan et al. (<xref ref-type="bibr" rid="B52">2016</xref>).</p>
<p>The Pan-cancer analysis results show that some clinical attributes are able to be explained using mRNA, methylation and miRNA data while others need further investigation using other omics or clinical data. Collectively, we find that such results may help selecting omics when performing research on clinical features in a cancer cohort.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<p>While not shown in this study, the subtype classification accuracy decreased when involving certain omics types, particularly with the use of mutation profile data. For BRCA data, the accuracy dropped below 0.75 when SNP data were included in the tensor. The first short-coming of the SNP data was its extreme sparseness (i.e., 0.5% genes with SNP). We further attempted to impute the remaining missing values using the network-based stratification method for tumor mutations (Hofree et al., <xref ref-type="bibr" rid="B23">2013</xref>). Unfortunately, the accuracy further decreased, which may be due to the introduction of additional uncertainty arising from large number of predictions. For sparse data, integration methods that are not gene-centric may be more advantageous, such as SNF. Such result implies that no single method may be universally applicable for incorporating all types of omics data, and that omics data must be well understood and integrated in a manner specific to the characteristics of each omics. Similar arguments have been discussed previously (Zhang et al., <xref ref-type="bibr" rid="B54">2018</xref>).</p>
<p>Clustering of the selected sample features from the <italic>C</italic><sub><italic>p</italic></sub> component of the BRCA analysis result shows us that the Basal samples are well clustered together, whereas the Luminal A and Luminal B subtypes are relatively more mixed (<xref ref-type="fig" rid="F11">Figure 11A</xref>). Similarly, the clustering of selected feature genes from the <italic>C</italic><sub><italic>g</italic></sub> component showed the feature activity of genes (<xref ref-type="fig" rid="F11">Figure 11B</xref>). Here, the top color bars represent the maximum omics type of each feature. The feature four related genes had strong relation with methylation. Genes with high values in multiple features that are related with different omics types indicate that the gene has relationship across the two different omics types.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p><bold>(A)</bold> The cluster heatmap of sample features (<italic>C</italic><sub><italic>p</italic></sub>) and <bold>(B)</bold> the cluster heatmap of the feature genes (<italic>C</italic><sub><italic>g</italic></sub>) from the breast cancer result. The left color bars in <bold>(A)</bold> refer to the BRCA subtypes. The top color bars in <bold>(B)</bold> refer to the omics with the largest feature value in <italic>C</italic><sub><italic>o</italic></sub>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0011.tif"/>
</fig>
<p>Furthermore, the selected features in all three case studies captured correlation among different omics data types. As shown in <xref ref-type="fig" rid="F12">Figure 12</xref>, EXOC6 was most affected by DNA methylation in Basal subtype of BRCA. EXOC6 is reported to be an important respondent gene when the effects of a combination of the histone deacetylase inhibitor suberoylanilide hydroxamic acid (SAHA) and taxanes were tested for cytotoxicity using human breast cancer cell lines (Chang et al., <xref ref-type="bibr" rid="B11">2011</xref>). Also, EXOC6 was found to be one out of five genes that was able to asses breast cancer risk with high accuracy (Winham et al., <xref ref-type="bibr" rid="B50">2017</xref>). While EXOC6 was observed to have distinct methylation profiles in brain tissues (Farlik et al., <xref ref-type="bibr" rid="B14">2016</xref>; Hira and Gillies, <xref ref-type="bibr" rid="B22">2016</xref>), it was not actively investigated in breast cancer Basal subtype samples in terms of multi-omics correlation. The OLFML2B gene was found to be negatively correlated with miRNA in the CMS4 subtype in COAD. We found that the miRNA OLFML2B targeting miRNA, miR-30b, is a well-known oncogene suppressor miRNA in colorectal cancer (Liao et al., <xref ref-type="bibr" rid="B30">2014</xref>), which may explain the omics relationship here. At last, the MAPK15 has been reported to be a regulator for redioresistance in nasopharyngeal carcinoma cells, which is tightly linked to the Epstein-Barr virus (EBV) infection (Li et al., <xref ref-type="bibr" rid="B29">2018</xref>), which may relate to the EBV subtype of STAD. Collectively, we may induce that the MAPK15&#x00027;s expression is down-regulated by methylation, which was not the case in other STAD subtypes. Other than the selected genes, well known multi-omics correlated genes related to certain cancer subtypes were also detected. Although data not shown, the ESPL1, detected by MONTI, showed significant regulatory relationship between gene expression and methylation specific to Luminal A and Luminal B subtypes in BRCA, which was previously reported in Finetti et al. (<xref ref-type="bibr" rid="B15">2014</xref>) and Li and Li (<xref ref-type="bibr" rid="B26">2020</xref>).</p>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>Three genes were selected to show correlation between different types of omics data across patient samples. EXOC6 was associated with the Basal subtype of BRCA, OLFML2B was associated with CMS4 subtype of COAD and MAPK15 was associated with the EBV subtype of STAD.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-682841-g0012.tif"/>
</fig>
<p>OLFML2B was most affected by miRNA in CMS4 subtype of COAD. MAPK15 also showed strong gene expression regulation by methylation in EBV subtype of STAD. This kind of result by MONTI may suggest cancer subtype specific gene regulation mechanisms, which can help discover subtype-specific gene markers for further biological and clinical investigations.</p>
<p>The genes were further examined to see if they captured known signals of cancer subtype specific pathways by applying the Subsystem Activation Scoring (SAS) method (Lim et al., <xref ref-type="bibr" rid="B32">2016</xref>). SAS is used to decompose molecular pathways into sub-pathways (named subsystems) and measure the activation levels of them in terms of gene expression. We expanded it to multi-omics levels to evaluate the association of each subsystem with each cancer subtype by constructing random forest classifiers using its SAS score. The detailed method and results are described in <xref ref-type="supplementary-material" rid="SM3">Supplementary Table 3</xref>. The detected pathway subsystems were highly specific to each cancer type. For example, the top 10 ranked pathways for the three case studies were all supported by previous studies. For example, the &#x0201C;Fanconi anemia&#x0201D; pathway was the top ranked pathway for the BRCA data, which is known to be a rare chromosomal instability disorder that is susceptible to cancer (Alan and D&#x00027;Andrea, <xref ref-type="bibr" rid="B2">2010</xref>). The &#x0201C;HIF-1 signaling&#x0201D; pathway was top ranked in STAD with association to miRNA. The study (He et al., <xref ref-type="bibr" rid="B20">2017</xref>) suggests that miR-224 promotes cell growth migration and invasion by targeting the RASSF8 gene in STAD. Similarly, the top ranked &#x0201C;Vascular smooth muscle contraction&#x0201D; pathway by SAS was also reported to be induced by colorectal cancer (Li et al., <xref ref-type="bibr" rid="B27">2017</xref>).</p>
<p>The application of MONTI was demonstrated on cancer subtype multi-omics data. However, MONTI is not tailored to cancer subtype analysis but can be utilized to identify any categorical clinical features, such as gender, mutation groups, tumor grade, or age. Thus, the advantage of MONTI is that it is able to identify clinical feature associated genes in terms of multi-omics. Furthermore, the omics component <italic>C</italic><sub><italic>o</italic></sub> can be further used to investigate which omics are currently active and take part in gene expression regulation.</p>
</sec>
<sec sec-type="data-availability" id="s5">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found at: TCGA multi-omics data.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>SK and IJ designed the project and MONTI algorithm framework. IJ and SR implemented multi-omics integration. SK, IJ, SL, and MK performed the biological analysis and interpretation.</p>
</sec>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>This research was supported by the Bio &#x00026; Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (2019M3E5D3073365), the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (No. NRF-2014M3C9A3063541), and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020M3C9A5085604).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2021.682841/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2021.682841/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_2.XLSX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_3.XLSX" id="SM3" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agrawal</surname> <given-names>N.</given-names></name> <name><surname>Akbani</surname> <given-names>R.</given-names></name> <name><surname>Aksoy</surname> <given-names>B. A.</given-names></name> <name><surname>Ally</surname> <given-names>A.</given-names></name> <name><surname>Arachchi</surname> <given-names>H.</given-names></name> <name><surname>Asa</surname> <given-names>S. L.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Integrated genomic characterization of papillary thyroid carcinoma</article-title>. <source>Cell</source> <volume>159</volume>, <fpage>676</fpage>&#x02013;<lpage>690</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2014.09.050</pub-id><pub-id pub-id-type="pmid">25417114</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alan</surname> <given-names>D.</given-names></name> <name><surname>D&#x00027;Andrea</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>The fanconi anemia and breast cancer susceptibility pathways</article-title>. <source>N. Engl. J. Med</source>. <volume>362</volume>, <fpage>1909</fpage>&#x02013;<lpage>1919</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMra0809889</pub-id><pub-id pub-id-type="pmid">20484397</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Argelaguet</surname> <given-names>R.</given-names></name> <name><surname>Arnol</surname> <given-names>D.</given-names></name> <name><surname>Bredikhin</surname> <given-names>D.</given-names></name> <name><surname>Deloro</surname> <given-names>Y.</given-names></name> <name><surname>Velten</surname> <given-names>B.</given-names></name> <name><surname>Marioni</surname> <given-names>J. C.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>MOFA&#x0002B;: a statistical framework for comprehensive integration of multi-modal single-cell data</article-title>. <source>Genome Biol</source>. <volume>21</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1186/s13059-020-02015-1</pub-id><pub-id pub-id-type="pmid">32393329</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Argelaguet</surname> <given-names>R.</given-names></name> <name><surname>Velten</surname> <given-names>B.</given-names></name> <name><surname>Arnol</surname> <given-names>D.</given-names></name> <name><surname>Dietrich</surname> <given-names>S.</given-names></name> <name><surname>Zenz</surname> <given-names>T.</given-names></name> <name><surname>Marioni</surname> <given-names>J. C.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Multi-omics factor analysis&#x02013;a framework for unsupervised integration of multi-omics data sets</article-title>. <source>Mol. Syst. Biol</source>. <volume>14</volume>:<fpage>e8124</fpage>. <pub-id pub-id-type="doi">10.15252/msb.20178124</pub-id><pub-id pub-id-type="pmid">29925568</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bro</surname> <given-names>R.</given-names></name></person-group> (<year>1997</year>). <article-title>Parafac. Tutorial and applications</article-title>. <source>Chemometr. Intell. Lab. Syst</source>. <volume>38</volume>, <fpage>149</fpage>&#x02013;<lpage>171</lpage>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buenrostro</surname> <given-names>J. D.</given-names></name> <name><surname>Wu</surname> <given-names>B.</given-names></name> <name><surname>Chang</surname> <given-names>H. Y.</given-names></name> <name><surname>Greenleaf</surname> <given-names>W. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Atac-seq: a method for assaying chromatin accessibility genome-wide</article-title>. <source>Curr. Protoc. Mol. Biol</source>. <volume>109</volume>, <fpage>21</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1002/0471142727.mb2129s109</pub-id><pub-id pub-id-type="pmid">25559105</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><collab>Cancer Genome Atlas Network</collab></person-group> (<year>2012</year>). <article-title>Comprehensive molecular portraits of human breast tumours</article-title>. <source>Nature</source> <volume>490</volume>, <fpage>61</fpage>&#x02013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1038/nature11412</pub-id><pub-id pub-id-type="pmid">23000897</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><collab>Cancer Genome Atlas Research Network</collab></person-group> (<year>2014</year>). <article-title>Comprehensive molecular characterization of gastric adenocarcinoma</article-title>. <source>Nature</source> <volume>513</volume>, <fpage>202</fpage>&#x02013;<lpage>209</lpage>. <pub-id pub-id-type="doi">10.1038/nature13480</pub-id><pub-id pub-id-type="pmid">25079317</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carithers</surname> <given-names>L. J.</given-names></name> <name><surname>Ardlie</surname> <given-names>K.</given-names></name> <name><surname>Barcus</surname> <given-names>M.</given-names></name> <name><surname>Branton</surname> <given-names>P. A.</given-names></name> <name><surname>Britton</surname> <given-names>A.</given-names></name> <name><surname>Buia</surname> <given-names>S. A.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>A novel approach to high-quality postmortem tissue procurement: the GTEx project</article-title>. <source>Biopreserv. Biobank</source>. <volume>13</volume>, <fpage>311</fpage>&#x02013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1089/bio.2015.0032</pub-id><pub-id pub-id-type="pmid">26484571</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carroll</surname> <given-names>J. D.</given-names></name> <name><surname>Chang</surname> <given-names>J.-J.</given-names></name></person-group> (<year>1970</year>). <article-title>Analysis of individual differences in multidimensional scaling via an n-way generalization of &#x0201C;Eckart&#x02013;young&#x0201D; decomposition</article-title>. <source>Psychometrika</source> <volume>35</volume>, <fpage>283</fpage>&#x02013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1007/BF02310791</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>H.</given-names></name> <name><surname>Jeung</surname> <given-names>H.-C.</given-names></name> <name><surname>Jung</surname> <given-names>J. J.</given-names></name> <name><surname>Kim</surname> <given-names>T. S.</given-names></name> <name><surname>Rha</surname> <given-names>S. Y.</given-names></name> <name><surname>Chung</surname> <given-names>H. C.</given-names></name></person-group> (<year>2011</year>). <article-title>Identification of genes associated with chemosensitivity to saha/taxane combination treatment in taxane-resistant breast cancer cells</article-title>. <source>Breast Cancer Res. Treatm</source>. <volume>125</volume>, <fpage>55</fpage>&#x02013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1007/s10549-010-0825-z</pub-id><pub-id pub-id-type="pmid">20224928</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaudhary</surname> <given-names>K.</given-names></name> <name><surname>Poirion</surname> <given-names>O. B.</given-names></name> <name><surname>Lu</surname> <given-names>L.</given-names></name> <name><surname>Garmire</surname> <given-names>L. X.</given-names></name></person-group> (<year>2017</year>). <article-title>Deep learning based multi-omics integration robustly predicts survival in liver cancer</article-title>. <source>Clin. Cancer Res</source>. <volume>24</volume>, <fpage>1248</fpage>&#x02013;<lpage>1259</lpage>. <pub-id pub-id-type="doi">10.1101/114892</pub-id><pub-id pub-id-type="pmid">28982688</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>miRDB: an online database for prediction of functional microRNA targets</article-title>. <source>Nucl. Acids Res</source>. <volume>48</volume>, <fpage>D127</fpage>&#x02013;<lpage>D131</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkz757</pub-id><pub-id pub-id-type="pmid">31504780</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Farlik</surname> <given-names>M.</given-names></name> <name><surname>Halbritter</surname> <given-names>F.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>F.</given-names></name> <name><surname>Choudry</surname> <given-names>F. A.</given-names></name> <name><surname>Ebert</surname> <given-names>P.</given-names></name> <name><surname>Klughammer</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>DNA methylation dynamics of human hematopoietic stem cell differentiation</article-title>. <source>Cell Stem Cell</source> <volume>19</volume>, <fpage>808</fpage>&#x02013;<lpage>822</lpage>. <pub-id pub-id-type="doi">10.1016/j.stem.2016.10.019</pub-id><pub-id pub-id-type="pmid">27867036</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finetti</surname> <given-names>P.</given-names></name> <name><surname>Guille</surname> <given-names>A.</given-names></name> <name><surname>Adelaide</surname> <given-names>J.</given-names></name> <name><surname>Birnbaum</surname> <given-names>D.</given-names></name> <name><surname>Chaffanet</surname> <given-names>M.</given-names></name> <name><surname>Bertucci</surname> <given-names>F.</given-names></name></person-group> (<year>2014</year>). <article-title>ESPL1 is a candidate oncogene of luminal b breast cancers</article-title>. <source>Breast Cancer Res. Treatm</source>. <volume>147</volume>, <fpage>51</fpage>&#x02013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1007/s10549-014-3070-z</pub-id><pub-id pub-id-type="pmid">25086634</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Furey</surname> <given-names>T. S.</given-names></name> <name><surname>Cristianini</surname> <given-names>N.</given-names></name> <name><surname>Duffy</surname> <given-names>N.</given-names></name> <name><surname>Bednarski</surname> <given-names>D. W.</given-names></name> <name><surname>Schummer</surname> <given-names>M.</given-names></name> <name><surname>Haussler</surname> <given-names>D.</given-names></name></person-group> (<year>2000</year>). <article-title>Support vector machine classification and validation of cancer tissue samples using microarray expression data</article-title>. <source>Bioinformatics</source> <volume>16</volume>, <fpage>906</fpage>&#x02013;<lpage>914</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/16.10.906</pub-id><pub-id pub-id-type="pmid">11120680</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gevaert</surname> <given-names>O.</given-names></name> <name><surname>Smet</surname> <given-names>F. D.</given-names></name> <name><surname>Timmerman</surname> <given-names>D.</given-names></name> <name><surname>Moreau</surname> <given-names>Y.</given-names></name> <name><surname>Moor</surname> <given-names>B. D.</given-names></name></person-group> (<year>2006</year>). <article-title>Predicting the prognosis of breast cancer by integrating clinical and microarray data with BAYESIAN networks</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>e184</fpage>&#x02013;<lpage>e190</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btl230</pub-id><pub-id pub-id-type="pmid">16873470</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Harshman</surname> <given-names>R. A.</given-names></name></person-group> (<year>1970</year>). <article-title>Foundations of the parafac procedure: Models and conditions for an&#x00022; explanatory&#x00022; multimodal factor analysis,</article-title> in <source>UCLA Working Papers in Phonetics</source> (<publisher-loc>Los Angeles, CA</publisher-loc>).</citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hasin</surname> <given-names>Y.</given-names></name> <name><surname>Seldin</surname> <given-names>M.</given-names></name> <name><surname>Lusis</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Multi-omics approaches to disease</article-title>. <source>Genome Biol</source>. <volume>18</volume>:<fpage>83</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-017-1215-1</pub-id><pub-id pub-id-type="pmid">28476144</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Hypoxia-inducible microrna-224 promotes the cell growth, migration and invasion by directly targeting rassf8 in gastric cancer</article-title>. <source>Mol. Cancer</source> <volume>16</volume>:<fpage>35</fpage>. <pub-id pub-id-type="doi">10.1186/s12943-017-0603-1</pub-id><pub-id pub-id-type="pmid">28173803</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hern&#x000E1;ndez-de Diego</surname> <given-names>R.</given-names></name> <name><surname>Tarazona</surname> <given-names>S.</given-names></name> <name><surname>Mart&#x000ED;nez-Mira</surname> <given-names>C.</given-names></name> <name><surname>Balzano-Nogueira</surname> <given-names>L.</given-names></name> <name><surname>Furio-Tari</surname> <given-names>P.</given-names></name> <name><surname>Pappas</surname> <given-names>G. J.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>PaintOmics 3: a web resource for the pathway analysis and visualization of multi-omics data</article-title>. <source>Nucl. Acids Res</source>. <volume>46</volume>, <fpage>W503</fpage>&#x02013;<lpage>W509</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky466</pub-id><pub-id pub-id-type="pmid">29800320</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hira</surname> <given-names>Z. M.</given-names></name> <name><surname>Gillies</surname> <given-names>D. F.</given-names></name></person-group> (<year>2016</year>). <article-title>Identifying significant features in cancer methylation data using gene pathway segmentation</article-title>. <source>Cancer Inform</source>. <volume>15</volume>, <fpage>189</fpage>&#x02013;<lpage>198</lpage>. <pub-id pub-id-type="doi">10.4137/CIN.S39859</pub-id><pub-id pub-id-type="pmid">27688706</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hofree</surname> <given-names>M.</given-names></name> <name><surname>Shen</surname> <given-names>J. P.</given-names></name> <name><surname>Carter</surname> <given-names>H.</given-names></name> <name><surname>Gross</surname> <given-names>A.</given-names></name> <name><surname>Ideker</surname> <given-names>T.</given-names></name></person-group> (<year>2013</year>). <article-title>Network-based stratification of tumor mutations</article-title>. <source>Nat. Methods</source> <volume>10</volume>:<fpage>1108</fpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2651</pub-id><pub-id pub-id-type="pmid">24037242</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>S.</given-names></name> <name><surname>Chaudhary</surname> <given-names>K.</given-names></name> <name><surname>Garmire</surname> <given-names>L. X.</given-names></name></person-group> (<year>2017</year>). <article-title>More is better: recent progress in multi-omics data integration methods</article-title>. <source>Front. Genet</source>. <volume>8</volume>:<fpage>84</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2017.00084</pub-id><pub-id pub-id-type="pmid">28670325</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kroonenberg</surname> <given-names>P. M.</given-names></name></person-group> (<year>1983</year>). <source>Three-Mode Principal Component Analysis: Theory and Applications, Vol. 2</source>. <publisher-loc>Los Angeles, CA</publisher-loc>: <publisher-name>DSWO Press</publisher-name>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>Comprehensive analysis of prognosis-related methylated sites in breast carcinoma</article-title>. <source>Mol. Genet. Genom. Med</source>. <volume>8</volume>:<fpage>e1161</fpage>. <pub-id pub-id-type="doi">10.1002/mgg3.1161</pub-id><pub-id pub-id-type="pmid">32037691</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.-W.</given-names></name> <name><surname>Wang</surname> <given-names>H.-Y.</given-names></name> <name><surname>Nie</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>Y.-B.</given-names></name> <name><surname>Han</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>B.-H.</given-names></name></person-group> (<year>2017</year>). <article-title>Human colorectal cancer cells induce vascular smooth muscle cell apoptosis in an exocrine manner</article-title>. <source>Oncotarget</source> <volume>8</volume>:<fpage>62049</fpage>. <pub-id pub-id-type="doi">10.18632/oncotarget.18893</pub-id><pub-id pub-id-type="pmid">28977925</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Oosting</surname> <given-names>M.</given-names></name> <name><surname>Smeekens</surname> <given-names>S. P.</given-names></name> <name><surname>Jaeger</surname> <given-names>M.</given-names></name> <name><surname>Aguirre-Gamboa</surname> <given-names>R.</given-names></name> <name><surname>Le</surname> <given-names>K. T.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>A functional genomics approach to understand variation in cytokine production in humans</article-title>. <source>Cell</source> <volume>167</volume>, <fpage>1099</fpage>&#x02013;<lpage>1110</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2016.10.017</pub-id><pub-id pub-id-type="pmid">27814507</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Li</surname> <given-names>N.</given-names></name> <name><surname>Shen</surname> <given-names>L.</given-names></name> <name><surname>Fu</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Quantitative proteomic analysis identifies MAPK15 as a potential regulator of radioresistance in nasopharyngeal carcinoma cells</article-title>. <source>Front. Oncol</source>. <volume>8</volume>:<fpage>548</fpage>. <pub-id pub-id-type="doi">10.3389/fonc.2018.00548</pub-id><pub-id pub-id-type="pmid">30524968</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>W.-T.</given-names></name> <name><surname>Ye</surname> <given-names>Y.-P.</given-names></name> <name><surname>Zhang</surname> <given-names>N.-J.</given-names></name> <name><surname>Li</surname> <given-names>T.-T.</given-names></name> <name><surname>Wang</surname> <given-names>S.-Y.</given-names></name> <name><surname>Cui</surname> <given-names>Y.-M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>MicroRNA-30b functions as a tumour suppressor in human colorectal cancer by targeting KRAS, PIK3CD and BCL2</article-title>. <source>J. Pathol</source>. <volume>232</volume>, <fpage>415</fpage>&#x02013;<lpage>427</lpage>. <pub-id pub-id-type="doi">10.1002/path.4309</pub-id><pub-id pub-id-type="pmid">24293274</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lim</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name> <name><surname>Jung</surname> <given-names>I.</given-names></name> <name><surname>Rhee</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Comprehensive and critical evaluation of individualized pathway activity measurement tools on pan-cancer data</article-title>. <source>Brief. Bioinform</source>. <volume>21</volume>, <fpage>36</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby097</pub-id><pub-id pub-id-type="pmid">30534977</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lim</surname> <given-names>S.</given-names></name> <name><surname>Park</surname> <given-names>Y.</given-names></name> <name><surname>Hur</surname> <given-names>B.</given-names></name> <name><surname>Kim</surname> <given-names>M.</given-names></name> <name><surname>Han</surname> <given-names>W.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Protein interaction network (PIN)-based breast cancer subsystem identification and activation measurement for prognostic modeling</article-title>. <source>Methods</source> <volume>110</volume>, <fpage>81</fpage>&#x02013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymeth.2016.06.015</pub-id><pub-id pub-id-type="pmid">27329435</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paquet</surname> <given-names>E. R.</given-names></name> <name><surname>Hallett</surname> <given-names>M. T.</given-names></name></person-group> (<year>2015</year>). <article-title>Absolute assignment of breast cancer intrinsic molecular subtype</article-title>. <source>J. Natl. Cancer Instit</source>. <volume>10</volume>:<fpage>357</fpage>. <pub-id pub-id-type="doi">10.1093/jnci/dju357</pub-id><pub-id pub-id-type="pmid">25479802</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>P. J.</given-names></name></person-group> (<year>2009</year>). <article-title>Chip-seq: advantages and challenges of a maturing technology</article-title>. <source>Nat. Rev. Genet</source>. <volume>10</volume>:<fpage>669</fpage>. <pub-id pub-id-type="doi">10.1038/nrg2641</pub-id><pub-id pub-id-type="pmid">19736561</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parker</surname> <given-names>J. S.</given-names></name> <name><surname>Mullins</surname> <given-names>M.</given-names></name> <name><surname>Cheang</surname> <given-names>M. C.</given-names></name> <name><surname>Leung</surname> <given-names>S.</given-names></name> <name><surname>Voduc</surname> <given-names>D.</given-names></name> <name><surname>Vickery</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Supervised risk predictor of breast cancer based on intrinsic subtypes</article-title>. <source>J. Clin. Oncol</source>. <volume>27</volume>:<fpage>1160</fpage>. <pub-id pub-id-type="doi">10.1200/JCO.2008.18.1370</pub-id><pub-id pub-id-type="pmid">19204204</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramaswamy</surname> <given-names>S.</given-names></name> <name><surname>Tamayo</surname> <given-names>P.</given-names></name> <name><surname>Rifkin</surname> <given-names>R.</given-names></name> <name><surname>Mukherjee</surname> <given-names>S.</given-names></name> <name><surname>Yeang</surname> <given-names>C.-H.</given-names></name> <name><surname>Angelo</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2001</year>). <article-title>Multiclass cancer diagnosis using tumor gene expression signatures</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>98</volume>, <fpage>15149</fpage>&#x02013;<lpage>15154</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.211566398</pub-id><pub-id pub-id-type="pmid">11742071</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ritchie</surname> <given-names>M. D.</given-names></name> <name><surname>Holzinger</surname> <given-names>E. R.</given-names></name> <name><surname>Li</surname> <given-names>R.</given-names></name> <name><surname>Pendergrass</surname> <given-names>S. A.</given-names></name> <name><surname>Kim</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>Methods of integrating data to uncover genotype-phenotype interactions</article-title>. <source>Nat. Rev. Genet</source>. <volume>16</volume>, <fpage>85</fpage>&#x02013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3868</pub-id><pub-id pub-id-type="pmid">25582081</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sathyanarayanan</surname> <given-names>A.</given-names></name> <name><surname>Gupta</surname> <given-names>R.</given-names></name> <name><surname>Thompson</surname> <given-names>E. W.</given-names></name> <name><surname>Nyholt</surname> <given-names>D. R.</given-names></name> <name><surname>Bauer</surname> <given-names>D. C.</given-names></name> <name><surname>Nagaraj</surname> <given-names>S. H.</given-names></name></person-group> (<year>2020</year>). <article-title>A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping</article-title>. <source>Brief. Bioinformatics</source> <volume>21</volume>, <fpage>1920</fpage>&#x02013;<lpage>1936</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz121</pub-id><pub-id pub-id-type="pmid">31774481</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>R.</given-names></name> <name><surname>Mo</surname> <given-names>Q.</given-names></name> <name><surname>Schultz</surname> <given-names>N.</given-names></name> <name><surname>Seshan</surname> <given-names>V. E.</given-names></name> <name><surname>Olshen</surname> <given-names>A. B.</given-names></name> <name><surname>Huse</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Integrative subtype discovery in glioblastoma using icluster</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e35236</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0035236</pub-id><pub-id pub-id-type="pmid">22539962</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>R.</given-names></name> <name><surname>Olshen</surname> <given-names>A. B.</given-names></name> <name><surname>Ladanyi</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis</article-title>. <source>Bioinformatics</source> <volume>25</volume>, <fpage>2906</fpage>&#x02013;<lpage>2912</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp543</pub-id><pub-id pub-id-type="pmid">19759197</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sotiriou</surname> <given-names>C.</given-names></name> <name><surname>Neo</surname> <given-names>S.-Y.</given-names></name> <name><surname>McShane</surname> <given-names>L. M.</given-names></name> <name><surname>Korn</surname> <given-names>E. L.</given-names></name> <name><surname>Long</surname> <given-names>P. M.</given-names></name> <name><surname>Jazaeri</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Breast cancer classification and prognosis based on gene expression profiles from a population-based study</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>100</volume>, <fpage>10393</fpage>&#x02013;<lpage>10398</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1732912100</pub-id><pub-id pub-id-type="pmid">12917485</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Subramanian</surname> <given-names>I.</given-names></name> <name><surname>Verma</surname> <given-names>S.</given-names></name> <name><surname>Kumar</surname> <given-names>S.</given-names></name> <name><surname>Jere</surname> <given-names>A.</given-names></name> <name><surname>Anamika</surname> <given-names>K.</given-names></name></person-group> (<year>2020</year>). <article-title>Multi-omics data integration, interpretation, and its application</article-title>. <source>Bioinform. Biol. Insights</source> <volume>14</volume>:<fpage>1177932219899051</fpage>. <pub-id pub-id-type="doi">10.1177/1177932219899051</pub-id><pub-id pub-id-type="pmid">32076369</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tahara</surname> <given-names>T.</given-names></name> <name><surname>Arisawa</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>Dna methylation as a molecular biomarker in gastric cancer</article-title>. <source>Epigenomics</source> <volume>7</volume>, <fpage>475</fpage>&#x02013;<lpage>486</lpage>. <pub-id pub-id-type="doi">10.2217/epi.15.4</pub-id><pub-id pub-id-type="pmid">26077432</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tao</surname> <given-names>M.</given-names></name> <name><surname>Song</surname> <given-names>T.</given-names></name> <name><surname>Du</surname> <given-names>W.</given-names></name> <name><surname>Han</surname> <given-names>S.</given-names></name> <name><surname>Zuo</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Classifying breast cancer subtypes using multiple kernel learning based on omics data</article-title>. <source>Genes</source> <volume>10</volume>:<fpage>200</fpage>. <pub-id pub-id-type="doi">10.3390/genes10030200</pub-id><pub-id pub-id-type="pmid">30866472</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><collab>The ENCODE Project Consortium</collab></person-group> (<year>2012</year>). <article-title>An integrated encyclopedia of DNA elements in the human genome</article-title>. <source>Nature</source> <volume>489</volume>:<fpage>57</fpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2238</pub-id><pub-id pub-id-type="pmid">23281567</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vasaikar</surname> <given-names>S. V.</given-names></name> <name><surname>Straub</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>LinkedOmics: analyzing multi-omics data within and across 32 cancer types</article-title>. <source>Nucleic Acids Res</source>. <volume>46</volume>, <fpage>D956</fpage>&#x02013;<lpage>D963</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx1090</pub-id><pub-id pub-id-type="pmid">29136207</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaske</surname> <given-names>C. J.</given-names></name> <name><surname>Benz</surname> <given-names>S. C.</given-names></name> <name><surname>Sanborn</surname> <given-names>J. Z.</given-names></name> <name><surname>Earl</surname> <given-names>D.</given-names></name> <name><surname>Szeto</surname> <given-names>C.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using paradigm</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>i237</fpage>&#x02013;<lpage>i245</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq182</pub-id><pub-id pub-id-type="pmid">20529912</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>B.</given-names></name> <name><surname>Mezlini</surname> <given-names>A. M.</given-names></name> <name><surname>Demir</surname> <given-names>F.</given-names></name> <name><surname>Fiume</surname> <given-names>M.</given-names></name> <name><surname>Tu</surname> <given-names>Z.</given-names></name> <name><surname>Brudno</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Similarity network fusion for aggregating data types on a genomic scale</article-title>. <source>Nat. Methods</source> <volume>11</volume>:<fpage>333</fpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2810</pub-id><pub-id pub-id-type="pmid">24464287</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weinstein</surname> <given-names>J. N.</given-names></name> <name><surname>Collisson</surname> <given-names>E. A.</given-names></name> <name><surname>Mills</surname> <given-names>G. B.</given-names></name> <name><surname>Shaw</surname> <given-names>K. R. M.</given-names></name> <name><surname>Ozenberger</surname> <given-names>B. A.</given-names></name> <name><surname>Ellrott</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>The cancer genome atlas pan-cancer analysis project</article-title>. <source>Nat. Genet</source>. <volume>45</volume>:<fpage>1113</fpage>. <pub-id pub-id-type="doi">10.1038/ng.2764</pub-id><pub-id pub-id-type="pmid">25936886</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Winham</surname> <given-names>S. J.</given-names></name> <name><surname>Mehner</surname> <given-names>C.</given-names></name> <name><surname>Heinzen</surname> <given-names>E. P.</given-names></name> <name><surname>Broderick</surname> <given-names>B. T.</given-names></name> <name><surname>Stallings-Mann</surname> <given-names>M.</given-names></name> <name><surname>Nassar</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Nanostring-based breast cancer risk prediction for women with sclerosing adenosis</article-title>. <source>Breast Cancer Res. Treat</source>. <volume>166</volume>, <fpage>641</fpage>&#x02013;<lpage>650</lpage>. <pub-id pub-id-type="doi">10.1007/s10549-017-4441-z</pub-id><pub-id pub-id-type="pmid">28798985</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Jiang</surname> <given-names>R.</given-names></name> <name><surname>Lu</surname> <given-names>X.</given-names></name> <name><surname>Tian</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>A pathways-based prediction model for classifying breast cancer subtypes</article-title>. <source>Oncotarget</source> <volume>8</volume>:<fpage>58809</fpage>. <pub-id pub-id-type="doi">10.18632/oncotarget.18544</pub-id><pub-id pub-id-type="pmid">28938599</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Mao</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Comprehensive characterization of molecular differences in cancer between male and female patients</article-title>. <source>Cancer Cell</source> <volume>29</volume>, <fpage>711</fpage>&#x02013;<lpage>722</lpage>. <pub-id pub-id-type="doi">10.1016/j.ccell.2016.04.001</pub-id><pub-id pub-id-type="pmid">27165743</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>C.-C.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Shen</surname> <given-names>H.</given-names></name> <name><surname>Laird</surname> <given-names>P. W.</given-names></name> <name><surname>Zhou</surname> <given-names>X. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Discovery of multi-dimensional modules by integrative analysis of cancer genomic data</article-title>. <source>Nucl. Acids Res</source>. <volume>40</volume>, <fpage>9379</fpage>&#x02013;<lpage>9391</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks725</pub-id><pub-id pub-id-type="pmid">22879375</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Ideker</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Classifying tumors by supervised network propagation</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>i484</fpage>&#x02013;<lpage>i493</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty247</pub-id><pub-id pub-id-type="pmid">30726869</pub-id></citation></ref>
</ref-list>
</back>
</article>