<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">767358</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2021.767358</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies</article-title>
<alt-title alt-title-type="left-running-head">Marczyk et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Association Integration in GWAS GSA</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Marczyk</surname>
<given-names>Michal</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/437385/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Macioszek</surname>
<given-names>Agnieszka</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tobiasz</surname>
<given-names>Joanna</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1461976/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Polanska</surname>
<given-names>Joanna</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/680024/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zyla</surname>
<given-names>Joanna</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1303248/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Department of Data Science and Engineering, Silesian University of Technology, <addr-line>Gliwice</addr-line>, <country>Poland</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Yale Cancer Center, Yale School of Medicine, <addr-line>New Haven</addr-line>, <addr-line>CT</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/114086/overview">Sorin Draghici</ext-link>, Wayne State University, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/775014/overview">Rostam Abdollahi-Arpanahi</ext-link>, University of Georgia, United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/930636/overview">Le Li</ext-link>, Cornell University, United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Joanna Polanska, <email>joanna.polanska@polsl.pl</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>767358</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Marczyk, Macioszek, Tobiasz, Polanska and Zyla.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Marczyk, Macioszek, Tobiasz, Polanska and Zyla</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar&#x2019;s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum <italic>p</italic>-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum <italic>p</italic>-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.</p>
</abstract>
<kwd-group>
<kwd>gene set analysis</kwd>
<kwd>genome-wide association study</kwd>
<kwd>statistical integration</kwd>
<kwd>single-nucleotide polymorphism</kwd>
<kwd>linkage disequilibrium correction</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Genome-wide association study (GWAS) is a high-throughput molecular biology technique, which gives insight into understanding the relation of single-nucleotide polymorphism (SNP) frequency and other types of genetic variations with particular traits. In recent years, GWAS reveals plenty of genetic locations related to common diseases, e.g., type 2 diabetes (<xref ref-type="bibr" rid="B3">Billings and Florez, 2010</xref>), Alzheimer disease (<xref ref-type="bibr" rid="B24">Marioni et&#x20;al., 2018</xref>), or many types of cancer (<xref ref-type="bibr" rid="B38">Sud et&#x20;al., 2017</xref>). Despite the promising outcomes, the biological functions of many genetic variation loci remain unclear, and the genetic mechanisms of phenotypes are not systematically explained. Yet, the GWAS is still an important tool used to understand the biological mechanisms of different diseases (<xref ref-type="bibr" rid="B46">Wijmenga and Zhernakova, 2018</xref>). One of the bioinformatic techniques, which can extend the amount of information from single genetic variations and their impact on the biological systems, is gene set analysis (GSA), and the importance of such a solution has been recently noticed (<xref ref-type="bibr" rid="B44">Wang et&#x20;al., 2007</xref>; <xref ref-type="bibr" rid="B12">Holden et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B11">Hirschhorn, 2009</xref>; <xref ref-type="bibr" rid="B50">Zhang et&#x20;al., 2010</xref>; <xref ref-type="bibr" rid="B45">Weng et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B7">de Leeuw et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B27">Mei et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B38">Sud et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B48">Yoon et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B19">Maleki et&#x20;al., 2020</xref>).</p>
<p>The GSA allows summarizing the results of association with phenotype from individual gene level to gene set level, also known as pathway level. Using this concept, it is possible to detect the aggregated impact of multiple genes on phenotype, even when the individual gene has moderate or small effect on the investigated trait. In addition, applying GSA increases understanding of changes observed in complex biological mechanisms under various conditions. Within the last decade of gene set analysis method development, many algorithms were introduced [just to mention a few: GSEA (<xref ref-type="bibr" rid="B37">Subramanian et&#x20;al., 2005</xref>), PADOG (<xref ref-type="bibr" rid="B41">Tarca et&#x20;al., 2012</xref>), SPIA (<xref ref-type="bibr" rid="B42">Tarca et&#x20;al., 2009</xref>) or LEGO (<xref ref-type="bibr" rid="B8">Dong et&#x20;al., 2016</xref>)] and can be classified by their generation (<xref ref-type="bibr" rid="B15">Khatri et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B52">Zyla et&#x20;al., 2017</xref>), hypothesis tested (<xref ref-type="bibr" rid="B18">Maciejewski, 2014</xref>), or application for a particular omics platform (<xref ref-type="bibr" rid="B6">Das et&#x20;al., 2020</xref>). First, algorithms were created to analyze the gene expression data from microarray experiments, but with rapid advancement in molecular biology techniques, they became widely applied in other omics, resulting in growth of bioinformatic tools, which perform multi-omics gene set analysis (<xref ref-type="bibr" rid="B4">Canzler and Hackerm&#xfc;ller, 2020</xref>; <xref ref-type="bibr" rid="B14">Kaspi and Ziemann, 2020</xref>). Application of GSA techniques to dissimilar omics data is associated with different problems. In the analysis of GWAS results, the key issue is how to transform the observed genetic variation into gene level. One of the most used techniques is to choose the SNP with the strongest association (minimum <italic>p</italic>-value) to represent a gene (<xref ref-type="bibr" rid="B44">Wang et&#x20;al., 2007</xref>; <xref ref-type="bibr" rid="B50">Zhang et&#x20;al., 2010</xref>). The minimum <italic>p</italic>-value approach may not be an optimal solution as it favors long genes with many SNPs measured, where obtaining stronger association is more likely compared with shorter ones. Thus, some adjustments were introduced to correct this effect, e.g., adaptive <italic>p</italic>-value combination of <italic>p</italic>-values (<xref ref-type="bibr" rid="B49">Yu et&#x20;al., 2009</xref>), selecting representative SNPs for each gene (<xref ref-type="bibr" rid="B45">Weng et&#x20;al., 2011</xref>), or correction of smallest <italic>p</italic>-value due to some factors, like no. of SNPs per kb, gene size, and linkage disequilibrium units per kb (<xref ref-type="bibr" rid="B34">Segr&#xe8; et&#x20;al., 2010</xref>). Other aggregation techniques, like Fisher integration, second minimum <italic>p</italic>-value, or application of Simes&#x2019; <italic>p</italic>-value adjusted for the number of SNPs were also proposed (<xref ref-type="bibr" rid="B27">Mei et&#x20;al., 2016</xref>). However, the authors applied those approaches only to the oldest gene set analysis method based on hypergeometric test [over-representation analysis (ORA)]. Also, they performed only basic evaluation, concentrating mostly on detecting target pathways for the analyzed dataset without looking at false positives. Recently, a new method of GSA in GWAS was introduced and compared with other methods by Type 1 error control and statistical power (<xref ref-type="bibr" rid="B48">Yoon et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B39">Sun et&#x20;al., 2019</xref>), but without testing different integration methods or SNP dependency correction. Finally, there are solutions where the problem of aggregation from genome to transcriptome level was neglected, e.g., MAGMA (<xref ref-type="bibr" rid="B7">de Leeuw et&#x20;al., 2015</xref>) or GSEA-SNP (<xref ref-type="bibr" rid="B12">Holden et&#x20;al., 2008</xref>).</p>
<p>Even though GSA methods have been used for over a decade in omics data analysis, there still exist many challenges in this research field (<xref ref-type="bibr" rid="B19">Maleki et&#x20;al., 2020</xref>). The knowledge about GSA algorithm efficiency was widely updated in several publications (<xref ref-type="bibr" rid="B28">Mitrea et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B40">Tarca et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B20">Maleki et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B30">Nguyen et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B51">Zyla et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B10">Geistlinger et&#x20;al., 2021</xref>; <xref ref-type="bibr" rid="B47">Xie et&#x20;al., 2021</xref>). Yet, those studies concentrated on enrichment methods dedicated to gene expression data measured with microarrays or RNA sequencing (RNASeq) technologies, and the overall performance of GSA algorithms in other omics is still not known. In this work, we focused on two major difficulties that occur during applying GSA in GWAS studies. The first goal of the study was to test the impact of aggregation of phenotype association test results from SNP to gene level, which is then transformed to gene set level. For this purpose, three statistical integration techniques were tested in a variety of GSA algorithms. The second goal was to investigate the impact of linkage disequilibrium (LD) control in the process of SNP information aggregation. These two GSA extensions were tested in combination with six gene set analysis methods. Each tested combination of algorithms was evaluated in terms of sensitivity, specificity, prioritization, and reproducibility of gene set analysis. Furthermore, the relation of GSA GWAS results to those obtained on gene expression data was investigated using the same collection of patient samples. Finally, all tested GSA algorithms, integration techniques, and LD correction were implemented in R package <italic>intGSASNP</italic> (integrative GSA for&#x20;SNP).</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and Methods</title>
<sec id="s2-1">
<title>Data</title>
<p>Data from Affymetrix Genome-Wide Human SNP Array 6.0 platform served for SNP genotyping. Affymetrix SNP 6.0 microarrays include over 906,600 SNPs and over 946,000 probes for copy number variation detection (<xref ref-type="bibr" rid="B1">Affymetrix, 2021</xref>). All files are part of The Cancer Genome Atlas (TCGA) Breast Invasive Carcinoma collection (<xref ref-type="bibr" rid="B2">Berger et&#x20;al., 2018</xref>) and were downloaded in CEL format from the Genomic Data Commons (GDC) Legacy Archive. Only white female breast cancer patients were selected for the study. For all individuals, data for both primary tumors and solid normal tissues were available.</p>
<p>Subsequently, Illumina paired-end RNA sequencing data were used for the same patients and the same samples (both tumor and normal) as in the case of SNP genotyping. All data files were downloaded from GDC Data Portal as HTSeq-counts. GDC previously preprocessed raw sequencing data according to the bioinformatics pipeline available from GDC Documentation (<xref ref-type="bibr" rid="B29">NCI Genomic Data Commons, 2021</xref>).</p>
<p>Consequently, 83 white females were considered. For all of them, RNASeq and SNP microarray results were available for both tumor and normal tissue fresh frozen specimens, which summed up to 166 samples. Hence, all individuals were matched in terms of sample and experiment types. Specimens were collected at five different centers (tissue source sites) participating in TCGA Breast Invasive Carcinoma project. All patients were labeled with a breast cancer subtype as previously described (<xref ref-type="bibr" rid="B22">Marczyk et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B23">Marczyk et&#x20;al., 2020</xref>). The summary of breast cancer subtypes, source center, and patient ethnicity is presented in <xref ref-type="sec" rid="s10">Supplementary Table&#x20;S1</xref>.</p>
</sec>
<sec id="s2-2">
<title>Single-Nucleotide Polymorphism Data Analysis</title>
<p>For each genotyped SNP, genome location and relation to transcriptomic function were mapped using the ENSEMBL human genome database, v80 (May 2015; <ext-link ext-link-type="uri" xlink:href="http://may2015.archive.ensembl.org/">http://may2015.archive.ensembl.org</ext-link>; <italic>biomaRt</italic> R package) (<xref ref-type="bibr" rid="B5">Cunningham et&#x20;al., 2015</xref>). During the process of quality control, multiple SNPs were filtered out due to minor allele frequency (MAF; lower than 5% removed) and Hardy&#x2013;Weinberg equilibrium (HWE; <italic>p</italic>-value &#x3c;0.05). Next, only SNPs that are located within the range of 5&#xa0;kb upstream and 5&#xa0;kb downstream of the gene were selected. The selected boundary is much narrower in comparison with other studies [e.g., (<xref ref-type="bibr" rid="B34">Segr&#xe8; et&#x20;al., 2010</xref>; <xref ref-type="bibr" rid="B50">Zhang et&#x20;al., 2010</xref>)], but here, we wanted to reflect the strongest association to possible changes in gene expression. These steps reduced the initial number of SNPs from 905,176 to 240,799. Finally, to compute the association between genotypes and phenotype (breast cancer <italic>vs.</italic> healthy tissue) under genotype genetic model (AA/AB/BB) with the paired design, the multinomial exact test (extension of McNemar&#x2019;s test) was performed with 100,000 Monte Carlo permutations using <italic>rcompanion</italic> R package (<xref ref-type="bibr" rid="B21">Mangiafico, 2016</xref>). This method does not allow introducing additional covariates in the analysis. As the collected samples come from white females only, the distributions of other biases between healthy and cancer tissue samples are the same due to the paired design of the experiment. Thus, the calculated model was not adjusted for other covariates.</p>
<p>In most cases, to perform GSA using SNP-level data, a single value per gene is needed. Thus, association results for SNPs within the same transcript need to be integrated into one representative value. Three different techniques for test result integration were applied. Currently, the most common method in GSA GWAS is to take the minimum <italic>p</italic>-value for SNP <italic>i</italic>, which falls within the gene <italic>g</italic> boundaries:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi mathvariant="italic">gene</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>The second integration technique evaluated here was Fisher&#x2019;s probability integration (<xref ref-type="bibr" rid="B9">Fisher, 1925</xref>), which calculates the sum of the natural logarithm from <italic>k</italic> SNP <italic>p</italic>-values, which fall within the same gene <italic>g</italic> boundaries:<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mi mathvariant="italic">gene</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:mi>ln</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#x223c;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3c7;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>The calculated F statistic per gene, <italic>F</italic>
<sub>
<italic>gene</italic>
</sub>, follows chi<sup>2</sup> distribution with <italic>2&#x2a;k</italic> degrees of freedom.</p>
<p>The last statistical integration approach used was the Stouffer method, also known as z-transformation-based integration (<xref ref-type="bibr" rid="B36">Stouffer, 1949</xref>). For <italic>k</italic> SNPs, which fall within the same gene <italic>g</italic> boundaries, <italic>Z</italic>
<sub>
<italic>i</italic>
</sub> statistic is first calculated using inverse normal cumulative distribution function (<inline-formula id="inf1">
<mml:math id="m3">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>) for each <italic>i</italic>-th SNP. Then the integrated Z statistic per gene, <italic>Z</italic>
<sub>
<italic>gene</italic>
</sub>, which follows standard normal distribution is calculated.<disp-formula id="e3">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi mathvariant="italic">gene</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mi>k</mml:mi>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#x223c;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>Next, for the integrated <italic>p</italic>-values, the dependency correction due to LD was applied. The commonly used approach for LD correction requires calculations of <italic>r</italic>
<sup>2</sup> or D&#x2032; score. Here, the modification of Dunn&#x2013;Sidak correction for multiple testing was used instead. As was shown in <xref ref-type="bibr" rid="B33">Saccone et&#x20;al. (2006)</xref>, approximately 50% of the SNPs within chromosomes are in high LD; thus, the exponent of Dunn&#x2013;Sidak was modified as follows:<disp-formula id="e5">
<mml:math id="m6">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>k</italic> is the number of SNPs located within gene <italic>g.</italic> This method of introducing LD correction was proposed by <xref ref-type="bibr" rid="B32">Saccone et&#x20;al. (2007)</xref> and allows for running enrichment analysis even for very limited genotyping data consisting only of two elements: SNP rs number and the result of the association test. Moreover, it was shown that the method is comparable, or slightly better than the regression method of GWAS integration <italic>p</italic>-value with correction due to SNPs per kb, gene size, recombination hotspots, linkage disequilibrium units per kb, or genetic distance (<xref ref-type="bibr" rid="B34">Segr&#xe8; et&#x20;al., 2010</xref>). Each integration approach with or without dependency correction was tested in terms of effectiveness in GSA in&#x20;GWAS.</p>
</sec>
<sec id="s2-3">
<title>Enrichment Algorithms for Single-Nucleotide Polymorphism Data</title>
<p>Several GSA algorithms dedicated to GWAS are based on <italic>p</italic>-value integration to move from SNP to transcriptome level (<xref ref-type="bibr" rid="B6">Das et&#x20;al., 2020</xref>). From this group, the algorithms based on the Gene Set Enrichment Analysis (GSEA) method, mostly used in transcriptomic analysis (<xref ref-type="bibr" rid="B37">Subramanian et&#x20;al., 2005</xref>), were selected.</p>
<p>The basic concept of GSEA is to estimate the enrichment score (ES) by calculating maximum absolute deviation between <italic>P</italic>
<sub>
<italic>hit</italic>
</sub> (normalized metric of genes within gene set <italic>S</italic>) and <italic>P</italic>
<sub>
<italic>miss</italic>
</sub> (normalized metric for genes outside gene set <italic>S</italic>). The ES distribution is calculated for <italic>j</italic>-th gene <italic>g</italic> in gene set <italic>S</italic> at the <italic>i</italic>-th position by modified Smirnov&#x2013;Kolmogorov statistic using the following formulas:<disp-formula id="e6">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:munder>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
<disp-formula id="e7">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
<disp-formula id="e8">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:munder>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>Here, as a rank <italic>r</italic>, the negative value of base 10 logarithm of <italic>p</italic>-value was taken [&#x2212;log10(<italic>p</italic>-value<sub>gene</sub>)]. <italic>N</italic> is the total number of genes, and <italic>N</italic>
<sub>
<italic>H</italic>
</sub> represents the number of genes in the gene set <italic>S</italic>, and <italic>N</italic>
<sub>
<italic>R</italic>
</sub> is the sum of ranks of genes within gene set&#x20;<italic>S</italic>.</p>
<p>The next algorithm used was GSEA-SNP (<xref ref-type="bibr" rid="B12">Holden et&#x20;al., 2008</xref>), which is a simple modification of the standard GSEA approach. In this method, instead of integrating SNP association to transcriptomic level, the <italic>p</italic>-values of all SNPs are taken to calculate rank <italic>r</italic> parameter [&#x2212;log10(<italic>p</italic>-value<sub>SNP</sub>)]. Moreover, in GSEA-SNP, the SNP label permutation test is performed to assess significance of each gene set, while for GSEA, the gene label permutation is applied. In both GSEA and GSEA-SNP, the ES metric is adjusted for variation in gene set size by dividing the observed <italic>ES</italic> by the mean of permutated ES with the same direction giving normalized enrichment score (NES).</p>
<p>The third algorithm was <italic>i</italic>-GSEA4GWAS (improved GSEA for GWAS) (<xref ref-type="bibr" rid="B50">Zhang et&#x20;al., 2010</xref>). This method has two main modifications compared with the standard GSEA: 1) Instead of gene label permutation, the SNP label permutation is performed, and then integration of <italic>p</italic>-values from SNP association test is executed. 2) The NES is substituted by significance proportion-based enrichment score (SPES). The SPES is multiplication of ES by ratio <italic>k/K</italic>, where <italic>k</italic> is the proportion of significant genes (mapped to 5% of the top SNPs) of the gene set <italic>S</italic>, and <italic>K</italic> is the proportion of significant genes (mapped to 5% of the top SNPs) of the total genes in the study (<xref ref-type="bibr" rid="B50">Zhang et&#x20;al., 2010</xref>). According to the authors, SPES emphasizes the total significance coming from a high proportion of significant&#x20;genes.</p>
<p>The fourth algorithm was MAGENTA (Meta-Analysis Gene-set Enrichment of variaNT Associations) (<xref ref-type="bibr" rid="B34">Segr&#xe8; et&#x20;al., 2010</xref>), where gene set significance is estimated as follows: 1) <italic>p</italic>-Values from SNP to gene level were integrated. 2) For each gene set, the number of gene <italic>p</italic>-values within a gene set lower than the cut-off (leading edge fraction) was calculated. The cutoff is a <italic>p</italic>-value of a specific percentile of all gene <italic>p</italic>-values (here set to the 75th percentile and marked as MAGENTA75), 3) to calculate the distribution of leading edge fraction with the permutation approach. In each permutation, the mock gene set is drawn as its leading edge fraction is collected. Finally, to assess the gene set significance, the number of permutation leading edge fractions equal or larger than the observed one for a particular gene set is estimated and divided by the number of permutations. All algorithms described above are modifications of GSEA approach and test <italic>competitive</italic> null hypothesis.</p>
<p>Two other algorithms were added to this list: ORA (over-representation analysis) (<xref ref-type="bibr" rid="B43">Tavazoie et&#x20;al., 1999</xref>) and CERNO (Coincident Extreme Ranks in Numerical Observations) algorithm (<xref ref-type="bibr" rid="B51">Zyla et&#x20;al., 2019</xref>). Both are designed for transcriptome data analysis but can be easily used in GWAS problems. ORA is the first-generation method, which estimates gene set significance via hypergeometric test using information about the number of differentially expressed genes (DEGs) and background genes within and outside the gene set. The CERNO method ranks genes from 1 to <italic>N</italic> (total analyzed genes), where rank 1 is given to the gene with the lowest <italic>p</italic>-value (here <italic>p</italic>-value from integration of SNP association). Next, the given ranks are divided by <italic>N</italic>, and the Fisher probability integration is performed for all genes within the gene&#x20;set.</p>
</sec>
<sec id="s2-4">
<title>RNASeq Data Analysis</title>
<p>Genes that were not represented in SNP data or with no counts within all samples were removed prior to analysis (15,924 genes left). <italic>DESeq2</italic> R package (<xref ref-type="bibr" rid="B17">Love et&#x20;al., 2014</xref>) was used to find genes with different expressions between normal and cancer samples, including the paired nature of the data. Pathway enrichment analysis with the GSEA method was performed using the <italic>fgsea</italic> package in R (<xref ref-type="bibr" rid="B16">Korotkevich et&#x20;al., 2019</xref>) on the same set of pathways used in SNP data analysis. Test statistic from the <italic>DESEq2</italic> package was used as a gene rank value <italic>r</italic> in GSEA to retain information about directionality of expression change on pathway&#x20;level.</p>
</sec>
<sec id="s2-5">
<title>Evaluation of Enrichment Algorithms</title>
<p>The brief evaluation pipeline is presented in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. All described gene set enrichment algorithms were run with three different integration approaches (minimum, Fisher, and Stouffer) and with and without dependency correction for LD. Four metrics were calculated to evaluate the algorithms: sensitivity, specificity, prioritization, and reproducibility (<xref ref-type="bibr" rid="B51">Zyla et&#x20;al., 2019</xref>). Sensitivity represents detection of target gene sets for a particular phenotype. Specifically, gene set <italic>p</italic>-values are collected, and the proportion of truly alternative hypotheses (1&#xa0;&#x2212;&#xa0;<inline-formula id="inf2">
<mml:math id="m10">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>&#x3c0;</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>) is calculated with Storey&#x2019;s method (<xref ref-type="bibr" rid="B35">Storey, 2002</xref>). Prioritization represents median ranks of target pathways in all analyzed gene sets. Specificity represents deviation of mean false-positive rate (FPR, observed level) from 5% (expected level). Specifically, FPR is the proportion of significant gene sets (<italic>p&#xa0;</italic>&#x3c;&#xa0;5%) among 50 permutations of the original phenotype. Reproducibility is the area under the curve (AUC) from the function of common detected gene sets across five or six data sets of the same phenotype at different cutoffs (<xref ref-type="bibr" rid="B51">Zyla et&#x20;al., 2019</xref>). All used metrics were previously applied in transcriptomic data GSA (<xref ref-type="bibr" rid="B40">Tarca et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B52">Zyla et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B51">Zyla et&#x20;al., 2019</xref>) and are one of the gold standards in enrichment algorithm evaluation (<xref ref-type="bibr" rid="B10">Geistlinger et&#x20;al., 2021</xref>; <xref ref-type="bibr" rid="B47">Xie et&#x20;al., 2021</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Pipeline of data analysis and gene set analysis (GSA) method evaluation.</p>
</caption>
<graphic xlink:href="fgene-12-767358-g001.tif"/>
</fig>
<p>To test the impact of LD dependency correction, the differences between each metric separately within one enrichment method and integration approach were calculated (e.g., sensitivity of ORA minimum integration with LD correction minus ORA minimum integration without LD correction). Next, the impact of integration was assessed. As the minimum approach is mostly used, we referred its results to the Fisher and Stouffer methods. Again, the difference between performance metrics were calculated, but for different integrations (e.g., sensitivity of ORA Fisher integration with or without LD minus ORA minimum integration with or without&#x20;LD).</p>
<p>Finally, we investigated similarities and information transition of gene set analysis performed on SNP and RNASeq data. For this purpose, we selected SNPs located at the &#x201c;5&#x2032; UTR and upstream region&#x201d; (beginning of transcript), as well as the &#x201c;3&#x2032; UTR and downstream&#x201d; coding region (end of transcript). The results of association test for those SNPs were extracted and aggregated using different integration methods with and without LD correction (the same as previously). Next, only the GSEA algorithm was run, as it has a direct equivalent in transcriptome analysis. The GSEA algorithm for RNASeq can distinguish up- and downregulated pathways; thus, the Spearman rank correlation was calculated for target pathways between GWAS (different SNP locations in transcript) and RNASeq (up-/downregulation). For the results from &#x201c;5&#x2032;UTR and upstream&#x201d; GWAS location and gene set downregulation on RNASeq, the positive correlation is expected as SNPs in this region should block further transcription and translation. Opposite results are expected for &#x201c;3&#x2032; UTR and downstream&#x201d; where only isoforms of transcript products should be observed (<xref ref-type="bibr" rid="B31">Robert and Pelletier, 2018</xref>).</p>
<p>At each step of the evaluation process, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (<xref ref-type="bibr" rid="B13">Kanehisa et&#x20;al., 2017</xref>) was used as a gene set collection (accessed January 15, 2021). In total, 341 gene sets were analyzed. The 54 target gene sets for breast cancer were selected through the literature search, and their detailed description is presented in <xref ref-type="sec" rid="s10">Supplementary Table&#x20;S2</xref>.</p>
</sec>
<sec id="s2-6">
<title>Implementation of Gene Set Analysis Algorithms for Single-Nucleotide Polymorphism Data Analysis</title>
<p>The implementation of all evaluated algorithms is provided in the R package <italic>intGSASNP</italic> (integrative GSA for SNP), created for the purpose of this study. This package includes R functions to run selected gene set&#x20;algorithms (ORA, CERNO, MAGENTA, GSEA, <italic>i</italic>-GSEA4GWAS, and GSEA-SNP) on SNP data. All algorithms were implemented according to the description included in the original manuscripts. <italic>intGSASNP</italic> allows the user to adjust various function parameters depending on the experiment, such as type of integration method (minimum, Fisher, and Stouffer), multiple testing correction method, permutation method (by gene entrez or SNP), number of permutations, incorporation of LD correction, or the number of processing cores required for parallel computing. In addition, an example of a dataset with sample refSNP IDs, entrez IDs, and <italic>p</italic>-values has been provided. Source code and the package documentation are available freely to download on GitHub (<ext-link ext-link-type="uri" xlink:href="https://github.com/ZAEDPolSl/intGSASNP">https://github.com/ZAEDPolSl/intGSASNP</ext-link>).</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<p>At first, results of SNP association tests were transformed to gene level by using minimum, Fisher, and Stouffer methods with and without LD correction. Then these results were used in combination with different GSA algorithms, i.e.,&#x20;GSEA, <italic>i</italic>-GSEA4GWAS, GSEA-SNP, MAGENTA75, ORA, and CERNO, and the four evaluation metrics were established, i.e.,&#x20;sensitivity, specificity, prioritization, and reproducibility (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). Based on those metrics, the impact of integration and correction for LD and the overall performance of tested methods were examined. Detailed results are presented in <xref ref-type="sec" rid="s10">Supplementary Figures S1</xref> and <xref ref-type="sec" rid="s10">Supplementary Table S3</xref>. The total number of significant pathways is presented in <xref ref-type="sec" rid="s10">Supplementary Table&#x20;S4</xref>.</p>
<sec id="s3-1">
<title>Single-Nucleotide Integration Results</title>
<p>The number of significant genes (<italic>p</italic>&#xa0;&#x3c;&#xa0;0.05) for each method is presented in <xref ref-type="table" rid="T1">Table&#x20;1</xref>, while the coverage between approaches is presented in <xref ref-type="sec" rid="s10">Supplementary Figure S2</xref>. It can be observed that application of LD correction decreased the number of significant transcripts and more likely reduced false-positive outcomes. Over 50% of genes were common to all integration techniques (54.98% and 55.03% with and without LD correction, respectively). Fisher and minimum approach shared around 30% (32.23% and 29.83% with and without LD correction, respectively) significant genes, which may lead to further similar results of GSA. The association between minimum and Fisher methods is characterized by large correlations (<xref ref-type="sec" rid="s10">Supplementary Table S5</xref>). This effect is expected as Fisher integration method is not robust to asymmetrical <italic>p</italic>-values, which result in stronger association assigned to genes, similar to that of the minimum method. Stouffer&#x2019;s technique showed weaker correlation to the minimum and Fisher methods. Also, after using Stouffer, there are not many unique significant genes or genes shared with only one of the other integration methods (<xref ref-type="sec" rid="s10">Supplementary Figure S2</xref>). Over 90% of genes (96.56 and 91.81% with and without LD correction, respectively) indicated by the Stouffer method were also significant for the minimum and Fisher methods.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Number of significant genes after integration of single-nucleotide polymorphism (SNP) association test results to gene&#x20;level.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Integration</th>
<th colspan="2" align="center">Minimum</th>
<th colspan="2" align="center">Fisher</th>
<th colspan="2" align="center">Stouffer</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">LD correction</td>
<td align="center">No</td>
<td align="center">Yes</td>
<td align="center">No</td>
<td align="center">Yes</td>
<td align="center">No</td>
<td align="center">Yes</td>
</tr>
<tr>
<td align="left">&#x23; of genes</td>
<td align="center">12,759</td>
<td align="center">10,810</td>
<td align="center">11,401</td>
<td align="center">10,456</td>
<td align="center">7,382</td>
<td align="center">6,948</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-2">
<title>Overall Performance of Gene Set Analysis Methods</title>
<p>Within each evaluation metric, values were first normalized giving the lowest value for the best outcome and the highest for the worst. Next, the sum of all metrics was calculated, and algorithms were ranked from the best to the worst within the study. At last, results were clustered using the k-means approach, where the number of clusters were set by the Silhouette metric (optimal <italic>k</italic> equals 5). The best performance was obtained for CERNO and MAGENTA75 methods with Stouffer integration regardless of LD correction (<xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>, cluster 1). The worst outcomes were achieved for <italic>i</italic>-GSEA4GWAS and ORA with Fisher integration (regardless of LD correction) as well as for ORA and MAGENTA75 with minimum integration and LD correction and GSEA-SNP (cluster 5). Original GSEA gave moderate results in comparison with others (cluster 2 or 3). Overall, the results for CERNO and MAGENTA75 were the best (mostly in clusters 1 and 2), while <italic>i</italic>-GSEA4GWAS and ORA were the worst (mostly in clusters 4 and&#x20;5).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Overall evaluation of GSA algorithms. <bold>(A)</bold> Normalized evaluation metrics together with clustering results. The red color represents poor performance, while the blue color represents good performance. The number in the brackets, next to the algorithm name, illustrates the place in global ranking including all evaluation metrics. <bold>(B)</bold> UMAP projection for results of each algorithm and all 341 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Marked ellipses represent clustering within UMAP projection.</p>
</caption>
<graphic xlink:href="fgene-12-767358-g002.tif"/>
</fig>
<p>Next, global similarities of results were investigated by using the UMAP dimensionality reduction technique (<xref ref-type="bibr" rid="B25">McInnes et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B26">McInnes and Healy, 2018</xref>) on the GSA results for all 341 KEGG pathways (<xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>
<bold>)</bold>. Four major clusters could be distinguished on two first instances of UMAP (<xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>). <italic>i</italic>-GSEA4GWAS gave similar results regardless of the integration technique as well as incorporation of LD correction. GSEA-SNP, CERNO, MAGENTA75, and GSEA performed on minimum integration and correction for LD are clustered together with <italic>i</italic>-GSEA4GWAS. The middle right cluster includes ORA with Fisher and minimum integration methods regardless of LD correction (color coding of UMAP projection due to integration used is presented in <xref ref-type="sec" rid="s10">Supplementary Figure S3</xref>). Moreover, ORA with Stouffer integration gave similar results across all tested pathways to CERNO and MAGENTA75 (with the same integration method; bottom cluster).</p>
</sec>
<sec id="s3-3">
<title>Impact of Linkage Disequilibrium Dependency Correction</title>
<p>To investigate the impact of LD dependency correction, the difference of each evaluation metric within a particular algorithm performed with a specific integration method was calculated (e.g., sensitivity difference between ORA with minimum integration and LD correction, and ORA with minimum integration and without LD correction). Values of these differences are presented in <xref ref-type="sec" rid="s10">Supplementary Table S6</xref>. ORA, CERNO, and GSEA showed decreased sensitivity (<xref ref-type="fig" rid="F3">Figure&#x20;3A</xref>) when LD correction was applied, but the specificity was increased greatly (<xref ref-type="fig" rid="F3">Figure&#x20;3C</xref>). The LD correction has a positive impact also on prioritization for MAGENTA75 (<xref ref-type="fig" rid="F3">Figure&#x20;3B</xref>). <italic>i</italic>-GSEA4GWAS with Stouffer and Fisher integration gave similar performance regardless of LD correction usage. However, for minimum integration (default option in original implementation of the algorithm), the LD correction increases the sensitivity of the observed results with only a slight drop of specificity. For the remaining algorithms, when the Stouffer or Fisher integration method is used, the LD correction gave similar or better performance. When minimum integration is applied, the reproducibility decreases with LD correction (<xref ref-type="fig" rid="F3">Figure&#x20;3D</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Impact of linkage disequilibrium (LD) correction on evaluation metrics. Each panel shows different metrics, i.e.,&#x20;sensitivity <bold>(A)</bold>, prioritization <bold>(B)</bold>, specificity <bold>(C)</bold>, and reproducibility <bold>(D)</bold>. Each dot represents the difference of metric when LD adjustment is used within a particular algorithm and integration method. Dots above the solid, black line represent better performance when LD correction is applied, while dots below the line represent the opposite. Colors show different integration techniques.</p>
</caption>
<graphic xlink:href="fgene-12-767358-g003.tif"/>
</fig>
</sec>
<sec id="s3-4">
<title>Impact of <italic>p</italic>-Value Integration</title>
<p>As the minimum integration is the most preferred approach, the results of this aggregation technique were compared to those of the Fisher and Stouffer methods separately. For this purpose, the difference between performance metrics was calculated (e.g., sensitivity of ORA minimum integration with LD minus ORA Fisher integration with LD). In the previous paragraph, it was shown that LD correction has a beneficial impact in most of the cases, and it preserves biological insights, so further description will concentrate only on outcomes when dependency correction is applied. CERNO and MAGENTA75 had better results in terms of prioritization, specificity, and reproducibility when the Fisher or Stouffer method was used (<xref ref-type="fig" rid="F4">Figure&#x20;4</xref>). <italic>i</italic>-GSEA4GWAS showed similar results regardless of the integration method used, with slightly better performance for minimum integration. The ORA algorithm gave similar&#x20;performance when the minimum or Fisher method was applied, while Stouffer gave better specificity&#x20;and reproducibility, but decreased sensitivity. Finally, GSEA showed better specificity when both Fisher and Stouffer were used, whereas other evaluation metrics were similar despite the integration techniques used (<xref ref-type="fig" rid="F4">Figure&#x20;4</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Impact of the Fisher and Stouffer integration technique compared with the minimum approach. <bold>(A&#x2013;D)</bold> Differences between outcomes for the sensitivity, prioritization, specificity, and reproducibility, respectively. Colors represent different algorithms, and point shape represents whether correction for LD was applied. Dots above the solid, black line represent better performance for the minimum integration approach, while dots below the line represent the opposite.</p>
</caption>
<graphic xlink:href="fgene-12-767358-g004.tif"/>
</fig>
<p>Within each evaluation metric and obtained differences, the equivalence of mean to zero was tested by one-sample <italic>t</italic>-test. The Stouffer integration method gave significantly better results (<italic>p</italic>-value&#xa0;&#x3d;&#xa0;0.0163) in terms of specificity compared with minimum integration for all tested algorithms (<xref ref-type="fig" rid="F4">Figure&#x20;4C</xref>). There was no statistically significant difference in other comparisons; however, the variety of effects can be observed for individual algorithms.</p>
</sec>
<sec id="s3-5">
<title>Comparison of Gene Set Analysis on Single-Nucleotide and Gene Expression Level</title>
<p>As the enrichment methods were initially designed for transcriptome data analysis, target pathway similarities of outcomes observed for genome and transcriptome were investigated (<xref ref-type="sec" rid="s10">Supplementary Figure S4</xref>). The same samples were taken for both omics, and the GSEA algorithm was applied (In RNASeq, it can distinguish up- and downregulated pathways). Moreover, for genomic data, only SNPs from the beginning and the end of transcriptomic regions were selected to catch the regulation directionality. Finally, the Spearman rank correlation coefficient was calculated between &#x2212;log10(<italic>p</italic>-value<sub>GeneSet</sub>) from RNASeq and genomic data within pathways up- and downregulated separately.</p>
<p>For Stouffer integration, the highest correlation of downregulated pathways and &#x201c;5&#x2032; UTR and upstream&#x201d; SNPs is observed, and it increases when LD correction is applied (from 0.33 to 0.42; both medium effect size) (<xref ref-type="fig" rid="F5">Figure&#x20;5</xref>). Similar results can be observed for the minimum approach, where correlation changes from small effect size when no LD correction is applied to medium effect size with LD correction (from 0.18 to 0.35). For Fisher integration, the small effect size was observed only when LD adjustment was applied (<xref ref-type="fig" rid="F5">Figure&#x20;5</xref>). When SNPs from &#x201c;3&#x2032; UTR and downstream&#x201d; (end of transcriptomic region) were analyzed, positive correlation with upregulated pathways was expected, but none of the tested methods showed statistically significant association. Nevertheless, for downregulated pathways, the expected negative correlation is observed for all integration techniques regardless of LD correction.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Correlation of the Gene Set Enrichment Analysis (GSEA) algorithm results performed on RNASeq and genomic data. The color of the boxes represents the value of Spearman rank correlation coefficient. The <italic>x</italic>-axis corresponds to the results from the GSEA algorithm performed on RNASeq data with distinction of pathway regulation direction. The <italic>y</italic>-axis corresponds to the results from the GSEA algorithm performed on genomic data with distinction of location of single-nucleotide polymorphisms (SNPs) taken to the aggregation process. All results are grouped by integration technique and LD correction&#x20;used.</p>
</caption>
<graphic xlink:href="fgene-12-767358-g005.tif"/>
</fig>
</sec>
<sec id="s3-6">
<title>Comparison of Gene Set Enrichment Analysis and Other Algorithms on Single-Nucleotide Polymorphism Level</title>
<p>Most of the tested algorithms were created by modification of the original GSEA method. Also, we found a correlation between GSEA results on genomic and transcriptomic level. Thus, we wanted to check how the results of GSA for target pathways are correlated between GSEA and other tested enrichment algorithms in GWAS (<xref ref-type="sec" rid="s10">Supplementary Figure S5</xref>
<bold>)</bold>. The GSEA-SNP does not use integration in the process of enrichment analysis; nevertheless, it showed a small correlation with GSEA only when the Stouffer method is applied. On the other hand, the results of <italic>i</italic>-GSEA4GWAS had negative correlation with GSEA when the Fisher and Stouffer methods were applied. All other methods mostly showed positive correlation with GSEA. The highest correlation was observed for the CERNO and MAGENTA75 algorithms.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>Incorporation of specific integration methods and LD correction can have significant impact on the performance of gene set analysis in GWAS. Usage of LD correction was beneficial for <italic>i</italic>-GSEA4GWAS, especially when the default minimum integration method was used. Thus, the incorporation of basic SNP dependency correction method, like we did here, or more complex solutions (<xref ref-type="bibr" rid="B7">de Leeuw et&#x20;al., 2015</xref>) is recommended. When Fisher and Stouffer integration were used in the CERNO, ORA, MAGENTA75, and GSEA algorithms, the LD correction was always beneficial, so it should be applied in any case there. Observed decrease in reproducibility after applying LD could be the effect of decreasing the number of significant findings (genes with <italic>p</italic>-values lower than a threshold), which is usual after using correction for multiple testing (like LD correction here). Moreover, the reproducibility experiment was performed on much smaller subsets (<italic>n</italic>&#xa0;&#x3d;&#xa0;14 paired samples), which also decreased the power of GSA (<xref ref-type="bibr" rid="B20">Maleki et&#x20;al., 2019</xref>). In <italic>i</italic>-GSEA4GWAS, this effect was not observed due to corrections given by SPES statistic.</p>
<p>Comparing <italic>p</italic>-value integration methods, Stouffer gave the best results in terms of specificity for all tested enrichment methods. Moreover, it gave better or similar results in terms of prioritization and reproducibility for CERNO, MAGENTA, GSEA, and ORA. The Stouffer integration decreases only sensitivity, which is an effect of preserving robustness to asymmetrical <italic>p</italic>-value distribution during the process of integration. Thus, the integrated <italic>p</italic>-value is higher, and some target pathways could not be detected. However, this mechanism prevents <italic>p</italic>-value overestimation that was observed for some GSA methods (<xref ref-type="bibr" rid="B51">Zyla et&#x20;al., 2019</xref>). Also, Stouffer integration gave the highest correlation between GSA analysis results on SNP level and transcriptome level. Thus, this is the method that we recommend&#x20;most.</p>
<p>SNPs located at the beginning of the gene region have the biggest ability to silence gene expression (<xref ref-type="bibr" rid="B31">Robert and Pelletier, 2018</xref>). The GSA outcomes compared between genomic and transcriptomic levels confirmed this effect. Furthermore, results for 5&#x2032; UTR and upstream SNPs were negatively correlated with upregulated pathways on the gene level. GSA results for SNPs located at the end of genes were positively correlated with upregulated pathways on the gene level, as it was expected, but the effect was smaller.</p>
<p>All gene set analysis methods, integration approaches, and LD correction method that were tested within the study were implemented in the <italic>intGSASNP</italic> R package and are freely available on GitHub. Therefore, different combinations of methods could be easily tested on any dataset by other researchers. We hope that collecting multiple methods in a single package will help to promote the application of GSA methods in SNP analysis.</p>
<p>In summary, we thoroughly analyzed different methods of gene set analysis in GWAS in terms of performance and its applicability. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the introduction of these techniques into common practice. We believe that this work will guide others to select the most effective combinations of methods.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. These data can be found here: <ext-link ext-link-type="uri" xlink:href="https://www.cancer.gov/tcga">https://www.cancer.gov/tcga</ext-link>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>MM, JP, and JZ conceived the concept of the study and supervised the methodology. AM and JT were responsible for the data acquisition. MM, AM, and JZ were responsible for the data analysis. JT and JZ were responsible for the visualization. AM was responsible for the implementation of the algorithms and R package creation. All authors wrote and approved the final version of the article.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>This work was supported by the Silesian University of Technology grant for Support and Development of Research Potential (AM, JP, JZ), the Silesian University of Technology rector&#x2019;s pro-quality grant no. 02/070/RGJ21/0020 (MM), and the European Social Fund grant no. POWR.03.02.00-00-I029&#x20;(JT).</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>We would like to thank Professor Andrzej Polanski for his support regarding TCGA database and GDC Data Portal.</p>
</ack>
<sec id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2021.767358/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2021.767358/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.PDF" id="SM1" mimetype="application/PDF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<collab>Affymetrix</collab> (<year>2021</year>). <source>Genome Wide Human SNP 6.0 Array</source>. <comment>Available&#x20;at:&#x20;<ext-link ext-link-type="uri" xlink:href="http://tools.thermofisher.com/content/sfs/brochures/genomewide_snp6_datasheet.pdf">http://tools.thermofisher.com/content/sfs/brochures/genomewide_snp6_datasheet.pdf</ext-link> (Accessed October 22, 2021)</comment>. </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Korkut</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kanchi</surname>
<given-names>R. S.</given-names>
</name>
<name>
<surname>Hegde</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Lenoir</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers</article-title>. <source>Cancer Cell</source> <volume>33</volume> (<issue>4</issue>), <fpage>690</fpage>&#x2013;<lpage>e9</lpage>. <comment>e9. Epub 2018/04/07PubMed PMID: 29622464; PubMed Central PMCID: PMCPMC5959730</comment>. <pub-id pub-id-type="doi">10.1016/j.ccell.2018.03.014</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Billings</surname>
<given-names>L. K.</given-names>
</name>
<name>
<surname>Florez</surname>
<given-names>J.&#x20;C.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>The Genetics of Type 2 Diabetes: what Have We Learned from GWAS?</article-title> <source>Ann. N. Y Acad. Sci.</source> <volume>1212</volume>, <fpage>59</fpage>&#x2013;<lpage>77</lpage>. <comment>Epub 2010/11/26PubMed PMID: 21091714; PubMed Central PMCID: PMCPMC3057517</comment>. <pub-id pub-id-type="doi">10.1111/j.1749-6632.2010.05838.x</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Canzler</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hackerm&#xfc;ller</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>multiGSEA: a GSEA-Based Pathway Enrichment Analysis for Multi-Omics Data</article-title>. <source>BMC Bioinformatics</source> <volume>21</volume> (<issue>1</issue>), <fpage>561</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-020-03910-x</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cunningham</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Amode</surname>
<given-names>M. R.</given-names>
</name>
<name>
<surname>Barrell</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Beal</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Billis</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Brent</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Ensembl 2015</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>, <fpage>D662</fpage>&#x2013;<lpage>D669</lpage>. <comment>(Database issue)Epub 2014/10/30PubMed PMID: 25352552; PubMed Central PMCID: PMCPMC4383879</comment>. <pub-id pub-id-type="doi">10.1093/nar/gku1010</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Das</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>McClain</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Rai</surname>
<given-names>S. N.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges</article-title>. <source>Entropy</source> <volume>22</volume> (<issue>4</issue>), <fpage>427</fpage>. <comment>Epub 2020/12/09PubMed PMID: 33286201; PubMed Central PMCID: PMCPMC7516904</comment>. <pub-id pub-id-type="doi">10.3390/e22040427</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>de Leeuw</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Mooij</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<name>
<surname>Heskes</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Posthuma</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>MAGMA: Generalized Gene-Set Analysis of GWAS Data</article-title>. <source>Plos Comput. Biol.</source> <volume>11</volume> (<issue>4</issue>), <fpage>e1004219</fpage>. <comment>Epub 2015/04/18PubMed PMID: 25885710; PubMed Central PMCID: PMCPMC4401657</comment>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004219</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dong</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>LEGO: a Novel Method for Gene Set Over-representation Analysis by Incorporating Network-Based Gene Weights</article-title>. <source>Sci. Rep.</source> <volume>6</volume> (<issue>1</issue>), <fpage>18871</fpage>. <pub-id pub-id-type="doi">10.1038/srep18871</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fisher</surname>
<given-names>R. A.</given-names>
</name>
</person-group> (<year>1925</year>). <source>Statistical Methods for Research Workers</source>. <publisher-loc>Edinburgh</publisher-loc>: <publisher-name>Oliver &#x26; Boyd</publisher-name>. </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geistlinger</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Csaba</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Santarelli</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ramos</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schiffer</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Turaga</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Toward a Gold Standard for Benchmarking Gene Set Enrichment Analysis</article-title>. <source>Brief Bioinform</source> <volume>22</volume> (<issue>1</issue>), <fpage>545</fpage>&#x2013;<lpage>556</lpage>. <comment>Epub 2020/02/07PubMed PMID: 32026945; PubMed Central PMCID: PMCPMC7820859</comment>. <pub-id pub-id-type="doi">10.1093/bib/bbz158</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hirschhorn</surname>
<given-names>J.&#x20;N.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Genomewide Association Studies - Illuminating Biologic Pathways</article-title>. <source>N. Engl. J.&#x20;Med.</source> <volume>360</volume> (<issue>17</issue>), <fpage>1699</fpage>&#x2013;<lpage>1701</lpage>. <comment>Epub 2009/04/17PubMed PMID: 19369661</comment>. <pub-id pub-id-type="doi">10.1056/NEJMp0808934</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holden</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wojnowski</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Kulle</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>GSEA-SNP: Applying Gene Set Enrichment Analysis to SNP Data from Genome-wide Association Studies</article-title>. <source>Bioinformatics</source> <volume>24</volume> (<issue>23</issue>), <fpage>2784</fpage>&#x2013;<lpage>2785</lpage>. <comment>Epub 2008/10/16PubMed PMID: 18854360</comment>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn516</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Furumichi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tanabe</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Morishima</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>KEGG: New Perspectives on Genomes, Pathways, Diseases and Drugs</article-title>. <source>Nucleic Acids Res.</source> <volume>45</volume> (<issue>D1</issue>), <fpage>D353</fpage>&#x2013;<lpage>D361</lpage>. <comment>Epub 2016/12/03PubMed PMID: 27899662; PubMed Central PMCID: PMCPMC5210567</comment>. <pub-id pub-id-type="doi">10.1093/nar/gkw1092</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaspi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ziemann</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Mitch: Multi-Contrast Pathway Enrichment for Multi-Omics and Single-Cell Profiling Data</article-title>. <source>BMC Genomics</source> <volume>21</volume> (<issue>1</issue>), <fpage>447</fpage>. <comment>Epub 2020/07/01PubMed PMID: 32600408; PubMed Central PMCID: PMCPMC7325150</comment>. <pub-id pub-id-type="doi">10.1186/s12864-020-06856-9</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khatri</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sirota</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Butte</surname>
<given-names>A. J.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges</article-title>. <source>Plos Comput. Biol.</source> <volume>8</volume> (<issue>2</issue>), <fpage>e1002375</fpage>. <comment>Epub 2012/03/03PubMed PMID: 22383865; PubMed Central PMCID: PMCPMC3285573</comment>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002375</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Korotkevich</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sukhov</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Budin</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Shpak</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Artyomov</surname>
<given-names>M. N.</given-names>
</name>
<name>
<surname>Sergushichev</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Fast Gene Set Enrichment Analysis</source>, <fpage>060012</fpage>. <comment>bioRxiv</comment>. <pub-id pub-id-type="doi">10.1101/060012</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Love</surname>
<given-names>M. I.</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Anders</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2</article-title>. <source>Genome Biol.</source> <volume>15</volume> (<issue>12</issue>), <fpage>550</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maciejewski</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Gene Set Analysis Methods: Statistical Models and Methodological Differences</article-title>. <source>Brief. Bioinform.</source> <volume>15</volume> (<issue>4</issue>), <fpage>504</fpage>&#x2013;<lpage>518</lpage>. <comment>Epub 2013/02/16PubMed PMID: 23413432; PubMed Central PMCID: PMCPMC4103537</comment>. <pub-id pub-id-type="doi">10.1093/bib/bbt002</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maleki</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Ovens</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hogan</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Kusalik</surname>
<given-names>A. J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Gene Set Analysis: Challenges, Opportunities, and Future Research</article-title>. <source>Front. Genet.</source> <volume>11</volume>, <fpage>654</fpage>. <comment>Epub 2020/07/23PubMed PMID: 32695141; PubMed Central PMCID: PMCPMC7339292</comment>. <pub-id pub-id-type="doi">10.3389/fgene.2020.00654</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maleki</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Ovens</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>McQuillan</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Kusalik</surname>
<given-names>A. J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Size Matters: How Sample Size Affects the Reproducibility and Specificity of Gene Set Analysis</article-title>. <source>Hum. Genomics</source> <volume>13</volume> (<issue>Suppl. 1</issue>), <fpage>42</fpage>. <comment>Epub 2019/10/23PubMed PMID: 31639047; PubMed Central PMCID: PMCPMC6805317</comment>. <pub-id pub-id-type="doi">10.1186/s40246-019-0226-2</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mangiafico</surname>
<given-names>S. S.</given-names>
</name>
</person-group> (<year>2016</year>)., <volume>125</volume>. <publisher-loc>New Brunswick, NJ, USA</publisher-loc>, <fpage>16</fpage>&#x2013;<lpage>22</lpage>.<article-title>Summary and Analysis of Extension Program Evaluation in R</article-title>
<source>Rutgers Coop. Extension</source> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jaksik</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Polanski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Polanska</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>GaMRed - Adaptive Filtering of High-Throughput Biological Data</article-title>. <source>Ieee/acm Trans. Comput. Biol. Bioinf.</source> <volume>17</volume> (<issue>1</issue>), <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2018.2858825</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Patwardhan</surname>
<given-names>G. A.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Qu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wali</surname>
<given-names>V. B.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Multi-Omics Investigation of Innate Navitoclax Resistance in Triple-Negative Breast Cancer Cells</article-title>. <source>Cancers</source> <volume>12</volume> (<issue>9</issue>), <fpage>2551</fpage>. <comment>PubMed PMID:</comment>. <pub-id pub-id-type="doi">10.3390/cancers12092551</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marioni</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>McRae</surname>
<given-names>A. F.</given-names>
</name>
<name>
<surname>Hagenaars</surname>
<given-names>S. P.</given-names>
</name>
<name>
<surname>Hill</surname>
<given-names>W. D.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>GWAS on Family History of Alzheimer&#x27;s Disease</article-title>. <source>Transl Psychiatry</source> <volume>8</volume> (<issue>1</issue>), <fpage>99</fpage>. <comment>Epub 2018/05/20PubMed PMID: 29777097; PubMed Central PMCID: PMCPMC5959890</comment>. <pub-id pub-id-type="doi">10.1038/s41398-018-0150-6</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McInnes</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Healy</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Saul</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Gro&#xdf;berger</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>UMAP: Uniform Manifold Approximation and Projection</article-title>. <source>Joss</source> <volume>3</volume>, <fpage>861</fpage>. <pub-id pub-id-type="doi">10.21105/joss.00861</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>McInnes</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Healy</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Umap: Uniform Manifold Approximation and Projection for Dimension Reduction</source>. <comment>arXiv preprint arXiv:180203426 (2018)</comment>. </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mei</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Simino</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Griswold</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mosley</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>snpGeneSets: An R Package for Genome-wide Study Annotation</article-title>. <source>G3 (Bethesda)</source> <volume>6</volume> (<issue>12</issue>), <fpage>4087</fpage>&#x2013;<lpage>4095</lpage>. <comment>Epub 2016/11/04PubMed PMID: 27807048; PubMed Central PMCID: PMCPMC5144977</comment>. <pub-id pub-id-type="doi">10.1534/g3.116.034694</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitrea</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Taghavi</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Bokanizad</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hanoudi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tagett</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Donato</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Methods and Approaches in the Topology-Based Analysis of Biological Pathways</article-title>. <source>Front. Physiol.</source> <volume>4</volume>, <fpage>278</fpage>. <comment>Epub 2013/10/18PubMed PMID: 24133454; PubMed Central PMCID: PMCPMC3794382</comment>. <pub-id pub-id-type="doi">10.3389/fphys.2013.00278</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nci Genomic Data Commons</surname>
</name>
</person-group> (<year>2021</year>). <source>Documentation Data</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/">https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/</ext-link>
</comment> (<comment>Accessed October 22, 2021)</comment>. </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nguyen</surname>
<given-names>T.-M.</given-names>
</name>
<name>
<surname>Shafi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Draghici</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Identifying Significantly Impacted Pathways: a Comprehensive Review and Assessment</article-title>. <source>Genome Biol.</source> <volume>20</volume> (<issue>1</issue>), <fpage>203</fpage>. <comment>Epub 2019/10/11PubMed PMID: 31597578; PubMed Central PMCID: PMCPMC6784345</comment>. <pub-id pub-id-type="doi">10.1186/s13059-019-1790-4</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robert</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Pelletier</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Exploring the Impact of Single-Nucleotide Polymorphisms on Translation</article-title>. <source>Front. Genet.</source> <volume>9</volume>, <fpage>507</fpage>. <comment>Epub 2018/11/15PubMed PMID: 30425729; PubMed Central PMCID: PMCPMC6218417</comment>. <pub-id pub-id-type="doi">10.3389/fgene.2018.00507</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saccone</surname>
<given-names>S. F.</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Saccone</surname>
<given-names>N. L.</given-names>
</name>
<name>
<surname>Chase</surname>
<given-names>G. A.</given-names>
</name>
<name>
<surname>Konvicka</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>P. A. F.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <article-title>Cholinergic Nicotinic Receptor Genes Implicated in a Nicotine Dependence Association Study Targeting 348 Candidate Genes with 3713 SNPs</article-title>. <source>Hum. Mol. Genet.</source> <volume>16</volume> (<issue>1</issue>), <fpage>36</fpage>&#x2013;<lpage>49</lpage>. <comment>Epub 2006/12/01PubMed PMID: 17135278; PubMed Central PMCID: PMCPMC2270437</comment>. <pub-id pub-id-type="doi">10.1093/hmg/ddl438</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saccone</surname>
<given-names>S. F.</given-names>
</name>
<name>
<surname>Rice</surname>
<given-names>J.&#x20;P.</given-names>
</name>
<name>
<surname>Saccone</surname>
<given-names>N. L.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Power-based, Phase-Informed Selection of Single Nucleotide Polymorphisms for Disease Association Screens</article-title>. <source>Genet. Epidemiol.</source> <volume>30</volume> (<issue>6</issue>), <fpage>459</fpage>&#x2013;<lpage>470</lpage>. <comment>Epub 2006/05/11PubMed PMID: 16685721</comment>. <pub-id pub-id-type="doi">10.1002/gepi.20159</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segr&#xe8;</surname>
<given-names>A. V.</given-names>
</name>
<name>
<surname>Groop</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Mootha</surname>
<given-names>V. K.</given-names>
</name>
<name>
<surname>Daly</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits</article-title>. <source>Plos Genet.</source> <volume>6</volume> (<issue>8</issue>), <fpage>e1001058</fpage>. <comment>PubMed PMID: 20714348; PubMed Central PMCID: PMCPMC2920848</comment>. <pub-id pub-id-type="doi">10.1371/journal.pgen.1001058</pub-id>
<comment>Epub 2010/08/18</comment> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Storey</surname>
<given-names>J.&#x20;D.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>A Direct Approach to False Discovery Rates</article-title>. <source>J.&#x20;R. Stat. Soc. Ser. B (Statistical Methodology)</source> <volume>64</volume> (<issue>3</issue>), <fpage>479</fpage>&#x2013;<lpage>498</lpage>. <pub-id pub-id-type="doi">10.1111/1467-9868.00346</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Stouffer</surname>
<given-names>S. A.</given-names>
</name>
</person-group> (<year>1949</year>). <source>The American Soldier: Adjustment during Army Life</source>. <publisher-name>Princeton University Press</publisher-name>. </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Subramanian</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tamayo</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mootha</surname>
<given-names>V. K.</given-names>
</name>
<name>
<surname>Mukherjee</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ebert</surname>
<given-names>B. L.</given-names>
</name>
<name>
<surname>Gillette</surname>
<given-names>M. A.</given-names>
</name>
<etal/>
</person-group> (<year>2005</year>). <article-title>Gene Set Enrichment Analysis: a Knowledge-Based Approach for Interpreting Genome-wide Expression Profiles</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>102</volume> (<issue>43</issue>), <fpage>15545</fpage>&#x2013;<lpage>15550</lpage>. <comment>Epub 2005/10/04PubMed PMID: 16199517; PubMed Central PMCID: PMCPMC1239896</comment>. <pub-id pub-id-type="doi">10.1073/pnas.0506580102</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sud</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kinnersley</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Houlston</surname>
<given-names>R. S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Genome-wide Association Studies of Cancer: Current Insights and Future Perspectives</article-title>. <source>Nat. Rev. Cancer</source> <volume>17</volume> (<issue>11</issue>), <fpage>692</fpage>&#x2013;<lpage>704</lpage>. <comment>Epub 2017/10/14PubMed PMID: 29026206</comment>. <pub-id pub-id-type="doi">10.1038/nrc.2017.82</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Hui</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>G. D.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Kraft</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Powerful Gene Set Analysis in GWAS with the Generalized Berk-Jones Statistic</article-title>. <source>Plos Genet.</source> <volume>15</volume> (<issue>3</issue>), <fpage>e1007530</fpage>. <comment>Epub 2019/03/16PubMed PMID: 30875371; PubMed Central PMCID: PMCPMC6436759</comment>. <pub-id pub-id-type="doi">10.1371/journal.pgen.1007530</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tarca</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Bhatti</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Romero</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity</article-title>. <source>PLoS One</source> <volume>8</volume> (<issue>11</issue>), <fpage>e79217</fpage>. <comment>Epub 2013/11/22PubMed PMID: 24260172; PubMed Central PMCID: PMCPMC3829842</comment>. <pub-id pub-id-type="doi">10.1371/journal.pone.0079217</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tarca</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Draghici</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bhatti</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Romero</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Down-weighting Overlapping Genes Improves Gene Set Analysis</article-title>. <source>BMC Bioinformatics</source> <volume>13</volume> (<issue>1</issue>), <fpage>136</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-13-136</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tarca</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Draghici</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Khatri</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hassan</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-s.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>A Novel Signaling Pathway Impact Analysis</article-title>. <source>Bioinformatics (Oxford, England)</source> <volume>25</volume> (<issue>1</issue>), <fpage>75</fpage>&#x2013;<lpage>82</lpage>. <comment>Epub 2008/11/05PubMed PMID: 18990722</comment>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn577</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tavazoie</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>J.&#x20;D.</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>G. M.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Systematic Determination of Genetic Network Architecture</article-title>. <source>Nat. Genet.</source> <volume>22</volume> (<issue>3</issue>), <fpage>281</fpage>&#x2013;<lpage>285</lpage>. <comment>Epub 1999/07/03PubMed PMID: 10391217</comment>. <pub-id pub-id-type="doi">10.1038/10343</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bucan</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Pathway-based Approaches for Analysis of Genomewide Association Studies</article-title>. <source>Am. J.&#x20;Hum. Genet.</source> <volume>81</volume> (<issue>6</issue>), <fpage>1278</fpage>&#x2013;<lpage>1283</lpage>. <comment>Epub 2007/10/30PubMed PMID: 17966091; PubMed Central PMCID: PMCPMC2276352</comment>. <pub-id pub-id-type="doi">10.1086/522374</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Macciardi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Subramanian</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Guffanti</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Potkin</surname>
<given-names>S. G.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>SNP-based Pathway Enrichment Analysis for Genome-wide Association Studies</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>, <fpage>99</fpage>. <comment>Epub /04/19PubMed PMID: 21496265; PubMed Central PMCID: PMCPMC3102637</comment>. <pub-id pub-id-type="doi">10.1186/1471-2105-12-99</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wijmenga</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhernakova</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>The Importance of Cohort Studies in the post-GWAS Era</article-title>. <source>Nat. Genet.</source> <volume>50</volume> (<issue>3</issue>), <fpage>322</fpage>&#x2013;<lpage>328</lpage>. <comment>Epub 2018/03/08PubMed PMID: 29511284</comment>. <pub-id pub-id-type="doi">10.1038/s41588-018-0066-3</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jauhari</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mora</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Popularity and Performance of Bioinformatics Software: the Case of Gene Set Analysis</article-title>. <source>BMC Bioinformatics</source> <volume>22</volume> (<issue>1</issue>), <fpage>191</fpage>. <comment>Epub 2021/04/17PubMed PMID: 33858350; PubMed Central PMCID: PMCPMC8050894</comment>. <pub-id pub-id-type="doi">10.1186/s12859-021-04124-5</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoon</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>H. C. T.</given-names>
</name>
<name>
<surname>Yoo</surname>
<given-names>Y. J.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Baik</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Efficient Pathway Enrichment and Network Analysis of GWAS Summary Data Using GSA-SNP2</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume> (<issue>10</issue>), <fpage>e60</fpage>. <comment>Epub 2018/03/22PubMed PMID: 29562348; PubMed Central PMCID: PMCPMC6007455</comment>. <pub-id pub-id-type="doi">10.1093/nar/gky175</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Bergen</surname>
<given-names>A. W.</given-names>
</name>
<name>
<surname>Pfeiffer</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Rosenberg</surname>
<given-names>P. S.</given-names>
</name>
<name>
<surname>Caporaso</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>Pathway Analysis by Adaptive Combination ofP-Values</article-title>. <source>Genet. Epidemiol.</source> <volume>33</volume> (<issue>8</issue>), <fpage>700</fpage>&#x2013;<lpage>709</lpage>. <comment>Epub 2009/04/01PubMed PMID: 19333968; PubMed Central PMCID: PMCPMC2790032</comment>. <pub-id pub-id-type="doi">10.1002/gepi.20422</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>i-Gsea4Gwas</surname>
</name>
</person-group> (<year>2010</year>). <article-title>i-GSEA4GWAS: a Web Server for Identification of Pathways/gene Sets Associated with Traits by Applying an Improved Gene Set Enrichment Analysis to Genome-wide Association Study</article-title>. <source>Nucleic Acids Res.</source> <volume>38</volume>, <fpage>W90</fpage>&#x2013;<lpage>W95</lpage>. <comment>(Web Server issue):W90Epub 2010/05/04PubMed PMID: 20435672; PubMed Central PMCID: PMCPMC2896119</comment>. <pub-id pub-id-type="doi">10.1093/nar/gkq324</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zyla</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Marczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Domaszewska</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kaufmann</surname>
<given-names>S. H. E.</given-names>
</name>
<name>
<surname>Polanska</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Weiner</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Gene Set Enrichment for Reproducible Science: Comparison of CERNO and Eight Other Algorithms</article-title>. <source>Bioinformatics</source> <volume>35</volume> (<issue>24</issue>), <fpage>5146</fpage>&#x2013;<lpage>5154</lpage>. <comment>Epub 2019/06/06PubMed PMID: 31165139; PubMed Central PMCID: PMCPMC6954644</comment>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz447</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zyla</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Marczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Weiner</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Polanska</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Ranking Metrics in Gene Set Enrichment Analysis: Do They Matter?</article-title> <source>BMC Bioinformatics</source> <volume>18</volume> (<issue>1</issue>), <fpage>256</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-017-1674-0</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>