<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="brief-report" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title>Frontiers in Computational Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5188</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fncom.2024.1388504</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Capturing biomarkers associated with Alzheimer disease subtypes using data distribution characteristics</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Smith</surname> <given-names>Kenneth</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/2822179/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Climer</surname> <given-names>Sharlee</given-names></name><xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2661227/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/funding-acquisition/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff><institution>Department of Computer Science, University of Missouri &#x2013; St. Louis</institution>, <addr-line>St. Louis, MO</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0001">
<p>Edited by: Chengcui Zhang, University of Alabama at Birmingham, United States</p>
</fn>
<fn fn-type="edited-by" id="fn0002">
<p>Reviewed by: Meng Luo, Harbin Institute of Technology, China</p>
<p>Pratim Saha, University of Alabama at Birmingham, United States</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Sharlee Climer, <email>climer@umsl.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>03</day>
<month>09</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>18</volume>
<elocation-id>1388504</elocation-id>
<history>
<date date-type="received">
<day>19</day>
<month>02</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>08</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2024 Smith and Climer.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Smith and Climer</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Late-onset Alzheimer disease (AD) is a highly complex disease with multiple subtypes, as demonstrated by its disparate risk factors, pathological manifestations, and clinical traits. Discovery of biomarkers to diagnose specific AD subtypes is a key step towards understanding biological mechanisms underlying this enigmatic disease, generating candidate drug targets, and selecting participants for drug trials. Popular statistical methods for evaluating candidate biomarkers, fold change (FC) and area under the receiver operating characteristic curve (AUC), were designed for homogeneous data and we demonstrate the inherent weaknesses of these approaches when used to evaluate subtypes representing less than half of the diseased cases. We introduce a unique evaluation metric that is based on the distribution of the values, rather than the magnitude of the values, to identify analytes that are associated with a subset of the diseased cases, thereby revealing potential biomarkers for subtypes. Our approach, Bimodality Coefficient Difference (BCD), computes the difference between the degrees of bimodality for the cases and controls. We demonstrate the effectiveness of our approach with large-scale synthetic data trials containing nearly perfect subtypes. In order to reveal novel AD biomarkers for heterogeneous subtypes, we applied BCD to gene expression data for 8,650 genes for 176 AD cases and 187 controls. Our results confirm the utility of BCD for identifying subtypes of heterogeneous diseases.</p>
</abstract>
<kwd-group>
<kwd>Alzheimer disease</kwd>
<kwd>precision medicine</kwd>
<kwd>subtypes</kwd>
<kwd>biomarkers</kwd>
<kwd>association studies</kwd>
<kwd>fold change</kwd>
<kwd>AUC</kwd>
<kwd>bimodality</kwd>
</kwd-group>
<contract-num rid="cn1">AARG-22-925002</contract-num>
<contract-num rid="cn2">1RF1AG053303-01</contract-num>
<contract-num rid="cn2">3RF1AG053303-01S2</contract-num>
<contract-sponsor id="cn1">Alzheimer&#x2019;s Association</contract-sponsor>
<contract-sponsor id="cn2">National Institute on Aging (NIA)<named-content content-type="fundref-id">10.13039/100000049</named-content></contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="3"/>
<equation-count count="2"/>
<ref-count count="53"/>
<page-count count="14"/>
<word-count count="7802"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Neuroscience</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1">
<label>1</label>
<title>Introduction</title>
<p>Advances in precision medicine (PM) for cancer patients are extending the healthspan for countless lives by tailoring treatments to heterogeneous cancer subtypes. PM utilizes specific biomarker information to diagnose each specific subtype of the disease and enable customized treatments, prognoses, and monitoring. Candidate biomarkers may include genetics, demographics, lifestyle, and/or physiological observations such as imaging or omics data (e.g., levels of gene expression, proteins, lipids, or metabolites). An additional benefit of PM is that it facilitates understanding of underlying biological mechanisms by teasing apart biomarkers into subtype groups. Knowledge of distinct biomarkers associated with each subtype empowers drug discovery as well as selections of participants for drug trials.</p>
<p>Heterogeneous subtypes of late-onset Alzheimer disease (AD) are exhibited by the disparate genetic and environmental risk factors and clinical outcomes observed for this enigmatic disease. Efforts are underway to enable PM for AD, including the Accelerating Medicines Partnership&#x00AE; for AD 2.0 (<xref ref-type="bibr" rid="ref1">Accelerating Medicines Partnership, 2022</xref>), which began in 2021, and Alzheimer Precision Medicine Initiative (<xref ref-type="bibr" rid="ref14">Hampel et al., 2019</xref>), which began in 2016.</p>
<p>Imaging methods and CSF total tau (tTau) have been used to discriminate typical and atypical AD subtypes associated with brain regions. While positive amyloid PET indicates AD status in general, fluorodeoxyglucose PET (FDG-PET), and tau ligand binding suggest five subtypes: typical amnestic syndrome, logopenic variant of primary progressive aphasia, posterior cortical atrophy, corticobasal syndrome, and frontal AD (<xref ref-type="bibr" rid="ref7">Dubois et al., 2023</xref>). Furthermore, Pillia et al. observed an association between the upregulation of tTau in CSF and the atypical logopenic variant subtype (<xref ref-type="bibr" rid="ref40">Pillai et al., 2019</xref>).</p>
<p>Ferreira et al. conducted an extensive meta-analysis of neuropathology and neuroimaging studies and propose AD subtypes based on two dimensions: typicality and severity (<xref ref-type="bibr" rid="ref11">Ferreira et al., 2020</xref>). The four subtypes are typical AD, limbic-predominant, hippocampal-sparing, and minimal atrophy. They present covariates that are associated with the two subtypes at the extremes of the typicality dimension, limbic-predominant and hippocampal-sparing.</p>
<p>Recently, deep learning methods incorporating multiple data types, such as imaging, omics data, and clinical assessments, have introduced multimodal models. For example, Reyes et al. proposed a tri-modal co-attention transformer, referred to as Tri-COAT, to classify AD cases into three progression-specific subtypes (<xref ref-type="bibr" rid="ref27">Machado Reyes et al., 2024</xref>). They integrated imaging data, genetics, and clinical records by using transformer modules to encode each type separately, then merging the three into a co-attention model to learn feature weights and relationships across the three data types.</p>
<p>Nevertheless, PM progress has been limited for AD as well as a host of additional neurological diseases. Unlike cancer, which provides a written history of mutations, diseased and healthy cells for comparisons from a given patient, and excellent animal models for experiments, pathological clues for AD lie buried deep in the human brain with only traces of evidence that leak into peripheral systems.</p>
<p>The realization of successful PM can only be attained by identifying disease subtypes and developing practical methods to diagnose and treat each subtype. A common first step is to use statistical methods to test associations of candidate biomarkers with the disease. Different statistics are used for categorical, ordinal, and numerical data types. Herein we focus on numerical data types, which includes omics data (e.g., gene expression and protein levels), measurements from imaging data (such as PET amyloid load), and other observations that are quantified as numerical values. Popular statistics for this domain include fold change (FC) of levels of candidate biomarkers between diseased cases and normal controls and area under the receiver operating characteristic curve (AUC; <xref ref-type="bibr" rid="ref51">Xia et al., 2013</xref>).</p>
<p>The nascent PM for AD research field faces challenges due to multiple issues, including the need for large sample sizes to elicit power to sift out a subtype that may only represent a small fraction of the diseased cases. An overlooked, but major, challenge is that traditional statistical methods that are successful for global biomarkers can be inappropriate for subset biomarker identification. Stated bluntly, traditional methods need to be scrutinized for use in this distinct domain.</p>
<p>In order to assess current statistical metrics for advancing PM for AD, we examine the use of FC and AUC when subtype groups exist. FC is a traditional approach for identifying analytes that are differentially expressed across diseased cases and normal controls. It is equal to the quotient of the analyte expression levels between the two groups: (level in diseased cases)/(level in normal controls). If the quotient is above or below a given cutoff, the analyte is considered differentially expressed. A single value representing the expression level of the analyte is required for each group; usually the median or mean. Typically, a cutoff of &#x003E;2 is used to indicate significant up-regulation in the diseased cases group and a cutoff of &#x003C;0.5 for down-regulation. In order to more easily interpret across both up-and down-regulated analytes, the log2FC is often employed, where log2FC&#x2009;=&#x2009;abs{log<sub>2</sub>[(level in diseased cases)/(level in normal controls)]}, providing a significance threshold of log2FC&#x2009;&#x003E;&#x2009;1 for both up-and down-regulation (<xref ref-type="bibr" rid="ref35">Pacholewska, 2017</xref>). Some weaknesses of this metric have been previously noted. FC calculations are unstable when the expression levels are near the noise level of the measurement system. This can lead to false positives at low intensity levels. At the other end of the spectrum, FC is also biased against samples that have high expression levels, but small differences between two groups (<xref ref-type="bibr" rid="ref30">Mariani et al., 2003</xref>). Mariani et al. reported that high FC cutoffs are needed for low intensity genes and lower cutoffs are needed for high intensity genes. They introduced a variable FC cutoff-based approach that uses LOESS to estimate a variance based on expression intensity, thereby alleviating the bias at both high and low intensity levels (<xref ref-type="bibr" rid="ref30">Mariani et al., 2003</xref>). Despite these improvements to the FC calculation, there is a fundamental problem with this metric: Use of the mean or median in the presence of heterogeneity leads to the omission of subgroup signals, as demonstrated in this manuscript.</p>
<p>Standard 2&#x2009;&#x00D7;&#x2009;2 contingency tables are commonly used to assess predictive accuracy of biomarkers using various statistics, such as sensitivity/specificity, precision/recall, Fisher&#x2019;s Exact Test (<xref ref-type="bibr" rid="ref12">Fisher, 1935</xref>), and Youden&#x2019;s J index (<xref ref-type="bibr" rid="ref52">Youden, 1950</xref>). Note that Youden&#x2019;s J definition can be rearranged to produce a simple interpretation: J&#x2009;=&#x2009;TPR&#x2009;&#x2212;&#x2009;FPR, where TPR is the true positive rate and FPR is the false positive rate. A key benefit of utilizing Youden&#x2019;s J is that subgroups can be captured, rather than being lost in a summary statistic, as is done with FC. However, without other information, subgroups may be overlooked due to the existence of moderate case/control biomarkers with the same J value, just higher TPR and FPR, e.g., J&#x2009;=&#x2009;0.20&#x2009;&#x2212;&#x2009;0.01 vs. J&#x2009;=&#x2009;0.70&#x2009;&#x2212;&#x2009;0.51. Importantly, in order to classify real values as true or false positives, a threshold must be designated, and Youden&#x2019;s J value is highly dependent upon the given threshold.</p>
<p>More generally, when testing numerical values, 2&#x2009;&#x00D7;&#x2009;2 contingency tables require the selection of a threshold to separate diagnostic classifications. A key strength of AUC is that it has no reliance upon a specified threshold. This metric originated as a tool for radar receivers, spread throughout engineering and medical domains, and has become a prevalent tool for evaluating the diagnostic ability of biomarkers (<xref ref-type="bibr" rid="ref51">Xia et al., 2013</xref>; <xref ref-type="bibr" rid="ref53">Zweig and Campbell, 1993</xref>; <xref ref-type="bibr" rid="ref46">S&#x00F8;, 2009</xref>). AUC simultaneously accounts for sensitivity and specificity across all threshold values as a plot of the TPR vs. FPR is constructed and the area under the curve is returned as the AUC value (<xref ref-type="bibr" rid="ref37">Pepe, 2000</xref>). The plot for a random classifier would tend toward a diagonal line from (0,0) to (1,1) with an AUC value of 0.5. A &#x2018;perfect&#x2019; predictor would have FPR&#x2009;=&#x2009;0 and TPR&#x2009;=&#x2009;1 for all thresholds of the biomarker and a corresponding AUC value of 1. An example of this rare event was reported by Karikari et al. for discriminating Alzheimer disease from healthy young adults using plasma tau phosphorylated at threonine 181 (pTau-181) (<xref ref-type="bibr" rid="ref20">Karikari et al., 2020</xref>).</p>
<p>There is not a consensus for a significance cutoff for AUC values. Previous publications have suggested an AUC between 0.7 and 0.8 as acceptable and greater than 0.8 as excellent (<xref ref-type="bibr" rid="ref5">Cucchiara, 2013</xref>; <xref ref-type="bibr" rid="ref29">Mandrekar, 2010</xref>), while the National Center on Response to Intervention&#x2019;s Technical Standard sets AUC values between 0.75 and 0.85 as &#x2018;partially convincing&#x2019; and below 0.75 as &#x2018;unconvincing&#x2019; (<xref ref-type="bibr" rid="ref3">Bowers and Zhou, 2019</xref>). On the other hand, it has been recommended that no set value should be utilized; rather AUC values should be used to compare predictors within a single domain rather than enforcing a strict cutoff value (<xref ref-type="bibr" rid="ref53">Zweig and Campbell, 1993</xref>; <xref ref-type="bibr" rid="ref17">Hanley and McNeil, 1982</xref>; <xref ref-type="bibr" rid="ref47">Swets, 1988</xref>; <xref ref-type="bibr" rid="ref2">Bowers et al., 2013</xref>).</p>
<p>In addition to evaluating biomarkers across all threshold values, AUC has several other beneficial properties. It is a simple and intuitive measure, and the corresponding ROC plot provides additional information beyond the scalar value. Also, there are no parameters to be tuned, yielding robust reproducibility.</p>
<p>There are also some well-known issues with AUC. First, small sample size can yield poor performance (<xref ref-type="bibr" rid="ref15">Hanczar et al., 2010</xref>; <xref ref-type="bibr" rid="ref8">Dudbridge, 2013</xref>). Second, AUC includes the areas under the ROC curve that represent threshold values that are not utilized in practical applications (<xref ref-type="bibr" rid="ref25">Lobo et al., 2008</xref>). A related issue is when the ROC curves of two different biomarkers cross, the relative AUC values may be misleading (<xref ref-type="bibr" rid="ref16">Hand, 2009</xref>).</p>
<p>In general, the points in the ROC curve arise <italic>solely</italic> from differences in TPR and FPR and are not scaled across threshold values, resulting with the possibility of a small span of threshold values being stretched across broad regions of the area under the curve. Consequently, a small difference in the level of the analyte would correspond to large differences in specificity and sensitivity. In clinical practice, target thresholds or threshold ranges are used to flag individuals at risk. AUC values are generally computed over clean data that have been acquired and processed using highly uniform methods, but this uniformity deteriorates when moving from bench to bedside. In general, examination of AUC values and plots may not directly provide insights for selecting a suitable diagnostic threshold that is robust across measurement error. The metric introduced in this manuscript addresses this issue.</p>
<p>The AUC metric is entirely dependent upon, and equally weighted on, the TPR and FPR. When testing across a heterogeneous group, an accurate TPR for a perfect biomarker has an upper limit equal to the proportion of the subtype. Due to its dependence upon TPR, we hypothesize that screening based on AUC may discard valuable subtype biomarkers, regardless of sample size. Using simulated tests mimicking nearly &#x2018;ideal&#x2019; biomarkers for subsets of disease cases, we demonstrate the failure of AUC to capture their significance.</p>
<p>The need for a robust evaluation metric in the heterogeneous AD domain inspired us to design a tool that is based upon the <italic>distribution</italic> of values, rather than traditional statistical measurements. Consider a biomarker that is a strong indicator of a subset of diseased cases, referred to as &#x2018;associated cases&#x2019;. We assume here that the cases that are not part of this subtype exhibit biomarker levels that are similar to the normal controls. Consequently, the distribution of biomarker levels for the cases tends to skew the distribution or exhibit a bimodal profile, where one of the modes lines up with the controls&#x2019; distribution.</p>
<p>It should be noted that normal controls might show a bimodal distribution also. For example, blood sugar levels are high following a meal and low just before a meal, so controls sampled at varying times of day would be prone to exhibit a bimodal curve for this analyte.</p>
<p>Aiming to identify aberrant bimodal distributions, we propose a metric which calculates the <italic>difference</italic> between the bimodalities of the diseased cases and normal controls. The first task is to select a method to measure the degree of bimodality for an array of data values. A number of formulae for distinguishing between unimodality and bimodality have been previously proposed and evaluated (<xref ref-type="bibr" rid="ref13">Freeman and Dale, 2013</xref>). Hartigan&#x2019;s Dip Statistic (HDS) (<xref ref-type="bibr" rid="ref18">Hartigan and Hartigan, 1985</xref>) and the Bimodality Coefficient (BC) (<xref ref-type="bibr" rid="ref42">SAS Institute Inc, 1990</xref>) have both been shown to have good accuracy to detect bimodality (<xref ref-type="bibr" rid="ref13">Freeman and Dale, 2013</xref>). Note that high skewness in a unimodal distribution tends to increase BC values and can lead to false-positive bimodal predictions (<xref ref-type="bibr" rid="ref39">Pfister et al., 2013</xref>). We selected BC as we are interested in identifying either bimodality or skewness that is significantly different between cases and controls.</p>
<p>We introduce Bimodality Coefficient Difference (BCD) as the difference in the BC values for the diseased cases and normal controls. BCD can theoretically range from zero to one, but we observe in our trials that relatively low values indicate significance. Using a series of simulation trials, we demonstrate the effectiveness of this metric over FC and AUC for identifying analytes with clear subtype populations that comprise less than half of the simulated cases. We then leverage this method in an analysis of AD gene expression data and reveal known and novel genes exhibiting bimodal distributions for the AD cases. Notably, more than 95% of the genes discovered by BCD were missed by both FC and AUC. The python software package for computing BCD is freely available at: <ext-link xlink:href="https://github.com/ClimerLab/bcd" ext-link-type="uri">https://github.com/ClimerLab/bcd</ext-link>.</p>
</sec>
<sec sec-type="methods" id="sec2">
<label>2</label>
<title>Methods</title>
<sec id="sec3">
<label>2.1</label>
<title>BCD</title>
<p>The Bimodality Coefficient, BC, was introduced by SAS in 1990 and is based on three parameters of the array of values: cardinality (<italic>n</italic>), skewness (<italic>s</italic>), and kurtosis (<italic>k</italic>). The value is computed as follows:</p>
<disp-formula id="E1">
<mml:math id="M1">
<mml:mi>B</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo>&#x00D7;</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfrac>
</mml:math>
</disp-formula>
<p>BC values range from zero to one and a uniform distribution has a value of 5/9&#x2009;&#x2248;&#x2009;0.555. Higher values indicate greater bimodality.</p>
<p>We propose the following measure, bimodality coefficient difference (BCD), for identifying biomarkers representing subtypes in heterogeneous populations:</p>
<disp-formula id="E2">
<mml:math id="M2">
<mml:mi>B</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo stretchy="true">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi mathvariant="italic">cases</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>B</mml:mi>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi mathvariant="italic">controls</mml:mi>
</mml:msub>
<mml:mo stretchy="true">|</mml:mo>
</mml:math>
</disp-formula>
<p>The absolute value is applied as a protective factor may result with the controls having a higher BC value than the cases.</p>
</sec>
<sec id="sec4">
<label>2.2</label>
<title>Simulated data trials</title>
<p>In our simulations, samples are drawn from one of two normal distributions, N<sub>1</sub> and N<sub>2</sub>, with the following means and standard deviations: N<sub>1</sub>&#x2009;~&#x2009;(0.03, 0.04) and N<sub>2</sub>&#x2009;~&#x2009;(0.40, 0.16). These means and standard deviations were derived from analysis of highly differentially-expressed proteins from our COVID-19 study, as described in the <xref ref-type="supplementary-material" rid="SM6">Supplementary material</xref>. The size of the subtype, as a percentage of the cases, are varied over seven scenarios from 0 to 50%. In each scenario, the cases in the subtype group were sampled from N<sub>2</sub> and the remaining cases, along with all controls, are sampled from N<sub>1</sub>. A total of 1,000 cases and 1,000 controls are simulated in each trial. Each scenario was tested using 1,000 trials. Histograms for randomly selected trials are shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Sample histograms for the simulation trials. Shown are random histograms drawn from the 1,000 trials for each of the seven subset size scenarios.</p>
</caption>
<graphic xlink:href="fncom-18-1388504-g001.tif"/>
</fig>
</sec>
<sec id="sec5">
<label>2.3</label>
<title>Biological data trials</title>
<p>We utilized publicly-available gene expression data from human cortex tissue generated using Sentrix HumanRef-8 Expression BeadChip (<xref ref-type="bibr" rid="ref49">Webster et al., 2009</xref>). These data are available on NCBI&#x2019;s Gene Expression Omnibus (GEO), Accession GSE15222. Standard protocols for cRNA hybridization and BeadStudio software, with Illumina&#x2019;s custom error model, were utilized in data generation, as previously described (<xref ref-type="bibr" rid="ref49">Webster et al., 2009</xref>). Data for 8,650 genes for 176 AD cases and 187 controls are provided and used for the current study.</p>
<p>Following analyses utilizing FC, AUC, and BCD, the highest 5% results were extracted for each method and used for comparisons between the methods. In order to further interrogate the results and examine data distributions, the six best genes for each method were extracted and plotted. Note that multiple testing corrections were not applied for any of the methods and the presented results need to be validated in independent data prior to further research effort.</p>
</sec>
<sec id="sec6">
<label>2.4</label>
<title>Data pre-processing</title>
<p>The AD data were pre-processed by the Myers&#x2019; lab, as described previously (<xref ref-type="bibr" rid="ref49">Webster et al., 2009</xref>). Outliers disproportionately affect BC values and there is no clear consensus on eliminating them prior to computing BC. Here we winsorized the outliers as follows. Given a lower quartile Q1, upper quartile Q3, and IQR, values higher than Q3 + 3&#x002A;IQR were replaced with Q3 + 3&#x002A;IQR and values less than Q1 - 3&#x002A;IQR were replaced with Q1-3&#x002A;IQR.</p>
<p>Because the simulated data can contain &#x2018;negative&#x2019; expression values, a min/max normalization was applied for FC calculations and plotting to scale and shift the values to a range of [0, 1]. Also, a logistical regression model was generated in the AUC computations.</p>
</sec>
</sec>
<sec sec-type="results" id="sec7">
<label>3</label>
<title>Results</title>
<sec id="sec8">
<label>3.1</label>
<title>Simulation trials</title>
<p>We generated large-scale simulated data for a total of 7,000 pseudo analytes over a range of subtype percentages and analyzed each using FC, AUC, and BCD. The subset size of zero provides a baseline for which no association should be observed as all the data points for cases and controls are drawn from the N<sub>1</sub> distribution. The other trials test subset sizes of 5, 10, 20, 30, 40, and 50%. Results for the simulations are summarized in <xref ref-type="table" rid="tab1">Table 1</xref>. As expected, <italic>Sim_0%</italic>, with none of the data values drawn from N<sub>2</sub>, yielded values near zero for log2FC and BCD, and near 0.5 for AUC.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Median values for the simulation trials, with minimum and maximum values shown in brackets.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Trial</th>
<th align="center" valign="top">log2FC</th>
<th align="center" valign="top">AUC</th>
<th align="center" valign="top">BCD (&#x003E;2.09)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">
<italic>Sim_0%</italic>
</td>
<td align="center" valign="top">0.016 [5.47E&#x2212;06, 0.080]</td>
<td align="center" valign="top">0.508 [0.491, 0.542]</td>
<td align="center" valign="top">0.016 [8.68E&#x2212;06, 0.076]</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Sim_5%</italic>
</td>
<td align="center" valign="top">0.0276 [2.25E&#x2212;06, 0.107]</td>
<td align="center" valign="top">0.525 [0.483, 0.563]</td>
<td align="center" valign="top">0.145 [0.066, 0.214]</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Sim_10%</italic>
</td>
<td align="center" valign="top">0.0544 [2.79E&#x2212;04, 0.143]</td>
<td align="center" valign="top">0.548 [0.509, 0.605]</td>
<td align="center" valign="top">0.282 [0.214, 0.350]</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Sim_20%</italic>
</td>
<td align="center" valign="top">0.124 [0.037, 0.210]</td>
<td align="center" valign="top">0.598 [0.563, 0.630]</td>
<td align="center" valign="top">0.395 [0.325, 0.458]</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Sim_30%</italic>
</td>
<td align="center" valign="top">0.206 [0.071, 0.320]</td>
<td align="center" valign="top">0.646 [0.617, 0.675]</td>
<td align="center" valign="top">0.441 [0.372, 0.514]</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Sim_40%</italic>
</td>
<td align="center" valign="top">0.337 [0.134, 0.452]</td>
<td align="center" valign="top">0.695 [0.669, 0.723]</td>
<td align="center" valign="top">0.464 [0.389, 0.525]</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Sim_50%</italic>
</td>
<td align="center" valign="top">0.608 [0.221, 0.821]</td>
<td align="center" valign="top">0.743 [0.719, 0.768]</td>
<td align="center" valign="top">0.438 [0.383, 0.504]</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The first row represents no subsets, where all of the cases and controls values are drawn from the N<sub>1</sub> distribution. Subsequent rows represent trials with 5, 10, 20, 30, 40, and 50%, respectively, of the cases values drawn from the N<sub>2</sub> distribution and represent the disease subtype. For each scenario, 1,000 cases and 1,000 controls values were generated for each of 1,000 trials. Note that the BCD analysis of the AD gene expression data provides a threshold of 0.209 for <italic>p</italic>-value &#x2264;0.05.</p>
</table-wrap-foot>
</table-wrap>
<p>Across the remaining trials with subtypes ranging from 5 to 50%, none of the log2FC values were significant as the maximum value across all the simulations is 0.821. None of the AUC values were significant for subsets of 30% or less as the highest across those simulations was 0.675. The medians for subset size 40 and 50% were 0.695 and 0.743, respectively. As described below, the AD data provided a threshold of 0.209 for a <italic>p</italic>-value of 0.05 for BCD. Based on this proxy, BCD values were significant for all trials with subset sizes of 10% or more as well as a few of the 5% subset size. Sample ROC curves for each scenario are shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>. Note the vertical rise on the left, which perfectly captures the subtype, followed by the relatively straight diagonal line across the graph.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>AUC plots for subtype groups of 0, 5, 10, 20, 30, 40, and 50% of the diseased cases. Random plots for each scenario are shown.</p>
</caption>
<graphic xlink:href="fncom-18-1388504-g002.tif"/>
</fig>
</sec>
<sec id="sec9">
<label>3.2</label>
<title>Alzheimer disease data</title>
<p>In our first round of BCD trials, the genes with the highest values proved to be spurious. For example, gene GI_37540877-S exhibited the strongest association, with an BCD value of 0.377. This signal was erroneous, as described next.</p>
<p>Both the AD cases and normal controls have outlier values for this gene and these outliers were winsorized, as described in the Methods section, to values of 331.65. As shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>, 24 of the AD cases and 8 of the normal controls exhibit these outlier values for this gene. We extracted the covariate data for these samples and observed all but two in each group were brain samples drawn from region 4 (<xref ref-type="fig" rid="fig3">Figure 3</xref>). 91.7% of the cases in the second mode were drawn from brain region 4, even though only 12.5% of the case samples overall were drawn from this region. Furthermore, only 4.8% of the control samples were drawn from region 4, yielding a strong imbalance of samples for this region. Consequently, diseased cases samples that were drawn from region 4 form distinct subsets that create second modes for genes that are differentially expressed across the brain regions. These results demonstrate the power of BCD to identify subtypes, but do not yield information of interest regarding AD. As shown in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S1</xref>, brain region 2 is also unbalanced between cases and controls.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Spurious results for the first round of BCD trials on the AD biological data. <bold>(A)</bold> The histograms depict the numbers of AD cases and normal controls with each expression value range for the most significant BCD gene, GI_37540877-S. The covariates for the individuals in the second mode are shown on the right. Note that 91.7 and 75.0% of the cases and controls, respectively, are samples from brain region 4. <bold>(B)</bold> Overall, brain region 4 comprises only 12.5 and 4.8% of the cases and controls, respectively. These results suggest that BCD identified differences between AD cases and normal controls due to the differences in the expression of GI_37540877-S in the various brain regions.</p>
</caption>
<graphic xlink:href="fncom-18-1388504-g003.tif"/>
</fig>
<p>In our second round of trials, we removed samples drawn from brain regions 2 and 4, yielding 137&#x2009;AD cases and 175 normal controls then analyzed the data using FC, AUC, and BCD. The genes with the highest 5% of values for each method are enumerated in <xref ref-type="supplementary-material" rid="SM2">Supplementary Tables S2</xref>&#x2013;<xref ref-type="supplementary-material" rid="SM4">S4</xref>. Across the 8,650 genes, FC, AUC, and BCD values of 0.637, 0.740, and 0.209, respectively, represented the cutoffs for <italic>p</italic>-values&#x2009;&#x2264;&#x2009;0.05.</p>
<p>Overall, 46.1% of the significant genes for FC and AUC were the same. In sharp contrast, only 3.7 and 4.6% of the significant BCD genes were identified by FC and AUC, respectively. Overall, 2.3% of the significant genes for each method were common across all three approaches (<xref ref-type="supplementary-material" rid="SM5">Supplementary Table S5</xref>).</p>
<p>Lists of the top six genes for FC, AUC, and BCD are shown in <xref ref-type="table" rid="tab2">Table 2</xref>, histograms for each of these genes are shown in <xref ref-type="fig" rid="fig4">Figures 4</xref>&#x2013;<xref ref-type="fig" rid="fig6">6</xref> and <xref ref-type="table" rid="tab3">Table 3</xref> provides descriptions of the top AD genes identified by BCD. Some of the FC and AUC plots exhibit tendency towards bimodality or increased skew, but in general they represent differences in expression across the majority of the samples, demonstrating their value for identifying biomarkers associated with large proportions of the cases.</p>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption>
<p>Top six genes for each analysis of the AD gene expression data.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top" colspan="3">Fold Change</th>
<th align="center" valign="top" colspan="3">AUC</th>
<th align="center" valign="top" colspan="3">BCD</th>
</tr>
<tr>
<th align="left" valign="top">GeneID</th>
<th align="center" valign="top">Gene</th>
<th align="center" valign="top">log2FC</th>
<th align="center" valign="top">GeneID</th>
<th align="center" valign="top">Gene</th>
<th align="center" valign="top">AUC</th>
<th align="center" valign="top">GeneID</th>
<th align="center" valign="top">Gene</th>
<th align="center" valign="top">BCD</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">GI_38201693-S</td>
<td align="center" valign="top">RGS4</td>
<td align="center" valign="top">1.448</td>
<td align="center" valign="top">GI_4585642-S</td>
<td align="center" valign="top">ZNF264</td>
<td align="center" valign="top">0.854</td>
<td align="center" valign="top">GI_4502806-S</td>
<td align="center" valign="top">CHGB</td>
<td align="center" valign="top">0.403</td>
</tr>
<tr>
<td align="left" valign="top">GI_40255112-S</td>
<td align="center" valign="top">MGC35285</td>
<td align="center" valign="top">1.428</td>
<td align="center" valign="top">GI_27734844-S</td>
<td align="center" valign="top">ZDHHC23</td>
<td align="center" valign="top">0.830</td>
<td align="center" valign="top">GI_17999536-S</td>
<td align="center" valign="top">PRPF8</td>
<td align="center" valign="top">0.378</td>
</tr>
<tr>
<td align="left" valign="top">GI_40254432-S</td>
<td align="center" valign="top">N/A</td>
<td align="center" valign="top">1.342</td>
<td align="center" valign="top">GI_24308166-S</td>
<td align="center" valign="top">DKFZp761H039</td>
<td align="center" valign="top">0.829</td>
<td align="center" valign="top">GI_37540877-S</td>
<td align="center" valign="top">GLCE</td>
<td align="center" valign="top">0.378</td>
</tr>
<tr>
<td align="left" valign="top">GI_40018630-A</td>
<td align="center" valign="top">N/A</td>
<td align="center" valign="top">1.306</td>
<td align="center" valign="top">GI_34577121-S</td>
<td align="center" valign="top">NFKB1</td>
<td align="center" valign="top">0.822</td>
<td align="center" valign="top">GI_28872783-A</td>
<td align="center" valign="top">CDK5RAP1</td>
<td align="center" valign="top">0.375</td>
</tr>
<tr>
<td align="left" valign="top">GI_29744077-S</td>
<td align="center" valign="top">LOC340542</td>
<td align="center" valign="top">1.288</td>
<td align="center" valign="top">GI_13376557-S</td>
<td align="center" valign="top">FLJ21272</td>
<td align="center" valign="top">0.820</td>
<td align="center" valign="top">GI_14589948-S</td>
<td align="center" valign="top">POLR2A</td>
<td align="center" valign="top">0.374</td>
</tr>
<tr>
<td align="left" valign="top">GI_27475984-S</td>
<td align="center" valign="top">NEUROD6</td>
<td align="center" valign="top">1.217</td>
<td align="center" valign="top">GI_23312375-A</td>
<td align="center" valign="top">PPEF1</td>
<td align="center" valign="top">0.815</td>
<td align="center" valign="top">GI_23503234-A</td>
<td align="center" valign="top">C1QDC1</td>
<td align="center" valign="top">0.372</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Histograms for the six top genes identified using FC. Each row corresponds to a gene in <xref ref-type="table" rid="tab2">Table 2</xref> and are given in the same order.</p>
</caption>
<graphic xlink:href="fncom-18-1388504-g004.tif"/>
</fig>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>Histograms for the six top genes identified using AUC. Each row corresponds to a gene in <xref ref-type="table" rid="tab2">Table 2</xref> and are given in the same order.</p>
</caption>
<graphic xlink:href="fncom-18-1388504-g005.tif"/>
</fig>
<fig position="float" id="fig6">
<label>Figure 6</label>
<caption>
<p>Histograms for the six top genes identified using BCD. Each row corresponds to a gene in <xref ref-type="table" rid="tab2">Table 2</xref> and are given in the same order.</p>
</caption>
<graphic xlink:href="fncom-18-1388504-g006.tif"/>
</fig>
<table-wrap position="float" id="tab3">
<label>Table 3</label>
<caption>
<p>Descriptions of the top six genes identified by BCD in the AD data.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">GeneID</th>
<th align="left" valign="top">Gene</th>
<th align="left" valign="top">Description</th>
<th align="left" valign="top">Location</th>
<th align="left" valign="top">Alias</th>
<th align="left" valign="top">NCBI Summary</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">GI_4502806-S</td>
<td align="left" valign="top">CHGB</td>
<td align="left" valign="top">Chromogranin B (secretogranin 1)</td>
<td align="left" valign="top">20p12.3</td>
<td align="left" valign="top">SCG1</td>
<td align="left" valign="top">This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides (provided by RefSeq, January 2009).</td>
</tr>
<tr>
<td align="left" valign="top">GI_17999536-S</td>
<td align="left" valign="top">PRPF8</td>
<td align="left" valign="top">Pre-mRNA processing factor 8</td>
<td align="left" valign="top">17p13.3</td>
<td align="left" valign="top">PRP8; RP13; HPRP8; PRPC8; SNRNP220</td>
<td align="left" valign="top">Pre-mRNA splicing occurs in 2 sequential transesterification steps. The protein encoded by this gene is a component of both U2-and U12-dependent spliceosomes, and found to be essential for the catalytic step II in pre-mRNA splicing process. It contains several WD repeats, which function in protein&#x2013;protein interactions. This protein has a sequence similarity to yeast Prp8 protein. This gene is a candidate gene for autosomal dominant retinitis pigmentosa (provided by RefSeq, July 2008).</td>
</tr>
<tr>
<td align="left" valign="top">GI_37540877-S</td>
<td align="left" valign="top">GLCE</td>
<td align="left" valign="top">Glucuronic acid epimerase</td>
<td align="left" valign="top">15q23</td>
<td align="left" valign="top">HSEPI</td>
<td align="left" valign="top">Enables calcium ion binding activity; heparosan-N-sulfate-glucuronate 5-epimerase activity; and protein homodimerization activity. Involved in heparan sulfate proteoglycan biosynthetic process. Predicted to be located in Golgi membrane. Predicted to be integral component of membrane. Predicted to be active in Golgi apparatus (provided by Alliance of Genome Resources, April 2022).</td>
</tr>
<tr>
<td align="left" valign="top">GI_28872783-A</td>
<td align="left" valign="top">CDK5RAP1</td>
<td align="left" valign="top">CDK5 regulatory subunit associated protein 1</td>
<td align="left" valign="top">20q11.21</td>
<td align="left" valign="top">C42; CGI-05; HSPC167; C20orf34</td>
<td align="left" valign="top">This gene encodes a regulator of cyclin-dependent kinase 5 activity. This protein has also been reported to modify RNA by adding a methylthio-group and may thus have a dual function as an RNA methylthiotransferase and as an inhibitor of cyclin-dependent kinase 5 activity. Alternative splicing results in multiple transcript variants that encode different isoforms (provided by RefSeq, May 2013).</td>
</tr>
<tr>
<td align="left" valign="top">GI_14589948-S</td>
<td align="left" valign="top">POLR2A</td>
<td align="left" valign="top">Polymerase (RNA) II (DNA directed) polypeptide A, 220&#x2009;kDa</td>
<td align="left" valign="top">17p13.1</td>
<td align="left" valign="top">RPB1; RPO2; POLR2; POLRA; RPBh1; RPOL2; NEDHIB; RpIILS; hsRPB1; hRPB220</td>
<td align="left" valign="top">This gene encodes the largest subunit of RNA polymerase II, the polymerase responsible for synthesizing messenger RNA in eukaryotes. The product of this gene contains a carboxy terminal domain composed of heptapeptide repeats that are essential for polymerase activity. These repeats contain serine and threonine residues that are phosphorylated in actively transcribing RNA polymerase. In addition, this subunit, in combination with several other polymerase subunits, forms the DNA binding domain of the polymerase, a groove in which the DNA template is transcribed into RNA (provided by RefSeq, July 2008).</td>
</tr>
<tr>
<td align="left" valign="top">GI_23503234-A</td>
<td align="left" valign="top">C1QDC1</td>
<td align="left" valign="top">caprin family member 2</td>
<td align="left" valign="top">12p11</td>
<td align="left" valign="top">EEG1; EEG-1; C1QDC1; RNG140</td>
<td align="left" valign="top">The protein encoded by this gene may regulate the transport of mRNA. It may play a role in the differentiation of erythroblasts. Multiple transcript variants encoding different isoforms have been found for this gene (provided by RefSeq, February 2016).</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec sec-type="discussion" id="sec10">
<label>4</label>
<title>Discussion</title>
<p>The simulation experiments provide a comprehensive evaluation across the three methods with 1,000 repetitions of large-scale trials comprised of 1,000 cases and 1,000 controls each and nearly &#x2018;ideal&#x2019; subtype biomarkers representing each subtype percentage. The results from these trials are stunning.</p>
<p>In general, FC performed extremely poorly. Even when 50% of the cases were associated with the subtype biomarker, the median log2FC value was only 0.608. The maximum across all 1,000 trials was 0.821. Consequently, all of the pseudo biomarkers would be discarded, despite their nearly perfect discrimination of a subtype comprised of half of the cases. Moreover, every one of the biomarkers would be discarded for all the other scenarios.</p>
<p>It&#x2019;s trickier to evaluate AUC, due to lack of a clear significance cutoff value. The literature points to 0.7 or 0.75 and our trials on AD gene expression provided a cutoff of 0.74 for <italic>p</italic>-value&#x2009;&#x2264;&#x2009;0.05. All the pseudo biomarkers had AUC values less than 0.70 across the 1,000 trials for subsets less than 40%. Furthermore, the median for the 40% subset trial was 0.695. Significance emerged as the subset size grew to 50%.</p>
<p>Being a newly introduced metric, there is no established significance cutoff for BCD. The AD gene expression data provided a cutoff value of 0.209 for p-value &#x2264;0.05. Using this proxy, <italic>every</italic> one of the 1,000 trials for subsets of 10% or more would be marked as significant. Even a few of the trials with subset size of 5% were above 0.209. These results document the power of the use of the distribution, rather than the magnitude, of data values to identify subtypes within a population.</p>
<p>As expected for the biological trials, the top six genes identified by FC and AUC show significant differences between the diseased cases and normal controls (<xref ref-type="fig" rid="fig4">Figures 4</xref>, <xref ref-type="fig" rid="fig5">5</xref>). While several of the top results exhibit some degree of bimodality, others tend towards differences across the majority of the samples. On the other hand, each top BCD result clearly delimitates a subgroup, without requiring aberrant levels for individuals that are not in the given subgroup (<xref ref-type="fig" rid="fig6">Figure 6</xref>).</p>
<p>A particularly interesting result is that the first round of BCD trials produced spurious associations for the top six values due to the imbalance of cases and controls samples from brain region 4. While this imbalance, coupled with differential expression across brain regions, created clear subsets, none of these genes were included in the top six genes for FC or AUC in the first round of trials.</p>
<p>The six most significant genes identified by BCD include four genes previously associated with AD and two novel genes. Genes that have known associations with AD include <italic>CHGB</italic> (<xref ref-type="bibr" rid="ref22">Lechner et al., 2004</xref>; <xref ref-type="bibr" rid="ref41">Quinn et al., 2020</xref>; <xref ref-type="bibr" rid="ref31">Marksteiner et al., 2002</xref>; <xref ref-type="bibr" rid="ref50">Willis et al., 2008</xref>), <italic>GLCE</italic> (<xref ref-type="bibr" rid="ref38">Perez-Lopez et al., 2021</xref>; <xref ref-type="bibr" rid="ref34">Ozsan McMillan et al., 2023</xref>; <xref ref-type="bibr" rid="ref23">Liachko et al., 2019</xref>; <xref ref-type="bibr" rid="ref43">Schultheis et al., 2021</xref>), <italic>CDK5RAP1</italic> (<xref ref-type="bibr" rid="ref9">Esteras et al., 2012</xref>) and the gene it regulates, <italic>CDK5</italic> (<xref ref-type="bibr" rid="ref24">Liu et al., 2016</xref>; <xref ref-type="bibr" rid="ref26">Maccioni et al., 2001</xref>; <xref ref-type="bibr" rid="ref45">Shukla et al., 2012</xref>; <xref ref-type="bibr" rid="ref48">Tsai et al., 2004</xref>; <xref ref-type="bibr" rid="ref4">Cruz and Tsai, 2004</xref>; <xref ref-type="bibr" rid="ref32">Monaco III, 2005</xref>; <xref ref-type="bibr" rid="ref28">Maitra and Vincent, 2022</xref>; <xref ref-type="bibr" rid="ref33">Nikhil et al., 2019</xref>; <xref ref-type="bibr" rid="ref21">Lau and Ahlijanian, 2003</xref>; <xref ref-type="bibr" rid="ref36">Pei et al., 1998</xref>), and <italic>POLR2A</italic> (<xref ref-type="bibr" rid="ref6">Dickson et al., 2021</xref>). Chromogranin B (CHGB) has been observed in about 60% of the amyloid-beta plaques in AD transgenic mice and these mice performed poorly in the Morris water maze task (<xref ref-type="bibr" rid="ref50">Willis et al., 2008</xref>). This protein has been proposed as a synaptic degeneration marker for AD (<xref ref-type="bibr" rid="ref31">Marksteiner et al., 2002</xref>). Glucuronic acid epimerase (GLCE) modifies heparan sulfate by converting the glucuronic acid to iduronic acid. This gene is downregulated in AD (<xref ref-type="bibr" rid="ref44">Sepulveda-Diaz et al., 2015</xref>; <xref ref-type="bibr" rid="ref19">Huynh et al., 2019</xref>) and suggested to contribute to the aberrant behavior of heparan sulfate in AD (<xref ref-type="bibr" rid="ref38">Perez-Lopez et al., 2021</xref>; <xref ref-type="bibr" rid="ref34">Ozsan McMillan et al., 2023</xref>). CDK5 regulatory subunit-associated protein 1 (CDK5RAP1) is involved in checkpoint and arrest in the cell cycle as it inhibits CDK5, a protein with strong implications for AD progression (<xref ref-type="bibr" rid="ref24">Liu et al., 2016</xref>; <xref ref-type="bibr" rid="ref26">Maccioni et al., 2001</xref>; <xref ref-type="bibr" rid="ref45">Shukla et al., 2012</xref>; <xref ref-type="bibr" rid="ref48">Tsai et al., 2004</xref>; <xref ref-type="bibr" rid="ref4">Cruz and Tsai, 2004</xref>; <xref ref-type="bibr" rid="ref32">Monaco III, 2005</xref>; <xref ref-type="bibr" rid="ref28">Maitra and Vincent, 2022</xref>; <xref ref-type="bibr" rid="ref33">Nikhil et al., 2019</xref>; <xref ref-type="bibr" rid="ref21">Lau and Ahlijanian, 2003</xref>; <xref ref-type="bibr" rid="ref36">Pei et al., 1998</xref>). Esteras et al. observed significant upregulation of CDK5RAP1 in AD transgenic mouse brain (1.98 fold change) and PBMCs (10.69 fold change) (<xref ref-type="bibr" rid="ref9">Esteras et al., 2012</xref>). The largest subunit of RNA polymerase II, POLR2A, also known as RPB1, has recently been linked to AD by Dickson et al. Using an AD transgenic mouse model, this group demonstrated the mislocalization of this protein from the nucleus to the cytoplasm in a tau-and age-dependent manner (<xref ref-type="bibr" rid="ref6">Dickson et al., 2021</xref>).</p>
<p><italic>PRPF8</italic> currently has no obvious connection to AD, however, it was one of 10 genes found to be associated with AD and Parkinson&#x2019;s disease in another study involving a different dataset (GSE4229) using an alternative tissue: peripheral blood (<xref ref-type="bibr" rid="ref10">Faruqui et al., 2021</xref>). Finally, <italic>C1QDC1</italic> is another novel gene without any clear association. <xref ref-type="table" rid="tab3">Table 3</xref> includes descriptions of the six genes.</p>
<p>FC identified one of the six BCD-significant genes, <italic>CHGB</italic>, with a <italic>p</italic>-value of 0.0079. The other five were missed by FC and all six were missed by AUC. In general, BCD is able to tease out novel genes missed by the other methods as FC and AUC shared 46.1% of their significant genes while only 3.7 and 4.6% of the BCD genes were identified by FC and AUC, respectively (<xref ref-type="supplementary-material" rid="SM2">Supplementary Tables S2</xref>&#x2013;<xref ref-type="supplementary-material" rid="SM4">S4</xref>).</p>
<p>It should be noted that BCD is not expected to identify global biomarkers. When nearly all of the cases are associated with the biomarker, e.g., pTau-181 associations with AD, a shift in the cases median and mean, not modality, is expected. Such biomarkers are better captured using FC or AUC as medians and means are ignored by BCD.</p>
<p>It should also be noted that the lower bound on sample size for BCD is limited by the ability to distinguish the Bimodality Coefficient for the distributions. Since this statistic is derived from the skewness and kurtosis of the data, an adequate sample size for each of these statistics is requisite. Furthermore, the Bimodality Coefficient includes the sample size in its formulation.</p>
<p>BCD enjoys the same favorable properties exhibited by AUC. No specific biomarker threshold or other parameters are utilized. The metric is simple and intuitive. Furthermore, examination of the corresponding histograms provides additional information beyond the scalar value. As a bonus, individuals representing the subtype are distinguished from those who are not associated.</p>
<p>At the same time, BCD does not suffer from AUC&#x2019;s drawbacks. AUC includes regions under the curve where analyte thresholds are not of practical interest and can be misleading when comparing two ROC curves that cross. Neither of these issues are of concern for BCD as the distributions of analyte levels, rather than TPRs and FPRs, dictate the computed values. Furthermore, high AUC value does not always correlate with the ability to identify a robust threshold for practical use of the biomarker. In contrast, high BCD value indicates strong bimodality, which corresponds to a natural inversion between the modes. The horizontal axis values delineate the corresponding threshold. Finally, analytes that are already known to be unimodal under normal conditions do not necessarily require any new controls data to be generated and ranks of the BC values for the cases across the whole set of tested analytes can be used to distinguish significance.</p>
<p>AD is a complex and heterogeneous disease and identification of subtypes is needed to advance precise treatments of each subtype group. We demonstrate here that popular statistics used for assessing biomarkers, FC and AUC, generally perform suboptimally when heterogeneity exists. We also provide a new metric, BCD, which appears to hold promise in this domain.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="sec11">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15222" ext-link-type="uri">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15222</ext-link>.</p>
</sec>
<sec sec-type="ethics-statement" id="sec12">
<title>Ethics statement</title>
<p>The study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants&#x2019; legal guardians/next of kin in accordance with the national legislation and the institutional requirements.</p>
</sec>
<sec sec-type="author-contributions" id="sec13">
<title>Author contributions</title>
<p>KS: Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing &#x2013; original draft, Writing &#x2013; review &#x0026; editing. SC: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Visualization, Writing &#x2013; original draft, Writing &#x2013; review &#x0026; editing.</p>
</sec>
<sec sec-type="funding-information" id="sec14">
<title>Funding</title>
<p>The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Alzheimer&#x2019;s Association (AARG-22-925002), National Institute on Aging (NIA) grants 1RF1AG053303-01 and 3RF1AG053303-01S2, and research grants from the University of Missouri &#x2013; St. Louis.</p>
</sec>
<ack>
<p>Special thanks to Jamie Lea for insightful conversations.</p>
</ack>
<sec sec-type="COI-statement" id="sec15">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="sec16">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="sec17">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fncom.2024.1388504/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fncom.2024.1388504/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.xlsx" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_2.xlsx" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_3.xlsx" id="SM3" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_4.xlsx" id="SM4" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_5.xlsx" id="SM5" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM6" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="other"><person-group person-group-type="author">
<collab id="coll1">Accelerating Medicines Partnership</collab>
</person-group> (<year>2022</year>). Program for Alzheimer&#x2019;s Disease (AMP&#x00AE; AD 2.0). National Institute on Aging. Available at: <ext-link xlink:href="https://www.nia.nih.gov/research/amp-ad-second-iteration" ext-link-type="uri">https://www.nia.nih.gov/research/amp-ad-second-iteration</ext-link>. (Accessed: 15th January 2022).</citation>
</ref>
<ref id="ref2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowers</surname> <given-names>A. J.</given-names></name> <name><surname>Sprott</surname> <given-names>R.</given-names></name> <name><surname>Taff</surname> <given-names>S. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Do we know who will drop out?: a review of the predictors of dropping out of high school: precision, sensitivity, and specificity</article-title>. <source>High Sch. J.</source> <volume>96</volume>, <fpage>77</fpage>&#x2013;<lpage>100</lpage>. doi: <pub-id pub-id-type="doi">10.1353/hsj.2013.0000</pub-id></citation>
</ref>
<ref id="ref3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowers</surname> <given-names>A. J.</given-names></name> <name><surname>Zhou</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Receiver operating characteristic (ROC) area under the curve (AUC): a diagnostic measure for evaluating the accuracy of predictors of education outcomes</article-title>. <source>J. Educ. Stud. Placed Risk</source> <volume>24</volume>, <fpage>20</fpage>&#x2013;<lpage>46</lpage>. doi: <pub-id pub-id-type="doi">10.1080/10824669.2018.1523734</pub-id></citation>
</ref>
<ref id="ref4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cruz</surname> <given-names>J. C.</given-names></name> <name><surname>Tsai</surname> <given-names>L. H.</given-names></name></person-group> (<year>2004</year>). <article-title>Cdk5 deregulation in the pathogenesis of Alzheimer&#x2019;s disease</article-title>. <source>Trends Mol. Med.</source> <volume>10</volume>, <fpage>452</fpage>&#x2013;<lpage>458</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.molmed.2004.07.001</pub-id>, PMID: <pub-id pub-id-type="pmid">15350898</pub-id></citation>
</ref>
<ref id="ref5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cucchiara</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Applied logistic regression</article-title>. <source>Technometrics</source> <volume>34</volume>, <fpage>358</fpage>&#x2013;<lpage>359</lpage>. doi: <pub-id pub-id-type="doi">10.1080/00401706.1992.10485291</pub-id></citation>
</ref>
<ref id="ref6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dickson</surname> <given-names>J. R.</given-names></name> <name><surname>Yoon</surname> <given-names>H.</given-names></name> <name><surname>Frosch</surname> <given-names>M. P.</given-names></name> <name><surname>Hyman</surname> <given-names>B. T.</given-names></name></person-group> (<year>2021</year>). <article-title>Cytoplasmic mislocalization of RNA polymerase II subunit RPB1 in Alzheimer disease is linked to pathologic tau</article-title>. <source>J. Neuropathol. Exp. Neurol.</source> <volume>80</volume>, <fpage>530</fpage>&#x2013;<lpage>540</lpage>. doi: <pub-id pub-id-type="doi">10.1093/jnen/nlab040</pub-id>, PMID: <pub-id pub-id-type="pmid">33990839</pub-id></citation>
</ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dubois</surname> <given-names>B.</given-names></name> <name><surname>von Arnim</surname> <given-names>C. A. F.</given-names></name> <name><surname>Burnie</surname> <given-names>N.</given-names></name> <name><surname>Bozeat</surname> <given-names>S.</given-names></name> <name><surname>Cummings</surname> <given-names>J.</given-names></name></person-group> (<year>2023</year>). <article-title>Biomarkers in Alzheimer&#x2019;s disease: role in early and differential diagnosis and recognition of atypical variants</article-title>. <source>Alzheimers Res Therapy</source> <volume>15</volume>:<fpage>175</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s13195-023-01314-6</pub-id>, PMID: <pub-id pub-id-type="pmid">37833762</pub-id></citation>
</ref>
<ref id="ref8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dudbridge</surname> <given-names>F.</given-names></name></person-group> (<year>2013</year>). <article-title>Power and predictive accuracy of polygenic risk scores</article-title>. <source>PLoS Genet.</source> <volume>9</volume>:<fpage>e1003348</fpage>. doi: <pub-id pub-id-type="doi">10.1371/annotation/b91ba224-10be-409d-93f4-7423d502cba0</pub-id>, PMID: <pub-id pub-id-type="pmid">23555274</pub-id></citation>
</ref>
<ref id="ref9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Esteras</surname> <given-names>N.</given-names></name> <name><surname>Bartolom&#x00E9;</surname> <given-names>F.</given-names></name> <name><surname>Alqu&#x00E9;zar</surname> <given-names>C.</given-names></name> <name><surname>Antequera</surname> <given-names>D.</given-names></name> <name><surname>Mu&#x00F1;oz</surname> <given-names>&#x00DA;.</given-names></name> <name><surname>Carro</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Altered cell cycle-related gene expression in brain and lymphocytes from a transgenic mouse model of Alzheimer&#x2019;s disease [amyloid precursor protein/presenilin 1 (PS1)]</article-title>. <source>Eur. J. Neurosci.</source> <volume>36</volume>, <fpage>2609</fpage>&#x2013;<lpage>2618</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1460-9568.2012.08178.x</pub-id>, PMID: <pub-id pub-id-type="pmid">22702220</pub-id></citation>
</ref>
<ref id="ref10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faruqui</surname> <given-names>N. A.</given-names></name> <name><surname>Prium</surname> <given-names>D. H.</given-names></name> <name><surname>Mowna</surname> <given-names>S. A.</given-names></name> <name><surname>Rahaman</surname> <given-names>T. I.</given-names></name> <name><surname>Dutta</surname> <given-names>A. R.</given-names></name> <name><surname>Akter</surname> <given-names>F.</given-names></name></person-group> (<year>2021</year>). <article-title>Identification of common molecular signatures shared between Alzheimer&#x2019;s and Parkinson&#x2019;s diseases and therapeutic agents exploration: an integrated genomics approach</article-title>. <source>bioRxiv</source>. doi: <pub-id pub-id-type="doi">10.1101/2020.12.31.424962</pub-id></citation>
</ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferreira</surname> <given-names>D.</given-names></name> <name><surname>Nordberg</surname> <given-names>A.</given-names></name> <name><surname>Westman</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>Biological subtypes of Alzheimer disease: a systematic review and meta-analysis</article-title>. <source>Neurology</source> <volume>94</volume>, <fpage>436</fpage>&#x2013;<lpage>448</lpage>. doi: <pub-id pub-id-type="doi">10.1212/WNL.0000000000009058</pub-id>, PMID: <pub-id pub-id-type="pmid">32047067</pub-id></citation>
</ref>
<ref id="ref12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fisher</surname> <given-names>R. A.</given-names></name></person-group> (<year>1935</year>). <article-title>The logic of inductive inference</article-title>. <source>J. R. Stat. Soc.</source> <volume>98</volume>:<fpage>39</fpage>. doi: <pub-id pub-id-type="doi">10.2307/2342435</pub-id></citation>
</ref>
<ref id="ref13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freeman</surname> <given-names>J. B.</given-names></name> <name><surname>Dale</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>Assessing bimodality to detect the presence of a dual cognitive process</article-title>. <source>Behav. Res. Methods</source> <volume>45</volume>, <fpage>83</fpage>&#x2013;<lpage>97</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13428-012-0225-x</pub-id>, PMID: <pub-id pub-id-type="pmid">22806703</pub-id></citation>
</ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hampel</surname> <given-names>H.</given-names></name> <name><surname>Vergallo</surname> <given-names>A.</given-names></name> <name><surname>Perry</surname> <given-names>G.</given-names></name> <name><surname>Lista</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>The Alzheimer precision medicine initiative</article-title>. <source>J. Alzheimers Dis.</source> <volume>68</volume>, <fpage>1</fpage>&#x2013;<lpage>24</lpage>. doi: <pub-id pub-id-type="doi">10.3233/JAD-181121</pub-id>, PMID: <pub-id pub-id-type="pmid">30814352</pub-id></citation>
</ref>
<ref id="ref15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanczar</surname> <given-names>B.</given-names></name> <name><surname>Hua</surname> <given-names>J.</given-names></name> <name><surname>Sima</surname> <given-names>C.</given-names></name> <name><surname>Weinstein</surname> <given-names>J.</given-names></name> <name><surname>Bittner</surname> <given-names>M.</given-names></name> <name><surname>Dougherty</surname> <given-names>E. R.</given-names></name></person-group> (<year>2010</year>). <article-title>Small-sample precision of ROC-related estimates</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>822</fpage>&#x2013;<lpage>830</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btq037</pub-id>, PMID: <pub-id pub-id-type="pmid">20130029</pub-id></citation>
</ref>
<ref id="ref16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hand</surname> <given-names>D. J.</given-names></name></person-group> (<year>2009</year>). <article-title>Measuring classifier performance: a coherent alternative to the area under the ROC curve</article-title>. <source>Mach. Learn.</source> <volume>77</volume>, <fpage>103</fpage>&#x2013;<lpage>123</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10994-009-5119-5</pub-id></citation>
</ref>
<ref id="ref17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanley</surname> <given-names>J. A.</given-names></name> <name><surname>McNeil</surname> <given-names>B. J.</given-names></name></person-group> (<year>1982</year>). <article-title>The meaning and use of the area under a receiver operating characteristic (ROC) curve</article-title>. <source>Radiology</source> <volume>143</volume>, <fpage>29</fpage>&#x2013;<lpage>36</lpage>. doi: <pub-id pub-id-type="doi">10.1148/radiology.143.1.7063747</pub-id>, PMID: <pub-id pub-id-type="pmid">7063747</pub-id></citation>
</ref>
<ref id="ref18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hartigan</surname> <given-names>J. A.</given-names></name> <name><surname>Hartigan</surname> <given-names>P. M.</given-names></name></person-group> (<year>1985</year>). <article-title>The dip test of Unimodality</article-title>. <source>Ann. Stat.</source> <volume>13</volume>, <fpage>74</fpage>&#x2013;<lpage>80</lpage>. doi: <pub-id pub-id-type="doi">10.1214/aos/1176346577</pub-id></citation>
</ref>
<ref id="ref19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huynh</surname> <given-names>M. B.</given-names></name> <name><surname>Ouidja</surname> <given-names>M. O.</given-names></name> <name><surname>Chantepie</surname> <given-names>S.</given-names></name> <name><surname>Carpentier</surname> <given-names>G.</given-names></name> <name><surname>Ma&#x00EF;za</surname> <given-names>A.</given-names></name> <name><surname>Zhang</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Glycosaminoglycans from Alzheimer&#x2019;s disease hippocampus have altered capacities to bind and regulate growth factors activities and to bind tau</article-title>. <source>PLoS One</source> <volume>14</volume>:<fpage>e0209573</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0209573</pub-id>, PMID: <pub-id pub-id-type="pmid">30608949</pub-id></citation>
</ref>
<ref id="ref20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karikari</surname> <given-names>T. K.</given-names></name> <name><surname>Pascoal</surname> <given-names>T. A.</given-names></name> <name><surname>Ashton</surname> <given-names>N. J.</given-names></name> <name><surname>Janelidze</surname> <given-names>S.</given-names></name> <name><surname>Benedet</surname> <given-names>A. L.</given-names></name> <name><surname>Rodriguez</surname> <given-names>J. L.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Blood phosphorylated tau 181 as a biomarker for Alzheimer&#x2019;s disease: a diagnostic performance and prediction modelling study using data from four prospective cohorts</article-title>. <source>Lancet Neurol.</source> <volume>19</volume>, <fpage>422</fpage>&#x2013;<lpage>433</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S1474-4422(20)30071-5</pub-id>, PMID: <pub-id pub-id-type="pmid">32333900</pub-id></citation>
</ref>
<ref id="ref21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lau</surname> <given-names>L. F.</given-names></name> <name><surname>Ahlijanian</surname> <given-names>M. K.</given-names></name></person-group> (<year>2003</year>). <article-title>Role of cdk5 in the pathogenesis of Alzheimer&#x2019;s disease</article-title>. <source>Neurosignals</source> <volume>12</volume>, <fpage>209</fpage>&#x2013;<lpage>214</lpage>. doi: <pub-id pub-id-type="doi">10.1159/000074622</pub-id>, PMID: <pub-id pub-id-type="pmid">14673207</pub-id></citation>
</ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lechner</surname> <given-names>T.</given-names></name> <name><surname>Adlassnig</surname> <given-names>C.</given-names></name> <name><surname>Humpel</surname> <given-names>C.</given-names></name> <name><surname>Kaufmann</surname> <given-names>W. A.</given-names></name> <name><surname>Maier</surname> <given-names>H.</given-names></name> <name><surname>Reinstadler-Kramer</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Chromogranin peptides in Alzheimer&#x2019;s disease</article-title>. <source>Exp. Gerontol.</source> <volume>39</volume>, <fpage>101</fpage>&#x2013;<lpage>113</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.exger.2003.09.018</pub-id>, PMID: <pub-id pub-id-type="pmid">14724070</pub-id></citation>
</ref>
<ref id="ref23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liachko</surname> <given-names>N. F.</given-names></name> <name><surname>Saxton</surname> <given-names>A. D.</given-names></name> <name><surname>McMillan</surname> <given-names>P. J.</given-names></name> <name><surname>Strovas</surname> <given-names>T. J.</given-names></name> <name><surname>Keene</surname> <given-names>C. D.</given-names></name> <name><surname>Bird</surname> <given-names>T. D.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Genome wide analysis reveals heparan sulfate epimerase modulates TDP-43 proteinopathy</article-title>. <source>PLoS Genet.</source> <volume>15</volume>:<fpage>e1008526</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.1008526</pub-id>, PMID: <pub-id pub-id-type="pmid">31834878</pub-id></citation>
</ref>
<ref id="ref24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>S. L.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Jiang</surname> <given-names>T.</given-names></name> <name><surname>Tan</surname> <given-names>L.</given-names></name> <name><surname>Xing</surname> <given-names>A.</given-names></name> <name><surname>Yu</surname> <given-names>J. T.</given-names></name></person-group> (<year>2016</year>). <article-title>The role of Cdk5 in Alzheimer&#x2019;s disease</article-title>. <source>Mol. Neurobiol.</source> <volume>53</volume>, <fpage>4328</fpage>&#x2013;<lpage>4342</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12035-015-9369-x</pub-id>, PMID: <pub-id pub-id-type="pmid">26227906</pub-id></citation>
</ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobo</surname> <given-names>J. M.</given-names></name> <name><surname>Jim&#x00E9;nez-valverde</surname> <given-names>A.</given-names></name> <name><surname>Real</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>AUC: a misleading measure of the performance of predictive distribution models</article-title>. <source>Glob. Ecol. Biogeogr.</source> <volume>17</volume>, <fpage>145</fpage>&#x2013;<lpage>151</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1466-8238.2007.00358.x</pub-id></citation>
</ref>
<ref id="ref26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maccioni</surname> <given-names>R. B.</given-names></name> <name><surname>Otth</surname> <given-names>C.</given-names></name> <name><surname>Concha</surname> <given-names>I. I.</given-names></name> <name><surname>Mu&#x00F1;oz</surname> <given-names>J. P.</given-names></name></person-group> (<year>2001</year>). <article-title>The protein kinase cdk5: structural aspects, roles in neurogenesis and involvement in Alzheimer&#x2019;s pathology</article-title>. <source>Eur. J. Biochem.</source> <volume>268</volume>, <fpage>1518</fpage>&#x2013;<lpage>1527</lpage>. doi: <pub-id pub-id-type="doi">10.1046/j.1432-1327.2001.02024.x</pub-id>, PMID: <pub-id pub-id-type="pmid">11248668</pub-id></citation>
</ref>
<ref id="ref27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Machado Reyes</surname> <given-names>D.</given-names></name> <name><surname>Chao</surname> <given-names>H.</given-names></name> <name><surname>Hahn</surname> <given-names>J.</given-names></name> <name><surname>Shen</surname> <given-names>L.</given-names></name> <name><surname>Yan</surname> <given-names>P.</given-names></name></person-group> (<year>2024</year>). <article-title>Identifying progression-specific Alzheimer&#x2019;s subtypes using multimodal transformer</article-title>. <source>J. Pers. Med.</source> <volume>14</volume>:<fpage>421</fpage>. doi: <pub-id pub-id-type="doi">10.3390/jpm14040421</pub-id>, PMID: <pub-id pub-id-type="pmid">38673048</pub-id></citation>
</ref>
<ref id="ref28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maitra</surname> <given-names>S.</given-names></name> <name><surname>Vincent</surname> <given-names>B.</given-names></name></person-group> (<year>2022</year>). <article-title>Cdk5-p25 as a key element linking amyloid and tau pathologies in Alzheimer&#x2019;s disease: mechanisms and possible therapeutic interventions</article-title>. <source>Life Sci.</source> <volume>308</volume>:<fpage>120986</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.lfs.2022.120986</pub-id>, PMID: <pub-id pub-id-type="pmid">36152679</pub-id></citation>
</ref>
<ref id="ref29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mandrekar</surname> <given-names>J. N.</given-names></name></person-group> (<year>2010</year>). <article-title>Receiver operating characteristic curve in diagnostic test assessment</article-title>. <source>J. Thorac. Oncol.</source> <volume>5</volume>, <fpage>1315</fpage>&#x2013;<lpage>1316</lpage>. doi: <pub-id pub-id-type="doi">10.1097/JTO.0b013e3181ec173d</pub-id>, PMID: <pub-id pub-id-type="pmid">20736804</pub-id></citation>
</ref>
<ref id="ref30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mariani</surname> <given-names>T. J.</given-names></name> <name><surname>Budhraja</surname> <given-names>V.</given-names></name> <name><surname>Mecham</surname> <given-names>B. H.</given-names></name> <name><surname>Gu</surname> <given-names>C. C.</given-names></name> <name><surname>Watson</surname> <given-names>M. A.</given-names></name> <name><surname>Sadovsky</surname> <given-names>Y.</given-names></name></person-group> (<year>2003</year>). <article-title>A variable fold change threshold determines significance for expression microarrays</article-title>. <source>FASEB J.</source> <volume>17</volume>, <fpage>321</fpage>&#x2013;<lpage>323</lpage>. doi: <pub-id pub-id-type="doi">10.1096/fj.02-0351fje</pub-id>, PMID: <pub-id pub-id-type="pmid">12475896</pub-id></citation>
</ref>
<ref id="ref31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marksteiner</surname> <given-names>J.</given-names></name> <name><surname>Kaufmann</surname> <given-names>W. A.</given-names></name> <name><surname>Gurka</surname> <given-names>P.</given-names></name> <name><surname>Humpel</surname> <given-names>C.</given-names></name></person-group> (<year>2002</year>). <article-title>Synaptic proteins in alzheimer&#x2019;s disease</article-title>. <source>J. Mol. Neurosci.</source> <volume>18</volume>, <fpage>53</fpage>&#x2013;<lpage>63</lpage>. doi: <pub-id pub-id-type="doi">10.1385/JMN:18:1-2:53</pub-id>, PMID: <pub-id pub-id-type="pmid">11931350</pub-id></citation>
</ref>
<ref id="ref32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Monaco</surname> <given-names>E.</given-names> <suffix>III</suffix></name></person-group> (<year>2005</year>). <article-title>Recent evidence regarding a role for Cdk5 dysregulation in Alzheimers disease</article-title>. <source>Curr. Alzheimer Res.</source> <volume>1</volume>, <fpage>33</fpage>&#x2013;<lpage>38</lpage>. doi: <pub-id pub-id-type="doi">10.2174/1567205043480519</pub-id>, PMID: <pub-id pub-id-type="pmid">15975083</pub-id></citation>
</ref>
<ref id="ref33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nikhil</surname> <given-names>K.</given-names></name> <name><surname>Viccaro</surname> <given-names>K.</given-names></name> <name><surname>Shah</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Multifaceted regulation of ALDH1A1 by Cdk5 in Alzheimer&#x2019;s disease pathogenesis</article-title>. <source>Mol. Neurobiol.</source> <volume>56</volume>, <fpage>1366</fpage>&#x2013;<lpage>1390</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12035-018-1114-9</pub-id>, PMID: <pub-id pub-id-type="pmid">29948941</pub-id></citation>
</ref>
<ref id="ref34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ozsan McMillan</surname> <given-names>I.</given-names></name> <name><surname>Li</surname> <given-names>J.-P.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name></person-group> (<year>2023</year>). <article-title>Heparan sulfate proteoglycan in Alzheimer&#x2019;s disease: aberrant expression and functions in molecular pathways related to amyloid-&#x03B2; metabolism</article-title>. <source>Am. J. Physiol. Physiol.</source> <volume>324</volume>, <fpage>C893</fpage>&#x2013;<lpage>C909</lpage>. doi: <pub-id pub-id-type="doi">10.1152/ajpcell.00247.2022</pub-id>, PMID: <pub-id pub-id-type="pmid">36878848</pub-id></citation>
</ref>
<ref id="ref35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pacholewska</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Loget&#x2019; - a uniform differential expression unit to replace &#x2018;logFC&#x2019; and &#x2018;log2FC</article-title>. <source>Matters</source>. doi: <pub-id pub-id-type="doi">10.19185/matters.201706000011</pub-id></citation>
</ref>
<ref id="ref36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pei</surname> <given-names>J. J.</given-names></name> <name><surname>Grundke-Iqbal</surname> <given-names>I.</given-names></name> <name><surname>Iqbal</surname> <given-names>K.</given-names></name> <name><surname>Bogdanovic</surname> <given-names>N.</given-names></name> <name><surname>Winblad</surname> <given-names>B.</given-names></name> <name><surname>Cowburn</surname> <given-names>R. F.</given-names></name></person-group> (<year>1998</year>). <article-title>Accumulation of cyclin-dependent kinase 5 (cdk5) in neurons with early stages of Alzheimer&#x2019;s disease neurofibrillary degeneration</article-title>. <source>Brain Res.</source> <volume>797</volume>, <fpage>267</fpage>&#x2013;<lpage>277</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0006-8993(98)00296-0</pub-id>, PMID: <pub-id pub-id-type="pmid">9666145</pub-id></citation>
</ref>
<ref id="ref37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pepe</surname> <given-names>M. S.</given-names></name></person-group> (<year>2000</year>). <article-title>Receiver operating characteristic methodology</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>95</volume>, <fpage>308</fpage>&#x2013;<lpage>311</lpage>. doi: <pub-id pub-id-type="doi">10.1080/01621459.2000.10473930</pub-id></citation>
</ref>
<ref id="ref38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perez-Lopez</surname> <given-names>N.</given-names></name> <name><surname>Mart&#x00ED;n</surname> <given-names>C.</given-names></name> <name><surname>Garc&#x00ED;a</surname> <given-names>B.</given-names></name> <name><surname>Sol&#x00ED;s-Hern&#x00E1;ndez</surname> <given-names>M. P.</given-names></name> <name><surname>Rodr&#x00ED;guez</surname> <given-names>D.</given-names></name> <name><surname>Alcalde</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Alterations in the expression of the genes responsible for the synthesis of Heparan sulfate in brains with Alzheimer disease</article-title>. <source>J. Neuropathol. Exp. Neurol.</source> <volume>80</volume>, <fpage>446</fpage>&#x2013;<lpage>456</lpage>. doi: <pub-id pub-id-type="doi">10.1093/jnen/nlab028</pub-id>, PMID: <pub-id pub-id-type="pmid">33779723</pub-id></citation>
</ref>
<ref id="ref39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pfister</surname> <given-names>R.</given-names></name> <name><surname>Schwarz</surname> <given-names>K. A.</given-names></name> <name><surname>Janczyk</surname> <given-names>M.</given-names></name> <name><surname>Dale</surname> <given-names>R.</given-names></name> <name><surname>Freeman</surname> <given-names>J. B.</given-names></name></person-group> (<year>2013</year>). <article-title>Good things peak in pairs: a note on the bimodality coefficient</article-title>. <source>Front. Psychol.</source> <volume>4</volume>:<fpage>700</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2013.00700</pub-id>, PMID: <pub-id pub-id-type="pmid">24109465</pub-id></citation>
</ref>
<ref id="ref40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pillai</surname> <given-names>J. A.</given-names></name> <name><surname>Bonner-Jackson</surname> <given-names>A.</given-names></name> <name><surname>Bekris</surname> <given-names>L. M.</given-names></name> <name><surname>Safar</surname> <given-names>J.</given-names></name> <name><surname>Bena</surname> <given-names>J.</given-names></name> <name><surname>Leverenz</surname> <given-names>J. B.</given-names></name></person-group> (<year>2019</year>). <article-title>Highly elevated cerebrospinal fluid Total tau level reflects higher likelihood of non-amnestic subtype of Alzheimer&#x2019;s disease</article-title>. <source>J. Alzheimers Dis.</source> <volume>70</volume>, <fpage>1051</fpage>&#x2013;<lpage>1058</lpage>. doi: <pub-id pub-id-type="doi">10.3233/JAD-190519</pub-id>, PMID: <pub-id pub-id-type="pmid">31306137</pub-id></citation>
</ref>
<ref id="ref41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quinn</surname> <given-names>J. P.</given-names></name> <name><surname>Kandigian</surname> <given-names>S. E.</given-names></name> <name><surname>Trombetta</surname> <given-names>B. A.</given-names></name> <name><surname>Arnold</surname> <given-names>S. E.</given-names></name> <name><surname>Carlyle</surname> <given-names>B. C.</given-names></name></person-group> (<year>2020</year>). <article-title>Characterizing chromogranin and secretogranin proteoforms in dementia pathophysiology</article-title>. <source>Alzheimers Dement.</source> <volume>16</volume>:<fpage>e044624</fpage>. doi: <pub-id pub-id-type="doi">10.1002/alz.044624</pub-id></citation>
</ref>
<ref id="ref42">
<citation citation-type="other"><person-group person-group-type="author">
<collab id="coll2">SAS Institute Inc</collab>
</person-group> (<year>1990</year>). SAS/STAT User&#x2019;s Guide (Version 6).</citation>
</ref>
<ref id="ref43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultheis</surname> <given-names>N.</given-names></name> <name><surname>Jiang</surname> <given-names>M.</given-names></name> <name><surname>Selleck</surname> <given-names>S. B.</given-names></name></person-group> (<year>2021</year>). <article-title>Putting the brakes on autophagy: the role of heparan sulfate modified proteins in the balance of anabolic and catabolic pathways and intracellular quality control</article-title>. <source>Matrix Biol.</source> <volume>100-101</volume>, <fpage>173</fpage>&#x2013;<lpage>181</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.matbio.2021.01.006</pub-id></citation>
</ref>
<ref id="ref44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sepulveda-Diaz</surname> <given-names>J. E.</given-names></name> <name><surname>Alavi Naini</surname> <given-names>S. M.</given-names></name> <name><surname>Huynh</surname> <given-names>M. B.</given-names></name> <name><surname>Ouidja</surname> <given-names>M. O.</given-names></name> <name><surname>Yanicostas</surname> <given-names>C.</given-names></name> <name><surname>Chantepie</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>HS3ST2 expression is critical for the abnormal phosphorylation of tau in Alzheimer&#x2019;s disease-related tau pathology</article-title>. <source>Brain</source> <volume>138</volume>, <fpage>1339</fpage>&#x2013;<lpage>1354</lpage>. doi: <pub-id pub-id-type="doi">10.1093/brain/awv056</pub-id>, PMID: <pub-id pub-id-type="pmid">25842390</pub-id></citation>
</ref>
<ref id="ref45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shukla</surname> <given-names>V.</given-names></name> <name><surname>Skuntz</surname> <given-names>S.</given-names></name> <name><surname>Pant</surname> <given-names>H. C.</given-names></name></person-group> (<year>2012</year>). <article-title>Deregulated Cdk5 activity is involved in inducing Alzheimer&#x2019;s disease</article-title>. <source>Arch. Med. Res.</source> <volume>43</volume>, <fpage>655</fpage>&#x2013;<lpage>662</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.arcmed.2012.10.015</pub-id>, PMID: <pub-id pub-id-type="pmid">23142263</pub-id></citation>
</ref>
<ref id="ref46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>S&#x00F8;</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research</article-title>. <source>J. Clin. Pathol.</source> <volume>62</volume>, <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi: <pub-id pub-id-type="doi">10.1136/jcp.2008.061010</pub-id></citation>
</ref>
<ref id="ref47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swets</surname> <given-names>J. A.</given-names></name></person-group> (<year>1988</year>). <article-title>Measuring the accuracy of diagnostic systems</article-title>. <source>Sci. Sci.</source> <volume>240</volume>, <fpage>1285</fpage>&#x2013;<lpage>1293</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.3287615</pub-id></citation>
</ref>
<ref id="ref48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsai</surname> <given-names>L. H.</given-names></name> <name><surname>Lee</surname> <given-names>M. S.</given-names></name> <name><surname>Cruz</surname> <given-names>J.</given-names></name></person-group> (<year>2004</year>). <article-title>Cdk5, a therapeutic target for Alzheimer&#x2019;s disease?</article-title> <source>Biochim Biophysica Acta Prot Prot</source> <volume>1697</volume>, <fpage>137</fpage>&#x2013;<lpage>142</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bbapap.2003.11.019</pub-id>, PMID: <pub-id pub-id-type="pmid">39148280</pub-id></citation>
</ref>
<ref id="ref49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webster</surname> <given-names>J. A.</given-names></name> <name><surname>Gibbs</surname> <given-names>J. R.</given-names></name> <name><surname>Clarke</surname> <given-names>J.</given-names></name> <name><surname>Ray</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Holmans</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Genetic control of human brain transcript expression in Alzheimer disease</article-title>. <source>Am. J. Hum. Genet.</source> <volume>84</volume>, <fpage>445</fpage>&#x2013;<lpage>458</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ajhg.2009.03.011</pub-id>, PMID: <pub-id pub-id-type="pmid">19361613</pub-id></citation>
</ref>
<ref id="ref50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Willis</surname> <given-names>M.</given-names></name> <name><surname>Prokesch</surname> <given-names>M.</given-names></name> <name><surname>Hutter-Paier</surname> <given-names>B.</given-names></name> <name><surname>Windisch</surname> <given-names>M.</given-names></name> <name><surname>Stridsberg</surname> <given-names>M.</given-names></name> <name><surname>Mahata</surname> <given-names>S. K.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Chromogranin B and secretogranin II in transgenic mice overexpressing human APP751 with the London (V717I) and Swedish (K670M/N671L) mutations and in Alzheimer patients</article-title>. <source>J. Alzheimers Dis.</source> <volume>13</volume>, <fpage>123</fpage>&#x2013;<lpage>135</lpage>. doi: <pub-id pub-id-type="doi">10.3233/JAD-2008-13202</pub-id>, PMID: <pub-id pub-id-type="pmid">18376054</pub-id></citation>
</ref>
<ref id="ref51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xia</surname> <given-names>J.</given-names></name> <name><surname>Broadhurst</surname> <given-names>D. I.</given-names></name> <name><surname>Wilson</surname> <given-names>M.</given-names></name> <name><surname>Wishart</surname> <given-names>D. S.</given-names></name></person-group> (<year>2013</year>). <article-title>Translational biomarker discovery in clinical metabolomics: an introductory tutorial</article-title>. <source>Metabolomics</source> <volume>9</volume>, <fpage>280</fpage>&#x2013;<lpage>299</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11306-012-0482-9</pub-id>, PMID: <pub-id pub-id-type="pmid">23543913</pub-id></citation>
</ref>
<ref id="ref52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Youden</surname> <given-names>W. J.</given-names></name></person-group> (<year>1950</year>). <article-title>Index for rating diagnostic tests</article-title>. <source>Cancer</source> <volume>3</volume>, <fpage>32</fpage>&#x2013;<lpage>35</lpage>. doi: <pub-id pub-id-type="doi">10.1002/1097-0142(1950)3:1&#x003C;32::AID-CNCR2820030106&#x003E;3.0.CO;2-3</pub-id>, PMID: <pub-id pub-id-type="pmid">15405679</pub-id></citation>
</ref>
<ref id="ref53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zweig</surname> <given-names>M. H.</given-names></name> <name><surname>Campbell</surname> <given-names>G.</given-names></name></person-group> (<year>1993</year>). <article-title>Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine</article-title>. <source>Clin. Chem.</source> <volume>39</volume>, <fpage>561</fpage>&#x2013;<lpage>577</lpage>. doi: <pub-id pub-id-type="doi">10.1093/clinchem/39.4.561</pub-id>, PMID: <pub-id pub-id-type="pmid">8472349</pub-id></citation>
</ref>
</ref-list>
</back>
</article>