<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mol. Biosci.</journal-id>
<journal-title>Frontiers in Molecular Biosciences</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mol. Biosci.</abbrev-journal-title>
<issn pub-type="epub">2296-889X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmolb.2015.00004</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Molecular Biosciences</subject>
<subj-group>
<subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A data preprocessing strategy for metabolomics to reduce the mask effect in data analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Yang</surname> <given-names>Jun</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/180783"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhao</surname> <given-names>Xinjie</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Lu</surname> <given-names>Xin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/205709"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Lin</surname> <given-names>Xiaohui</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Xu</surname> <given-names>Guowang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/142672"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences</institution> <country>Dalian, China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Entomology and Nematology, University of California, Davis</institution> <country>Davis, CA, USA</country></aff>
<aff id="aff3"><sup>3</sup><institution>School of Computer Science and Technology, Dalian University of Technology</institution> <country>Dalian, China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Manuel Portero-Otin, IRBLLEIDA-UdL, Spain</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Atsushi Fukushima, RIKEN, Japan; Hunter N. B. Moseley, University of Kentucky, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Guowang Xu, Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China e-mail: <email>xugw&#x00040;dicp.ac.cn</email>;</p></fn>
<fn fn-type="corresp" id="fn002"><p>Jun Yang, Department of Entomology and Nematology, University of California, One Shields Ave, Davis, CA 95616, USA e-mail: <email>junyang&#x00040;ucdavis.edu</email></p></fn>
<fn fn-type="other" id="fn003"><p>This article was submitted to Metabolomics, a section of the journal Frontiers in Molecular Biosciences.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>02</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>2</volume>
<elocation-id>4</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>10</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>01</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2015 Yang, Zhao, Lu, Lin and Xu.</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p><bold>Highlights</bold>
<list list-type="bullet">
<list-item><p>Developed a data preprocessing strategy to cope with missing values and mask effects in data analysis from high variation of abundant metabolites.</p></list-item>
<list-item><p>A new method- &#x02018;x-VAST&#x02019; was developed to amend the measurement deviation enlargement.</p></list-item>
<list-item><p>Applying the above strategy, several low abundant masked differential metabolites were rescued.</p></list-item>
</list></p>
<p>Metabolomics is a booming research field. Its success highly relies on the discovery of differential metabolites by comparing different data sets (for example, patients vs. controls). One of the challenges is that differences of the low abundant metabolites between groups are often masked by the high variation of abundant metabolites. In order to solve this challenge, a novel data preprocessing strategy consisting of three steps was proposed in this study. In step 1, a &#x02018;modified 80%&#x02019; rule was used to reduce effect of missing values; in step 2, unit-variance and Pareto scaling methods were used to reduce the mask effect from the abundant metabolites. In step 3, in order to fix the adverse effect of scaling, stability information of the variables deduced from intensity information and the class information, was used to assign suitable weights to the variables. When applying to an LC/MS based metabolomics dataset from chronic hepatitis B patients study and two simulated datasets, the mask effect was found to be partially eliminated and several new low abundant differential metabolites were rescued.</p>
</abstract>
<kwd-group>
<kwd>metabolomics</kwd>
<kwd>data preprocessing</kwd>
<kwd>pattern recognition</kwd>
<kwd>biomarkers</kwd>
<kwd>differential metabolites</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="3"/>
<equation-count count="7"/>
<ref-count count="38"/>
<page-count count="9"/>
<word-count count="5573"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>Introduction</title>
<p>Metabolomics has been successfully applied in many fields including clinical research (Brindle et al., <xref ref-type="bibr" rid="B3">2002</xref>; Yang et al., <xref ref-type="bibr" rid="B35">2004</xref>, <xref ref-type="bibr" rid="B36">2005</xref>; Abate-Shen and Shen, <xref ref-type="bibr" rid="B1">2009</xref>; Sreekumar et al., <xref ref-type="bibr" rid="B23">2009</xref>), drug discovery (Kell and Goodacre, <xref ref-type="bibr" rid="B10">2014</xref>), toxicology (Keun, <xref ref-type="bibr" rid="B11">2006</xref>; van Ravenzwaay et al., <xref ref-type="bibr" rid="B27">2014</xref>), and phytochemistry (Fiehn, <xref ref-type="bibr" rid="B7">2002</xref>; Mari et al., <xref ref-type="bibr" rid="B16">2013</xref>). With the quantitative measure of the dynamic metabolic response of living systems to pathophysiological stimuli or genetic modification (Nicholson et al., <xref ref-type="bibr" rid="B17">2002</xref>), the disease process and mechanism could be investigated in a synthesis induction way (Kell, <xref ref-type="bibr" rid="B9">2004</xref>). Among the analytical technologies used in metabolomics, NMR (Pelczer, <xref ref-type="bibr" rid="B18">2005</xref>; Wang et al., <xref ref-type="bibr" rid="B31">2005</xref>; Pinto et al., <xref ref-type="bibr" rid="B20">2014</xref>; Powers, <xref ref-type="bibr" rid="B21">2014</xref>; Wagner et al., <xref ref-type="bibr" rid="B30">2014</xref>; Worley and Powers, <xref ref-type="bibr" rid="B34">2014</xref>), chromatography and their hyphenated techniques (Keun et al., <xref ref-type="bibr" rid="B12">2003</xref>; Bijlsma et al., <xref ref-type="bibr" rid="B2">2006</xref>; Craig et al., <xref ref-type="bibr" rid="B4">2006</xref>; Dai et al., <xref ref-type="bibr" rid="B5">2014</xref>; Peterson et al., <xref ref-type="bibr" rid="B19">2014</xref>; Wachsmuth et al., <xref ref-type="bibr" rid="B29">2014</xref>; Zhao et al., <xref ref-type="bibr" rid="B38">2014</xref>) were the most popular.</p>
<p>In general, after samples are analyzed using various instruments, the data collected need be pre-processed including data alignment (Koh et al., <xref ref-type="bibr" rid="B13">2010</xref>), normalization (Sysi-Aho et al., <xref ref-type="bibr" rid="B24">2007</xref>) or internal standard correction, missing value correction, scaling and transformation (van den Berg et al., <xref ref-type="bibr" rid="B26">2006</xref>; Enot et al., <xref ref-type="bibr" rid="B6">2008</xref>; Veselkov et al., <xref ref-type="bibr" rid="B28">2011</xref>; Want and Masson, <xref ref-type="bibr" rid="B32">2011</xref>; Hrydziuszko and Viant, <xref ref-type="bibr" rid="B8">2012</xref>; Kohl et al., <xref ref-type="bibr" rid="B14">2012</xref>) before using various chemometrics methods (Trygg et al., <xref ref-type="bibr" rid="B25">2007</xref>). A general strategy of data (pre-) processing and validation for human metabolomics studies was given by Bijlsma et al. (<xref ref-type="bibr" rid="B2">2006</xref>). However, they didn&#x00027;t describe how the data preprocessing method affects the results and what data preprocessing methods are to be selected for a given study.</p>
<p>Craig et al. (<xref ref-type="bibr" rid="B4">2006</xref>) investigated the scaling and normalization effects in details, two traditional scaling methods [mean centering and unit variance (Uv)] were compared using NMR data sets. It was concluded that mean centering (Ctr) could result in a parsimonious model, and Uv favored systematic changes with small variance while it confounds the potential useful information embedded in peak height and peak multiplicities. In another word, Uv may diminish the mask effect of the abundant metabolites, which is a common problem in proteomics and metabolomics fields. Unfortunately, at the same time, the deviations from measurements are significantly magnified since the measurement deviations are often higher at low concentrations, which will confound the results.</p>
<p>To eliminate the adverse effects of Uv mentioned above, several methods were developed. Keun et al. (<xref ref-type="bibr" rid="B12">2003</xref>) proposed a strategy for incorporating prior information into the scaling procedure called variable stability (VAST) scaling, in which each variable is assigned a weight according to its stability. Another method is orthogonal signal correction (OSC) (Wold et al., <xref ref-type="bibr" rid="B33">1998</xref>). The OSC can extract the components with the maximum variance orthogonal to Y. This orthogonal model effectively filters obscuring variation in the data set. However, how many components should be retained appropriately becomes another challenge in the OSC procedure. Van den Berg et al. compared several different centering, scaling and transformations in a GC/MS data set and concluded that &#x0201C;the choice for a pretreatment method depends on the biological question to be answered&#x0201D; (van den Berg et al., <xref ref-type="bibr" rid="B26">2006</xref>).</p>
<p>In the current study, we have developed a novel data preprocessing strategy to cope with the missing values and eliminate mask effects in data analysis from high variation of abundant metabolites. It consists of the following three steps: missing value correction, scaling and x-VAST. In the missing value correction step, a &#x02018;modified 80% rule&#x02019; was proposed to cope with the missing value. In the scaling method, Pareto (User&#x00027;s Guide to SIMCA-P, 2005) was chosen to reduce the effect of the metabolite magnitude (i.e., eliminate the mask effect) without amplifying the measurement deviation too much. At last, a new method called as &#x02018;x-VAST&#x02019; was developed to amend the measurement deviation enlargement after the VAST information and class information were used. The contour plots, which give an intuitionist view, were employed to illustrate the effects of each step. In order to test the developed data preprocessing strategy, the dataset from a metabolomics study of chronic hepatitis B patients was tested. Several masked differential metabolites were rescued. In addition, two simulated datasets were used to test if the proposed strategy could be generalized. The result indicated that the developed preprocessing strategy could improve the analysis of multivariate dataset of metabolomics by removing missing values and reducing mask effect.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<sec>
<title>Plasma samples and high performance liquid chromatography-mass spectrometry (HPLC-MS) analysis</title>
<p>Thirty seven chronic hepatitis B patients hospitalized for acute deterioration in liver function and 50 healthy individuals were enrolled in this study. The detailed sample information and HPLC-MS analysis procedure were described in another paper (Yang et al., <xref ref-type="bibr" rid="B37">2006</xref>). After peak alignment, 7347 ions were generated in the final reference peak list. The data set was an 87 &#x000D7; 7347 matrix. After preprocessed by missing value correction, scaling and x-VAST, partial least squares discriminant analysis (PLS-DA) was used to discovery the differential metabolites.</p>
</sec>
<sec>
<title>Missing value correction</title>
<p>The data sets from the metabolic profiling analysis usually contain many zeros. They are considered as the missing value, which are artificial cutoffs from the peak alignment. The missing values could affect the correlation between variables, which would deteriorate the performance of multivariate analysis.</p>
<p>In order to reduce the number of zeros present, Smilde et al. applied a procedure referred as the &#x02018;80% rule&#x02019; (Smilde et al., <xref ref-type="bibr" rid="B22">2005</xref>). A variable will be kept if it has a non-zero value for at least 80% of all samples. One shortcoming is that some perfect differential metabolites might be lost according to the &#x02018;80% rule&#x02019; when their concentrations were below the detect limitation in one specific class. In this work, the class information was utilized as the supervisor, the &#x02018;80% rule&#x02019; was modified to a &#x02018;variable is kept if the variable has a non-zero value for at least 80% in the samples of any one class&#x02019;. In this paper, this new rule was called as &#x02018;modified 80% rule&#x02019;.</p>
</sec>
<sec>
<title>Scaling methods</title>
<p>In the scaling section, Ctr, Uv, Pareto and logarithm (<italic>ln</italic>) transformation were compared in diminishing the mask effects and finding the differential metabolites more efficiently. To avoid the confusion, we adopt the following definitions as in the SIMCA-P manual (User&#x00027;s Guide to SIMCA-P, 2005).</p>
<p><italic>Mean centering (Ctr):</italic>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula></p>
<p>Where <italic>x</italic>&#x02032;<sub><italic>ik</italic></sub> is the value after scaling, <italic>x</italic><sub><italic>ik</italic></sub> is the original value; <overline><italic>x</italic></overline><sub><italic>k</italic></sub> is the mean of the variable <italic>k</italic>.</p>
<p><italic>Uv:</italic>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula></p>
<p>Where <italic>s</italic><sub><italic>k</italic></sub> is the standard deviation of the variable <italic>k</italic>.</p>
<p><italic>Pareto:</italic>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<italic>ln transformation:</italic>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext>ln&#x02009;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula></p>
<p>Here, we propose a new supervised scaling method based on VAST method, which is referred as &#x02018;x-VAST&#x02019;. And VAST, supervised VAST methods (Keun et al., <xref ref-type="bibr" rid="B12">2003</xref>) are employed for comparison.</p>
<p><italic>x-VAST:</italic>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mtext>max</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mn>1</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mn>2</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mn>3</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mn>3</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>&#x02026;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>&#x02026;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02022;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula></p>
<p>Here, <overline><italic>x</italic></overline><sub><italic>jk</italic></sub> and <italic>s</italic><sub><italic>jk</italic></sub> are the mean and standard deviation of the variable <italic>k</italic> for the <italic>j</italic>th class, respectively, and n is the total number of classes.</p>
<p><italic>VAST:</italic>
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo>&#x02022;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula>
supervised VAST (s-VAST):
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:msub><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02022;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula></p>
<p>The preprocessing methods mentioned above were all realized in self-developed scripts written in MATLAB software (Mathworks, Natick, MA).</p>
</sec>
<sec>
<title>Contour plot and PLS-DA</title>
<p>The contour plot was employed to visualize the data. In the plot, x-coordinate is corresponding to the variables, y-coordinate is corresponding to the samples. The plot is straightforward to show difference among the effect from different data preprocessing methods.</p>
<p>To compare the final classification results and find the differential metabolites, PLS-DA in SIMCA-P software (Umetrics, Sweden) was employed.</p>
</sec>
<sec>
<title>Validation with simulated dataset</title>
<p>In order to test if the proposed method could be generic, two datasets [one includes 140 variables, another includes 1400 variables; both includes two class of samples (<italic>n</italic> &#x0003D; 20 in each class)] were generated to validate it.</p>
<p>The smaller dataset (variable number is 140) including 50 high abundant random variables (HNM variables), 50 low abundant random variables (LNM variables), 10 high abundant and big change variables with 10 times difference on average (HGM variables), 10 high abundant and medium change variables with three times difference on average (HMM variables), 10 low abundant and big change variables with 10 times difference on average (LGM variables), 10 low abundant and medium change variables with three times difference on average (LMM variables). The bigger dataset includes similar setup but has 10 times more variables. The detail codes for generating the simulated datasets are included in the Supplementary File for information. In brief, random normal distribution function was used to generate each group variables with different abundance and variations as shown in the code.</p>
</sec>
</sec>
<sec>
<title>Results and discussion</title>
<sec>
<title>Missing value correction</title>
<p>As mentioned above, the &#x02018;80% rule&#x02019; is often followed when missing values are present in the data set. Figures <xref ref-type="fig" rid="F1">1A,B</xref> shows the contour plots of the raw data and the data corrected according to the &#x02018;80% rule&#x02019;. After corrected, the variable number was reduced dramatically, most of them were deleted and only 169 were reserved. As illuminated in the following section, in this step some useful differential metabolites were also deleted. As an example, Figure <xref ref-type="fig" rid="F2">2</xref> shows the non-zero ratio of the first 15 variables of the raw data in each class sample (control and hepatitis). From the figure, the variables can be divided into three types:</p>
<list list-type="order">
<list-item><p>Type 1, which values in most of the samples in each class is zero such as var_1, var_2, and var_3, it indicates that these variables have a very low concentration, and present method can&#x00027;t correctly measure them and should be deleted.</p></list-item>
<list-item><p>Type 2, which values in most of the samples are zero in one class or several classes, but in the samples of the remaining at least one class most of them are non-zero, such as var_5. These variables are perfect biomarkers which can accurately differentiate different groups. The variables of this type should be reserved instead of being deleted.</p></list-item>
<list-item><p>Type 3, which values in most of the samples in each class are non-zero such as var_8, var_11, var_12, and var_14, it indicates that the value of this type variation could be measured and should be reserved.</p></list-item>
</list>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Two dimensional contour plots based on (A) the raw data, (B) the data excluding missing values according to 80% criteria, and (C) modified 80% criteria.</bold> The horizontal coordinate is corresponding to the variable No. The longitudinal coordinate is corresponding to the sample No. And the color is corresponding to the responses of the variables. To be convenient, the variables in original data were named as var &#x0002B;&#x0201C;_&#x0201D;&#x0002B; number like var_1, the variables in <bold>Panel C</bold> (i.e., the raw data were corrected by modified 80% rule) were expressed as VAR &#x0002B;&#x0201C;_&#x0201D;&#x0002B;number, such as VAR_1.</p></caption>
<graphic xlink:href="fmolb-02-00004-g0001.tif"/>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Non-zero ratios in the control and hepatitis groups of the first 15 ions</bold>.</p></caption>
<graphic xlink:href="fmolb-02-00004-g0002.tif"/>
</fig>
<p>In current study, a &#x02018;modified 80% rule&#x02019;, is suggested: the variables which non-zero values in any class of the samples are above 80% should be reserved. According to this rule, the type 2 variables defined above will be rescued. Figure <xref ref-type="fig" rid="F1">1C</xref> gives the contour plot processed according to the &#x02018;modified 80% rule&#x02019;. Compared to 80% rule, many type 2 variables were rescued (170 out of 339 are new). As an example, it can be found that VAR_165 is present according to the &#x02018;modified 80% rule&#x02019; but absent according to the &#x02018;80% rule&#x02019; (see arrow position in Figure <xref ref-type="fig" rid="F1">1C</xref>). The significant difference is found when <italic>t</italic>-test is applied to this variable. It could be concluded that the &#x02018;modified 80% rule&#x02019; saves more differential metabolites (around two times more).</p>
</sec>
<sec>
<title>Mask effects and various scaling methods</title>
<p>When the average responses of the 7347 ions were compared, the dynamic range (minimum to maximum ratio) of these ions is 3.22 &#x000D7; 10<sup>&#x02212;5</sup>. It resulted in the fact that the variable with high responses would be endowed with a bigger weight and their variations have dominant impacts on the result if no scaling methods were employed. The minor peaks will be masked by the major ones or noise although their biology meaning may be of importance.</p>
<p>The mask effect could be eliminated, at least partly reduced if the variables were divided by their deviations, i.e., scaling according to Uv. Each new variable would have identical weight for the identical variance i.e., Uv. The height information was discarded while only the deviation information was reserved. It seems that Uv is an ideal scaling method to eliminate the mask effects and perfectly suit for metabolomics application to differential metabolite discovery if all variables could be accurately measured and the deviation from measurement could be ignored. Unfortunately, it is not always true especially when the metabolite responses are near the detection limit. The measurement deviation would account for the major part in the deviation information when the peaks were just above the detection limit. In other words, Uv scaling method magnifies the measurement variations for the low abundance metabolites. In this situation, the peak response information still gives some information about how much probability the deviation from measurement should be considered. In another word, the peak information should be reserved to some extent.</p>
<p>Pareto and <italic>ln</italic> transformation could satisfy the requirement. Figure <xref ref-type="fig" rid="F3">3</xref> shows the contour plots scaled by the Pareto or <italic>ln</italic> transformation. Compared with the raw data without scaling (Figure <xref ref-type="fig" rid="F1">1C</xref>), it could be found that the response information was reserved too little to discover the differential metabolites after the <italic>ln</italic> transformation (Figure <xref ref-type="fig" rid="F3">3B</xref>), the Pareto scaling seems a good compromise between diminishing mask effects and avoiding magnifying the measurement deviation of low concentration metabolites (Figure <xref ref-type="fig" rid="F3">3A</xref>).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Comparison of the different preprocessing methods. (A)</bold> preprocessed with Pareto; <bold>(B)</bold> preprocessed with <italic>ln</italic> transformation. X-axis is variable number and y-axis is sample number.</p></caption>
<graphic xlink:href="fmolb-02-00004-g0003.tif"/>
</fig>
</sec>
<sec>
<title>x-VAST</title>
<p>To solve the dilemma mentioned above, many algorithms were developed. Keun et al. (<xref ref-type="bibr" rid="B12">2003</xref>) thought the VAST will improve the analysis of any multivariate dataset where group differences were significantly obscured by other variation. Here, x-VAST was developed to amend the adverse effect mentioned above after scaling. As comparison, the VAST and s-VAST were also employed to utilize the VAST to adjust the variables&#x00027; weights. In general, the variables, which variation was mainly from measurement deviation or from the individual variation, have lower stability (smaller <overline><italic>x</italic></overline>/<italic>s</italic> value). It could be expected that the combination of the VAST and scaling methods mentioned above could diminish the mask effects with fewer side effect.</p>
<p>Comparison of the various VAST scaling methods is shown in Figure <xref ref-type="fig" rid="F4">4</xref>. It could be found that (i) the noise was eliminated and the stability of variables was enhanced after scaled by all of the VAST methods; (ii) the variables (e.g., VAR_60, VAR_106, the red arrows) which have distinct different values in the two classes, got a larger weights after scaled by x-VAST, while the difference of these variables was not found by the VAST and s-VAST. Confirmed by the following PLS-DA result, these two variables had prominent contribution to the classification.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Comparison of the different VAST scaling methods. (A)</bold> VAST; <bold>(B)</bold> s-VAST; <bold>(C)</bold> x-VAST. Red arrow indicates the specially enhanced variables processed after x-VAST&#x00027; method. X-axis is variable number and y-axis is sample number.</p></caption>
<graphic xlink:href="fmolb-02-00004-g0004.tif"/>
</fig>
<p>It could be concluded that the variables, which have stable values in one class while unstable values near detection limit in another class, would be assigned to a smaller weights in VAST and s-VAST. In fact, these variables are the most useful biomarkers, they should be assigned to the maximum weights, which was the case in x-VAST.</p>
</sec>
<sec>
<title>PLS-DA analyses</title>
<p>PLS-DA was employed as another way to assess the data preprocessing strategy mentioned above. The data scaled by 11 scaling methods were fed to PLS-DA, respectively. The results were given in the Supplementary Materials (Table <xref ref-type="supplementary-material" rid="SM1">S1</xref>, Figure <xref ref-type="supplementary-material" rid="SM1">S1</xref>). Here, only the score and loading plots scaled by Pareto-Ctr and Pareto-x-VAST-Ctr are given in Figure <xref ref-type="fig" rid="F5">5</xref>. After scaled by x-VAST, A group of variables were recognized as highly important metabolites (e.g., VAR_267, VAR_248, VAR_297, VAR_36, VAR_40) became more important, while other variables (e.g., VAR_333) became less important.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>PLS-DA results scaled by Pareto-Ctr and Pareto-x-VAST-Ctr. (A)</bold> Pareto-Ctr; <bold>(B)</bold> Pareto-x-Vast-Ctr. Left, score figure. &#x025A1; hepatitis, &#x025A0; control. Right, loading figure.</p></caption>
<graphic xlink:href="fmolb-02-00004-g0005.tif"/>
</fig>
<p>The comparison of new differential metabolites and old ones (the first 20 differential metabolites) was given in Table <xref ref-type="table" rid="T1">1</xref>, five new differential metabolites (var_359, var_369, var_3703, var_3705, var_3866) were identified instead of five old differential metabolites (var_644, var_686, var_741, var_4178, var_6461). In these deleted old differential metabolites, four of them (var_644, var_686, var_741, var_4178) were found having too many missing values. The last one, var_6461, which is corresponding to VAR_333 in Figure <xref ref-type="fig" rid="F1">1C</xref>, failed in the <italic>t</italic>-test.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Using the developed data preprocessing strategy, several differential metabolites were rediscovered</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" colspan="4"><bold>Before preprocessed</bold></th>
<th align="center" colspan="4"><bold>After preprocessed</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><bold>var__ID</bold></td>
<td align="center"><bold>retention time (min)</bold></td>
<td align="center"><bold>m/z</bold></td>
<td align="left"><bold>Identification result</bold></td>
<td align="left"><bold>var_ID</bold></td>
<td align="center"><bold>retention time (min)</bold></td>
<td align="center"><bold>m/z</bold></td>
<td align="left"><bold>Identification result</bold></td>
</tr>
<tr>
<td align="left">var_5229</td>
<td align="center">17.22</td>
<td align="center">524.5</td>
<td align="left">LPC C18:0</td>
<td align="left">var_4177</td>
<td align="center">15.33</td>
<td align="center">496.5</td>
<td align="left">LPC C16:0</td>
</tr>
<tr>
<td align="left">var_4177</td>
<td align="center">15.33</td>
<td align="center">496.5</td>
<td align="left">LPC C16:0</td>
<td align="left">var_3850</td>
<td align="center">14.75</td>
<td align="center">520.5</td>
<td align="left">LPC C18:2</td>
</tr>
<tr>
<td align="left">var_4167</td>
<td align="center">15.25</td>
<td align="center">478.2</td>
<td align="left">LPC C16:0 Fragment</td>
<td align="left">var_5229</td>
<td align="center">17.22</td>
<td align="center">524.5</td>
<td align="left">LPC C18:0</td>
</tr>
<tr>
<td align="left">var_5226</td>
<td align="center">17.21</td>
<td align="center">506.6</td>
<td align="left">LPC C18:0 Fragment</td>
<td align="left">var_3849</td>
<td align="center">14.75</td>
<td align="center">502.5</td>
<td align="left">LPC C18:2 fragment</td>
</tr>
<tr>
<td align="left">var_3850</td>
<td align="center">14.75</td>
<td align="center">520.5</td>
<td align="left">LPC C18:2</td>
<td align="left">var_4167</td>
<td align="center">15.25</td>
<td align="center">478.2</td>
<td align="left">LPC C16:0 fragment</td>
</tr>
<tr>
<td align="left"><bold>var_686 <xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></bold></td>
<td align="center">7.78</td>
<td align="center">235.2</td>
<td align="left">UN<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td align="left">var_4417</td>
<td align="center">15.84</td>
<td align="center">504.4</td>
<td align="left">LPC C18:1 fragment</td>
</tr>
<tr>
<td align="left"><bold>var_644 <xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></bold></td>
<td align="center">7.55</td>
<td align="center">235.2</td>
<td align="left">UN<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td align="left">var_6169</td>
<td align="center">19.81</td>
<td align="center">282.4</td>
<td/>
</tr>
<tr>
<td align="left">var_4422</td>
<td align="center">15.85</td>
<td align="center">522.4</td>
<td align="left">LPC C18:1</td>
<td align="left">var_5226</td>
<td align="center">17.21</td>
<td align="center">506.6</td>
<td align="left">LPC C18:0 fragment</td>
</tr>
<tr>
<td align="left">var_3849</td>
<td align="center">14.75</td>
<td align="center">502.5</td>
<td align="left">LPC C18:2 Fragment</td>
<td align="left">var_4422</td>
<td align="center">15.85</td>
<td align="center">522.4</td>
<td align="left">LPC C18:1</td>
</tr>
<tr>
<td align="left">var_2266</td>
<td align="center">12.34</td>
<td align="center">414.2</td>
<td align="left">GCDCA or GDCA Fragment</td>
<td align="left">var_6014</td>
<td align="center">19.49</td>
<td align="center">256.4</td>
<td align="left">UN</td>
</tr>
<tr>
<td align="left">var_6014</td>
<td align="center">19.49</td>
<td align="center">256.4</td>
<td align="left">UN<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td align="left">var_4022</td>
<td align="center">15.05</td>
<td align="center">478.4</td>
<td align="left">LPC C16:0 fragment</td>
</tr>
<tr>
<td align="left">var_6169</td>
<td align="center">19.81</td>
<td align="center">282.4</td>
<td align="left">UN<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td align="left"><bold>var_369 <xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></bold></td>
<td align="center">5.87</td>
<td align="center">188.2</td>
<td align="left">Trp fragment</td>
</tr>
<tr>
<td align="left"><bold>var_6461</bold></td>
<td align="center">21.06</td>
<td align="center">284.3</td>
<td align="left">UN<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td align="left"><bold>var_3705 <xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></bold></td>
<td align="center">14.52</td>
<td align="center">520.3</td>
<td align="left">LPC C18:2</td>
</tr>
<tr>
<td align="left">var_4417</td>
<td align="center">15.84</td>
<td align="center">504.4</td>
<td align="left">LPC C18:1 Fragment</td>
<td align="left">var_4021</td>
<td align="center">15.05</td>
<td align="center">184.2</td>
<td align="left">Phosphatidylcholine moiety of LPC C16:0</td>
</tr>
<tr>
<td align="left">var_4022</td>
<td align="center">15.05</td>
<td align="center">478.4</td>
<td align="left">Fragment of LPC C16:0</td>
<td align="left">var_2266</td>
<td align="center">12.34</td>
<td align="center">414.2</td>
<td align="left">GCDCA or GDCA Fragment</td>
</tr>
<tr>
<td align="left">var_4024</td>
<td align="center">15.05</td>
<td align="center">496.1</td>
<td align="left">LPC C16:0</td>
<td align="left">var_4024</td>
<td align="center">15.05</td>
<td align="center">496.1</td>
<td align="left">LPC C16:0</td>
</tr>
<tr>
<td align="left">var_4021</td>
<td align="center">15.05</td>
<td align="center">184.2</td>
<td align="left">Phosphatidylcholine moiety of LPC C16:0</td>
<td align="left"><bold>var_359 <xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></bold></td>
<td align="center">5.86</td>
<td align="center">146.1</td>
<td align="left">Trp fragment</td>
</tr>
<tr>
<td align="left">var_5104</td>
<td align="center">16.89</td>
<td align="center">524.4</td>
<td align="left">LPC C18:0</td>
<td align="left">var_5104</td>
<td align="center">16.89</td>
<td align="center">524.4</td>
<td align="left">LPC C18:0</td>
</tr>
<tr>
<td align="left"><bold>va _4178 <xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></bold></td>
<td align="center">15.34</td>
<td align="center">479.3</td>
<td align="left">Isotope of 478.4</td>
<td align="left"><bold>var_3866 <xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></bold></td>
<td align="center">14.8</td>
<td align="center">544.3</td>
<td align="left">LPC C18:3</td>
</tr>
<tr>
<td align="left"><bold>var_741 <xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></bold></td>
<td align="center">7.99</td>
<td align="center">235.3</td>
<td align="left">UN<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></td>
<td align="left"><bold>var_3703 <xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></bold></td>
<td align="center">14.52</td>
<td align="center">502.3</td>
<td align="left">LPC C18:2 fragment</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The following table compared the differential metabolites defined by PLS-DA before and after using developed preprocessing strategy</italic>.</p>
<p><italic>The variables in bold font highlighted the different markers before and after the preprocessing strategy used</italic>.</p>
<fn id="TN1a"><label>a</label><p><italic>Deleted differential metabolites after preprocessed</italic>.</p></fn>
<fn id="TN1b"><label>b</label><p><italic>Newly found differential metabolites after preprocessed</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>In the newly found differential metabolites list, two of them (var_369 and var_359) were tryptophan fragments according to authentic standard sample run under the same conditions. Tryptophan is an essential amino acid, a constituent of proteins. In addition, tryptophan is also a substrate for two important biosynthetic pathways: tryptophan 5-hydroxylase pathway to generate neurotransmitter 5-hydroxytryptamine (serotonin); and the formation of kynurenine derivatives and nicotinamide adenine dinucleotides. In addition, it was reported that tryptophan catabolites are prognostic biomarkers for the severity of chronic liver diseases in potential transplant recipients (Lahdou et al., <xref ref-type="bibr" rid="B15">2011</xref>).</p>
<p>The other three (var_3703, var_3705, var_3866) were identified as lysophosphatidylcholines (LPCs). LPCs regulate many biological processes including cell proliferation, inflammation and tumor cell invasiveness. LPCs promotes inflammatory by expressing endothelial cell adhesion molecules and growth factors, monocyte chemotaxis, and activating macrophage.</p>
</sec>
<sec>
<title>Validation of x-VAST with simulated datasets</title>
<p>In order to validate the proposed method, two simulated datasets were generated as method section described. The datasets were fed to SIMCA-P for the followed multivariate data analyses. The VIP (Yang et al., <xref ref-type="bibr" rid="B37">2006</xref>) order was chose to reflect how these variables ranked as potential markers. Tables <xref ref-type="table" rid="T2">2</xref>, <xref ref-type="table" rid="T3">3</xref> showed the comparison of markers identified by PLS-DA using the original datasets, the dataset with VAST and x-VAST treated.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Rank of markers by PLS-DA using small simulated dataset (140 variables) preprocessed by none, VAST and x-VAST</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"><bold>Variable groups</bold></th>
<th align="center"><bold>Rank 1&#x02013;10</bold></th>
<th align="center"><bold>Rank 11&#x02013;20</bold></th>
<th align="center"><bold>Rank 21&#x02013;30</bold></th>
<th align="center"><bold>Rank 31&#x02013;40</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">HG004D (var101-110)</td>
<td align="center">7</td>
<td align="center">2</td>
<td align="center">1</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">HMM (var111-120)</td>
<td align="center">3</td>
<td align="center">6</td>
<td align="center">1</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">LGM (var121-130)</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">7</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">LMM (var131-140)</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">4</td>
</tr>
<tr>
<td align="left" colspan="5"><bold>PREPROCESSED BY VAST</bold></td>
</tr>
<tr>
<td align="left">HGM (var101-110)</td>
<td align="center">7</td>
<td align="center">2</td>
<td align="center">1</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">HMM (var111-120)</td>
<td align="center">3</td>
<td align="center">5</td>
<td align="center">0</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">LGM (var121-130)</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">4</td>
</tr>
<tr>
<td align="left">LMM (var131-140)</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">4</td>
</tr>
<tr>
<td align="left" colspan="5"><bold>PREPROCESSED BY x-VAST</bold></td>
</tr>
<tr>
<td align="left">HGM (var101-110)</td>
<td align="center">7</td>
<td align="center">2</td>
<td align="center">1</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">HMM (var111-120)</td>
<td align="center">3</td>
<td align="center">6</td>
<td align="center">1</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">LGM (var121-130)</td>
<td align="center">0</td>
<td align="center">2</td>
<td align="center">6</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">LMM (var131-140)</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Rank of markers by PLS-DA using big simulated dataset (1400 variables) preprocessed by none, VAST and x-VAST</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"><bold>Variable groups</bold></th>
<th align="center"><bold>Rank 1&#x02013;100</bold></th>
<th align="center"><bold>Rank 101&#x02013;200</bold></th>
<th align="center"><bold>Rank 201&#x02013;300</bold></th>
<th align="center"><bold>Rank 301&#x02013;400</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">HGM (var1001-1100)</td>
<td align="center">67</td>
<td align="center">30</td>
<td align="center">1</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">HMM (var1101-1200)</td>
<td align="center">22</td>
<td align="center">33</td>
<td align="center">33</td>
<td align="center">7</td>
</tr>
<tr>
<td align="left">LGM (var1201-1300)</td>
<td align="center">11</td>
<td align="center">37</td>
<td align="center">39</td>
<td align="center">9</td>
</tr>
<tr>
<td align="left">LMM (var1301-1400)</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">27</td>
<td align="center">54</td>
</tr>
<tr>
<td align="left" colspan="5"><bold>PREPROCESSED BY VAST</bold></td>
</tr>
<tr>
<td align="left">HGM (var1001-1100)</td>
<td align="center">61</td>
<td align="center">28</td>
<td align="center">1</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">HMM (var1101-1200)</td>
<td align="center">18</td>
<td align="center">33</td>
<td align="center">33</td>
<td align="center">7</td>
</tr>
<tr>
<td align="left">LGM (var1201-1300)</td>
<td align="center">9</td>
<td align="center">38</td>
<td align="center">38</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">LMM (var1301-1400)</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">25</td>
<td align="center">48</td>
</tr>
<tr>
<td align="left" colspan="5"><bold>PREPROCESSED BY x-VAST</bold></td>
</tr>
<tr>
<td align="left">HGM (var1001-1100)</td>
<td align="center">62</td>
<td align="center">33</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">HMM (var1101-1200)</td>
<td align="center">18</td>
<td align="center">33</td>
<td align="center">34</td>
<td align="center">10</td>
</tr>
<tr>
<td align="left">LGM (var1201-1300)</td>
<td align="center">7</td>
<td align="center">33</td>
<td align="center">44</td>
<td align="center">9</td>
</tr>
<tr>
<td align="left">LMM (var1301-1400)</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">18</td>
<td align="center">53</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The concept behind VAST and x-VAST is to increase the rank for stable (high abundant, low variation) variables and decrease the rank for unstable (low abundant, high variation) variables. So, the rank for HMM variables, which have high abundance and lower relative variation, will move toward the beginning; the rank for LMM variables, which have low abundance and higher relative variation, will move toward the end of VIP lists. In both tables, the LMM variables did move toward to the lower rank when preprocessed by VAST and x-VAST.</p>
<p>Comparing VAST and x-VAST, there are more markers were kept by x-VAST. For example, in Table <xref ref-type="table" rid="T2">2</xref>, there is more markers identified in HMM groups. Figure <xref ref-type="fig" rid="F6">6</xref> shows an example of the new identified biomarker (var 114). It clearly shows that, the responses of the var 114 are low abundant in one class. The preprocess of VAST did not identify this variable as biomarker because of the bigger variation from two classes. On the contrary, the preprocess of x-VAST can pick up this difference and identified this biomarker. The scenario of Var114 is just like what we saw in the real metabolomics dataset mentioned above.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>The response of var114 in small simulated dataset.</bold> It clearly shows that this variable is a good marker to differentiate two groups.</p></caption>
<graphic xlink:href="fmolb-02-00004-g0006.tif"/>
</fig>
<p>The biggest difference between VAST and x-VAST was found for variables in LGM group, which has low abundance and bigger difference between two classes. As both Tables <xref ref-type="table" rid="T2">2</xref>, <xref ref-type="table" rid="T3">3</xref> shown, VAST removed many markers because of low stability (average/variation) for these variables inspite of big difference between two classes. On the contrary, x-VAST used the higher stability calculation (average/variation) in one class as the weight for the variables. Then, more variables in this group were rescued back in biomarker list.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s3">
<title>Conclusions</title>
<p>The data preprocessing is a critical step in information mining of metabolomics studies, it directly influences the discovery of differential biomarkers. In this work, the missing values and the relationship between mask effect and scaling methods were studied. An optimal strategy including a &#x02018;modified 80% rule&#x02019;, Pareto scaling and x-VAST was suggested. When a dataset from acute deterioration in liver function of chronic hepatitis B was fed to the suggested strategy, several new differential metabolites masked by noise or other big peaks were rediscovered. Furthermore, two simulated datasets were used to test proposed method. It was shown that some masked marker was rescued by x-VAST. In the future, we will test it in another separate study to assess how useful this strategy is in a general metabolomics study. Although we use HPLC-MS dataset as a test dataset, it should be noted that the strategy could be used in other metabolomics research and other omics&#x00027; datasets from different analytical platforms.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>The study has been supported by the Foundation (No. 21375011) from the National Natural Science Foundation of China and the State Key Science and Technology Project for Infectious Diseases (2012ZX10002-011).</p>
</ack>
<sec sec-type="supplementary-material" id="s4">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://www.frontiersin.org/journal/10.3389/fmolb.2015.00004/abstract">http://www.frontiersin.org/journal/10.3389/fmolb.2015.00004/abstract</ext-link></p>
<supplementary-material xlink:href="Table1.DOCX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table2.XLS" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table3.XLS" mimetype="application/vnd.ms-excel" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image1.PDF" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abate-Shen</surname> <given-names>C.</given-names></name> <name><surname>Shen</surname> <given-names>M. M.</given-names></name></person-group> (<year>2009</year>). <article-title>Diagnostics: the prostate-cancer metabolome</article-title>. <source>Nature</source> <volume>457</volume>, <fpage>799</fpage>&#x02013;<lpage>800</lpage>. <pub-id pub-id-type="doi">10.1038/457799a</pub-id><pub-id pub-id-type="pmid">19212391</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bijlsma</surname> <given-names>S.</given-names></name> <name><surname>Bobeldijk</surname> <given-names>I.</given-names></name> <name><surname>Verheij</surname> <given-names>E. R.</given-names></name> <name><surname>Ramaker</surname> <given-names>R.</given-names></name> <name><surname>Kochhar</surname> <given-names>S.</given-names></name> <name><surname>Macdonald</surname> <given-names>I. A.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation</article-title>. <source>Anal. Chem</source>. <volume>78</volume>, <fpage>567</fpage>&#x02013;<lpage>574</lpage>. <pub-id pub-id-type="doi">10.1021/ac051495j</pub-id><pub-id pub-id-type="pmid">16408941</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brindle</surname> <given-names>J. T.</given-names></name> <name><surname>Antti</surname> <given-names>H.</given-names></name> <name><surname>Holmes</surname> <given-names>E.</given-names></name> <name><surname>Tranter</surname> <given-names>G.</given-names></name> <name><surname>Nicholson</surname> <given-names>J. K.</given-names></name> <name><surname>Bethell</surname> <given-names>H. W.</given-names></name> <etal/></person-group>. (<year>2002</year>). <article-title>Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomics</article-title>. <source>Nat. Med</source>. <volume>8</volume>, <fpage>1439</fpage>&#x02013;<lpage>1444</lpage>. <pub-id pub-id-type="doi">10.1038/nm1202-802</pub-id><pub-id pub-id-type="pmid">12447357</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Craig</surname> <given-names>A.</given-names></name> <name><surname>Cloarec</surname> <given-names>O.</given-names></name> <name><surname>Holmes</surname> <given-names>E.</given-names></name> <name><surname>Nicholson</surname> <given-names>J. K.</given-names></name> <name><surname>Lindon</surname> <given-names>J. C.</given-names></name></person-group> (<year>2006</year>). <article-title>Scaling and normalization effects in NMR spectroscopic metabonomic data sets</article-title>. <source>Anal. Chem</source>. <volume>78</volume>, <fpage>2262</fpage>&#x02013;<lpage>2267</lpage>. <pub-id pub-id-type="doi">10.1021/ac0519312</pub-id><pub-id pub-id-type="pmid">16579606</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>W.</given-names></name> <name><surname>Yin</surname> <given-names>P.</given-names></name> <name><surname>Zeng</surname> <given-names>Z.</given-names></name> <name><surname>Kong</surname> <given-names>H.</given-names></name> <name><surname>Tong</surname> <given-names>H.</given-names></name> <name><surname>Xu</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Nontargeted modification-specific metabolomics study based on liquid chromatography-high-resolution mass spectrometry</article-title>. <source>Anal. Chem</source>. <volume>86</volume>, <fpage>9146</fpage>&#x02013;<lpage>9153</lpage>. <pub-id pub-id-type="doi">10.1021/ac502045j</pub-id><pub-id pub-id-type="pmid">25186149</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Enot</surname> <given-names>D. P.</given-names></name> <name><surname>Lin</surname> <given-names>W.</given-names></name> <name><surname>Beckmann</surname> <given-names>M.</given-names></name> <name><surname>Parker</surname> <given-names>D.</given-names></name> <name><surname>Overy</surname> <given-names>D. P.</given-names></name> <name><surname>Draper</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data</article-title>. <source>Nat. Protoc</source>. <volume>3</volume>, <fpage>446</fpage>&#x02013;<lpage>470</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2007.511</pub-id><pub-id pub-id-type="pmid">18323816</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fiehn</surname> <given-names>O.</given-names></name></person-group> (<year>2002</year>). <article-title>Metabolomics&#x02013;the link between genotypes and phenotypes</article-title>. <source>Plant Mol. Biol</source>. <volume>48</volume>, <fpage>155</fpage>&#x02013;<lpage>171</lpage>. <pub-id pub-id-type="doi">10.1023/A:1013713905833</pub-id><pub-id pub-id-type="pmid">11860207</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hrydziuszko</surname> <given-names>O.</given-names></name> <name><surname>Viant</surname> <given-names>M. R.</given-names></name></person-group> (<year>2012</year>). <article-title>Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline</article-title>. <source>Metabolomics</source> <volume>8</volume>, <fpage>S161</fpage>&#x02013;<lpage>S174</lpage>. <pub-id pub-id-type="doi">10.1007/s11306-011-0366-4</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kell</surname> <given-names>D. B.</given-names></name></person-group> (<year>2004</year>). <article-title>Metabolomics and systems biology: making sense of the soup</article-title>. <source>Curr. Opin. Microbiol</source>. <volume>7</volume>, <fpage>296</fpage>&#x02013;<lpage>307</lpage>. <pub-id pub-id-type="doi">10.1016/j.mib.2004.04.012</pub-id><pub-id pub-id-type="pmid">15196499</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kell</surname> <given-names>D. B.</given-names></name> <name><surname>Goodacre</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Metabolomics and systems pharmacology: why and how to model the human metabolic network for drug discovery</article-title>. <source>Drug Discov. Today</source> <volume>19</volume>, <fpage>171</fpage>&#x02013;<lpage>182</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2013.07.014</pub-id><pub-id pub-id-type="pmid">23892182</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keun</surname> <given-names>H. C.</given-names></name></person-group> (<year>2006</year>). <article-title>Metabonomic modeling of drug toxicity</article-title>. <source>Pharmacol. Ther</source>. <volume>109</volume>, <fpage>92</fpage>&#x02013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1016/j.pharmthera.2005.06.008</pub-id><pub-id pub-id-type="pmid">16051371</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keun</surname> <given-names>H. C.</given-names></name> <name><surname>Ebbels</surname> <given-names>T. M. D.</given-names></name> <name><surname>Antti</surname> <given-names>H.</given-names></name> <name><surname>Bollard</surname> <given-names>M. E.</given-names></name> <name><surname>Beckonert</surname> <given-names>O.</given-names></name> <name><surname>Holmes</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profiling</article-title>. <source>Anal. Chim. Acta</source> <volume>490</volume>, <fpage>265</fpage>&#x02013;<lpage>276</lpage>. <pub-id pub-id-type="doi">10.1016/S0003-2670(03)00094-1</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koh</surname> <given-names>Y.</given-names></name> <name><surname>Pasikanti</surname> <given-names>K. K.</given-names></name> <name><surname>Yap</surname> <given-names>C. W.</given-names></name> <name><surname>Chan</surname> <given-names>E. C.</given-names></name></person-group> (<year>2010</year>). <article-title>Comparative evaluation of software for retention time alignment of gas chromatography/time-of-flight mass spectrometry-based metabonomic data</article-title>. <source>J. Chromatogr. A</source> <volume>1217</volume>, <fpage>8308</fpage>&#x02013;<lpage>8316</lpage>. <pub-id pub-id-type="doi">10.1016/j.chroma.2010.10.101</pub-id><pub-id pub-id-type="pmid">21081237</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kohl</surname> <given-names>S. M.</given-names></name> <name><surname>Klein</surname> <given-names>M. S.</given-names></name> <name><surname>Hochrein</surname> <given-names>J.</given-names></name> <name><surname>Oefner</surname> <given-names>P. J.</given-names></name> <name><surname>Spang</surname> <given-names>R.</given-names></name> <name><surname>Gronwald</surname> <given-names>W.</given-names></name></person-group> (<year>2012</year>). <article-title>State-of-the art data normalization methods improve NMR-based metabolomic analysis</article-title>. <source>Metabolomics</source> <volume>8</volume>, <fpage>S146</fpage>&#x02013;<lpage>S160</lpage>. <pub-id pub-id-type="doi">10.1007/s11306-011-0350-z</pub-id><pub-id pub-id-type="pmid">22593726</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lahdou</surname> <given-names>I. H.</given-names></name> <name><surname>Oweira</surname> <given-names>M.</given-names></name> <name><surname>Sadeghi</surname> <given-names>V.</given-names></name> <name><surname>Daniel</surname> <given-names>G.</given-names></name> <name><surname>Fusch</surname> <given-names>J. C.</given-names></name> <name><surname>Schefold</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Tryptophan catabolites as prognostic biomarkers for the severity of chronic liver diseases in potential transplant recipients</article-title>. <source>Transplant Int</source>. <volume>24</volume>, <fpage>264</fpage>&#x02013;<lpage>264</lpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mari</surname> <given-names>A.</given-names></name> <name><surname>Lyon</surname> <given-names>D.</given-names></name> <name><surname>Fragner</surname> <given-names>L.</given-names></name> <name><surname>Montoro</surname> <given-names>P.</given-names></name> <name><surname>Piacente</surname> <given-names>S.</given-names></name> <name><surname>Wienkoop</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Phytochemical composition of L. analyzed by an integrative GC-MS and LC-MS metabolomics platform</article-title>. <source>Metabolomics</source> <volume>9</volume>, <fpage>599</fpage>&#x02013;<lpage>607</lpage>. <pub-id pub-id-type="doi">10.1007/s11306-012-0473-x</pub-id><pub-id pub-id-type="pmid">23678344</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nicholson</surname> <given-names>J. K.</given-names></name> <name><surname>Connelly</surname> <given-names>J.</given-names></name> <name><surname>Lindon</surname> <given-names>J. C.</given-names></name> <name><surname>Holmes</surname> <given-names>E.</given-names></name></person-group> (<year>2002</year>). <article-title>Metabonomics: a platform for studying drug toxicity and gene function</article-title>. <source>Nat. Rev. Drug Discov</source>. <volume>1</volume>, <fpage>153</fpage>&#x02013;<lpage>161</lpage>. <pub-id pub-id-type="doi">10.1038/nrd728</pub-id><pub-id pub-id-type="pmid">12120097</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pelczer</surname> <given-names>I.</given-names></name></person-group> (<year>2005</year>). <article-title>High-resolution NMR for metabomics</article-title>. <source>Curr. Opin. Drug Discov. Devel</source>. <volume>8</volume>, <fpage>127</fpage>&#x02013;<lpage>133</lpage>. <pub-id pub-id-type="pmid">15679180</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peterson</surname> <given-names>A. C.</given-names></name> <name><surname>Balloon</surname> <given-names>A. J.</given-names></name> <name><surname>Westphall</surname> <given-names>M. S.</given-names></name> <name><surname>Coon</surname> <given-names>J. J.</given-names></name></person-group> (<year>2014</year>). <article-title>Development of a GC/Quadrupole-Orbitrap mass spectrometer, part II: new approaches for discovery metabolomics</article-title>. <source>Anal. Chem</source>. <volume>86</volume>, <fpage>10044</fpage>&#x02013;<lpage>10051</lpage>. <pub-id pub-id-type="doi">10.1021/ac5014755</pub-id><pub-id pub-id-type="pmid">25166283</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pinto</surname> <given-names>J.</given-names></name> <name><surname>Domingues</surname> <given-names>M. R.</given-names></name> <name><surname>Galhano</surname> <given-names>E.</given-names></name> <name><surname>Pita</surname> <given-names>C.</given-names></name> <name><surname>Almeida Mdo</surname> <given-names>C.</given-names></name> <name><surname>Carreira</surname> <given-names>I. M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Human plasma stability during handling and storage: impact on NMR metabolomics</article-title>. <source>Analyst</source> <volume>139</volume>, <fpage>1168</fpage>&#x02013;<lpage>1177</lpage>. <pub-id pub-id-type="doi">10.1039/c3an02188b</pub-id><pub-id pub-id-type="pmid">24443722</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Powers</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>The current state of drug discovery and a potential role for NMR metabolomics</article-title>. <source>J. Med. Chem</source>. <volume>57</volume>, <fpage>5860</fpage>&#x02013;<lpage>5870</lpage>. <pub-id pub-id-type="doi">10.1021/jm401803b</pub-id><pub-id pub-id-type="pmid">24588729</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smilde</surname> <given-names>A. K.</given-names></name> <name><surname>van der Werf</surname> <given-names>M. J.</given-names></name> <name><surname>Bijlsma</surname> <given-names>S.</given-names></name> <name><surname>van der Werff-van der Vat</surname> <given-names>B. J.</given-names></name> <name><surname>Jellema</surname> <given-names>R. H.</given-names></name></person-group> (<year>2005</year>). <article-title>Fusion of mass spectrometry-based metabolomics data</article-title>. <source>Anal. Chem</source>. <volume>77</volume>, <fpage>6729</fpage>&#x02013;<lpage>6736</lpage>. <pub-id pub-id-type="doi">10.1021/ac051080y</pub-id><pub-id pub-id-type="pmid">16223263</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sreekumar</surname> <given-names>A.</given-names></name> <name><surname>Poisson</surname> <given-names>L. M.</given-names></name> <name><surname>Rajendiran</surname> <given-names>T. M.</given-names></name> <name><surname>Khan</surname> <given-names>A. P.</given-names></name> <name><surname>Cao</surname> <given-names>Q.</given-names></name> <name><surname>Yu</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression</article-title>. <source>Nature</source> <volume>457</volume>, <fpage>910</fpage>&#x02013;<lpage>914</lpage>. <pub-id pub-id-type="doi">10.1038/nature07762</pub-id><pub-id pub-id-type="pmid">19212411</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sysi-Aho</surname> <given-names>M.</given-names></name> <name><surname>Katajamaa</surname> <given-names>M.</given-names></name> <name><surname>Yetukuri</surname> <given-names>L.</given-names></name> <name><surname>Oresic</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Normalization method for metabolomics data using optimal selection of multiple internal standards</article-title>. <source>BMC Bioinformatics</source> <volume>8</volume>:<fpage>93</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-8-93</pub-id><pub-id pub-id-type="pmid">17362505</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trygg</surname> <given-names>J.</given-names></name> <name><surname>Holmes</surname> <given-names>E.</given-names></name> <name><surname>Lundstedt</surname> <given-names>T.</given-names></name></person-group> (<year>2007</year>). <article-title>Chemometrics in metabonomics</article-title>. <source>J. Proteome Res</source>. <volume>6</volume>, <fpage>469</fpage>&#x02013;<lpage>479</lpage>. <pub-id pub-id-type="doi">10.1021/pr060594q</pub-id><pub-id pub-id-type="pmid">17269704</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van den Berg</surname> <given-names>R. A.</given-names></name> <name><surname>Hoefsloot</surname> <given-names>H. C. J.</given-names></name> <name><surname>Westerhuis</surname> <given-names>J. A.</given-names></name> <name><surname>Smilde</surname> <given-names>A. K.</given-names></name> <name><surname>van der Werf</surname> <given-names>M. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Centering, scaling, and transformations: improving the biological information content of metabolomics data</article-title>. <source>BMC Genomics</source> <volume>7</volume>:<fpage>142</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-7-142</pub-id><pub-id pub-id-type="pmid">16762068</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Ravenzwaay</surname> <given-names>B.</given-names></name> <name><surname>Montoya</surname> <given-names>G. A.</given-names></name> <name><surname>Fabian</surname> <given-names>E.</given-names></name> <name><surname>Herold</surname> <given-names>M.</given-names></name> <name><surname>Krennrich</surname> <given-names>G.</given-names></name> <name><surname>Looser</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>The sensitivity of metabolomics versus classical regulatory toxicology from a NOAEL perspective</article-title>. <source>Toxicol. Lett</source>. <volume>227</volume>, <fpage>20</fpage>&#x02013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1016/j.toxlet.2014.03.004</pub-id><pub-id pub-id-type="pmid">24657160</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Veselkov</surname> <given-names>K. A.</given-names></name> <name><surname>Vingara</surname> <given-names>L. K.</given-names></name> <name><surname>Masson</surname> <given-names>P.</given-names></name> <name><surname>Robinette</surname> <given-names>S. L.</given-names></name> <name><surname>Want</surname> <given-names>E.</given-names></name> <name><surname>Li</surname> <given-names>J. V.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery</article-title>. <source>Anal. Chem</source>. <volume>83</volume>, <fpage>5864</fpage>&#x02013;<lpage>5872</lpage>. <pub-id pub-id-type="doi">10.1021/ac201065j</pub-id><pub-id pub-id-type="pmid">21526840</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wachsmuth</surname> <given-names>C. J.</given-names></name> <name><surname>Dettmer</surname> <given-names>K.</given-names></name> <name><surname>Lang</surname> <given-names>S. A.</given-names></name> <name><surname>Mycielska</surname> <given-names>M. E.</given-names></name> <name><surname>Oefner</surname> <given-names>P. J.</given-names></name></person-group> (<year>2014</year>). <article-title>Continuous water infusion enhances atmospheric pressure chemical ionization of methyl chloroformate derivatives in gas chromatography coupled to time-of-flight mass spectrometry-based metabolomics</article-title>. <source>Anal. Chem</source>. <volume>86</volume>, <fpage>9186</fpage>&#x02013;<lpage>9195</lpage>. <pub-id pub-id-type="doi">10.1021/ac502133r</pub-id><pub-id pub-id-type="pmid">25152309</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wagner</surname> <given-names>L.</given-names></name> <name><surname>Trattner</surname> <given-names>S.</given-names></name> <name><surname>Pickova</surname> <given-names>J.</given-names></name> <name><surname>Gomez-Requeni</surname> <given-names>P.</given-names></name> <name><surname>Moazzami</surname> <given-names>A. A.</given-names></name></person-group> (<year>2014</year>). <article-title>(1)H NMR-based metabolomics studies on the effect of sesamin in Atlantic salmon (<italic>Salmo salar</italic>)</article-title>. <source>Food Chem</source>. <volume>147</volume>, <fpage>98</fpage>&#x02013;<lpage>105</lpage>. <pub-id pub-id-type="doi">10.1016/j.foodchem.2013.09.128</pub-id><pub-id pub-id-type="pmid">24206691</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Holmes</surname> <given-names>E.</given-names></name> <name><surname>Lindon</surname> <given-names>J. C.</given-names></name> <name><surname>Turini</surname> <given-names>M. E.</given-names></name> <name><surname>Sprenger</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>Biochemical characterization of rat intestine development using high-resolution magic-angle-spinning 1H NMR spectroscopy and multivariate data analysis</article-title>. <source>J. Proteome Res</source>. <volume>4</volume>, <fpage>1324</fpage>&#x02013;<lpage>1329</lpage>. <pub-id pub-id-type="doi">10.1021/pr050032r</pub-id><pub-id pub-id-type="pmid">16083283</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Want</surname> <given-names>E.</given-names></name> <name><surname>Masson</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>Processing and analysis of GC/LC-MS-based metabolomics data</article-title>. <source>Methods Mol. Biol</source>. <volume>708</volume>, <fpage>277</fpage>&#x02013;<lpage>298</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-61737-985-7_17</pub-id><pub-id pub-id-type="pmid">21207297</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wold</surname> <given-names>S.</given-names></name> <name><surname>Antti</surname> <given-names>H.</given-names></name> <name><surname>Lindgren</surname> <given-names>F.</given-names></name> <name><surname>Ohman</surname> <given-names>J.</given-names></name></person-group> (<year>1998</year>). <article-title>Orthogonal signal correction of near-infrared spectra</article-title>. <source>Chemometr. Intell. Lab. Syst</source>. <volume>44</volume>, <fpage>175</fpage>&#x02013;<lpage>185</lpage>. <pub-id pub-id-type="doi">10.1016/S0169-7439(98)00109-9</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Worley</surname> <given-names>B.</given-names></name> <name><surname>Powers</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>MVAPACK: a complete data handling package for NMR metabolomics</article-title>. <source>ACS Chem. Biol</source>. <volume>9</volume>, <fpage>1138</fpage>&#x02013;<lpage>1144</lpage>. <pub-id pub-id-type="doi">10.1021/cb4008937</pub-id><pub-id pub-id-type="pmid">24576144</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>G.</given-names></name> <name><surname>Zheng</surname> <given-names>Y.</given-names></name> <name><surname>Kong</surname> <given-names>H.</given-names></name> <name><surname>Pang</surname> <given-names>T.</given-names></name> <name><surname>Lv</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Diagnosis of liver cancer using HPLC-based metabonomics avoiding false-positive result from hepatitis and hepatocirrhosis diseases</article-title>. <source>J. Chromatogr. B Analyt. Technol. Biomed. Life Sci</source>. <volume>813</volume>, <fpage>59</fpage>&#x02013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1016/j.jchromb.2004.09.032</pub-id><pub-id pub-id-type="pmid">15556516</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>G.</given-names></name> <name><surname>Zheng</surname> <given-names>Y.</given-names></name> <name><surname>Kong</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>Strategy for metabonomics research based on high-performance liquid chromatography and liquid chromatography coupled with tandem mass spectrometry</article-title>. <source>J. Chromatogr. A</source> <volume>1084</volume>, <fpage>214</fpage>&#x02013;<lpage>221</lpage>. <pub-id pub-id-type="doi">10.1016/j.chroma.2004.10.100</pub-id><pub-id pub-id-type="pmid">16114257</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Gao</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>High performance liquid chromatography-mass spectrometry for metabonomics: potential biomarkers for acute deterioration of liver function in chronic hepatitis B</article-title>. <source>J. Proteome Res</source>. <volume>5</volume>, <fpage>554</fpage>&#x02013;<lpage>561</lpage>. <pub-id pub-id-type="doi">10.1021/pr050364w</pub-id><pub-id pub-id-type="pmid">16512670</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Xu</surname> <given-names>F.</given-names></name> <name><surname>Qi</surname> <given-names>B.</given-names></name> <name><surname>Hao</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Serum metabolomics study of polycystic ovary syndrome based on liquid chromatography-mass spectrometry</article-title>. <source>J. Proteome Res</source>. <volume>13</volume>, <fpage>1101</fpage>&#x02013;<lpage>1111</lpage>. <pub-id pub-id-type="doi">10.1021/pr401130w</pub-id><pub-id pub-id-type="pmid">24428203</pub-id></citation>
</ref>
</ref-list>
</back>
</article>