<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2013.00165</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A method for calling copy number polymorphism using haplotypes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Ho Jang</surname> <given-names>Gun</given-names></name>
</contrib>
<contrib contrib-type="author">
<name><surname>Christie</surname> <given-names>Jason D.</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Feng</surname> <given-names>Rui</given-names></name>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff><institution>Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania</institution> <country>Philadelphia, PA, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Hemant K. Tiwari, University of Alabama at Birmingham, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: William C. L. Stewart, Columbia University, USA; Li Zhang, University of California, San Francisco, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Rui Feng, Department of Biostatistics and Epidemiology, School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley Hall 209, Philadelphia, Pennsylvania, PA 19104, USA e-mail: <email>ruifeng&#x00040;mail.med.upenn.edu</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>09</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<volume>4</volume>
<elocation-id>165</elocation-id>
<history>
<date date-type="received">
<day>29</day>
<month>04</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>08</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2013 Jang, Christie and Feng.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>Single nucleotide polymorphism (SNP) and copy number variation (CNV) are both widespread characteristic of the human genome, but are often called separately on common genotyping platforms. To capture integrated SNP and CNV information, methods have been developed for calling allelic specific copy numbers or so called copy number polymorphism (CNP), using limited inter-marker correlation. In this paper, we proposed a haplotype-based maximum likelihood method to call CNP, which takes advantage of the valuable multi-locus linkage disequilibrium (LD) information in the population. We also developed a computationally efficient algorithm to estimate haplotype frequencies and optimize individual CNP calls iteratively, even at presence of missing data. Through simulations, we demonstrated our model is more sensitive and accurate in detecting various CNV regions, compared with commonly-used CNV calling methods including PennCNV, another hidden Markov model (HMM) using CNP, a scan statistic, segCNV, and cnvHap. Our method often performs better in the regions with higher LD, in longer CNV regions, and in common CNV than the opposite. We implemented our method on the genotypes of 90 HapMap CEU samples and 23 patients with acute lung injury (ALI). For each ALI patient the genotyping was performed twice. The CNPs from our method show good consistency and accuracy comparable to others.</p></abstract>
<kwd-group>
<kwd>CNV</kwd>
<kwd>CNP</kwd>
<kwd>GWAS</kwd>
<kwd>haplotype</kwd>
<kwd>joint SNP and CNV calling</kwd>
<kwd>integrated SNP and CNV</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="7"/>
<equation-count count="28"/>
<ref-count count="42"/>
<page-count count="14"/>
<word-count count="10159"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>1. Introduction</title>
<p>DNA copy number variation (CNV) refers to differences in genomic DNA with varying numbers of gene copies including segmental amplification, deletion, and loss of heterozygosity. CNVs are found widespread in the human genome, covering approximately 18% of the genome (Freeman et al., <xref ref-type="bibr" rid="B13">2006</xref>; Redon et al., <xref ref-type="bibr" rid="B30">2006</xref>; McCarroll and Altshuler, <xref ref-type="bibr" rid="B25">2007</xref>; Database of Genomic Variates). Increasing evidence shows that CNVs accounts for a significant portion of phenotypic variation (Iafrate et al., <xref ref-type="bibr" rid="B17">2004</xref>; Sebat et al., <xref ref-type="bibr" rid="B32">2004</xref>; Tuzun et al., <xref ref-type="bibr" rid="B35">2005</xref>; Conrad et al., <xref ref-type="bibr" rid="B6">2006</xref>; McCarroll et al., <xref ref-type="bibr" rid="B26">2006</xref>; Redon et al., <xref ref-type="bibr" rid="B30">2006</xref>) yet are far underestimated for human diseases and conditions (Sebat et al., <xref ref-type="bibr" rid="B31">2007</xref>). A comprehensive study suggested that the total amount of sequence variation involving CNVs between two healthy subjects was actually higher than that for Single nucleotide polymorphisms (SNPs) (Redon et al., <xref ref-type="bibr" rid="B30">2006</xref>), which was supported by the increasing number and resolution of CNV discoveries (Korbel et al., <xref ref-type="bibr" rid="B23">2007</xref>). A systematic evaluation of five widely used array-based CNV detection programs suggested that existing methods have conservative sensitivity in CNV detection.</p>
<p>Recently, high-density SNP genotyping arrays have gained substantial attention for CNV detection and analysis. Although originally designed for genome-wide SNP association studies, they contain signal intensities that can be borrowed to identify regions with deletions or duplications (Komura et al., <xref ref-type="bibr" rid="B22">2006</xref>; Peiffer et al., <xref ref-type="bibr" rid="B29">2006</xref>). Multiple softwares and programs have been developed for these arrays, and their performance is being evaluated (Winchester et al., <xref ref-type="bibr" rid="B39">2009</xref>; Zhang et al., <xref ref-type="bibr" rid="B40">2011</xref>). With a few exceptions, the existing approaches can be roughly classified to two types: single-locus pooled-sample approach and single-individual cross-genome approach. The single-locus pooled-sample approaches use the distributions of signal intensities of multiple samples at a fixed locus to derive reference values and clusters for each CNV value and then determine individual CNV by their belonged cluster, such as TriTyper (Franke et al., <xref ref-type="bibr" rid="B12">2008</xref>) and CNVtools (Barnes et al., <xref ref-type="bibr" rid="B1">2008</xref>). These methods proceed locus by locus and generally ignore inter-marker correlations. The single-individual cross-genome approaches either use partitioning approach [such as, DNAcopy (Olshen et al., <xref ref-type="bibr" rid="B27">2004</xref>), CnvPartition by Illumina, segCNV (Shi and Li, <xref ref-type="bibr" rid="B33">2012</xref>)] or the hidden Markov Models (HMM) [such as, Birdseye (Korn et al., <xref ref-type="bibr" rid="B24">2008</xref>), QuantiSNP (Colella et al., <xref ref-type="bibr" rid="B5">2007</xref>) and PennCNV (Wang et al., <xref ref-type="bibr" rid="B38">2007</xref>)] to call CNV individual by individual. The HMM considers the dependency between copy number states at two adjacent markers by assuming the CNV underlying observed signal intensities is a first-order Markov process (Gelfond et al., <xref ref-type="bibr" rid="B15">2009</xref>). The inter-marker correlation is implied in the homogeneous transition probability that only depends on the inter-marker distance. A noticeable exception to the two lines of methods is a novel Bayesian approach that combines the signal intensity distribution and heterozygosity information to infer individual CNV (Z&#x00151;llner et al., <xref ref-type="bibr" rid="B42">2009</xref>).</p>
<p>Because both SNPs and CNVs affect the signal intensities and may affect phenotypes separately or jointly, their coexistence will affect the identification of each other and the results of association studies. Ignoring SNPs in CNV analysis fails to incorporate allele-specific gains and losses and diminish the potential to exploit linkage disequilibrium (LD) between CNVs and nearby SNPs. How to combine SNPs and CNVs has been a challenge for geneticists. The common annotations of the copy numbers and SNP genotypes have been independent and the available methods for SNP and CNV calling had been separate until (Korn et al., <xref ref-type="bibr" rid="B24">2008</xref>) presented a sequential approach to generate copy number polymorphisms (CNPs) using results from both genotype calls and CNV calls.</p>
<p>Methods were extended to accommodate the CNP along the same class of CNV calling approaches. In single-locus pooled-sample approach, the CnvPartition reassumed that intensity and proportion of B alleles follow distinct bivariate Gaussian distributions given different CNP. In single-individual cross-genome approach, (Wang et al., <xref ref-type="bibr" rid="B37">2009</xref>) allowed multiple CNP states in their HMM and incorporated the two-locus LD parameter in addition to the inter-marker distance in the transition probability, which increased the accuracy of the CNP calls. cnvHap (Coin et al., <xref ref-type="bibr" rid="B4">2010</xref>) used a new transition probability in their HMM, which only relies on the two-locus haplotype frequencies. The HMM of polyHap (Su et al., <xref ref-type="bibr" rid="B34">2010</xref>) treated a CNV region as a region of variable ploidy and considered only two-locus haplotype.</p>
<p>At presence of CNV, all existing algorithms using haplotypes augmented the conventional haplotype definition by treating duplications and deletion as additional &#x0201C;alleles&#x0201D;, distinct from two existing SNP alleles at each locus. Such defined &#x0201C;haplotypes&#x0201D; are not continuous physical pieces as traditionally perceived, which can cause conceptual confusions. In addition, the number of possible combinations over a region increases exponentially with the base of 4&#x02013;6 instead of 2, largely increasing computational complexity and leading to infeasibility. There were a few methods to estimate the frequencies of such &#x0201C;haplotypes&#x0201D;. MOCSphaser (Kato et al., <xref ref-type="bibr" rid="B21">2008b</xref>) infers CNV-SNP haplotypes using an expectation-maximization (EM) algorithm but only accommodates integer copy numbers in CNV regions and SNP genotypes in non-CNV regions. CNVphaser employed a hierarchical partition-ligation strategy to break down a longer region into smaller blocks and used the EM algorithm to estimate the &#x0201C;haplotypes&#x0201D; frequencies just as for the regular haplotypes (Kato et al., <xref ref-type="bibr" rid="B20">2008a</xref>).</p>
<p>It often occurs that some intensity values are apparent outliers that can be easily detected by routine quality control procedures. Other methods either exclude the whole samples with some poor quality values (Wang et al., <xref ref-type="bibr" rid="B38">2007</xref>) or re-estimate the SNP at each locus with poor quality by imputation for subsequent calls.</p>
<p>In this paper, we developed a haplotype-based maximum likelihood method to call CNP, which takes account of valuable multi-locus LD information in the population. By posing practical assumptions for short CNV regions, we keep the same conventional haplotypes as originally defined for SNP genotype data and make corresponding inferences. We developed a computationally efficient algorithm that determines optimal CNPs for each individual and estimates haplotype frequencies in the population simultaneously. We consider our method as a natural merge of single-locus pooled-sample and single-individual cross-genome approaches for CNP calling. Our method can also recover CNPs even with missing data. We evaluated our methods through extensive simulations in terms of sensitivity, true positive rate, length of detectable CNV regions for different haplotype structure, frequencies and length of CNV regions. And we compared our method with a few available methods to assess the possible gain of using haplotypes. In addition, we checked how well our method can recover CNPs when there are missing or extreme values in raw data. Last, we applied these methods to the duplicated genotype samples of 23 individuals with acute lung injury (ALI) to check the consistency of our method in calling both CNV and SNP. Accuracy was assessed by comparing calls from all methods to the CNV regions identified through array CGH data.</p>
</sec>
<sec sec-type="methods" id="s2">
<title>2. Methods</title>
<sec>
<title>2.1. Notations</title>
<p>We use <italic>c</italic> &#x0003D; (<italic>c</italic><sub><italic>A</italic></sub>, <italic>c</italic><sub><italic>B</italic></sub>) to denote the copy numbers of A allele and B allele, or CNP at a bi-allelic marker locus. CNPs at normal states with two copies of alleles include (1,1), (2,0), and (0,2), which code regular SNP genotypes; (0,0) is the CNP at the double deletion state, (0,1) and (1,0) are the CNPs at single deletion states, and (1,2), (2,1), (2,2), (1,3), and (3,1) or more are for the duplication states. The copy numbers in most available methods are referred to as <sc>cn</sc> &#x0003D; <italic>c</italic><sub><italic>A</italic></sub> &#x0002B; <italic>c</italic><sub><italic>B</italic></sub>, which does not contain the allele specific information and cannot infer disease risks associated with the copy number change of a specific allele. Because the duplications with four or more copies are virtually indistinguishable on genotype platforms (Wang et al., <xref ref-type="bibr" rid="B37">2009</xref>), we set the maximum copy number to 4.</p>
<p>We denote <italic>X</italic><sub><italic>A</italic></sub> and <italic>X</italic><sub><italic>B</italic></sub> as the normalized signal intensity values for allele A and allele B, respectively. <italic>X</italic><sub><italic>A</italic></sub> and <italic>X</italic><sub><italic>B</italic></sub> can be extracted from raw CEL files using the standardized normalization procedure, such as the BeadStudio software for Illumina platforms and the Affymetrix Power Tools for Affymetrix platforms. We use two measures log R ratio and B allele frequency, denoted as <italic>r</italic> and <italic>b</italic>, as the observed values in our models. <italic>r</italic> is the logarithm ratio of observed total intensity <italic>R</italic> &#x0003D; <italic>X</italic><sub><italic>A</italic></sub> &#x0002B; <italic>X</italic><sub><italic>B</italic></sub> to expected intensity (Peiffer et al., <xref ref-type="bibr" rid="B29">2006</xref>), and <italic>b</italic> is the standardized proportion of samples carrying the B allele, i.e., a linear transformation of &#x003B8; &#x0003D; arctan(<italic>X</italic><sub><italic>B</italic></sub>/<italic>X</italic><sub><italic>A</italic></sub>)/(&#x003C0;/2).</p>
<p>We assume our study includes <italic>N</italic> individuals and we start with a haplotype block of interest containing <italic>M</italic> markers. Within the haplotype block, there are a total of <italic>s</italic> possible haplotypes <italic>h</italic><sub>1</sub>, &#x02026;, <italic>h</italic><sub><italic>s</italic></sub> with population frequencies &#x003C1; &#x0003D; (&#x003C1;<sub>1</sub>, &#x02026;, &#x003C1;<sub><italic>s</italic></sub>). Throughout the paper, we use subscripts for marker locations and superscripts for individuals.</p>
</sec>
<sec>
<title>2.2. Haplotype identification given CNP</title>
<p>In this section, we illustrate how individual CNPs can help infer haplotypes under some practical assumptions. We assume either duplication or deletion occurs as a continuous piece on one chromosome and deletion/insertion regions are not immediately next to each other for each individual.</p>
<p>If a CNV region of an individual covers the whole region of interest, we can identify his haplotype(s) as follows:
<list list-type="order">
<list-item><p>Within a duplication region (<sc>cn</sc> &#x0003D; <sc>cn</sc><sub>1</sub> &#x0003D; &#x000B7;&#x000B7;&#x000B7; &#x0003D; <sc>cn</sc><sub><italic>M</italic></sub> &#x0003D; 3 or 4), the duplicated haplotype <italic>h</italic><sub>1</sub> can be written as {<italic>S</italic><sub>1</sub> &#x02026;<italic>S</italic><sub><italic>M</italic></sub> : <italic>S</italic><sub>1</sub> &#x0003D; <italic>A</italic> if <italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub> &#x02265; <sc>cn</sc> &#x02212; 1, or <italic>S</italic><sub><italic>k</italic></sub> &#x0003D; <italic>B</italic> otherwise, for <italic>k</italic> &#x0003D; 1, &#x02026;, <italic>M</italic>}, and the other haplotype <italic>h</italic><sub>2</sub> become {<italic>S</italic>&#x02032;<sub>1</sub>&#x02026; <italic>S</italic>&#x02032;<sub><italic>M</italic></sub> : <italic>S</italic>&#x02032;<sub><italic>k</italic></sub> &#x0003D; <italic>A</italic> if <italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub> &#x0003D; <sc>cn</sc> or 1, or <italic>S</italic>&#x02032;<sub><italic>k</italic></sub> &#x0003D; <italic>B</italic> otherwise}.</p></list-item>
<list-item><p>Within a single deletion region (<sc>cn</sc> &#x0003D; <sc>cn</sc><sub>1</sub> &#x0003D; &#x000B7;&#x000B7;&#x000B7; &#x0003D; <sc>cn</sc><sub><italic>M</italic></sub> &#x0003D; 1), the unique haplotype <italic>h</italic><sub>1</sub> is {<italic>S</italic><sub>1</sub>&#x02026; <italic>S</italic><sub><italic>M</italic></sub> : <italic>S</italic><sub><italic>k</italic></sub> &#x0003D; <italic>A</italic> if <italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub> &#x0003E; 0, or <italic>S</italic><sub><italic>k</italic></sub> &#x0003D; <italic>B</italic> if <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub> &#x0003E; 0 for <italic>k</italic> &#x0003D; 1, &#x02026;, <italic>M</italic>}.</p></list-item>
</list>
</p>
<p>For example, if we know an individual&#x00027;s CNPs (3, 0)(1, 2)(2, 1)(0, 3) at four adjacent loci in a block, it is easy to see that there are three copies of allele A and 0 copy of allele B at the first locus and thus A must be on the duplicated haplotype. Similarly at loci 2, 3, and 4, B, A, and B are on the duplicated haplotype. Therefore the haplotype ABAB must be the duplicated piece and the other AABB is the normal non-duplicated haplotype. Similarly, given an individual&#x00027;s CNPs (1, 0)(0, 1)(0, 1)(1, 0), we will know that one haplotype of his was deleted and the other haplotype is ABBA. For those determined haplotypes given <bold>c</bold> &#x0003D; (<italic>c</italic><sub>1</sub>,&#x02026;, <italic>c</italic><sub><italic>M</italic></sub>), we call them &#x0201C;compatible&#x0201D; with <bold>c</bold> and denote as (<italic>h</italic><sub>1</sub>, <italic>h</italic><sub>2</sub>) &#x0007E; <bold>c</bold>. We say (ABAB, AABB) are compatible with (3, 0)(1, 2)(2, 1)(0, 3), i.e., (ABAB, AABB) &#x0007E; (3, 0)(1, 2)(2, 1)(0, 3).</p>
<p>If a CNV piece doesn&#x00027;t cover the whole region of interest, the haplotypes compatible with the CNP genotypes are determined by the CNPs within the CNV region and the regular SNP genotypes outside of the CNV region. As shown in Figure <xref ref-type="fig" rid="F1">1</xref> for a region including five loci, the haplotype sections within the 3-locus CNV regions are uniquely identified and the sections in the normal region (loci 1 and 5) can be partially inferred given population haplotype distributions. In the left figure with a deleted piece, the middle section of one haplotype is <italic>ABA</italic> and that of the other is deleted following our rule listed previously. Combined with genotypes (1,1) at loci 1 and 5, the possible haplotype couples covering the whole region can be AABAA/B&#x02026;B, AABAB/B&#x02026;A, BABAA/A&#x02026;B, and BABAB/A&#x02026;A. Similarly in the right figure with a duplication piece, the possible haplotype couples are AABAA/BBBAB, AABAB/BBBAA, BABAA/ABBAB, and BABAB/ABBAA. Now we see that a CNV region of an individual can help infer his/her haplotypes and thus help estimate the population haplotype frequencies. On the contrary, the haplotype information in the population can be used to better infer individual CNPs. In this example, if in the population there is no AABAA, BABAA, or BABAB (i.e., &#x003C0;(AABAA) &#x0003D; &#x003C0;(BABAA) &#x0003D; &#x003C0;(BABAB) &#x0003D; 0), the haplotypes in the left figure can be uniquely determined as AABAB/B&#x02026;A and in the right figure, the duplicated haplotype would be AABAB and the other BBBAA. In most cases, we know non-zero probabilities of AABAA, AABAB, BABAA, or BABAB in the population and can still make some inference about the haplotypes given CNP. This can be done by incorporating the haplotype distributions in the likelihood as we show next. In our method, we allow CNV regions vary from individual to individual, i.e., the CNV regions of different individuals can have completely different boundaries. Such flexibility would lead to extremely large numbers of haplotypes and likely small probabilities for them in the population in other &#x0201C;haplotype&#x0201D;-based CNV methods.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>CNP and haplotype configurations within a 5-locus block.</bold> 2-digit CNP at each locus was shown on top. Specific alleles A or B are shown in each circle at the corresponding loci aligned on a pair of chromosomes. The long lines between two loci denote deletion regions (no corresponding alleles). Both deletion (left) and duplication (right) occur from loci 2 to 4.</p></caption>
<graphic xlink:href="fgene-04-00165-g0001.tif"/>
</fig>
<p>Please note that, as SNPs, CNPs were aligned according to the reference coordinate, but not to the actual physical locations. For duplications, LRR and BAF can only tell which piece of chromosome is duplicated but not where it is connected to.</p>
</sec>
<sec>
<title>2.3. Maximum likelihood method using haplotype information</title>
<p>Given observed log R ratio (<bold>r</bold>) and B allele frequency (<bold>b</bold>) at loci 1, &#x02026;, <italic>M</italic>, the likelihood can be written as a function of CNP and population haplotype frequencies, i.e.,</p>
<p><graphic xlink:href="fgene-04-00165-i0001.tif"/></p>
<p>where <italic>i</italic> is the individual index <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>N</italic>, <bold>r</bold><sup><italic>i</italic></sup>, <bold>b</bold><sub><italic>i</italic></sub>, <bold>c</bold><sup><italic>i</italic></sup> are LRR, BAF, and CNP at all loci for individual <italic>i</italic> and <bold>C</bold> represents the CNP for all individuals, i.e., <bold>C</bold> &#x0003D; (<bold>c</bold><sub>1</sub>,&#x02026;, <bold>c</bold><sup><italic>N</italic></sup>), <italic>k</italic> is the marker index <italic>k</italic> &#x0003D; 1, &#x02026;, <italic>M</italic>, <italic>P</italic>(<italic>b</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub> | <italic>c</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub>, <graphic xlink:href="fgene-04-00165-i0002.tif"/>) and <italic>P</italic>(<italic>r</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub> | <italic>c</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub>, <graphic xlink:href="fgene-04-00165-i0002.tif"/>) are the conditional probability of the BAF and LRR given the CNP at locus <italic>k</italic>, <italic>P</italic>(<italic>h</italic>&#x02032;, <italic>h</italic>&#x02033;) is the probability of observing two haplotypes <italic>h</italic>&#x02032; and <italic>h</italic>&#x02033;, and <graphic xlink:href="fgene-04-00165-i0002.tif"/> denotes the set of parameters in both conditional probabilities <italic>P</italic>(<italic>b</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub> | <italic>c</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub>, <graphic xlink:href="fgene-04-00165-i0002.tif"/>) and <italic>P</italic>(<italic>r</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub> | <italic>c</italic><sup><italic>i</italic></sup><sub><italic>k</italic></sub>, <graphic xlink:href="fgene-04-00165-i0002.tif"/>). The last equation holds when <bold>b</bold> and <bold>r</bold> are assumed conditionally independent given <bold>c</bold> and either <italic>b</italic><sub><italic>k</italic></sub> or <italic>r</italic><sub><italic>k</italic></sub> are conditionally independent of other <italic>b</italic><sub><italic>l</italic></sub> or <italic>r</italic><sub><italic>l</italic></sub> given <bold>c</bold>. If we assume Hardy&#x02013;Weinberg Equilibrium (HWE), <italic>P</italic>(<italic>h</italic>&#x02032;, <italic>h</italic>&#x02033;) &#x0003D; 2&#x003C0;<sub><italic>i</italic></sub>&#x003C0;<sub><italic>j</italic></sub> if <italic>h</italic>&#x02032; &#x0003D; <italic>h</italic><sub><italic>i</italic></sub> &#x02260; <italic>h</italic>&#x02033; &#x0003D; <italic>h</italic><sub><italic>j</italic></sub> and <italic>P</italic>(<italic>h</italic>&#x02032;, <italic>h</italic>&#x02033;) &#x0003D; &#x003C0;<sup>2</sup><sub><italic>i</italic></sub> if <italic>h</italic>&#x02032; &#x0003D; <italic>h</italic>&#x02033; &#x0003D; <italic>h</italic><sub><italic>i</italic></sub>. As in HMM models, we can assume that <italic>P</italic>(<italic>r</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub> &#x0003D; (<italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub>, <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub>)) &#x0007E; <italic>N</italic>(&#x003BC;<sub><sc>cn</sc>(<italic>c</italic><sub><italic>k</italic></sub>)</sub>, &#x003C3;<sup>2</sup><sub><sc>cn</sc>(<italic>c</italic><sub><italic>k</italic></sub>)</sub>), <italic>P</italic>(<italic>b</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub>) &#x0007E; truncated <italic>N</italic>(&#x003BC;<sub><italic>b</italic>, <sc>cn</sc>(<italic>c</italic><sub><italic>k</italic></sub>)</sub>, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>), and thus <graphic xlink:href="fgene-04-00165-i0002.tif"/> &#x0003D; {&#x003BC;<sub>0</sub>,&#x02026;, &#x003BC;<sub>4</sub>, &#x003C3;<sub>0</sub>,&#x02026;, &#x003C3;<sub>4</sub>, &#x003BC;<sub><italic>b</italic>, 0</sub>,&#x02026;, &#x003BC;<sub><italic>b</italic>, 4</sub>, &#x003C3;<sub><italic>b</italic></sub>, &#x003B7;, &#x003B3;<sub>0, 0</sub>,&#x02026;, &#x003B3;<sub>4, 4</sub>, <italic>G</italic><sub>1, 1</sub>,&#x02026;, <italic>G</italic><sub>15, 15</sub>} (details in Appendices B and C).</p>
<p>Maximization of the likelihood Equation (1) with respect to <bold>C</bold> and <bold>&#x003C0;</bold> will yield optimal CNP estimates of each individual and haplotype frequency estimates. But the number of combinations of CNPs of all individual and haplotype is huge and searching over the whole parameter space can be computationally intensive. To reduce the computation burden, we derived an optimization algorithm to update individuals&#x00027; CNP and haplotype frequencies iteratively.</p>
</sec>
<sec>
<title>2.4. The CNP optimization algorithm</title>
<p>We used the following iterative algorithm to maximize the likelihood Equation (1):
<list list-type="order">
<list-item><p>At step 0, assign initial values to CNPs of all individuals <bold>C</bold><sup>(0)</sup> and population haplotype frequencies <bold>&#x003C0;</bold><sup>(0)</sup> within a haplotype block.</p></list-item>
<list-item><p>At step &#x02113;, given <bold>C</bold><sup>(&#x02113;)</sup> and <bold>&#x003C0;</bold><sup>(&#x02113;)</sup>, maximize the log-likelihood over possible <bold>c</bold><sup><italic>i</italic></sup> for each individual <italic>i</italic> to obtain optimal <bold>c</bold><sup><italic>i</italic>, (&#x02113; &#x0002B; 1)</sup>, i.e.,</p>
<p><graphic xlink:href="fgene-04-00165-i0003.tif"/></p></list-item>
<list-item><p>Update <bold>&#x003C0;</bold> through collecting the conditional probability of haplotypes given all individuals&#x00027; CNP. This is a revised expectation-maximization (EM) algorithm (Dempster et al., <xref ref-type="bibr" rid="B10">1977</xref>) of updating haplotype frequencies given individuals&#x00027; genotypes in the absence of CNV (Excoffier and Slatkin, <xref ref-type="bibr" rid="B11">1995</xref>). Given current estimate of <bold>&#x003C0;</bold><sup>(&#x02113;)</sup> and updated <bold>C</bold><sup>(&#x02113; &#x0002B; 1)</sup>, the haplotype frequency is updated according to
<disp-formula id="E1"><label>(3)</label><mml:math id="M1"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0223C;</mml:mo><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0223C;</mml:mo><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>&#x003C0;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:msub><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0223C;</mml:mo><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>&#x0007B;</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle><mml:mo>+</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007D;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>&#x003C0;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mstyle displaystyle='true'><mml:msub><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0223C;</mml:mo><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>h</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>&#x003C0;</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x02113;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p></list-item>
<list-item><p>Repeat (2) and (3) until the inferred CNP doesn&#x00027;t change and the estimated parameter <inline-formula><mml:math id="M11"><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mover accent='true'><mml:mi>&#x003C0;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:mstyle></mml:math></inline-formula> converges.</p></list-item>
</list>
</p>
<p>When the region is long, the path for convergence can be painfully long and thus makes the computation infeasible. To save the computation burden, we will call the initial CNP using a HMM and then apply our haplotype-based method. The HMM we used are similar to others&#x00027; (Colella et al., <xref ref-type="bibr" rid="B5">2007</xref>; Wang et al., <xref ref-type="bibr" rid="B38">2007</xref>, <xref ref-type="bibr" rid="B37">2009</xref>) and we refer readers to Appendix A for details.</p>
</sec>
<sec>
<title>2.5. Missing data</title>
<p>When some <italic>b</italic><sub><italic>k</italic></sub>&#x00027;s and <italic>r</italic><sub><italic>k</italic></sub>&#x00027;s are missing, the genotype or haplotype at the missing loci can be inferred using the LD information around the missing loci.</p>
<p>For a single individual, if <italic>b</italic><sub><italic>k</italic></sub>, <italic>r</italic><sub><italic>k</italic></sub> at loci <italic>k</italic> &#x02208; D are missing, the contributions of that individual to the overall likelihood is changed to</p>
<p><graphic xlink:href="fgene-04-00165-i0004.tif"/></p>
<p>The key difference between likelihoods Equations (1) and (4) is that Equation (1) does not have the components corresponding to possible haplotypes at the unobserved loci for this individual. Computationally, missing data affect initial parameter estimation slightly and may also increase the computational complexity because the number of possible CNPs and haplotypes increases for those individuals with missing data.</p>
<p>If a locus is not genotyped at all, all individuals will not have the corresponding components, but the extended haplotypes in other reference populations can be used to infer the CNP within or outside of the CNV region.</p>
</sec>
<sec>
<title>2.6. Potential uses of our proposed method</title>
<p>Our proposed method can serve as either an independent calling method or a refining procedure upon the results from other calls. Please note that the initial calls from HMM are not necessary and other initial values can work in our method as well. The haplotype frequency can be estimated as a byproduct in a decent-size sample or can be borrowed from public database such as HapMap to improve the inference in small samples.</p>
</sec>
</sec>
<sec>
<title>3. Simulations</title>
<p>To evaluate the performance of our methods, we conducted a series of simulations to access: 1. sensitivity of detecting CNV intervals, i.e., how often we can correctly detect it when there is such a region; 2. true positive rate, i.e., how often our detected CNV region are true; and 3. length of truly and falsely detected CNV regions. The CNV length, frequency among population, and the haplotype structure, were varied to understand their influences on the CNV calls. Our method was compared with five existing methods. We also checked CNP recovery rate using our method under a missing data scenario.</p>
<p>For each data set, we simulated the LRR and BAF for 1000 unrelated individuals following three steps: First, 2000 independent chromosomes of 1000 individuals were generated containing 55,860 SNP loci, the same number of loci on Affymetrix Genome-Wide Human SNP Array 6.0 platform, along chromosome 4. We selected eight haplotype blocks with various length and LD structure, four medium-length and two long blocks with low to medium <italic>R</italic><sup>2</sup>, two medium-length blocks with high <italic>R</italic><sup>2</sup>, as shown in Table <xref ref-type="table" rid="T1">1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Summary of the selected haplotype blocks</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>Gene</bold></th>
<th align="left" valign="top"><bold>&#x00023; of loci</bold></th>
<th align="left" valign="top"><bold>Position</bold></th>
<th align="left" valign="top"><bold>Average <italic>R</italic><sup>2</sup></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">ADD1</td>
<td align="center">6</td>
<td align="left" valign="top">2,841,681&#x02013;2,893,241</td>
<td align="left" valign="top">0.1561</td>
</tr>
<tr>
<td align="left" valign="top">CORIN-3</td>
<td align="center">6</td>
<td align="left" valign="top">47,474,045&#x02013;47,531,963</td>
<td align="left" valign="top">0.1477</td>
</tr>
<tr>
<td align="left" valign="top">NR3C2-1</td>
<td align="center">6</td>
<td align="left" valign="top">149,461,059&#x02013;149,491,985</td>
<td align="left" valign="top">0.3305</td>
</tr>
<tr>
<td align="left" valign="top">NR3C2-2</td>
<td align="center">8</td>
<td align="left" valign="top">149,493,152&#x02013;149,496,672</td>
<td align="left" valign="top">0.0852</td>
</tr>
<tr>
<td align="left" valign="top">LOC285501-1</td>
<td align="center">26</td>
<td align="left" valign="top">179,864,756&#x02013;179,949,542</td>
<td align="left" valign="top">0.4225</td>
</tr>
<tr>
<td align="left" valign="top">LOC285501-2</td>
<td align="center">11</td>
<td align="left" valign="top">180,222,608&#x02013;180,252,886</td>
<td align="left" valign="top">0.8002</td>
</tr>
<tr>
<td align="left" valign="top">RP11-404J23.1-1</td>
<td align="center">7</td>
<td align="left" valign="top">180,322,430&#x02013;180,354,002</td>
<td align="left" valign="top">0.7500</td>
</tr>
<tr>
<td align="left" valign="top">RP11-404J23.1-2</td>
<td align="center">24</td>
<td align="left" valign="top">180,373,119&#x02013;180,428,267</td>
<td align="left" valign="top">0.6274</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>All genotypes were simulated using the allele frequencies in HapMap CEU population. In addition, two-locus LD and multi-locus LD were reserved outside and within the selected haplotype blocks, respectively. Specifically, outside of the haplotype blocks, the alleles at the first locus were generated from Bernoulli(<italic>p</italic><sub>1</sub>) where <italic>p</italic><sub>1</sub> is the population frequency of B allele in CEU; alleles at other loci were generated using the conditional probabilities given the previous alleles as observed in CEU. Within the selected haplotype blocks, the starting SNPs were simulated as before conditional on the alleles at previous locus only but the remaining alleles were simulated as haplotypes based on the conditional haplotype frequencies within the same block in the CEU population.</p>
<p>Second, within the eight selected haplotype blocks, deletion and duplication regions with fixed boundary were randomly chosen from populations and the haplotype piece within the CNV regions was deleted or inserted. At this stage, the true CNPs at all loci were generated.</p>
<p>Third, the LRR and BAF values were generated based on the conditional probability distributions discussed in section A with parameter <bold>&#x003BC;</bold> &#x0003D; (&#x02212;2, &#x02212;0.664, 0, 0.4, 1), <bold>&#x003C3;</bold> &#x0003D; (0.5714, 0.28, 0.2, 0.21, 0.3333), <bold>&#x003BC;</bold><sub><italic>b</italic></sub> &#x0003D; (0.1, 0.04, &#x02212;0.02, &#x02212;0.08, &#x02212;0.14) and &#x003C3;<sub><italic>b</italic></sub> &#x0003D; 0.1.</p>
<p>We considered multiple scenarios with varying parameters, specified as follows:
<list list-type="order">
<list-item><p>different length of the CNV regions (3&#x02013;5 in short blocks and 5&#x02013;20 in long blocks); We fixed the left boundary of each CNV region so that the LD can remain the same within each block and we can use the LD across different haplotype blocks to understand the influences of the LD between the boundary and other SNPs.</p></list-item>
<list-item><p>more and less frequent CNV; For more frequent CNV regions, we let the population frequencies of deletion and duplication to be 20 and 5%; for less frequent CNV regions, the population frequencies of deletion and duplication were set to be 5 and 1%.</p></list-item>
<list-item><p>random missing of LRR and BAF; we assumed a relatively high missing rate 1% for all loci and all individuals.</p></list-item>
</list>
</p>
<p>For each scenarios, 100 datasets were generated. Based on observed LRR and BAF, we applied our method hap-CNP to call CNPs and compared them with the true CNP. Several CNV calling procedures were compared, including a HMM in Wang et al. (<xref ref-type="bibr" rid="B37">2009</xref>) (WHMM), PennCNV for unrelated individuals (Wang et al., <xref ref-type="bibr" rid="B38">2007</xref>), a SCAN method (Jeng et al., <xref ref-type="bibr" rid="B19">2010</xref>), an integrative segmentation method segCNV using both the joint distribution of LRR and BAF (Shi and Li, <xref ref-type="bibr" rid="B33">2012</xref>), and cnvHap (Coin et al., <xref ref-type="bibr" rid="B4">2010</xref>) that is another HMM using two-locus haplotype distribution in transition probabilities.</p>
<p>PennCNV, SCAN, and segCNV only consider the copy numbers and can form good contrasts with ours for CNP. Comparing with WHMM and cnvHap can help us understand how much additional information we can gain using correlation within multi-locus haplotype and flexible boundary assumption.</p>
</sec>
<sec>
<title>4. Real data analysis</title>
<sec>
<title>4.1. Duplicated samples of patients with acute lung injury</title>
<p>In a genome-wide study to investigate the genetic effect on various trauma-induced clinical outcomes, cases with ALI and controls from at-risk trauma population at the University of Pennsylvania were recruited for genotyping. To control the quality of genotyping, 23 Caucasian ALI patients were randomly chosen to have duplicate serum samples, which were separately genotyped using Illumina HumanQuad610 BeadChip (Illumina, San Diego). Over 600,000 bin-tagging polymorphisms were included and normalized intensity data for each sample were loaded into Illumina Beadstudio 2.0. See Christie et al. (<xref ref-type="bibr" rid="B3">2012</xref>) for genotyping details.</p>
<p>To check the feasibility and reliability of our proposed method, we applied it to the 23 pairs of samples and compared our CNP calls with PennCNV, SCAN, and the normal genotypes called using Illumina&#x00027;s clustering algorithm without considering CNV. The summary statistics of raw signal data were checked prior to analysis. Samples with extreme values generally suggest low quality and were removed from the analysis. We used a sliding window of five SNPs through the whole chromosome to determine haplotype blocks. In each window, haplotype frequencies are estimated from the HapMap genotype data. Because the HapMap genotypes were generated mainly from Affymetrix platform, their strands sometimes were the complementary stands of our genotypes. We unified the strandness of two sets of genotypes and estimated the corresponding haplotype frequencies using the EM algorithm implemented in the <monospace>haplo.stats R</monospace> package.</p>
<p>We checked the concordance of the genotype of CNV calls from our and others&#x00027; methods and the recovery of CNP calls among regular genotype calls.</p>
</sec>
<sec>
<title>4.2. HapMap CEU samples</title>
<p>We also applied PennCNV, SCAN, WHMM, segCNV, and our method in 90 HapMap CEU samples genotyped by Affymetrix Genome-Wide Human SNP Array 6.0. To assess the accuracy of various calling methods, we checked the overlap between CNV calls from these methods and the CNV regions validated through array-CGH (Conrad et al., <xref ref-type="bibr" rid="B7">2010</xref>).</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>5. Results</title>
<sec>
<title>5.1. Simulation results</title>
<p>Table <xref ref-type="table" rid="T2">2</xref> summarizes the results from our method for eight haplotype blocks. The sensitivity ranged from 71.4 to 99.4% for more frequent case and from 72.0 to 99.6% for less frequent case, meaning that we can detect most of CNV regions in all cases. The true positive rate was larger than 97% in general, meaning that vast majority of our detected regions are true CNV regions.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Summary of CNP regions and genotypes called from hap-CNP</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>CNV type</bold></th>
<th align="left" valign="top"><bold>Haplotype block</bold></th>
<th align="left" valign="top"><bold>True CNV length</bold></th>
<th align="left" valign="top"><bold>Sensitivity (s.d.)</bold></th>
<th align="left" valign="top"><bold>True positive rate (s.d.)</bold></th>
<th align="left" valign="top"><bold>Within truly detected regions</bold></th>
<th align="left" valign="top"><bold>Within falsely detected regions</bold></th>
</tr>
<tr>
<th/>
<th/>
<th/>
<th/>
<th/>
<th align="left" valign="top"><bold>Length of CNV regions</bold></th>
<th align="left" valign="top"><bold>Length of CNV regions</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="3">More frequent</td>
<td align="left" valign="top">CORIN-3</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.714(0.026)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">3.064(0.662)</td>
<td align="left" valign="top">1.750(0.886)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.810(0.022)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">3.946(0.683)</td>
<td align="left" valign="top">2.500(1.509)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.839(0.025)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">4.794(0.711)</td>
<td align="left" valign="top">2.571(1.272)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">ADD1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.715(0.028)</td>
<td align="left" valign="top">1.000(0.001)</td>
<td align="left" valign="top">3.027(0.636)</td>
<td align="left" valign="top">1.000(0.000)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.817(0.026)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">3.900(0.660)</td>
<td align="left" valign="top">2.250(1.488)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.865(0.020)</td>
<td align="left" valign="top">1.000(0.001)</td>
<td align="left" valign="top">4.782(0.673)</td>
<td align="left" valign="top">1.000(NA)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.725(0.028)</td>
<td align="left" valign="top">0.973(0.015)</td>
<td align="left" valign="top">3.187(0.791)</td>
<td align="left" valign="top">1.368(0.797)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.824(0.025)</td>
<td align="left" valign="top">0.972(0.011)</td>
<td align="left" valign="top">3.961(0.663)</td>
<td align="left" valign="top">1.384(0.772)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.880(0.019)</td>
<td align="left" valign="top">0.972(0.013)</td>
<td align="left" valign="top">4.783(0.681)</td>
<td align="left" valign="top">1.467(0.979)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-2</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.713(0.027)</td>
<td align="left" valign="top">0.993(0.006)</td>
<td align="left" valign="top">3.148(0.675)</td>
<td align="left" valign="top">1.847(1.201)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.824(0.029)</td>
<td align="left" valign="top">0.984(0.010)</td>
<td align="left" valign="top">4.034(0.707)</td>
<td align="left" valign="top">1.633(1.105)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.878(0.020)</td>
<td align="left" valign="top">0.941(0.017)</td>
<td align="left" valign="top">5.014(0.806)</td>
<td align="left" valign="top">1.385(0.920)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">LOC285501-2</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.703(0.030)</td>
<td align="left" valign="top">1.000(0.001)</td>
<td align="left" valign="top">3.067(0.790)</td>
<td align="left" valign="top">2.000(0.000)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.767(0.029)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">4.230(1.149)</td>
<td align="left" valign="top">1.833(0.983)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.854(0.023)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">5.015(0.905)</td>
<td align="left" valign="top">2.800(1.398)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">RP11-404J23.1-1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.691(0.027)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">3.072(0.707)</td>
<td align="left" valign="top">2.100(1.449)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.792(0.027)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">3.978(0.761)</td>
<td align="left" valign="top">1.750(1.165)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.836(0.029)</td>
<td align="left" valign="top">0.999(0.001)</td>
<td align="left" valign="top">4.895(0.770)</td>
<td align="left" valign="top">2.800(0.837)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">LOC285501-1</td>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.866(0.022)</td>
<td align="left" valign="top">0.998(0.003)</td>
<td align="left" valign="top">5.041(0.915)</td>
<td align="left" valign="top">2.647(1.730)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">20</td>
<td align="left" valign="top">0.995(0.005)</td>
<td align="left" valign="top">0.998(0.002)</td>
<td align="left" valign="top">19.875(1.794)</td>
<td align="left" valign="top">2.158(1.344)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">RP11-404J23.1-2</td>
<td align="left" valign="top">15</td>
<td align="left" valign="top">0.988(0.007)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">14.908(1.628)</td>
<td align="left" valign="top">2.769(1.536)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">20</td>
<td align="left" valign="top">0.994(0.005)</td>
<td align="left" valign="top">0.999(0.002)</td>
<td align="left" valign="top">19.863(1.783)</td>
<td align="left" valign="top">2.125(1.586)</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="3">Less frequent</td>
<td align="left" valign="top">CORIN-3</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.720(0.059)</td>
<td align="left" valign="top">0.996(0.009)</td>
<td align="left" valign="top">3.102(0.698)</td>
<td align="left" valign="top">1.333(0.500)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.826(0.044)</td>
<td align="left" valign="top">0.997(0.007)</td>
<td align="left" valign="top">3.965(0.724)</td>
<td align="left" valign="top">2.429(1.397)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.852(0.049)</td>
<td align="left" valign="top">0.997(0.007)</td>
<td align="left" valign="top">4.796(0.759)</td>
<td align="left" valign="top">1.667(0.516)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">ADD1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.726(0.053)</td>
<td align="left" valign="top">0.997(0.008)</td>
<td align="left" valign="top">3.055(0.698)</td>
<td align="left" valign="top">1.000(0.000)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.829(0.043)</td>
<td align="left" valign="top">0.998(0.006)</td>
<td align="left" valign="top">3.980(0.706)</td>
<td align="left" valign="top">1.833(1.169)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.892(0.040)</td>
<td align="left" valign="top">0.998(0.005)</td>
<td align="left" valign="top">4.806(0.674)</td>
<td align="left" valign="top">1.800(0.837)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.736(0.052)</td>
<td align="left" valign="top">0.957(0.033)</td>
<td align="left" valign="top">3.074(0.749)</td>
<td align="left" valign="top">1.371(0.711)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.840(0.047)</td>
<td align="left" valign="top">0.952(0.032)</td>
<td align="left" valign="top">3.954(0.697)</td>
<td align="left" valign="top">1.370(0.699)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.881(0.047)</td>
<td align="left" valign="top">0.961(0.022)</td>
<td align="left" valign="top">4.811(0.699)</td>
<td align="left" valign="top">1.413(0.925)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-2</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.736(0.059)</td>
<td align="left" valign="top">0.987(0.018)</td>
<td align="left" valign="top">3.179(0.715)</td>
<td align="left" valign="top">1.893(1.474)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.835(0.054)</td>
<td align="left" valign="top">0.976(0.018)</td>
<td align="left" valign="top">4.085(0.763)</td>
<td align="left" valign="top">1.525(1.058)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.882(0.041)</td>
<td align="left" valign="top">0.912(0.040)</td>
<td align="left" valign="top">5.033(0.836)</td>
<td align="left" valign="top">1.469(1.044)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">LOC285501-2</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.695(0.064)</td>
<td align="left" valign="top">0.996(0.010)</td>
<td align="left" valign="top">3.063(0.841)</td>
<td align="left" valign="top">2.000(1.225)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.776(0.050)</td>
<td align="left" valign="top">0.995(0.012)</td>
<td align="left" valign="top">4.248(1.174)</td>
<td align="left" valign="top">2.182(1.168)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.877(0.043)</td>
<td align="left" valign="top">0.998(0.006)</td>
<td align="left" valign="top">5.090(1.002)</td>
<td align="left" valign="top">1.600(0.548)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">RP11-404J23.1-1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.702(0.065)</td>
<td align="left" valign="top">0.996(0.009)</td>
<td align="left" valign="top">3.084(0.764)</td>
<td align="left" valign="top">1.857(1.464)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.810(0.060)</td>
<td align="left" valign="top">0.998(0.006)</td>
<td align="left" valign="top">4.028(0.841)</td>
<td align="left" valign="top">3.000(0.707)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.864(0.039)</td>
<td align="left" valign="top">0.997(0.008)</td>
<td align="left" valign="top">4.933(0.835)</td>
<td align="left" valign="top">2.333(1.225)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">LOC285501-1</td>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.898(0.040)</td>
<td align="left" valign="top">0.992(0.012)</td>
<td align="left" valign="top">5.154(1.004)</td>
<td align="left" valign="top">2.476(1.436)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">20</td>
<td align="left" valign="top">0.997(0.007)</td>
<td align="left" valign="top">0.992(0.013)</td>
<td align="left" valign="top">20.002(1.689)</td>
<td align="left" valign="top">2.385(1.329)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">RP11-404J23.1-2</td>
<td align="left" valign="top">15</td>
<td align="left" valign="top">0.989(0.012)</td>
<td align="left" valign="top">0.995(0.008)</td>
<td align="left" valign="top">15.009(1.731)</td>
<td align="left" valign="top">2.846(1.405)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">20</td>
<td align="left" valign="top">0.996(0.008)</td>
<td align="left" valign="top">0.994(0.010)</td>
<td align="left" valign="top">19.951(1.692)</td>
<td align="left" valign="top">2.667(1.589)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As the length of true CNV interval increased with the same haplotype block, the sensitivity increased and the true positive rate remained similar. With comparable LD, the longer CNV regions across different blocks tend to have higher sensitivity and true positive rates. This is because haplotypes are more likely to be separated with longer CNV regions and in return the more accurate haplotype information can lead to better CNV detection.</p>
<p>In addition, we found the majority of the false negatives resided on the boundary rather than within the true region, which was also consistently observed in the results from other methods.</p>
<p>The sensitivity remained stable in both common and rare CNV intervals, while the true positive rate was slightly higher for common CNV regions than that for rare CNV regions. Though we expected better performance in common CNV than rare CNV, the sample size we generated was large enough to give reliable haplotype inference. So the frequency of CNV doesn&#x00027;t play much role in studies with a decent sample size (<italic>n</italic> &#x0007E; 1000). For smaller sample size (<italic>n</italic> &#x0003C; 200), common CNVs could lead to much better inference in Haplotypes than rare CNVs and the difference can be larger.</p>
<p>There was no clear trend of sensitivity change as LD gets stronger from CORIN-3, ADD1, NR3C2-1 to LOC285501-2 (overall average <italic>R</italic><sup>2</sup> increases from 0.15 to 0.81). Even the LD is assumed to help the inference of CNP, the excess high LD may not necessarily lead to much accuracy gain and the little gain may be covered by the boundary LD and length of the CNV regions.</p>
<p>In truly detected CNV regions, the average length of the regions was close to the truth; while in falsely detected CNV regions, the average length was around 1, meaning the most of the falsely detected CNV were singletons. On the contrary, almost all detected singletons were false CNVs and the longer detected regions were more likely to be the true CNV regions. The average lengths of detected CNV intervals were similar for common and rare cases while the variability was bigger for rare case due to the small sample size of CNV intervals.</p>
<p>As a comparison, the results from other methods including PennCNV, SCAN, WHMM, cnvHap, and segCNV are summarized in Table <xref ref-type="table" rid="T3">3</xref>. In general, PennCNV underestimated CNV regions and more often it happened when the true CNV regions were short. WHMM, SCAN and segCNV methods were more sensitive than PennCNV, but slightly less than our method. Interestingly, although PennCNV detected fewer CNV loci and regions, their detected CNV regions were often longer than the truth. WHMM yielded much more CNV regions than the truth, which resulted in high sensitivity with small true positive rate. SCAN had similar performance as ours for less frequent cases, while it has smaller sensitivity than ours for more frequent cases. SCAN showed slightly longer falsely detected CNV regions than our methods in both cases. segCNV, as a partitioning method, performs similar to SCAN. cnvHap, as the only method using haplotypes we are comparing with, has superior true positive rates than others in general. For short CNV regions, it has better sensitivity than others but this advantage quickly diminishes as CNV regions become longer. On one hand, cnvHap&#x00027;s results support the advantage of using haplotype information but its usage of two-locus haplotypes may not be ideal for a long block, which seems improved in our method. It is counter-intuitive to see that cnvHap perform worse for longer CNVs. This is partially because the results are sensitive to the choice of blocks, even with true values, in the customized normalization step of cnvHap.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Summary of CNV region and genotype calls from all methods</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>Haplotype block</bold></th>
<th align="left" valign="top"><bold>Method</bold></th>
<th align="left" valign="top"><bold>True CNV length</bold></th>
<th align="left" valign="top"><bold>Sensitivity (s.d.)</bold></th>
<th align="left" valign="top"><bold>True positive rate (s.d.)</bold></th>
<th align="left" valign="top"><bold>Within truly detected regions</bold></th>
<th align="left" valign="top"><bold>Within falsely detected regions</bold></th>
</tr>
<tr>
<th/>
<th/>
<th/>
<th/>
<th/>
<th align="left" valign="top"><bold>Length of CNV regions</bold></th>
<th align="left" valign="top"><bold>Length of CNV regions</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="3">NR3C2-1 More frequent</td>
<td align="left" valign="top">hap-CNP</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.725(0.029)</td>
<td align="left" valign="top">0.973(0.015)</td>
<td align="left" valign="top">3.187(0.791)</td>
<td align="left" valign="top">1.368(0.797)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.824(0.025)</td>
<td align="left" valign="top">0.972(0.011)</td>
<td align="left" valign="top">3.961(0.663)</td>
<td align="left" valign="top">1.384(0.772)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.880(0.019)</td>
<td align="left" valign="top">0.972(0.013)</td>
<td align="left" valign="top">4.783(0.681)</td>
<td align="left" valign="top">1.467(0.979)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">PennCNV</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.167(0.024)</td>
<td align="left" valign="top">0.868(0.052)</td>
<td align="left" valign="top">4.252(1.017)</td>
<td align="left" valign="top">3.642(1.737)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.534(0.035)</td>
<td align="left" valign="top">0.912(0.023)</td>
<td align="left" valign="top">4.395(0.611)</td>
<td align="left" valign="top">3.162(1.749)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.755(0.024)</td>
<td align="left" valign="top">0.919(0.023)</td>
<td align="left" valign="top">4.952(0.447)</td>
<td align="left" valign="top">2.970(1.728)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">SCAN</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.556(0.033)</td>
<td align="left" valign="top">0.989(0.008)</td>
<td align="left" valign="top">3.265(1.036)</td>
<td align="left" valign="top">2.065(1.576)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.723(0.030)</td>
<td align="left" valign="top">0.986(0.009)</td>
<td align="left" valign="top">4.062(0.847)</td>
<td align="left" valign="top">1.851(1.422)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.821(0.024)</td>
<td align="left" valign="top">0.984(0.009)</td>
<td align="left" valign="top">4.785(0.712)</td>
<td align="left" valign="top">1.910(1.379)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">WHMM</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.637(0.027)</td>
<td align="left" valign="top">0.968(0.016)</td>
<td align="left" valign="top">1.846(0.839)</td>
<td align="left" valign="top">2.069(1.317)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.747(0.026)</td>
<td align="left" valign="top">0.956(0.013)</td>
<td align="left" valign="top">2.497(1.132)</td>
<td align="left" valign="top">2.059(1.361)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.806(0.025)</td>
<td align="left" valign="top">0.947(0.015)</td>
<td align="left" valign="top">3.317(1.375)</td>
<td align="left" valign="top">2.209(1.419)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">cnvHap</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.888(0.114)</td>
<td align="left" valign="top">1.000(0.001)</td>
<td align="left" valign="top">3.010(0.114)</td>
<td align="left" valign="top">1.000(NA)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.572(0.065)</td>
<td align="left" valign="top">1.000(0.002)</td>
<td align="left" valign="top">4.067(0.250)</td>
<td align="left" valign="top">5.000(NA)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.217(0.079)</td>
<td align="left" valign="top">1.000(0.000)</td>
<td align="left" valign="top">4.992(0.091)</td>
<td align="left" valign="top">NA(NA)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">segCNV</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.275(0.024)</td>
<td align="left" valign="top">0.984(0.017)</td>
<td align="left" valign="top">3.239(0.633)</td>
<td align="left" valign="top">2.546(1.792)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.603(0.029)</td>
<td align="left" valign="top">0.989(0.009)</td>
<td align="left" valign="top">3.360(0.697)</td>
<td align="left" valign="top">2.152(1.564)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.820(0.022)</td>
<td align="left" valign="top">0.987(0.006)</td>
<td align="left" valign="top">4.064(0.724)</td>
<td align="left" valign="top">2.500(1.789)</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="3">NR3C2-1 Less frequent</td>
<td align="left" valign="top">hap-CNP</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.736(0.052)</td>
<td align="left" valign="top">0.957(0.033)</td>
<td align="left" valign="top">3.074(0.749)</td>
<td align="left" valign="top">1.371(0.711)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.840(0.047)</td>
<td align="left" valign="top">0.952(0.032)</td>
<td align="left" valign="top">3.954(0.697)</td>
<td align="left" valign="top">1.370(0.699)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.881(0.047)</td>
<td align="left" valign="top">0.961(0.022)</td>
<td align="left" valign="top">4.811(0.699)</td>
<td align="left" valign="top">1.413(0.925)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">PennCNV</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.123(0.043)</td>
<td align="left" valign="top">0.560(0.145)</td>
<td align="left" valign="top">3.841(0.962)</td>
<td align="left" valign="top">3.556(1.699)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.499(0.068)</td>
<td align="left" valign="top">0.806(0.078)</td>
<td align="left" valign="top">4.231(0.515)</td>
<td align="left" valign="top">3.525(1.814)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.735(0.058)</td>
<td align="left" valign="top">0.834(0.053)</td>
<td align="left" valign="top">4.956(0.406)</td>
<td align="left" valign="top">3.216(1.760)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">SCAN</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.576(0.062)</td>
<td align="left" valign="top">0.982(0.021)</td>
<td align="left" valign="top">2.972(0.788)</td>
<td align="left" valign="top">2.121(1.763)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.751(0.047)</td>
<td align="left" valign="top">0.980(0.020)</td>
<td align="left" valign="top">3.905(0.765)</td>
<td align="left" valign="top">1.826(1.465)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.838(0.055)</td>
<td align="left" valign="top">0.983(0.016)</td>
<td align="left" valign="top">4.749(0.739)</td>
<td align="left" valign="top">2.023(1.640)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">WHMM</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.294(0.064)</td>
<td align="left" valign="top">0.971(0.038)</td>
<td align="left" valign="top">1.615(0.727)</td>
<td align="left" valign="top">1.692(0.970)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.434(0.062)</td>
<td align="left" valign="top">0.970(0.034)</td>
<td align="left" valign="top">2.075(0.989)</td>
<td align="left" valign="top">1.452(0.832)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.535(0.061)</td>
<td align="left" valign="top">0.970(0.028)</td>
<td align="left" valign="top">2.506(1.236)</td>
<td align="left" valign="top">1.694(1.103)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">cnvHap</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.782(0.064)</td>
<td align="left" valign="top">1.000(0.000)</td>
<td align="left" valign="top">3.224(0.432)</td>
<td align="left" valign="top">NA(NA)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.824(0.057)</td>
<td align="left" valign="top">1.000(0.000)</td>
<td align="left" valign="top">4.474(0.500)</td>
<td align="left" valign="top">NA(NA)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.811(0.075)</td>
<td align="left" valign="top">0.998(0.005)</td>
<td align="left" valign="top">4.826(0.379)</td>
<td align="left" valign="top">5.000(0.000)</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">segCNV</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.348(0.060)</td>
<td align="left" valign="top">0.987(0.021)</td>
<td align="left" valign="top">3.118(0.482)</td>
<td align="left" valign="top">3.500(1.871)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.672(0.066)</td>
<td align="left" valign="top">0.990(0.015)</td>
<td align="left" valign="top">3.283(0.540)</td>
<td align="left" valign="top">2.750(1.669)</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.853(0.047)</td>
<td align="left" valign="top">0.990(0.015)</td>
<td align="left" valign="top">4.043(0.695)</td>
<td align="left" valign="top">4.100(2.183)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Tables <xref ref-type="table" rid="T2">2</xref>, <xref ref-type="table" rid="T3">3</xref> demonstrate that our method provides the most sensitive and accurate results among the methods considered.</p>
<p>When we randomly selected 1% loci to have missing <italic>b</italic> and <italic>r</italic> values, the CNP at missing loci were estimated using neighbor markers as described in section 2.5. As described in Table <xref ref-type="table" rid="T4">4</xref>, recovery rate is generally higher for longer CNV regions. Missing loci in rare case are more likely to be correctly recovered than those in more frequent cases. For less frequent cases, there are more normal copy number loci which can be used for accurate haplotype frequency estimation.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p><bold>Missing recovery using haplotypes</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>CNV type</bold></th>
<th align="left" valign="top"><bold>Haplotype block</bold></th>
<th align="left" valign="top"><bold>True CNV length</bold></th>
<th align="left" valign="top"><bold>No of missing</bold></th>
<th align="left" valign="top"><bold>No of recovery</bold></th>
<th align="left" valign="top"><bold>No of correct</bold></th>
<th align="left" valign="top"><bold>Correctness rate</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="4">More frequent</td>
<td align="left" valign="top">CORIN-3</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">158</td>
<td align="left" valign="top">158</td>
<td align="left" valign="top">79</td>
<td align="left" valign="top">0.500</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">196</td>
<td align="left" valign="top">196</td>
<td align="left" valign="top">155</td>
<td align="left" valign="top">0.791</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">235</td>
<td align="left" valign="top">235</td>
<td align="left" valign="top">181</td>
<td align="left" valign="top">0.770</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">6</td>
<td align="left" valign="top">261</td>
<td align="left" valign="top">261</td>
<td align="left" valign="top">211</td>
<td align="left" valign="top">0.808</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">ADD1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">151</td>
<td align="left" valign="top">151</td>
<td align="left" valign="top">99</td>
<td align="left" valign="top">0.656</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">175</td>
<td align="left" valign="top">175</td>
<td align="left" valign="top">114</td>
<td align="left" valign="top">0.651</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">263</td>
<td align="left" valign="top">263</td>
<td align="left" valign="top">195</td>
<td align="left" valign="top">0.741</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">129</td>
<td align="left" valign="top">129</td>
<td align="left" valign="top">65</td>
<td align="left" valign="top">0.504</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">204</td>
<td align="left" valign="top">204</td>
<td align="left" valign="top">134</td>
<td align="left" valign="top">0.657</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">252</td>
<td align="left" valign="top">252</td>
<td align="left" valign="top">187</td>
<td align="left" valign="top">0.742</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-2</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">140</td>
<td align="left" valign="top">140</td>
<td align="left" valign="top">81</td>
<td align="left" valign="top">0.579</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">198</td>
<td align="left" valign="top">198</td>
<td align="left" valign="top">135</td>
<td align="left" valign="top">0.682</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">248</td>
<td align="left" valign="top">248</td>
<td align="left" valign="top">188</td>
<td align="left" valign="top">0.758</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="4">Less frequent</td>
<td align="left" valign="top">CORIN-3</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">34</td>
<td align="left" valign="top">34</td>
<td align="left" valign="top">20</td>
<td align="left" valign="top">0.588</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">44</td>
<td align="left" valign="top">44</td>
<td align="left" valign="top">39</td>
<td align="left" valign="top">0.886</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">56</td>
<td align="left" valign="top">56</td>
<td align="left" valign="top">47</td>
<td align="left" valign="top">0.839</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">6</td>
<td align="left" valign="top">71</td>
<td align="left" valign="top">71</td>
<td align="left" valign="top">58</td>
<td align="left" valign="top">0.817</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">ADD1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">29</td>
<td align="left" valign="top">29</td>
<td align="left" valign="top">16</td>
<td align="left" valign="top">0.552</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">41</td>
<td align="left" valign="top">41</td>
<td align="left" valign="top">28</td>
<td align="left" valign="top">0.683</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">65</td>
<td align="left" valign="top">65</td>
<td align="left" valign="top">47</td>
<td align="left" valign="top">0.723</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-1</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">32</td>
<td align="left" valign="top">32</td>
<td align="left" valign="top">21</td>
<td align="left" valign="top">0.656</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">43</td>
<td align="left" valign="top">43</td>
<td align="left" valign="top">27</td>
<td align="left" valign="top">0.628</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">55</td>
<td align="left" valign="top">55</td>
<td align="left" valign="top">44</td>
<td align="left" valign="top">0.800</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">NR3C2-2</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">43</td>
<td align="left" valign="top">43</td>
<td align="left" valign="top">25</td>
<td align="left" valign="top">0.581</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">4</td>
<td align="left" valign="top">58</td>
<td align="left" valign="top">58</td>
<td align="left" valign="top">35</td>
<td align="left" valign="top">0.603</td>
</tr>
<tr>
<td/>
<td/>
<td align="left" valign="top">5</td>
<td align="left" valign="top">63</td>
<td align="left" valign="top">63</td>
<td align="left" valign="top">52</td>
<td align="left" valign="top">0.825</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For haplotype inference, we checked the haplotype frequency estimates from our method and from HaploView using all individuals with normal copies. Within the NR3C2 block, the sum of squared errors of estimated frequencies vs. the true frequencies from our method and from HaploView had a mean of 0.0007 and 0.0011, respectively. This shows a better accuracy of inferring haplotypes using CNP, as a byproduct of our method. We found that our method can give similar results no matter how long the CNV interval is but the estimates can be more accurate in common CNV regions than less frequent CNV regions (data not shown). That&#x00027;s not surprising because with common CNV regions, more individuals have CNV and those more informative haplotypes.</p>
<p>Based on 500,000 simulations, the average computing times for one chromosome were 10.5 s for PennCNV, 77.7 s for WHMM, 1.0 s for SCAN, 3.2 s for segCNV, and 81.5 s for our hap-CNP. cnvHap requires customized input for each haplotype block and thus we only ran it over our tested blocks, which took 137.7 s per simulation. As expected, SCAN is extremely fast; PennCNV is efficient using the forward-backward algorithm for HMM and it was developed for calling CNV only without allelic specifications. WHMM and our hap-CNP are comparable though we allow copy numbers to range from 0 to 4, more states than 1&#x02013;3 in WHMM. cnvHap requires more computing time and would become challenging to run over a whole chromosome, mostly because the number of &#x0201C;haplotypes&#x0201D; increases considerably with all possible CNVs.</p>
</sec>
<sec>
<title>5.2. CNPs on chromosomes 1 of duplicated samples</title>
<p>Without loss of generality, we reported the results of CNP calls on chromosome 1. Among 46 samples, two had either extremely large variance or median absolute deviation and thus were removed from further analysis. There are 1317 loci missing in LRR or BAF among all subjects. Among them 22 loci were recovered from our algorithm.</p>
<p>Table <xref ref-type="table" rid="T5">5</xref> summarizes the total number of CNV calls on chromosome 1 from one set of samples in contrast of the calls from the other set of samples, using our method, PennCNV, and SCAN.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p><bold>Concordance of copy numbers between duplicated samples</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th align="center" valign="top" colspan="5"><bold>hap-CNP</bold></th>
<th align="center" valign="top" colspan="5"><bold>SCAN</bold></th>
<th align="center" valign="top" colspan="5"><bold>PennCNV</bold></th>
</tr>
<tr>
<th/>
<th align="left" valign="top"><bold>0</bold></th>
<th align="left" valign="top"><bold>1</bold></th>
<th align="left" valign="top"><bold>2</bold></th>
<th align="left" valign="top"><bold>3</bold></th>
<th align="left" valign="top"><bold>NC</bold><xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></th>
<th/>
<th align="left" valign="top"><bold>0</bold></th>
<th align="left" valign="top"><bold>1</bold></th>
<th align="left" valign="top"><bold>2</bold></th>
<th align="left" valign="top"><bold>3</bold></th>
<th align="left" valign="top"><bold>NC</bold><xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></th>
<th/>
<th align="left" valign="top"><bold>0</bold></th>
<th align="left" valign="top"><bold>1</bold></th>
<th align="left" valign="top"><bold>2</bold></th>
<th align="left" valign="top"><bold>3</bold></th>
<th align="left" valign="top"><bold>NC</bold><xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">0</td>
<td align="right">95</td>
<td align="right">10</td>
<td align="right">7</td>
<td align="right">0</td>
<td align="right">2</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">3</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
</tr>
<tr>
<td align="left" valign="top">1</td>
<td align="right">10</td>
<td align="right">649</td>
<td align="right">1756</td>
<td align="right">6</td>
<td align="right">7</td>
<td align="right">0</td>
<td align="right">868</td>
<td align="right">1718</td>
<td align="right">7</td>
<td align="right">5</td>
<td align="right">0</td>
<td align="right">142</td>
<td align="right">78</td>
<td align="right">0</td>
<td align="right">0</td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="right">23</td>
<td align="right">8096</td>
<td align="right">973,114</td>
<td align="right">1195</td>
<td align="right">1050</td>
<td align="right">0</td>
<td align="right">9734</td>
<td align="right">962,981</td>
<td align="right">4093</td>
<td align="right">1054</td>
<td align="right">0</td>
<td align="right">217</td>
<td align="right">987,122</td>
<td align="right">123</td>
<td align="right">970</td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="right">1</td>
<td align="right">149</td>
<td align="right">2533</td>
<td align="right">379</td>
<td align="right">2</td>
<td align="right">0</td>
<td align="right">106</td>
<td align="right">7922</td>
<td align="right">575</td>
<td align="right">7</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">154</td>
<td align="right">265</td>
<td align="right">1</td>
</tr>
<tr>
<td align="left" valign="top">NC<xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">133</td>
<td align="right">0</td>
<td align="right">53</td>
<td align="right">0</td>
<td align="right">1</td>
<td align="right">143</td>
<td align="right">1</td>
<td align="right">53</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">146</td>
<td align="right">0</td>
<td align="right">47</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1"><label>&#x0002A;</label><p><italic>No call</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The copy number concordance rates were 98.6, 97.5, and 99.8% for our method, SCAN and PennCNV, respectively. PennCNV showed higher concordance rate due to its conservative detection of CNV while our method and SCAN detected more CNV loci. The copy number discordant rate was 1.00% in our method, mainly caused by three individuals whose <italic>r</italic> is further away from majority of individuals.</p>
<p>Table <xref ref-type="table" rid="T6">6</xref> summarizes the number of normal SNP genotype calls from one set of samples compared with the other set of samples. The concordance rate of regular genotypes (<sc>cn</sc> &#x0003D; 2) using our method was 99.95% and for BeadStudio was 99.99%. But our method had a no-call rate of 0.13%, much smaller than 4.35% from BeadStudio. BeadStudio provides high amount of no-calls to maintain the concordance rate almost perfect. While our method extracts more information from the BeadStudio&#x00027;s no-calls, that is, among 43,018 no-call loci from BeadStudio, 38,962 loci were genotyped concordantly by our method. Hence, there was a trade-off between no call rate and concordance rate.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p><bold>Concordance of normal genotypes between duplicated samples</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th align="center" valign="top" colspan="4"><bold>hap-CNP</bold></th>
<th align="center" valign="top" colspan="4"><bold>BeadStudio</bold></th>
</tr>
<tr>
<th/>
<th/>
<th align="left" valign="top"><bold>AA</bold></th>
<th align="left" valign="top"><bold>AB</bold></th>
<th align="left" valign="top"><bold>BB</bold></th>
<th align="left" valign="top"><bold>NC<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref></bold></th>
<th/>
<th align="left" valign="top"><bold>AA</bold></th>
<th align="left" valign="top"><bold>AB</bold></th>
<th align="left" valign="top"><bold>BB</bold></th>
<th align="left" valign="top"><bold>NC<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">AA</td>
<td/>
<td align="right" valign="top">315,664</td>
<td align="right" valign="top">89</td>
<td align="right" valign="top">7</td>
<td align="right" valign="top">394</td>
<td/>
<td align="right" valign="top">300,050</td>
<td align="right" valign="top">12</td>
<td align="right" valign="top">7</td>
<td align="right" valign="top">698</td>
</tr>
<tr>
<td align="left" valign="top">AB</td>
<td/>
<td align="right" valign="top">162</td>
<td align="right" valign="top">301,801</td>
<td align="right" valign="top">123</td>
<td align="right" valign="top">297</td>
<td/>
<td align="right" valign="top">11</td>
<td align="right" valign="top">299,085</td>
<td align="right" valign="top">36</td>
<td align="right" valign="top">579</td>
</tr>
<tr>
<td align="left" valign="top">BB</td>
<td/>
<td align="right" valign="top">1</td>
<td align="right" valign="top">127</td>
<td align="right" valign="top">358,748</td>
<td align="right" valign="top">354</td>
<td/>
<td align="right" valign="top">6</td>
<td align="right" valign="top">24</td>
<td align="right" valign="top">347,019</td>
<td align="right" valign="top">565</td>
</tr>
<tr>
<td align="left" valign="top">NC<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref></td>
<td/>
<td align="right" valign="top">37</td>
<td align="right" valign="top">36</td>
<td align="right" valign="top">51</td>
<td align="right" valign="top">53</td>
<td/>
<td align="right" valign="top">94</td>
<td align="right" valign="top">167</td>
<td align="right" valign="top">272</td>
<td align="right" valign="top">40,643</td>
</tr>
<tr>
<td align="left" valign="top">Total</td>
<td/>
<td align="right" valign="top">315,864</td>
<td align="right" valign="top">302,057</td>
<td align="right" valign="top">358,748</td>
<td align="right" valign="top">1098</td>
<td/>
<td align="right" valign="top">300,161</td>
<td align="right" valign="top">299,298</td>
<td align="right" valign="top">347,334</td>
<td align="right" valign="top">42,485</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN2"><label>&#x0002A;</label><p><italic>No call</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>5.3. CNPs on chromosomes 1 of CEU samples</title>
<p>Table <xref ref-type="table" rid="T7">7</xref> summarizes CNV regions on chromosome 1 of CEU samples detected by PennCNV, SCAN, segCNV, WHMM, and our hap-CNP in overlap with the CNV regions validated through array CGH and with each other. Among a total of 26.5 Mb CNV regions detected by hap-CNV, more than 80% are in the validated regions. PennCNV and SCAN detected much less but most of their detected regions are covered by the validation set. Overall, we have detected 70, 57, 67, and 78% unique CNVs which cannot be found in PennCNV, SCAN, segCNV, and WHMM. Despite that the majority of CNV regions were among validated regions, the percentage of all validated regions covered by each approach was tiny (0.13, 0.06, 0.10, 0.09, and 5.16%, respectively), suggesting genotyping platform may have limited sensitivity for CNV detecting compared with aCGH. When we checked individual calls, most of long CNV regions were called by all five algorithms but there is no persistent optimal choice of an algorithm. As example regions from three samples are shown in Figure <xref ref-type="fig" rid="F2">2</xref>, long deletion and duplication regions were detected by all five methods in <bold>(A)</bold> and <bold>(B)</bold> but a small deletion CNV region was detected only by our method.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p><bold>Number of CNV calls on chromosome 1 of CEU samples</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th align="center" valign="top"><bold>Total CNV (Mb)</bold></th>
<th align="center" valign="top" colspan="6"><bold>Overlap with</bold></th>
</tr>
<tr>
<th/>
<th/>
<th align="left" valign="top"><bold>aCGH (%)</bold></th>
<th align="left" valign="top"><bold>hap-CNP (%)</bold></th>
<th align="left" valign="top"><bold>PennCNV (%)</bold></th>
<th align="left" valign="top"><bold>SCAN (%)</bold></th>
<th align="left" valign="top"><bold>segCNV (%)</bold></th>
<th align="left" valign="top"><bold>WHMM (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">hap-CNP</td>
<td align="char" char=".">26.53</td>
<td align="char" char=".">82.6</td>
<td align="left" valign="top">&#x02013;</td>
<td align="char" char=".">30.0</td>
<td align="char" char=".">43.3</td>
<td align="char" char=".">32.7</td>
<td align="char" char=".">21.7</td>
</tr>
<tr>
<td align="left" valign="top">PennCNV</td>
<td align="char" char=".">8.39</td>
<td align="char" char=".">90.2</td>
<td align="char" char=".">87.1</td>
<td align="left" valign="top">&#x02013;</td>
<td align="char" char=".">82.9</td>
<td align="char" char=".">71.2</td>
<td align="char" char=".">27.8</td>
</tr>
<tr>
<td align="left" valign="top">SCAN</td>
<td align="char" char=".">18.37</td>
<td align="char" char=".">87.5</td>
<td align="char" char=".">61.8</td>
<td align="char" char=".">38.8</td>
<td align="left" valign="top">&#x02013;</td>
<td align="char" char=".">48.5</td>
<td align="char" char=".">20.0</td>
</tr>
<tr>
<td align="left" valign="top">segCNV</td>
<td align="char" char=".">17.37</td>
<td align="char" char=".">82.7</td>
<td align="char" char=".">49.4</td>
<td align="char" char=".">37.0</td>
<td align="char" char=".">51.5</td>
<td align="left" valign="top">&#x02013;</td>
<td align="char" char=".">16.8</td>
</tr>
<tr>
<td align="left" valign="top">WHMM</td>
<td align="char" char=".">1097.15</td>
<td align="char" char=".">72.7</td>
<td align="char" char=".">0.4</td>
<td align="char" char=".">0.2</td>
<td align="char" char=".">0.3</td>
<td align="char" char=".">0.3</td>
<td align="left" valign="top">&#x02013;</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Detected CNV intervals.</bold> Three regions from three different HapMap samples were taken to show the detected regions. <bold>(A,B)</bold> Show deleted and duplicated regions detected by all five methods. <bold>(C)</bold> Shows a deleted region only detected by our method. All regions are located in chromosome 1.</p></caption>
<graphic xlink:href="fgene-04-00165-g0002.tif"/>
</fig>
<p>Computation time was in similar scales as in simulations. For 90 CEU individuals, the average computing time on chromosome 1 were 1.0, 16.9, 3.2, 48.2, and 28.8 s for SCAN, PennCNV, segCNV, WHMM, and our method, respectively.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>6. Discussion</title>
<p>SNPs and CNVs may affect phenotypes separately or jointly and the accuracy of their call can affect the results of association studies. Ignoring CNVs during SNP genotyping may lead to failure to capture the true underlying sequence at many sites and can create the appearance of violations of Mendelian inheritance or Hardy&#x02013;Weinberg equilibrium where in fact none exists. Using only CNV while ignoring the allelic information in the association studies may fail to incorporate allele-specific gains and losses and diminish the potential to exploit LD between CNVs and nearby SNPs (ongoing study). Association analysis using copy number only without differentiating alleles can dilute the effect size and the power, as shown in both simulations and real studies of insulin and schizophrenia (Hu et al., under review; Irvin et al., <xref ref-type="bibr" rid="B18">2011</xref>).</p>
<p>We didn&#x00027;t separate LOH from the CNV calls, but it can be checked as a special class from our call, i.e., regions with <italic>c</italic><sub><italic>A</italic></sub> &#x000B7; <italic>c</italic><sub><italic>B</italic></sub> &#x0003D; 0 and <italic>c</italic><sub><italic>A</italic></sub> &#x0002B; <italic>c</italic><sub><italic>B</italic></sub> &#x0003D; 2. Due to the limitation of genotyping platforms, our method can not detect interchromosome duplication and dispersed segmental duplication, which can be discovered using genomic sequence data.</p>
<p>Our assumption for haplotype-based CNV inference means that within a region, the deletion/duplication piece cannot end at locus <italic>T</italic> on one chromosome and then occur immediately again from <italic>T</italic> &#x0002B; 1 on the other. If in reality this occurs, one more parameter of the event probability can be incorporated in the likelihood, which will result in much longer computation time as a trade-off.</p>
<p>In duplication regions, our method also relies on the assumption of a nearby haplotype being duplicated. In reality, exceptions could occur, which may affect the performance or our method in uncertain ways. So users should be cautious about the inference on the duplication regions when the assumption is in doubt. Further investigations on how likely this would happen and what bias it leads to are warranted.</p>
<p>The genotypes at the loci with missing LRR/BAF values can be inferred using the neighborhood haplotype information. Depending on whether the loci are at the boundary of or within CNV regions, the copy numbers may not be accurately recovered. This can also be used for imputing CNP genotypes of some individuals that were genotyped using a platform different from others.</p>
<p>We used the estimates from an HMM as the initial values for our proposed haplotype-based method to expedite the computation. We also checked the robustness of our method using other initial estimate such as clusters based on arbitrary cut-off values for LRR and BAF. We found the performance of our method was consistent. For the long regions, the initial calls from HMM were generally reliable and had little space for improvement. But for short regions where our model assumptions are more likely to meet, our method yielded more reliable and accurate calls. These finding were consistent as reported in Wang et al. (<xref ref-type="bibr" rid="B37">2009</xref>).</p>
<p>Whether real chromosomes can be partitioned as unrelated haplotype blocks is still a question, early studies (Daly et al., <xref ref-type="bibr" rid="B8">2001</xref>; Patil et al, <xref ref-type="bibr" rid="B28">2001</xref>; Dawson et al, <xref ref-type="bibr" rid="B9">2002</xref>; Gabriel et al, <xref ref-type="bibr" rid="B14">2002</xref>; Zhang et al., <xref ref-type="bibr" rid="B41">2003</xref>) has shown the rational and feasibilities of separated blocks&#x00027; representation. So we adopted known haplotype blocks in our simulation. As a limitation of the algorithm, the data generated can only have similar local LD patterns as in the HapMap CEU population.</p>
<p>In analysis of HapMap data, we used sliding-window approach to avoid the selection of haplotype blocks. We have tested a few similar window sizes, which resulted little differences. But longer blocks can cause problem in haplotype estimates and slow down the algorithms even though they worked fine in our simulations. In addition, since there is more chances to detect longer CNV and less space for improvement, using long sliding windows may be not efficient in whole-genome scan.</p>
<p>The mutual benefits of haplotype and CNP inference can be applied to other data such as next generation sequence data, as in our ongoing work.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack>
<p>Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R01GM088566.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barnes</surname> <given-names>C.</given-names></name> <name><surname>Plagnol</surname> <given-names>V.</given-names></name> <name><surname>Fitzgerald</surname> <given-names>T.</given-names></name> <name><surname>Redon</surname> <given-names>R.</given-names></name> <name><surname>Marchini</surname> <given-names>J.</given-names></name> <name><surname>Clayton</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>A robust statistical method for case-control association testing with copy number variation</article-title>. <source>Nat. Genet</source>. <volume>40</volume>, <fpage>1245</fpage>&#x02013;<lpage>1252</lpage>. <pub-id pub-id-type="doi">10.1038/ng.206</pub-id><pub-id pub-id-type="pmid">18776912</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baum</surname> <given-names>L. E.</given-names></name> <name><surname>Petrie</surname> <given-names>T.</given-names></name> <name><surname>Soules</surname> <given-names>G.</given-names></name> <name><surname>Weiss</surname> <given-names>N.</given-names></name></person-group> (<year>1970</year>). <article-title>A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains</article-title>. <source>Ann. Math. Statist</source>. <volume>41</volume>, <fpage>164</fpage>&#x02013;<lpage>171</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177697196</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Christie</surname> <given-names>J. D.</given-names></name> <name><surname>Wurfel</surname> <given-names>M. M.</given-names></name> <name><surname>Feng</surname> <given-names>R.</given-names></name> <name><surname>O&#x00027;Keefe</surname> <given-names>G. E.</given-names></name> <name><surname>Bradfield</surname> <given-names>J.</given-names></name> <name><surname>Ware</surname> <given-names>L. B.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Genome wide association identifies PPFIA1 as a candidate gene for acute lung injury risk following major trauma</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e28268</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0028268</pub-id><pub-id pub-id-type="pmid">22295056</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coin</surname> <given-names>L. J. M.</given-names></name> <name><surname>Asher</surname> <given-names>J. E.</given-names></name> <name><surname>Walters</surname> <given-names>R. G.</given-names></name> <name><surname>Moustafa</surname> <given-names>J. S. E.-S.</given-names></name> <name><surname>de Smith</surname> <given-names>A. J.</given-names></name> <name><surname>Sladek</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>cnvHap: an integrative population and haplotype-based multiplatform model of SNPs and CNVs</article-title>. <source>Nat. Methods</source> <volume>7</volume>, <fpage>541</fpage>&#x02013;<lpage>546</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1466</pub-id><pub-id pub-id-type="pmid">20512141</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Colella</surname> <given-names>S.</given-names></name> <name><surname>Yau</surname> <given-names>C.</given-names></name> <name><surname>Taylor</surname> <given-names>J. M.</given-names></name> <name><surname>Mirza</surname> <given-names>G.</given-names></name> <name><surname>Butler</surname> <given-names>H.</given-names></name> <name><surname>Clouston</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>QuantiSNP: an objective Bayes Hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data</article-title>. <source>Nucleic Acids Res</source>. <volume>35</volume>, <fpage>2013</fpage>&#x02013;<lpage>2025</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkm076</pub-id><pub-id pub-id-type="pmid">17341461</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conrad</surname> <given-names>D. F.</given-names></name> <name><surname>Andrews</surname> <given-names>T. D.</given-names></name> <name><surname>Carter</surname> <given-names>N. P.</given-names></name> <name><surname>Hurles</surname> <given-names>M. E.</given-names></name> <name><surname>Pritchard</surname> <given-names>J. K.</given-names></name></person-group> (<year>2006</year>). <article-title>A high-resolution survey of deletion polymorphism in the human genome</article-title>. <source>Nat. Genet</source>. <volume>38</volume>, <fpage>75</fpage>&#x02013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1038/ng1697</pub-id><pub-id pub-id-type="pmid">16327808</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conrad</surname> <given-names>D. F.</given-names></name> <name><surname>Pinto</surname> <given-names>D.</given-names></name> <name><surname>Redon</surname> <given-names>R.</given-names></name> <name><surname>Feuk</surname> <given-names>L.</given-names></name> <name><surname>Gokcumen</surname> <given-names>O.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Origins and functional impact of copy number variation in the human genome</article-title>. <source>Nature</source> <volume>464</volume>, <fpage>704</fpage>&#x02013;<lpage>712</lpage>. <pub-id pub-id-type="doi">10.1038/nature08516</pub-id><pub-id pub-id-type="pmid">19812545</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daly</surname> <given-names>M. J.</given-names></name> <name><surname>Rioux</surname> <given-names>J. D.</given-names></name> <name><surname>Schaffner</surname> <given-names>S. F.</given-names></name> <name><surname>Hudson</surname> <given-names>T. J.</given-names></name> <name><surname>Lander</surname> <given-names>E. S.</given-names></name></person-group> (<year>2001</year>). <article-title>High-resolution haplotype structure in the human genome</article-title>. <source>Nat. Genet</source>. <volume>29</volume>, <fpage>229</fpage>&#x02013;<lpage>232</lpage>. <pub-id pub-id-type="doi">10.1038/ng1001-229</pub-id><pub-id pub-id-type="pmid">11586305</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dawson</surname> <given-names>E.</given-names></name> <name><surname>Abecasis</surname> <given-names>G. R.</given-names></name> <name><surname>Bumpstead</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Hunt</surname> <given-names>S.</given-names></name> <name><surname>Beare</surname> <given-names>D. M.</given-names></name> <etal/></person-group>. (<year>2002</year>). <article-title>A first-generation linkage disequilibrium map of chromosome 22</article-title>. <source>Nature</source> <volume>418</volume>, <fpage>544</fpage>&#x02013;<lpage>548</lpage>. <pub-id pub-id-type="doi">10.1038/nature00864</pub-id><pub-id pub-id-type="pmid">12110843</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dempster</surname> <given-names>A.</given-names></name> <name><surname>Laird</surname> <given-names>N.</given-names></name> <name><surname>Rubin</surname> <given-names>D.</given-names></name></person-group> (<year>1977</year>). <article-title>Maximum likelihood from incomplete data via the EM algorithm</article-title>. <source>J. R. Stat. Soc. Ser. B (Methodological)</source> <volume>39</volume>, <fpage>1</fpage>&#x02013;<lpage>38</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Excoffier</surname> <given-names>L.</given-names></name> <name><surname>Slatkin</surname> <given-names>M.</given-names></name></person-group> (<year>1995</year>). <article-title>Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population</article-title>. <source>Mol. Biol. Evol</source>. <volume>12</volume>, <fpage>921</fpage>&#x02013;<lpage>927</lpage>. <pub-id pub-id-type="pmid">7476138</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Franke</surname> <given-names>L.</given-names></name> <name><surname>de Kovel</surname> <given-names>C.</given-names></name> <name><surname>Aulchenko</surname> <given-names>Y.</given-names></name> <name><surname>Trynka</surname> <given-names>G.</given-names></name> <name><surname>Zhernakova</surname> <given-names>A.</given-names></name> <name><surname>Hunt</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Detection, imputation, and association analysis of small deletions and null alleles on oligonucleotide arrays</article-title>. <source>Am. J. Hum. Genet</source>. <volume>82</volume>, <fpage>1316</fpage>&#x02013;<lpage>1333</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2008.05.008</pub-id><pub-id pub-id-type="pmid">18519066</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freeman</surname> <given-names>J. L.</given-names></name> <name><surname>Perry</surname> <given-names>G. H.</given-names></name> <name><surname>Feuk</surname> <given-names>L.</given-names></name> <name><surname>Redon</surname> <given-names>R.</given-names></name> <name><surname>McCarroll</surname> <given-names>S. A.</given-names></name> <name><surname>Altshuler</surname> <given-names>D. M.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Copy number variation: new insights in genome diversity</article-title>. <source>Genome Res</source>. <volume>16</volume>, <fpage>949</fpage>&#x02013;<lpage>961</lpage>. <pub-id pub-id-type="doi">10.1101/gr.3677206</pub-id><pub-id pub-id-type="pmid">16809666</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gabriel</surname> <given-names>S. B.</given-names></name> <name><surname>Schaffner</surname> <given-names>S. F.</given-names></name> <name><surname>Nguyen</surname> <given-names>H.</given-names></name> <name><surname>Moore</surname> <given-names>J. M.</given-names></name> <name><surname>Roy</surname> <given-names>J.</given-names></name> <name><surname>Blumenstiel</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2002</year>). <article-title>The structure of haplotype blocks in the human genome</article-title>. <source>Science</source> <volume>296</volume>, <fpage>2225</fpage>&#x02013;<lpage>2229</lpage>. <pub-id pub-id-type="doi">10.1126/science.1069424</pub-id><pub-id pub-id-type="pmid">12029063</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gelfond</surname> <given-names>J.</given-names></name> <name><surname>Gupta</surname> <given-names>M.</given-names></name> <name><surname>Ibrahim</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data</article-title>. <source>Biometrics</source> <volume>65</volume>, <fpage>1087</fpage>&#x02013;<lpage>1095</lpage>. <pub-id pub-id-type="doi">10.1111/j.1541-0420.2008.01180.x</pub-id><pub-id pub-id-type="pmid">19210737</pub-id></citation> 
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hunter</surname> <given-names>D. R.</given-names></name> <name><surname>Lange</surname> <given-names>K.</given-names></name></person-group> (<year>2004</year>). <article-title>A tutorial on MM algorithms</article-title>. <source>Am. Stat</source>. <volume>58</volume>, <fpage>30</fpage>&#x02013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.1198/0003130042836</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iafrate</surname> <given-names>A. J.</given-names></name> <name><surname>Feuk</surname> <given-names>L.</given-names></name> <name><surname>Rivera</surname> <given-names>M. N.</given-names></name> <name><surname>Listewnik</surname> <given-names>M. L.</given-names></name> <name><surname>Donahoe</surname> <given-names>P. K.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Detection of large-scale variation in the human genome</article-title>. <source>Nat. Genet</source>. <volume>36</volume>, <fpage>949</fpage>&#x02013;<lpage>951</lpage>. <pub-id pub-id-type="doi">10.1038/ng1416</pub-id><pub-id pub-id-type="pmid">15286789</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Irvin</surname> <given-names>M. R.</given-names></name> <name><surname>Wineinger</surname> <given-names>N. E.</given-names></name> <name><surname>Rice</surname> <given-names>T. K.</given-names></name> <name><surname>Pajewski</surname> <given-names>N. M.</given-names></name> <name><surname>Kabagambe</surname> <given-names>E. K.</given-names></name> <name><surname>Gu</surname> <given-names>C. C.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Genome-wide detection of allele specific copy number variation associated with insulin resistance in african americans from the HyperGEN study</article-title>. <source>PLoS ONE</source> <volume>6</volume>:<fpage>e24052</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0024052</pub-id><pub-id pub-id-type="pmid">21901158</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jeng</surname> <given-names>X. J.</given-names></name> <name><surname>Cai</surname> <given-names>T. T.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2010</year>). <article-title>Optimal sparse segment identification with application in copy number variation analysis</article-title>. <source>J. Am. Statist. Assoc</source>. <volume>105</volume>, <fpage>1156</fpage>&#x02013;<lpage>1166</lpage>. <pub-id pub-id-type="doi">10.1198/jasa.2010.tm10083</pub-id><pub-id pub-id-type="pmid">23543902</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kato</surname> <given-names>M.</given-names></name> <name><surname>Nakamura</surname> <given-names>Y.</given-names></name> <name><surname>Tsunoda</surname> <given-names>T.</given-names></name></person-group> (<year>2008a</year>). <article-title>An algorithm for inferring complex haplotypes in a region of copy-number variation</article-title>. <source>Am. J. Hum. Genet</source>. <volume>83</volume>, <fpage>157</fpage>&#x02013;<lpage>169</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2008.06.021</pub-id><pub-id pub-id-type="pmid">18639202</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kato</surname> <given-names>M.</given-names></name> <name><surname>Nakamura</surname> <given-names>Y.</given-names></name> <name><surname>Tsunoda</surname> <given-names>T.</given-names></name></person-group> (<year>2008b</year>). <article-title>MOCSphaser: a haplotype inference tool from a mixture of copy number variation and single nucleotide polymorphism data</article-title>. <source>Bioinformatics</source> <volume>24</volume>, <fpage>1645</fpage>&#x02013;<lpage>1646</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn242</pub-id><pub-id pub-id-type="pmid">18492685</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Komura</surname> <given-names>D.</given-names></name> <name><surname>Shen</surname> <given-names>F.</given-names></name> <name><surname>Ishikawa</surname> <given-names>S.</given-names></name> <name><surname>Fitch</surname> <given-names>K. R.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays</article-title>. <source>Genome Res</source>. <volume>16</volume>, <fpage>1575</fpage>&#x02013;<lpage>1584</lpage>. <pub-id pub-id-type="doi">10.1101/gr.5629106</pub-id><pub-id pub-id-type="pmid">17122084</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Korbel</surname> <given-names>J.</given-names></name> <name><surname>Urban</surname> <given-names>A.</given-names></name> <name><surname>Grubert</surname> <given-names>F.</given-names></name> <name><surname>Du</surname> <given-names>J.</given-names></name> <name><surname>Royce</surname> <given-names>T.</given-names></name> <name><surname>Starr</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>104</volume>, <fpage>10110</fpage>&#x02013;<lpage>10115</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0703834104</pub-id><pub-id pub-id-type="pmid">17551006</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Korn</surname> <given-names>J.</given-names></name> <name><surname>Kuruvilla</surname> <given-names>F.</given-names></name> <name><surname>McCarroll</surname> <given-names>S. A.</given-names></name> <name><surname>Wysoker</surname> <given-names>A.</given-names></name> <name><surname>Nemesh</surname> <given-names>J.</given-names></name> <name><surname>Cawley</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs</article-title>. <source>Nat. Genet</source>. <volume>40</volume>, <fpage>1253</fpage>&#x02013;<lpage>1260</lpage>. <pub-id pub-id-type="doi">10.1038/ng.237</pub-id><pub-id pub-id-type="pmid">18776909</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCarroll</surname> <given-names>S. A.</given-names></name> <name><surname>Altshuler</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>Copy-number variation and association studies of human disease</article-title>. <source>Nat. Genet</source>. <volume>39</volume>, <fpage>S37</fpage>&#x02013;<lpage>S42</lpage>. <pub-id pub-id-type="doi">10.1038/ng2080</pub-id><pub-id pub-id-type="pmid">17597780</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCarroll</surname> <given-names>S. A.</given-names></name> <name><surname>Hadnott</surname> <given-names>T. N.</given-names></name> <name><surname>Perry</surname> <given-names>G. H.</given-names></name> <name><surname>Sabeti</surname> <given-names>P. C.</given-names></name> <name><surname>Zody</surname> <given-names>M. C.</given-names></name> <name><surname>Barrett</surname> <given-names>J. C.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Common deletion polymorphisms in the human genome</article-title>. <source>Nat. Genet</source>. <volume>38</volume>, <fpage>86</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1038/ng1696</pub-id><pub-id pub-id-type="pmid">16468122</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olshen</surname> <given-names>A. B.</given-names></name> <name><surname>Venkatraman</surname> <given-names>E. S.</given-names></name> <name><surname>Lucito</surname> <given-names>R.</given-names></name> <name><surname>Wigler</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Circular binary segmentation for the analysis of array-based dna copy number data</article-title>. <source>Biostatistics</source> <volume>5</volume>, <fpage>557</fpage>&#x02013;<lpage>572</lpage>. <pub-id pub-id-type="doi">10.1093/biostatistics/kxh008</pub-id><pub-id pub-id-type="pmid">15475419</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patil</surname> <given-names>N.</given-names></name> <name><surname>Berno</surname> <given-names>A. J.</given-names></name> <name><surname>Hinds</surname> <given-names>D. A.</given-names></name> <name><surname>Barrett</surname> <given-names>W. A.</given-names></name> <name><surname>Doshi</surname> <given-names>J. M.</given-names></name> <name><surname>Hacker</surname> <given-names>C. R.</given-names></name> <etal/></person-group>. (<year>2001</year>). <article-title>Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21</article-title>. <source>Science</source> <volume>294</volume>, <fpage>1719</fpage>&#x02013;<lpage>1723</lpage>. <pub-id pub-id-type="doi">10.1126/science.1065573</pub-id><pub-id pub-id-type="pmid">11721056</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peiffer</surname> <given-names>D. A.</given-names></name> <name><surname>Le</surname> <given-names>J. M.</given-names></name> <name><surname>Steemers</surname> <given-names>F. J.</given-names></name> <name><surname>Chang</surname> <given-names>W.</given-names></name> <name><surname>Jenniges</surname> <given-names>T.</given-names></name> <name><surname>Garcia</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping</article-title>. <source>Genome Res</source>. <volume>16</volume>, <fpage>1136</fpage>&#x02013;<lpage>1148</lpage>. <pub-id pub-id-type="doi">10.1101/gr.5402306</pub-id><pub-id pub-id-type="pmid">16899659</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redon</surname> <given-names>R.</given-names></name> <name><surname>Ishikawa</surname> <given-names>S.</given-names></name> <name><surname>Fitch</surname> <given-names>K. R.</given-names></name> <name><surname>Feuk</surname> <given-names>L.</given-names></name> <name><surname>Perry</surname> <given-names>G. H.</given-names></name> <name><surname>Andrews</surname> <given-names>T. D.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Global variation in copy number in the human genome</article-title>. <source>Nature</source> <volume>444</volume>, <fpage>444</fpage>&#x02013;<lpage>454</lpage>. <pub-id pub-id-type="doi">10.1038/nature05329</pub-id><pub-id pub-id-type="pmid">17122850</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sebat</surname> <given-names>J.</given-names></name> <name><surname>Lakshmi</surname> <given-names>B.</given-names></name> <name><surname>Malhotra</surname> <given-names>D.</given-names></name> <name><surname>Troge</surname> <given-names>J.</given-names></name> <name><surname>Lese-Martin</surname> <given-names>C.</given-names></name> <name><surname>Walsh</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Strong association of <italic>de novo</italic> copy number mutations with autism</article-title>. <source>Science</source> <volume>316</volume>, <fpage>445</fpage>&#x02013;<lpage>449</lpage>. <pub-id pub-id-type="doi">10.1126/science.1138659</pub-id><pub-id pub-id-type="pmid">17363630</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sebat</surname> <given-names>J.</given-names></name> <name><surname>Lakshmi</surname> <given-names>B.</given-names></name> <name><surname>Troge</surname> <given-names>J.</given-names></name> <name><surname>Alexander</surname> <given-names>J.</given-names></name> <name><surname>Young</surname> <given-names>J.</given-names></name> <name><surname>Lundin</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Large-scale copy number polymorphism in the human genome</article-title>. <source>Science</source> <volume>305</volume>, <fpage>525</fpage>&#x02013;<lpage>528</lpage>. <pub-id pub-id-type="doi">10.1126/science.1098918</pub-id><pub-id pub-id-type="pmid">15273396</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>An integrative segmentation method for detecting germline copy number variations in SNP arrays</article-title>. <source>Genet. Epidemiol</source>. <volume>36</volume>, <fpage>373</fpage>&#x02013;<lpage>383</lpage>. <pub-id pub-id-type="doi">10.1002/gepi.21631</pub-id><pub-id pub-id-type="pmid">22539397</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Su</surname> <given-names>S.</given-names></name> <name><surname>Asher</surname> <given-names>J.</given-names></name> <name><surname>Jarvelin</surname> <given-names>M.</given-names></name> <name><surname>Froguel</surname> <given-names>P.</given-names></name> <name><surname>Blakemore</surname> <given-names>A.</given-names></name> <name><surname>Balding</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Inferring combined CNV/SNP haplotypes from genotype data</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>1437</fpage>&#x02013;<lpage>1445</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq157</pub-id><pub-id pub-id-type="pmid">20406911</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tuzun</surname> <given-names>E.</given-names></name> <name><surname>Sharp</surname> <given-names>A. J.</given-names></name> <name><surname>Bailey</surname> <given-names>J. A.</given-names></name> <name><surname>Kaul</surname> <given-names>R.</given-names></name> <name><surname>Morrison</surname> <given-names>V. A.</given-names></name> <name><surname>Pertz</surname> <given-names>L. M.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>Fine-scale structural variation of the human genome</article-title>. <source>Nat. Genet</source>. <volume>37</volume>, <fpage>727</fpage>&#x02013;<lpage>732</lpage>. <pub-id pub-id-type="doi">10.1038/ng1562</pub-id><pub-id pub-id-type="pmid">15895083</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Viterbi</surname> <given-names>A. J.</given-names></name></person-group> (<year>1967</year>). <article-title>Error bounds for convolutional codes and an asymptotically optimum decoding algorithm</article-title>. <source>IEEE Trans. Inf. Theory</source> <volume>13</volume>, <fpage>260</fpage>&#x02013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1109/TIT.1967.1054010</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Veldink</surname> <given-names>J. H.</given-names></name> <name><surname>Blauw</surname> <given-names>H.</given-names></name> <name><surname>van den Berg</surname> <given-names>L. H.</given-names></name> <name><surname>Ophoff</surname> <given-names>R. A.</given-names></name> <name><surname>Sabatti</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Markov models for inferring copy number variations from genotype data on Illumina platforms</article-title>. <source>Hum. Hered</source>. <volume>68</volume>, <fpage>1</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1159/000210445</pub-id><pub-id pub-id-type="pmid">19339782</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name> <name><surname>Hadley</surname> <given-names>D.</given-names></name> <name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Glessner</surname> <given-names>J.</given-names></name> <name><surname>Grant</surname> <given-names>S. F.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Penncnv: an integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome snp genotyping data</article-title>. <source>Genome Res</source>. <volume>17</volume>, <fpage>1665</fpage>&#x02013;<lpage>1674</lpage>. <pub-id pub-id-type="doi">10.1101/gr.6861907</pub-id><pub-id pub-id-type="pmid">17921354</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Winchester</surname> <given-names>L.</given-names></name> <name><surname>Yau</surname> <given-names>C.</given-names></name> <name><surname>Ragoussis</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>Comparing CNV detection methods for SNP arrays</article-title>. <source>Brief. Funct. Genomic Proteomic</source> <volume>8</volume>, <fpage>353</fpage>&#x02013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1093/bfgp/elp017</pub-id><pub-id pub-id-type="pmid">19737800</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Qian</surname> <given-names>Y.</given-names></name> <name><surname>Akula</surname> <given-names>N.</given-names></name> <name><surname>Alliey-Rodriguez</surname> <given-names>N.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Study</surname> <given-names>B. G.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Accuracy of CNV detection from GWAS data</article-title>. <source>PLoS ONE</source> <volume>6</volume>:<fpage>e14511</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0014511</pub-id><pub-id pub-id-type="pmid">21249187</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>K.</given-names></name> <name><surname>Sun</surname> <given-names>F.</given-names></name> <name><surname>Waterman</surname> <given-names>M. S.</given-names></name> <name><surname>Chen</surname> <given-names>T.</given-names></name></person-group> (<year>2003</year>). <article-title>Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data</article-title>. <source>Am. J. Hum. Genet.</source> <volume>73</volume>, <fpage>63</fpage>&#x02013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1086/376437</pub-id><pub-id pub-id-type="pmid">12802783</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Z&#x00151;llner</surname> <given-names>S.</given-names></name> <name><surname>Su</surname> <given-names>G.</given-names></name> <name><surname>Stewart</surname> <given-names>W. C.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>McInnis</surname> <given-names>M. G.</given-names></name> <name><surname>Burmeister</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Bayesian EM algorithm for scoring polymorphic deletions from SNP data and application to a common CNV on 8q24</article-title>. <source>Genet. Epidemiol</source>. <volume>33</volume>, <fpage>357</fpage>&#x02013;<lpage>368</lpage>. <pub-id pub-id-type="doi">10.1002/gepi.20391</pub-id><pub-id pub-id-type="pmid">19085946</pub-id></citation>
</ref>
</ref-list>
<app-group>
<app id="A1">
<title>Appendix</title>
<sec>
<title>A. Conditional probability for <italic>R</italic> and <italic>B</italic></title>
<p>Given each CNP, we will assume the conditional probability <italic>P</italic>(<italic>r</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub> &#x0003D; (<italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub>, <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub>)) to be Gaussian with mean &#x003BC;<sub><sc>cn</sc><sub><italic>k</italic></sub></sub> and variance &#x003C3;<sup>2</sup><sub><sc>cn</sc><sub><italic>k</italic></sub></sub> where <sc>cn</sc><sub><italic>k</italic></sub> is the copy number at the locus <italic>k</italic> given by cn<sub><italic>k</italic></sub> &#x0003D; <sc>cn</sc>(<italic>c</italic><sub><italic>k</italic></sub>) &#x0003D; <italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub> &#x0002B; <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub>, i.e.,
<disp-formula id="E2"><mml:math id="M2"><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>B</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>~</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:msub><mml:mrow><mml:mtext>CN</mml:mtext></mml:mrow><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:msub><mml:mrow><mml:mtext>CN</mml:mtext></mml:mrow><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Due to the truncation and linear transformation of &#x003B8; used for the <italic>b</italic> calculation, we will model the conditional probability of <italic>b</italic> as a mixture distribution with two point mass at 0 and 1 (denoted as &#x003B4;<sub>0</sub> and &#x003B4;<sub>1</sub>, respectively) and truncated normal distribution. The truncation only affects for the copy numbers having one component is 0. Also mostly <italic>b</italic> follows normal distribution, e.g., <italic>b</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub> &#x0007E; <italic>N</italic>(&#x003BC;<sub><italic>b</italic>, <sc>cn</sc><sub><italic>k</italic></sub></sub>, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>) when <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub> &#x0003D; 0 in the sense that <italic>P</italic>(<italic>b</italic><sub><italic>k</italic></sub> &#x0003D; 0 | <italic>c</italic><sub><italic>k</italic></sub>) &#x0003D; <italic>P</italic>(<italic>N</italic>(&#x003BC;<sub><italic>b</italic>, <sc>cn</sc><sub><italic>k</italic></sub></sub>, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>) &#x02264; 0), <italic>P</italic>(<italic>b</italic><sub><italic>k</italic></sub> &#x0003D; 1 | <italic>c</italic><sub><italic>k</italic></sub>) &#x0003D; <italic>P</italic>(<italic>N</italic>(&#x003BC;<sub><italic>b</italic>, <sc>cn</sc><sub><italic>k</italic></sub></sub>, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>) &#x02265; 1), and <italic>P</italic>(<italic>b</italic><sub><italic>k</italic></sub> &#x02264; <italic>b</italic> | <italic>c</italic><sub><italic>k</italic></sub>) &#x0003D; <italic>P</italic>(<italic>N</italic>(&#x003BC;<sub><italic>b</italic>, <sc>cn</sc><sub><italic>k</italic></sub></sub>, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>) &#x02264; b) for <italic>b</italic> &#x02208; (0, 1). When <italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub> &#x0003D; 0 and <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub> &#x0003D; c, the distribution of <italic>b</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub> is the same as 1 &#x02212; <italic>b</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub> &#x0003D; (<italic>c</italic>, 0) if <italic>c</italic> &#x0003E; 0 and <italic>b</italic><sub><italic>k</italic></sub>, | <italic>c</italic><sub><italic>k</italic></sub> &#x0007E; <italic>N</italic>(1/2, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>) if <italic>c</italic> &#x0003D; 0. If <italic>c</italic><sub><italic>k</italic>, <italic>A</italic></sub>, <italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub> &#x0003E; 0, then <italic>b</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub> &#x0007E; <italic>N</italic>(<italic>c</italic><sub><italic>k</italic>, <italic>B</italic></sub>/<sc>cn</sc><sub><italic>k</italic></sub>, &#x003C3;<sup>2</sup><sub><italic>b</italic></sub>).</p>
<p>These conditional probabilities can be assumed the same as the emission probabilities in an HMM model and thus the parameters can be estimated nicely from the HMM model.</p>
</sec>
<sec>
<title>B. Hidden markov model</title>
<p>A first order HMM on the loci specific CNP is used to infer the initial <bold>c</bold><sup><italic>i</italic></sup> for each individual. In this model, the hidden CNP at each locus depends only on the CNP at the most preceding locus. Given the observed data <bold>r</bold><sup><italic>i</italic></sup>, <bold>b</bold><sup><italic>i</italic></sup>, the most likely hidden CNP at each marker locus becomes the initial value, i.e., the CNP that maximizes the likelihood given in Equation (5). In this computation, the Viterbi&#x00027;s algorithm is used, see Viterbi (<xref ref-type="bibr" rid="B36">1967</xref>).</p>
<p>Also by assuming that the values of <italic>r</italic> and <italic>b</italic> are independent given the hidden copy number state, the likelihood of the observed data becomes
<disp-formula id="E3"><label>(5)</label><mml:math id="M3"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x0200B;&#x0200B;&#x0200B;&#x0200B;&#x0200B;&#x0200B;&#x0200B;&#x0200B;</mml:mtext><mml:mi>P</mml:mi><mml:mtext>&#x0200B;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>r</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>b</mml:mi></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle></mml:munder><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>r</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>b</mml:mi></mml:mstyle><mml:mo>&#x0007C;</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>c</mml:mi></mml:mstyle></mml:munder><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mi>P</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mstyle><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Note that the HMM (Equation 5) is different from the haplotype model (Equation 1). As we can see from Equation (5), there are two main components in the HMM calculation. The emission probabilities <italic>P</italic>(<italic>r</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub>) and <italic>P</italic>(<italic>b</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic></sub>) remain the same in Appendix A, and transition probability is given by <italic>P</italic>(<italic>c</italic><sub><italic>k</italic></sub> | <italic>c</italic><sub><italic>k</italic> &#x02212; 1</sub>) &#x0003D; <italic>t</italic>(<sc>cn</sc><sub><italic>k</italic> &#x02212; 1</sub>, <sc>cn</sc><sub><italic>k</italic></sub>, <italic>d</italic><sub><italic>k</italic></sub>) <italic>G</italic>(<italic>c</italic><sub><italic>k</italic> &#x02212; 1</sub>, <italic>c</italic><sub><italic>k</italic></sub>) where <italic>t</italic>(<italic>i</italic>, <italic>j</italic>, <italic>d</italic>) is a distance based transition matrix among copy numbers given by 1 &#x02212; (1 &#x02212; &#x003B3;<sub><italic>i j</italic></sub>)(1 &#x02212; <italic>e</italic><sup>&#x02212;&#x003B7;<italic>d</italic></sup>) if <italic>i</italic> &#x0003D; <italic>j</italic> and &#x003B3;<sub><italic>ij</italic></sub> (1 &#x02212; <italic>e</italic><sup>&#x02212;&#x003B7;<italic>d</italic></sup>) otherwise. and <italic>G</italic>(&#x000B7;,&#x000B7;) is a transition matrix within the states having the same copy number. The fact that the coefficient matrix &#x00393; &#x0003D; (&#x003B3;<sub><italic>ij</italic></sub>) is a transition probability matrix, that is, each elements are non-negative and row sums are 1, is used in the parameter estimation.</p>
</sec>
<sec>
<title>C. Parameter estimation</title>
<p>The initial model parameters for the HMM were determined from our preliminary results based on the HapMap data. In theory, the maximum likelihood estimators (MLE) of the parameters, <graphic xlink:href="fgene-04-00165-i0002.tif"/> &#x0003D; (<bold>&#x003BC;</bold>, <bold>&#x003C3;</bold>, <bold>&#x003BC;</bold><sub><italic>b</italic></sub>, <bold>&#x003C3;</bold><sub><italic>b</italic></sub>, <bold>&#x003B7;</bold>, <bold>&#x003B3;</bold>), can be obtained by maximizing the following likelihood function</p>
<p><graphic xlink:href="fgene-04-00165-i0005.tif"/></p>
<p>In practice, maximizing Equation (6) in terms of multiple parameters is computationally intensive for a huge number of summations will be involved for a large <italic>M</italic>. As in a typical HMM, we used a Baum&#x02013;Welch algorithm (Baum et al., <xref ref-type="bibr" rid="B2">1970</xref>) to make the computation feasible. The details are in the following sections.</p>
<sec>
<title>C.1. The baum&#x02013;welch algorithm</title>
<p>At each iteration &#x02113; with current parameter estimate <graphic xlink:href="fgene-04-00165-i0006.tif"/>, calculate the conditional expectation of the log likelihood, given by</p>
<p><graphic xlink:href="fgene-04-00165-i0007.tif"/></p>
<p>where <graphic xlink:href="fgene-04-00165-i0008.tif"/> is the set of all possible CNP states.</p>
<p>Two required conditional probabilities &#x003C1;<sub><italic>k</italic></sub>(<italic>s</italic>) &#x0003D; <italic>P</italic>(<italic>c</italic><sub><italic>k</italic></sub> &#x0003D; <italic>s</italic> | <bold>b</bold>, <bold>r</bold>, <graphic xlink:href="fgene-04-00165-i0006.tif"/>) and &#x003BE;<sub><italic>k</italic></sub>(<italic>s</italic>, <italic>t</italic>) &#x0003D; <italic>P</italic>(<italic>c</italic><sub><italic>k</italic> &#x02212; 1</sub> &#x0003D; <italic>s</italic>, <italic>c</italic><sub><italic>k</italic></sub> &#x0003D; <italic>t</italic> | <bold>b</bold>, <bold>r</bold>, <graphic xlink:href="fgene-04-00165-i0006.tif"/>) are computed using the forward and backward algorithms in the general HMM theory.</p>
<p>In the M-step, parameters are updated by maximizing the conditional expectation (Equation 7), which included three separate terms that can be maximized separately: the first term only concerns the parameters in the transition matrix, the second and the third terms concern the parameters in the emission probability of the B allele frequency and the log R ratio, respectively. Here, we give the formulas for updating the parameter with derivation details in following sections.</p>
</sec>
<sec>
<title>C.2. Parameters in the emission probability of LRR</title>
<p>Without any constraint on &#x003BC; and &#x003C3;, the maximizers <inline-formula><mml:math id="M28"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M12"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are given by
<disp-formula id="E4"><mml:math id="M4"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msub><mml:mover accent='true'><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle><mml:mo>/</mml:mo><mml:mstyle displaystyle='true'><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;and</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle><mml:msub><mml:mover accent='true'><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>/</mml:mo><mml:mstyle displaystyle='true'><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
for <italic>j</italic> &#x0003D; 0, 1, 2, 3, and 4, where <graphic xlink:href="fgene-04-00165-i0009.tif"/>.</p>
<p>We assumed a constraint &#x003BC;<sub>1</sub> &#x02264; &#x003BC;<sub>2</sub> &#x0003D; 0 &#x02264; &#x003BC;<sub>3</sub>, then
<disp-formula id="E5"><mml:math id="M5"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>min</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>3</mml:mn><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>max</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>3</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;and</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>If a signal-to-noise constraint is assumed, that is, &#x003BC;<sub>1</sub> &#x0002B; <italic>v</italic>&#x003C3;<sub>1</sub> &#x02264; &#x003BC;<sub>2</sub> &#x02212; <italic>v</italic>&#x003C3;<sub>2</sub> and &#x003BC;<sub>2</sub> &#x0002B; <italic>v</italic>&#x003C3;<sub>2</sub> &#x02264; &#x003BC;<sub>3</sub> &#x02212; <italic>v</italic>&#x003C3;<sub>3</sub>, then no closed form is available. If <inline-formula><mml:math id="M13"><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:math></inline-formula> and <inline-formula><mml:math id="M14"><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:math></inline-formula> satisfies the condition <inline-formula><mml:math id="M15"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mi>v</mml:mi><mml:msub><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>v</mml:mi><mml:msub><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, then new estimators are <inline-formula><mml:math id="M16"><mml:mrow><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M17"><mml:mrow><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003C3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. Otherwise, new &#x003BC;<sub>1</sub>, &#x003BC;<sub>2</sub>, &#x003C3;<sub>1</sub>, &#x003C3;<sub>2</sub> are on the boundary &#x003BC;<sub>1</sub> &#x0002B; <italic>v</italic>&#x003C3;<sub>1</sub> &#x0003D; &#x003BC;<sub>2</sub> &#x02212; <italic>v</italic>&#x003C3;<sub>2</sub>. The maximizer under this condition can be found an iterative conditional maximization. For example, if &#x003C3;<sub>1</sub> and &#x003C3;<sub>2</sub> are fixed, the argument function for maximization becomes
<disp-formula id="E6"><mml:math id="M6"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mtext>const</mml:mtext><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
where <inline-formula><mml:math id="M18"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:math></inline-formula>. Hence, &#x003BC;<sub>1</sub> and &#x003BC;<sub>2</sub> are updated by minimizing the argument function which is a quadratic equation, so it has a closed form solution. One possible conditional maximization sequence would be a sequential updates of (&#x003BC;<sub>1</sub>, &#x003BC;<sub>2</sub>), (&#x003BC;<sub>2</sub>, &#x003C3;<sub>2</sub>), (&#x003BC;<sub>1</sub>, &#x003BC;<sub>2</sub>), and (&#x003BC;<sub>1</sub>, &#x003C3;<sub>1</sub>) under the other parameters are held fixed. Then the unconditional maximizers are obtained by repeating this procedure until convergence is achieved.</p>
<p>Furthermore, we assume &#x003BC;<sub>0</sub>, &#x003BC;<sub>4</sub>, &#x003C3;<sub>0</sub>, &#x003C3;<sub>4</sub> are fixed in practice due to the sparsity of the deletion and duplication states. In order to obtain a stable estimation for the standard deviation parameters, we assume that &#x003C3;<sub><italic>j</italic></sub> for <italic>j</italic> &#x0003D; 1, 2, 3 are bounded below and above by &#x003C3;<sub>lower</sub> and &#x003C3;<sub>upper</sub> which are the half and double of the standard deviation of <bold>r</bold>.</p>
</sec>
<sec>
<title>C.3. Parameters in the emission probability of BAF</title>
<p>Since BAF has bounded values on [0, 1], estimations of related parameters are a bit complicated. For instance, the argument function for estimation of &#x003BC;<sub><italic>b</italic>, <italic>j</italic></sub> is given by
<disp-formula id="E7"><label>(8)</label><mml:math id="M7"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x0007B;</mml:mo><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007D;</mml:mo><mml:mi>log</mml:mi><mml:mi>&#x003A6;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mstyle><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:msubsup><mml:mi>&#x003C3;</mml:mi><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>&#x0007B;</mml:mo><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>&#x0007D;</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
where &#x003A6; is the cumulative distribution function of the standard normal distribution. Since there is no analytic solution of the maximizer of Equation (8), a first order Taylor expansion around the current value is used to obtain a local maximizer.</p>
</sec>
<sec>
<title>C.4. Parameters in the transition matrix</title>
<p>The maximizer of <italic>G</italic> is obtained by</p>
<p><graphic xlink:href="fgene-04-00165-i0010.tif"/></p>
<p>However, the convergence of this maximizer is very slow. A column-wise or block-wise estimation which assume all elements in the same column or block are the same is more reliable than element-wise estimation given Equation (9).</p>
<p>For the parameters &#x003B7; and &#x003B3;<sub><italic>ij</italic></sub> in the transition matrix, we use an iterative minorization-maximization (MM) algorithm [see Hunter and Lange (<xref ref-type="bibr" rid="B16">2004</xref>) and references therein]. Using a Young&#x00027;s inequality, a minorized argument function for &#x003B3; is obtained by
<disp-formula id="E8"><label>(10)</label><mml:math id="M8"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>log</mml:mi><mml:msub><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mn>4</mml:mn></mml:munderover><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>log</mml:mi><mml:msub><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
which has a closed for maximizer where <inline-formula><mml:math id="M19"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mover accent='true'><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msup><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> for the current values <inline-formula><mml:math id="M20"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M21"><mml:mover accent='true'><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:math></inline-formula> of &#x003B3;<sub><italic>ij</italic></sub> and &#x003B7;. Also the argument function for &#x003B7; becomes
<disp-formula id="E9"><label>(11)</label><mml:math id="M9"><mml:mrow><mml:mi>v</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mstyle><mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007B;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mi>log</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0007D;</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Then, the maximizer &#x003B3;<sup><italic>m</italic></sup> and &#x003B7;<sup><italic>m</italic></sup> in MM algorithm can be obtained as follows.</p>
<p><bold>Step 0:</bold> Set initial parameters <inline-formula><mml:math id="M22"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mi>&#x02113;</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M23"><mml:mrow><mml:mover accent='true'><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msup><mml:mi>&#x003B7;</mml:mi><mml:mi>&#x02113;</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>.</p>
<p><bold>Step 1:</bold> Update <inline-formula><mml:math id="M24"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> by maximizing Equation (10), that is,
<disp-formula id="E10"><mml:math id="M10"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>/</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:msup><mml:mi>j</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:munder><mml:mrow><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mi>j</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x003BA;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mstyle><mml:mtext>&#x000A0;&#x000A0;for&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:menclose notation='updiagonalstrike'><mml:mo>=</mml:mo></mml:menclose><mml:mi>j</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:menclose notation='updiagonalstrike'><mml:mo>=</mml:mo></mml:menclose><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msubsup><mml:mi>&#x003B3;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mi>m</mml:mi></mml:msubsup></mml:mrow></mml:mstyle><mml:mo>.</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;for&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;and&#x000A0;</mml:mtext><mml:mn>4.</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p><bold>Step 2:</bold> Find new <inline-formula><mml:math id="M25"><mml:mover accent='true'><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:math></inline-formula> by maximizing the argument function (Equation 11) using the Newton&#x02013;Raphson algorithm.</p>
<p><bold>Step 3:</bold> Repeat Steps 1 and 2 until <inline-formula><mml:math id="M26"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M27"><mml:mover accent='true'><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:math></inline-formula> are converged.</p>
<p>The convergent parameters are the maximizer of the second part in Equation (7).</p>
</sec>
</sec>
</app>
</app-group>
</back>
</article>