<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2021.731355</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Technology and Code</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>HandyCNV: Standardized Summary, Annotation, Comparison, and Visualization of Copy Number Variant, Copy Number Variation Region, and Runs of Homozygosity</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhou</surname> <given-names>Jinghang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1367231/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Liyuan</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Lopdell</surname> <given-names>Thomas J.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Garrick</surname> <given-names>Dorian J.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/37713/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Shi</surname> <given-names>Yuangang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Agriculture, Ningxia University</institution>, <addr-line>Yinchuan</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>AL Rae Centre for Genetics and Breeding, Massey University</institution>, <addr-line>Hamilton</addr-line>, <country>New Zealand</country></aff>
<aff id="aff3"><sup>3</sup><institution>Research and Development, Livestock Improvement Corporation</institution>, <addr-line>Hamilton</addr-line>, <country>New Zealand</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Guangchuang Yu, Southern Medical University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Xiaofeng Huang, Cornell University, United States; Max Robinson, Institute for Systems Biology, United States</p></fn>
<corresp id="c001">&#x002A;Correspondence: Dorian J. Garrick, <email>D.Garrick@massey.ac.nz</email></corresp>
<corresp id="c002">Yuangang Shi, <email>shyga818@126.com</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>17</day>
<month>09</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>731355</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>06</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>08</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Zhou, Liu, Lopdell, Garrick and Shi.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Zhou, Liu, Lopdell, Garrick and Shi</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Detection of CNVs (copy number variants) and ROH (runs of homozygosity) from SNP (single nucleotide polymorphism) genotyping data is often required in genomic studies. The post-analysis of CNV and ROH generally involves many steps, potentially across multiple computing platforms, which requires the researchers to be familiar with many different tools. In order to get around this problem and improve research efficiency, we present an R package that integrates the summarization, annotation, map conversion, comparison and visualization functions involved in studies of CNV and ROH. This one-stop post-analysis system is standardized, comprehensive, reproducible, timesaving, and user-friendly for researchers in humans and most diploid livestock species.</p>
</abstract>
<kwd-group>
<kwd>copy number variant</kwd>
<kwd>run of homozygosity</kwd>
<kwd>haplotype</kwd>
<kwd>SNP</kwd>
<kwd>CNVR</kwd>
</kwd-group>
<contract-sponsor id="cn001">China Scholarship Council<named-content content-type="fundref-id">10.13039/501100004543</named-content></contract-sponsor>
<contract-sponsor id="cn002">China Agricultural Research System<named-content content-type="fundref-id">10.13039/501100012453</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="19"/>
<page-count count="10"/>
<word-count count="5436"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="S1">
<title>Introduction</title>
<p>Genome-wide data have been accumulated for large numbers of individuals of various species as the cost of single nucleotide polymorphism (SNP) genotyping continues to decrease. In addition to using these data for GWAS (genome wide association study) or GS (genomic selection), interesting genomic information about copy number variant (CNV) and runs of homozygosity (ROH) can be inferred from these genotypes, and a range of software products [such as PennCNV (<xref ref-type="bibr" rid="B18">Wang et al., 2007</xref>), CNVPartition (<xref ref-type="bibr" rid="B7">Illumina, 2021</xref>), SNP and Variation Suite (<xref ref-type="bibr" rid="B1">Bozeman and Golden Helix, 2020</xref>)] have been developed to detect CNV and ROH for SNP data. However, few tools can integrate the summary data with annotations, comparisons, and visualizations of these results. As a result, extracting useful information from CNV and ROH data sets is time consuming, especially when it requires processing multiple results from different models and software. In order to get more comprehensive results, researchers often implement their own pipelines to switch back and forth between different tools, an approach that is prone to introducing bugs and thereby producing spurious results.</p>
<p>There are several common &#x201C;pitfalls&#x201D; we have observed when conducting CNV analyses using SNP genotyping data. The most frequent is to annotate the candidate genes in a CNVR (copy number variation region) without considering the frequency of the CNVs: this can result in undue weight being given to rare CNVs that affect only one or two samples. A second issue is comparing CNVs between different studies, and making comparisons only at the population level, and not at the individual sample level. Comparison at the population level could reflect the ubiquitous nature of CNVs, but at the individual level it also provides information about the robustness of CNV detection algorithms. A third issue arises when comparing CNVRs that have been detected using different reference genomes, which requires converting the coordinates of the regions between the two genomes. Making these conversions requires careful consideration, as the order of SNPs on chromosomes might differ between two different reference assemblies, such that the lengths or even chromosomal orders of CNVs can change, which might lead to meaningless comparisons between CNVRs. A fourth common problem is get the incorrect number of overlapping CNVRs when presenting comparison results via Venn diagram. Since the number of overlapping regions is relative to the results, and a single long interval generated using one approach might overlap multiple shorter intervals detected using another approach, in which case representing the results via Venn diagram requires special annotation.</p>
<p>There are also some steps that may be easily forgotten performing ROH analysis on SNP genotyping data. For example, the SNP density distributions may not have been carefully examined prior to inference of ROH. The density of SNPs may differ across the chromosome on different SNP chips, but ROH detection methods are highly affected by characteristics such as SNP density, window size, tolerance of occasional heterozygosity in the run, and the presence of missing values in the detection window. Knowing SNP density can therefore help us to select better parameters when performing ROH detection. Moreover, while reporting the candidate genes by functional annotation of genes that located in ROH regions, we may not examine the frequencies of haplotypes within these interesting genes, but this step could provide valuable information about the high frequency genotypes of these genes, which is useful on designing the further validation experiments and can provide the valuable reference to others when they comparing the genes using the same SNP chips on different populations.</p>
<p>There are several common requirements in studying CNV and ROH patterns in a new species or population. These include: the need for preparing summary tables, making summary figures, generating CNVRs and plotting CNVR distribution maps with gene annotations, comparing CNVs and CNVRs between studies, converting genome coordinates and map files from one reference to another, finding high frequency abnormal genomic regions, creating consensus gene lists, producing custom visualization of results, and identifying haplotypes in regions of interest. Therefore, we built this open-source tool to provide a standardized, reproducible, time-saving and widely available one-stop post-analysis system to make research more simple, practical and efficient while avoiding common &#x201C;pitfalls&#x201D; that can affect the accuracy and interpretability of these studies.</p>
</sec>
<sec id="S2">
<title>Method</title>
<sec id="S2.SS1">
<title>Brief Introduction of Main Functions</title>
<p>The functions provided by this package can be categorized into five sections: Conversion; Summary; Annotation; Comparison; and Visualization. The most useful features provided are: integrating summarized results, generating lists of CNVRs, annotating the results with known gene positions, plotting CNVR distribution maps, and producing customized visualizations of CNVs and ROHs with gene and other related information on one plot (<xref ref-type="fig" rid="F1">Figure 1</xref>). This package supports a range of customizations, including the color, size of high-resolution figures, and choice of output folder to avoid conflict between the results of different runs. Where applicable, output files are compatible with other software such as PennCNV (<xref ref-type="bibr" rid="B18">Wang et al., 2007</xref>), Plink (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>), or DAVID annotation tools (<xref ref-type="bibr" rid="B8">Jiao et al., 2012</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Example plots illustrating the main functions and output from the HandyCNV package.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g001.tif"/>
</fig>
<p>The conversion section handles the conversions of genomic positions between two reference genomes, and provides two functions. <italic>convert_map</italic> is designed to compare SNP map files for two different reference genomes, matching by SNP name, and produce SNP maps in a format suitable for use by <italic>convert_coord</italic>. The function also reports the density of SNPs by chromosome. <italic>convert_coord</italic> is designed to convert the physical positions of genomic intervals based on a given SNP map file. Currently, the function is limited to inputs generated by <italic>convert_map</italic>, and can only convert the coordinates for intervals on the same type of SNP chip. Converting coordinates may change the total length of the intervals, as the positions and orders of the SNPs on the chromosome will potentially differ between various reference genomes; therefore, the function produces a table that summarizes how many intervals were converted successfully, and reports on the differences in length between the converted and original intervals.</p>
<p>The summary section contains a group of functions to summarize CNV results, generate CNVRs, and make CNVR distribution maps from CNV results. There is also a collection of functions to summarize ROH results, report frequencies of ROH regions, inbreeding coefficient by different length groups and to generate haplotypes on interesting ROH regions.</p>
<p>The functions used for reporting CNV results include <italic>clean_cnv</italic>, <italic>summary_cnv_plot</italic>, and <italic>call_cnvr</italic>. <italic>clean_cnv</italic> takes a CNV list from PennCNV and CNVPartition and reformats it into a standard format for use in the functions listed below. <italic>cnv_summary_plot</italic> generates a range of summary plots, aggregating CNV results by length group, CNV type, chromosome, and individual. <italic>call_cnvr</italic> generates CNV regions as the union of sets of CNVs that overlap by at least one base pair (<xref ref-type="bibr" rid="B14">Redon et al., 2006</xref>). This function will output three tables: (a) the list of CNVRs, containing the number of CNVs and number of samples in each CNVR that can reflect the frequency of CNVRs; (b) a brief summary table showing numbers of CNVRs by length and type (Deletion, Duplication, and Mixed, where Mixed indicates that both duplications and deletions are found within the CNVR); and (c) the total length and number of CNVRs on each chromosome.</p>
<p><italic>roh_window</italic> will report: a table of high frequency ROH regions on the autosomes that passed the common frequency threshold, a table containing inbreeding coefficients by different length groups of each individual, a brief summary of the total numbers and lengths of ROHs in length groups, and a plot of high frequency ROH regions by chromosome. The inbreeding coefficients are calculated as <italic>F</italic><sub><italic>roh</italic></sub> = (&#x2211;<italic>L</italic><sub><italic>roh</italic></sub>)/(&#x2211;<italic>L</italic><sub><italic>auto</italic></sub>) (<xref ref-type="bibr" rid="B11">McQuillan et al., 2008</xref>), where &#x2211;<italic>L</italic><sub><italic>roh</italic></sub> is the total length of ROH, and &#x2211;<italic>L</italic><sub><italic>auto</italic></sub> is the total length of autosomes. Other functions in this group include <italic>prep_phased</italic>, <italic>closer_snp</italic>, and <italic>get_haplotype</italic>; see the package vignette for more information (<xref ref-type="bibr" rid="B9">Jinghang et al., 2021</xref>).</p>
<p>The annotation section facilitates downloading and formatting reference gene lists, and annotating genes on genomic intervals. <italic>get_refgene</italic> will automatically download a reference gene list and invoke <italic>clean_ucsc</italic> and <italic>clean_ensgene</italic> from UCSC (<xref ref-type="bibr" rid="B12">Navarro Gonzalez et al., 2021</xref>) websites for human, cow, sheep, pig, horse, chicken or dog species, then remove the duplicated genes and report the standard format as output. <italic>call_gene</italic> is used to report how many genes are located in the given genomic intervals. The frequency of genes is calculated from the number of samples that has the same gene annotated in its CNVs.</p>
<p>The comparison section consists of functions for comparing sets of CNVs (<italic>compare_cnv</italic>), CNVRs (<italic>compare_cnvr</italic>), gene frequency lists (<italic>compare_gene</italic>), and other intervals (<italic>compare_interval</italic>). These functions were implemented using the <italic>foverlaps</italic> function in the <italic>data.table</italic> R package (<xref ref-type="bibr" rid="B6">Dowle et al., 2019</xref>). <italic>compare_gene</italic> can produce consensus gene lists, given lists of genes present in CNVRs in multiple studies. The remaining functions report numbers, lengths, and proportions of overlapping intervals (CNVs, CNVRs, etc.) on a population and individual basis.</p>
<p>Finally, twelve functions in HandyCNV are included in the visualization section; of these, five produce plots as a subset of their output, and have been mentioned previously: <italic>cnv_summary_plot</italic>, <italic>roh_window</italic>, <italic>compare_cnv</italic>, <italic>compare_cnvr</italic>, and <italic>convert_map</italic>. The remaining visualization functions mainly focus on customizing and integrating the plotting of all information related to CNV, ROH, and high frequency CNVR: these are <italic>cnvr_plot</italic>, <italic>plot_gene</italic>, <italic>cnv_visual</italic>, <italic>roh_visual</italic>, <italic>plot_cnvr_panorama</italic>, <italic>plot_snp_density</italic>, and <italic>plot_cnvr_source</italic>. These functions are described in the package vignette (<xref ref-type="bibr" rid="B9">Jinghang et al., 2021</xref>).</p>
</sec>
<sec id="S2.SS2">
<title>Pipelines for the Post Analysis of CNVs and ROHs</title>
<sec id="S2.SS2.SSS1">
<title>Post-analysis of CNVs and CNVRs</title>
<p>The recommended pipeline contains 14 basic steps depending on the study purposes (<xref ref-type="fig" rid="F2">Figure 2</xref>), although usage is not limited to these basic steps, and users are free to explore their data by customizing the functions. By running through this pipeline, users can produce a wide range of results, such as summary tables and plots of CNV results, the CNVR list and its brief summary information and CNVR distribution plot, the frequency of CNVs and CNVRs within annotated genes, and comparison results between CNVs, CNVR, and annotated genes.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Pipeline of post analysis of CNV results using HandyCNV.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g002.tif"/>
</fig>
</sec>
<sec id="S2.SS2.SSS2">
<title>Post-analysis of ROHs</title>
<p>The pipeline for the post analysis of ROHs contains eight basic steps (<xref ref-type="fig" rid="F3">Figure 3</xref>). The main results produced by running through this pipeline are the high frequency ROH regions list, ROH-based inbreeding coefficients, a list of genes that are located in the ROH regions, and the frequency of haplotypes within genes or regions of interest.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Pipeline of post-analysis of ROH in HandyCNV.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g003.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec id="S3">
<title>Application Examples of CNV and ROH</title>
<p>We now provide two example runs of the pipeline, using two previously published data sets: the first is a CNV list produced for a human population in Brazil (<xref ref-type="bibr" rid="B5">de Godoy et al., 2020</xref>), and the second is genotype data for an inbred breed of horses (<xref ref-type="bibr" rid="B17">Velie et al., 2016</xref>). The purpose of these examples is to introduce how to use the functions in this package; therefore, further interpretation of the results is not included.</p>
<sec id="S3.SS1">
<title>Example 1. the Post-analysis of CNVs in a Human Dataset</title>
<p>The CNV result in this example was cited from a study published in 2020 which comprised 268 microarrays samples in a human population in Brazil (<xref ref-type="bibr" rid="B5">de Godoy et al., 2020</xref>). In this example, we will introduce how to prepare the standard CNV list, then produce brief summary, generate CNVRs, annotate genes and visualize CNVs. <xref ref-type="fig" rid="F4">Figure 4</xref> presents the code used in example 1, the R script can be found in <xref ref-type="supplementary-material" rid="DS1">Supplementary File 1</xref>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Analytical steps of example 1.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g004.tif"/>
</fig>
<p>To replicate this example, we first need to download the dataset &#x201C;Table S1 &#x2013; Detailed information about all CNVs analyzed in our sample&#x201D; (<xref ref-type="bibr" rid="B5">de Godoy et al., 2020</xref>) and save the sheet &#x201C;All array platforms&#x2019; CNVs&#x201D; as.<italic>csv</italic> format file. Then use <italic>read.csv</italic> to load the CNV list and select the columns required by <italic>cnv_clean</italic> (see <xref ref-type="fig" rid="F5">Figure 5C</xref>).</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>The main outputs of example 1. Panel <bold>(A)</bold> is CNV summary plot; panel <bold>(B)</bold> is CNVR distribution map; panel <bold>(C)</bold> is CNV input list; panel <bold>(D)</bold> is the brief summary table of CNV; panel <bold>(E)</bold> is a plot of CNVs on Chromosome 14; panel <bold>(F)</bold> is CNVR list; panel <bold>(G)</bold> is the brief summary table of CNVRs; panel <bold>(H)</bold> is an example plot of the high frequency CNVR; panel <bold>(I)</bold> is a plot of CNVs on Chr14:105-110 Mb; panel <bold>(J)</bold> is the gene frequency list; and panel <bold>(K)</bold> is the sample list that contain CNVs in the LINC00221 gene.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g005.tif"/>
</fig>
<p>A formatted clean CNV list will return as an object named &#x201C;clean_cnv&#x201D; in working environment, and a brief summary table of CNV (see <xref ref-type="fig" rid="F5">Figure 5D</xref>) will be written out after executing <italic>cnv_clean</italic>.</p>
<p>We then take a quick look at the CNV distribution by reading the &#x201C;clean_cnv&#x201D; list as input and customizing parameters in <italic>cnv_visual</italic>. In example, we first set &#x201C;chr_id = 14&#x201D; to visualize CNVs distribution on chromosome 14 (see <xref ref-type="fig" rid="F5">Figure 5E</xref>), then zoom into the region with higher frequency CNVs (see <xref ref-type="fig" rid="F5">Figure 5I</xref>) by setting &#x201C;start_position = 105&#x201D; and &#x201C;end_position = 110.&#x201D; Visualizing other chromosomes or regions and changing the colors of copy numbers can easily be done by adjusting the relevant arguments.</p>
<p>The CNV summary plot (see <xref ref-type="fig" rid="F5">Figure 5A</xref>) can be plotted via <italic>cnv_summary_plot</italic> by taking &#x201C;clean_cnv&#x201D; as input. The CNVR list (see <xref ref-type="fig" rid="F5">Figure 5F</xref>) is generated using call_cnvr by taking the &#x201C;clean_cnv&#x201D; file as input, producing a brief summary table of CNVR (see <xref ref-type="fig" rid="F5">Figure 5G</xref>) that will be saved in the working directory in the meantime. The CNVR distribution map (see <xref ref-type="fig" rid="F5">Figure 5B</xref>) is generated via <italic>cnvr_plot</italic> by loading the CNVR list.</p>
<p>For gene annotation steps, the reference gene list can be downloaded and formatted by assigning the genome version argument in <italic>get_refgene</italic>. Then the genes annotation list of CNV or CNVR are generated by running <italic>call_gene</italic>. Three input files need be assigned in the function: the clean CNV file (&#x201C;clean_cnv&#x201D;), the CNVR list (&#x201C;cnvr&#x201D;), and the reference gene list (&#x201C;human_hg19&#x201D;); the gene frequency list (see <xref ref-type="fig" rid="F5">Figure 5J</xref>) will be returned as an object in the R environment. We can plot all the high frequency CNVRs with gene annotation results (see one example plot in <xref ref-type="fig" rid="F5">Figure 5H</xref>) at the same time through cnvr_plot by reading &#x201C;cnvr,&#x201D; &#x201C;clean_cnv&#x201D; and reference gene list (&#x201C;human_hg19&#x201D;) and setting the &#x201C;sample_size&#x201D; and &#x201C;common_cnv_threshold&#x201D; arguments.</p>
<p>Finally, we can extract Sample IDs of CNVs that contain genes of interest (see <xref ref-type="fig" rid="F5">Figure 5K</xref>) using <italic>get_samples</italic>, by loading the CNV annotation list generated by <italic>call_gene</italic> and assigning the gene name to the &#x201C;gene_name&#x201D; argument.</p>
<p>Since this example only contains one CNV result in one reference genome, the functions in the comparison and conversion sections are not applicable in this example. Users of these functions can browse the vignette of this package from the Github repository (<xref ref-type="bibr" rid="B9">Jinghang et al., 2021</xref>).</p>
</sec>
<sec id="S3.SS2">
<title>Example 2. the Post-analysis of ROH Using Horse Genotype Samples</title>
<p>The genotype data used to detect ROH in this example is from the work of <xref ref-type="bibr" rid="B17">Velie et al. (2016)</xref> and contains 285 horse samples. This example aims to present how to use the functions in HandyCNV to analyze ROHs. This example includes ROH detection by Plink 1.9 (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>) and genotype phasing by Beagle 5.1 (<xref ref-type="bibr" rid="B2">Browning et al., 2018</xref>). <xref ref-type="fig" rid="F6">Figure 6</xref> presents the code used in example 2; the R script can be found in <xref ref-type="supplementary-material" rid="DS1">Supplementary File 2</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Analytical steps of example 2.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g006.tif"/>
</fig>
<p>To run this example, we first need to prepare the genotype data. The genotype files are read using the <italic>fread</italic> function (<xref ref-type="bibr" rid="B6">Dowle et al., 2019</xref>). Because the original ped file does not match the format required by Plink 1.9, we insert a sequential column of family IDs, plus placeholder columns of zeroes for the father, mother, and sex code by using <italic>data.frame</italic> and <italic>cbind</italic> functions (<xref ref-type="bibr" rid="B13">R Core Team, 2020</xref>). Before testing the ROH, the map file was loaded as the input file in <italic>plot_snp_density</italic> to get a brief summary and visualization of SNP density (<xref ref-type="fig" rid="F7">Figure 7A</xref>). The <italic>jpeg</italic> and <italic>dev.off</italic> functions (<xref ref-type="bibr" rid="B13">R Core Team, 2020</xref>) are used to save the plot.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>The main outputs of example 2. Panel <bold>(A)</bold> is SNP density distribution plot; panel <bold>(B)</bold> is brief summary of ROH by length group; panel <bold>(C)</bold> is plot of ROH on Chromosome 22; panel <bold>(D)</bold> is the high frequency ROH regions list; panel <bold>(E)</bold> is plot of ROHs on Chr1:139.6-141.6 Mb; panel <bold>(F)</bold> is genes annotation list of ROH regions; panel <bold>(G)</bold> is the ROH frequency distribution plot; panel <bold>(H)</bold> is plot of ROHs that overlap to the <italic>GABPB1</italic> gene; panel <bold>(I)</bold> is the frequency of haplotypes on <italic>GABPB1</italic> Gene; panel <bold>(J)</bold> is the frequency of haploids on the <italic>GABPB1</italic> gene; and panel <bold>(K)</bold> is the list of ROHs-based inbreeding coefficient.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-731355-g007.tif"/>
</fig>
<p>Then, we invoke Plink 1.9 (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>) by <italic>shell</italic> (<xref ref-type="bibr" rid="B13">R Core Team, 2020</xref>) from R Studio (<xref ref-type="bibr" rid="B16">Team, 2021</xref>) to generate binary genotype files and call ROH. For Windows operating systems, ensure that the plink.exe file is either in the current directory or accessible via the PATH system variable. To run Plink 1.9 on other operation system, please refer to the Plink website (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>).</p>
<p>Once we get ROH results, we can run <italic>roh_window</italic>, which takes a &#x201C;plink.hom&#x201D; file as input to report the brief summary of ROH by length group (see <xref ref-type="fig" rid="F7">Figure 7B</xref>), high frequency ROH regions (see <xref ref-type="fig" rid="F7">Figure 7D</xref>), ROH frequency distribution plot (see <xref ref-type="fig" rid="F7">Figure 7G</xref>), and to calculate the ROH based inbreeding coefficient (<xref ref-type="fig" rid="F7">Figure 7K</xref>).</p>
<p>In this example, we present visualizations of ROH on the whole of chromosome 22 (see <xref ref-type="fig" rid="F7">Figure 7C</xref>) and on the 22.81&#x2013;23.22 Mb region on chromosome 22 (see <xref ref-type="fig" rid="F7">Figure 7E</xref>) via <italic>roh_visual</italic>, which needs to load the &#x201C;plink.hom&#x201D; data set as input. The &#x201C;chr_id&#x201D; or &#x201C;target_region&#x201D; arguments are available to customize visualization, alongside additional arguments to customize the colors of ROHs.</p>
<p>The horse reference gene list (&#x201C;quaCab2&#x201D;) was downloaded from the UCSC website (<xref ref-type="bibr" rid="B12">Navarro Gonzalez et al., 2021</xref>) by <italic>get_refgene</italic>. The genes located in the high frequency ROH regions (see <xref ref-type="fig" rid="F7">Figure 7F</xref>) were annotated via <italic>call_gene</italic>, which requires loading the reference gene list (&#x201C;quaCab2&#x201D;) and the high frequency ROH regions file that was generated by <italic>roh_window</italic>. Since we have the reference gene list, we can visualize ROH region with genes (see <xref ref-type="fig" rid="F7">Figure 7H</xref>) via roh_visual by assigning the clean ROH file (&#x201C;clean_roh = clean_roh&#x201D;), target ROH region [&#x201C;target_region = c (1, 139.6, 141.6)&#x201D;] and reference gene lists (&#x201C;refgene = equaCab2&#x201D;). We can also visualize ROHs in terms of the gene we are interested in: here, we are looking at the <italic>GABPB1</italic> gene, first, exacting the physical position of this gene from the reference gene list (&#x201C;equaCab2&#x201D;) using the &#x201C;<italic>filter</italic>&#x201D; and &#x201C;<italic>select</italic>&#x201D; functions (<xref ref-type="bibr" rid="B19">Wickham et al., 2019</xref>), then using <italic>visual_roh</italic> to load the ROH file (&#x201C;plink.hom&#x201D;) as input and assigning the gene position to the &#x201C;target_region&#x201D; argument to present the plot (see <xref ref-type="fig" rid="F7">Figure 7E</xref>). We can write a loop (<xref ref-type="bibr" rid="B13">R Core Team, 2020</xref>) of <italic>visual_roh</italic> to plot all regions with genes annotated by iterating over the high frequency ROHs that contain genes.</p>
<p>To get the haplotype of the genes need the phased genotype files. Here, we take chromosome 1 as example to present how to use Plink 1.9 (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>) and Beagle 5.1 (<xref ref-type="bibr" rid="B2">Browning et al., 2018</xref>) to phase the genotypes. The <italic>shell</italic> (<xref ref-type="bibr" rid="B13">R Core Team, 2020</xref>) function is used to invoke plink (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>) to generate the VCF format genotype file, then to invoke beagle (<xref ref-type="bibr" rid="B2">Browning et al., 2018</xref>) to phase the genotypes from Rstudio (<xref ref-type="bibr" rid="B16">Team, 2021</xref>). For Windows operating systems, ensure that the plink and java executables are either in the current directory or accessible via the PATH system variable. Likewise, adjust the path to the Beagle JAR file as required for your operating system. For instructions on installing and running Beagle 5.1, refer to their manual (<xref ref-type="bibr" rid="B2">Browning et al., 2018</xref>).</p>
<p>Finally, we take <italic>GABPB1</italic> as an example to show how to get the haplotypes. First, we use <italic>prep_phased</italic> to load the phased genotype file (phased_geno = &#x201C;orse_chr1_phased.vcf.gz&#x201D;) that was generated by Beagle, and set the &#x201C;convert_letter&#x201D; argument as &#x201C;TRUE&#x201D; to convert the genotype file into the standard format used by HandyCNV (returned as &#x201C;geno_chr1&#x201D;). Second, we use <italic>closer_snp</italic> to extract the gene&#x2019;s position (returned as &#x201C;GABPB1_pos&#x201D;) from the SNP map file, which requires the SNP map file (provided using the &#x201C;phased_input&#x201D; argument), and to assign the gene&#x2019;s physical position we got from reference gene list to the &#x201C;chr,&#x201D; &#x201C;start,&#x201D; and &#x201C;end&#x201D; arguments, respectively. Finally, we use <italic>get_haplotype</italic> to get the haplotype information (see <xref ref-type="fig" rid="F7">Figures 7I,J</xref>) for the <italic>GABPB1</italic> gene by assigning the formatted phased genotype list (&#x201C;geno_chr1&#x201D;) to the &#x201C;geno&#x201D; argument and assigning the gene&#x2019;s position (&#x201C;GABPB1_pos&#x201D;) to the &#x201C;pos&#x201D; argument.</p>
</sec>
</sec>
<sec sec-type="discussion" id="S4">
<title>Discussion</title>
<p>Here we present a freely available and open source R package called HandyCNV, which provides a comprehensive set of functions to summarize and visualize the CNVs and run of homozygosity results detected from SNP genotyping data.</p>
<p>Many good software packages have been developed for the detection of CNV and ROH from SNP chip data [such as PennCNV (<xref ref-type="bibr" rid="B18">Wang et al., 2007</xref>), CNVPartition (<xref ref-type="bibr" rid="B7">Illumina, 2021</xref>), SNP and Variation Suite (<xref ref-type="bibr" rid="B1">Bozeman and Golden Helix, 2020</xref>), and Plink (<xref ref-type="bibr" rid="B3">Chang et al., 2015</xref>)], and some well-designed tools for CNV-based association analysis [such as CNVRuler (<xref ref-type="bibr" rid="B10">Kim et al., 2012</xref>), CNVRanger (<xref ref-type="bibr" rid="B4">da Silva et al., 2019</xref>), and CNVassoc (<xref ref-type="bibr" rid="B15">Subirana et al., 2011</xref>)]. However, while they do include some basic data summary and visualization functions, they do not contain any features to customize visualization of CNV or ROH results, or to report the haplotype information for target genomic regions. In contrast to these tools, the HandyCNV package is focused on the detailed summarization and custom visualization of CNV and ROH results, facilitating tasks such as converting SNP maps, identifying CNVRs from lists of CNVs, genome annotation, comparing and visualizing CNV, CNVR, and ROH, reporting summary results and processing haplotypes of genomic regions of interest. The integration of multiple tasks into a single package provides a standardizable, reproducible and timesaving post-analysis of CNV and ROH, which can help researchers to produce comprehensive tables and figures, and easily identify the samples that contains the genomic regions or genes of most interest for the further validation of experiment designs.</p>
<p>There are some limitations to this package. For example, the <italic>plot_cnvr_panorama</italic> function needs to read genotype data to plot BAF and LRR information: this can require larger amounts of storage. We have tested it on 150 k SNP chip with 2,100 samples on a desktop windows system and it performs well; however, it may not be suitable for higher density chips and very large data sets. The <italic>get_haplotype</italic> function is also limited, as it currently only accepts phased genotypes produced by Beagle 5.1 (<xref ref-type="bibr" rid="B2">Browning et al., 2018</xref>) with physical position. In addition, the functions in the conversion section require users provide the target and default map files.</p>
</sec>
<sec id="S5">
<title>Software Information</title>
<p>The current release of HandyCNV is version 1.1.6, which can be installed in the R environment using the following code: &#x201C;remotes::install_github (repo = &#x2018;JH-Zhou/HandyCNV@v.1.1.6&#x2019;).&#x201D; The current development version can be found at the GitHub repository (<ext-link ext-link-type="uri" xlink:href="https://github.com/JH-Zhou/HandyCNV">github.com/JH-Zhou/HandyCNV</ext-link>).</p>
</sec>
<sec sec-type="data-availability" id="S6">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: The human CNV lists used in Example 1 can be found in &#x201C;Table S1 &#x2013; Detailed information about all CNVs analyzed&#x201D; at <xref ref-type="supplementary-material" rid="DS1">Supplementary Material section</xref> in Vict&#x00F3;ria Cabral Silveira Monteiro de Godoy&#x2019;s study (doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1590/1678-4685-GMB-2019-0218">10.1590/1678-4685-GMB-2019-0218</ext-link>). The genotype data used in Example 2 can be found in Brandon D. Velie&#x2019;s study which was public available via Figshare (doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.3145759">10.6084/m9.figshare.3145759</ext-link>).</p>
</sec>
<sec id="S7">
<title>Ethics Statement</title>
<p>Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Ethical review and approval was not required for the animal study because no animal sampling, experiments or phenotype measurement applied in this study. The genotype data used in this analysis are from previous studies.</p>
</sec>
<sec id="S8">
<title>Author Contributions</title>
<p>JZ conceived the analysis, compiled the package, and wrote the manuscript. LL contributed to code writing and testing, and reviewed the manuscript. TL contributed to package testing, proofreading of the manuscript, and vignette. DG and YS provided instruction for analysis, reviewed the manuscript, manual, and vignette. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>TL is employed by Livestock Improvement Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="S9">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="S10">
<title>Funding</title>
<p>JZ was funded by the China Scholarship Council. YS was supported by the China Agricultural Research System of MOF and MARA.</p>
</sec>
<ack>
<p>We thank the two reviewers for their valuable comments, which have improved the scalability of the functions and structural integrity of this paper. We also thank BioRxiv for accepting an earlier version of this manuscript as a pre-print, and the Github platform for providing a place to store open source code, which helped to promote our study to more users in the early stage. This package depends on several independently developed R packages, such as the Tidyverse family (<xref ref-type="bibr" rid="B19">Wickham et al., 2019</xref>) and data.table (<xref ref-type="bibr" rid="B6">Dowle et al., 2019</xref>), et al. We appreciate all related contributors to the open source R language.</p>
</ack>
<sec id="S11" sec-type="supplementary material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2021.731355/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2021.731355/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.zip" id="DS1" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bozeman</surname> <given-names>M. T.</given-names></name> <name><surname>Golden Helix</surname> <given-names>I.</given-names></name></person-group> (<year>2020</year>). <source><italic>SNP &#x0026; Variation Suite TM (Version 8.x).</italic></source></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Browning</surname> <given-names>B. L.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Browning</surname> <given-names>S. R.</given-names></name></person-group> (<year>2018</year>). <article-title>A one-penny imputed genome from next-generation reference panels.</article-title> <source><italic>Am. J. Hum. Genet.</italic></source> <volume>103</volume> <fpage>338</fpage>&#x2013;<lpage>348</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2018.07.015</pub-id> <pub-id pub-id-type="pmid">30100085</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>C. C.</given-names></name> <name><surname>Chow</surname> <given-names>C. C.</given-names></name> <name><surname>Tellier</surname> <given-names>L. C.</given-names></name> <name><surname>Vattikuti</surname> <given-names>S.</given-names></name> <name><surname>Purcell</surname> <given-names>S. M.</given-names></name> <name><surname>Lee</surname> <given-names>J. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Second-generation PLINK: rising to the challenge of larger and richer datasets.</article-title> <source><italic>Gigascience</italic></source> <volume>4</volume>:<issue>7</issue>. <pub-id pub-id-type="doi">10.1186/s13742-015-0047-8</pub-id> <pub-id pub-id-type="pmid">25722852</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>da Silva</surname> <given-names>V.</given-names></name> <name><surname>Ramos</surname> <given-names>M.</given-names></name> <name><surname>Groenen</surname> <given-names>M.</given-names></name> <name><surname>Crooijmans</surname> <given-names>R.</given-names></name> <name><surname>Johansson</surname> <given-names>A.</given-names></name> <name><surname>Regitano</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>CNVRanger: association analysis of CNVs with gene expression and quantitative phenotypes.</article-title> <source><italic>Bioinformatics</italic></source> <volume>36</volume> <fpage>972</fpage>&#x2013;<lpage>973</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz632</pub-id> <pub-id pub-id-type="pmid">31392308</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Godoy</surname> <given-names>V. C. S. M.</given-names></name> <name><surname>Bellucco</surname> <given-names>F. T.</given-names></name> <name><surname>Colovati</surname> <given-names>M.</given-names></name> <name><surname>de Oliveira</surname> <given-names>H. R.</given-names> <suffix>Jr.</suffix></name> <name><surname>Moys&#x00E9;s-Oliveira</surname> <given-names>M.</given-names></name> <name><surname>Melaragno</surname> <given-names>M. I.</given-names></name></person-group> (<year>2020</year>). <article-title>Copy number variation (CNV) identification, interpretation, and database from Brazilian patients.</article-title> <source><italic>Genet. Mol. Biol.</italic></source> <volume>43</volume>:<issue>218</issue>. <pub-id pub-id-type="doi">10.1590/1678-4685-gmb-2019-0218</pub-id> <pub-id pub-id-type="pmid">33306777</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dowle</surname> <given-names>M.</given-names></name> <name><surname>Srinivasan</surname> <given-names>A.</given-names></name> <name><surname>Gorecki</surname> <given-names>J.</given-names></name> <name><surname>Chirico</surname> <given-names>M.</given-names></name> <name><surname>Stetsenko</surname> <given-names>P.</given-names></name> <name><surname>Short</surname> <given-names>T.</given-names></name><etal/></person-group> (<year>2019</year>). <source><italic>Package &#x2018;Data.Table&#x2019;Extension of &#x2018;Data-Frame&#x2019;. CRAN Repository Version:1.14.0.</italic></source></citation></ref>
<ref id="B7"><citation citation-type="journal"><collab>Illumina</collab> (<year>2021</year>). <source><italic>GenomeStudio.</italic></source> <ext-link ext-link-type="uri" xlink:href="https://www.illumina.com/techniques/microarrays/array-data-analysis-experimental-design/genomestudio.html">https://www.illumina.com/techniques/microarrays/array-data-analysis-experimental-design/genomestudio.html</ext-link> <comment>(accessed June 10, 2021)</comment>.</citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiao</surname> <given-names>X.</given-names></name> <name><surname>Sherman</surname> <given-names>B. T.</given-names></name> <name><surname>Huang da</surname> <given-names>W.</given-names></name> <name><surname>Stephens</surname> <given-names>R.</given-names></name> <name><surname>Baseler</surname> <given-names>M. W.</given-names></name> <name><surname>Lane</surname> <given-names>H. C.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>DAVID-WS: a stateful web service to facilitate gene/protein list analysis.</article-title> <source><italic>Bioinformatics</italic></source> <volume>28</volume> <fpage>1805</fpage>&#x2013;<lpage>1806</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts251</pub-id> <pub-id pub-id-type="pmid">22543366</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jinghang</surname> <given-names>Z.</given-names></name> <name><surname>Liyuan</surname> <given-names>L.</given-names></name> <name><surname>Thomas</surname> <given-names>L.</given-names></name> <name><surname>Dorian</surname> <given-names>G.</given-names></name> <name><surname>Yuangang</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <source><italic>Vignettes and Manual of HandyCNV.</italic></source> <ext-link ext-link-type="uri" xlink:href="https://jh-zhou.github.io/HandyCNV/">https://jh-zhou.github.io/HandyCNV/</ext-link> <comment>(accessed September 1, 2021)</comment>.</citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J.-H.</given-names></name> <name><surname>Hu</surname> <given-names>H. J.</given-names></name> <name><surname>Yim</surname> <given-names>S. H.</given-names></name> <name><surname>Bae</surname> <given-names>J. S.</given-names></name> <name><surname>Kim</surname> <given-names>S. Y.</given-names></name> <name><surname>Chung</surname> <given-names>Y. J.</given-names></name></person-group> (<year>2012</year>). <article-title>CNVRuler: a copy number variation-based case&#x2013;control association analysis tool.</article-title> <source><italic>Bioinformatics</italic></source> <volume>28</volume> <fpage>1790</fpage>&#x2013;<lpage>1792</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts239</pub-id> <pub-id pub-id-type="pmid">22539667</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McQuillan</surname> <given-names>R.</given-names></name> <name><surname>Leutenegger</surname> <given-names>A. L.</given-names></name> <name><surname>Abdel-Rahman</surname> <given-names>R.</given-names></name> <name><surname>Franklin</surname> <given-names>C. S.</given-names></name> <name><surname>Pericic</surname> <given-names>M.</given-names></name> <name><surname>Barac-Lauc</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Runs of homozygosity in european populations.</article-title> <source><italic>Am. J. Hum. Genet.</italic></source> <volume>83</volume> <fpage>359</fpage>&#x2013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2008.08.007</pub-id> <pub-id pub-id-type="pmid">18760389</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Navarro Gonzalez</surname> <given-names>J.</given-names></name> <name><surname>Zweig</surname> <given-names>A. S.</given-names></name> <name><surname>Speir</surname> <given-names>M. L.</given-names></name> <name><surname>Schmelter</surname> <given-names>D.</given-names></name> <name><surname>Rosenbloom</surname> <given-names>K. R.</given-names></name> <name><surname>Raney</surname> <given-names>B. J.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>The UCSC genome browser database: 2021 update.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>49</volume> <fpage>D1046</fpage>&#x2013;<lpage>D1057</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa1070</pub-id> <pub-id pub-id-type="pmid">33221922</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><collab>R Core Team</collab> (<year>2020</year>). <source>R: A Language and Environment for Statistical Computing</source>. <publisher-loc>Vienna, Austria</publisher-loc>.</citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redon</surname> <given-names>R.</given-names></name> <name><surname>Ishikawa</surname> <given-names>S.</given-names></name> <name><surname>Fitch</surname> <given-names>K. R.</given-names></name> <name><surname>Feuk</surname> <given-names>L.</given-names></name> <name><surname>Perry</surname> <given-names>G. H.</given-names></name> <name><surname>Andrews</surname> <given-names>T. D.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>Global variation in copy number in the human genome.</article-title> <source><italic>Nature</italic></source> <volume>444</volume> <fpage>444</fpage>&#x2013;<lpage>454</lpage>. <pub-id pub-id-type="doi">10.1038/nature05329</pub-id> <pub-id pub-id-type="pmid">17122850</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Subirana</surname> <given-names>I.</given-names></name> <name><surname>Diaz-Uriarte</surname> <given-names>R.</given-names></name> <name><surname>Lucas</surname> <given-names>G.</given-names></name> <name><surname>Gonzalez</surname> <given-names>J. R.</given-names></name></person-group> (<year>2011</year>). <article-title>CNVassoc: association analysis of CNV data using R.</article-title> <source><italic>BMC Med. Genomics</italic></source> <volume>4</volume>:<issue>47</issue>. <pub-id pub-id-type="doi">10.1186/1755-8794-4-47</pub-id> <pub-id pub-id-type="pmid">21609482</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Team</surname> <given-names>Rs</given-names></name></person-group> (<year>2021</year>). <source><italic>RStudio: Integrated Development Environment for R.</italic></source> <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>RStudio</publisher-name>.</citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Velie</surname> <given-names>B. D.</given-names></name> <name><surname>Shrestha</surname> <given-names>M.</given-names></name> <name><surname>Franc&#x0328;ois</surname> <given-names>L.</given-names></name> <name><surname>Schurink</surname> <given-names>A.</given-names></name> <name><surname>Tesfayonas</surname> <given-names>Y. G.</given-names></name> <name><surname>Stinckens</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Using an inbred horse breed in a high density genome-wide scan for genetic risk factors of insect bite hypersensitivity (IBH).</article-title> <source><italic>PLoS One</italic></source> <volume>11</volume>:<issue>e0152966</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0152966</pub-id> <pub-id pub-id-type="pmid">27070818</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name> <name><surname>Hadley</surname> <given-names>D.</given-names></name> <name><surname>Liu</surname> <given-names>R.</given-names></name> <name><surname>Glessner</surname> <given-names>J.</given-names></name> <name><surname>Grant</surname> <given-names>S. F.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.</article-title> <source><italic>Genome Res.</italic></source> <volume>17</volume> <fpage>1665</fpage>&#x2013;<lpage>1674</lpage>. <pub-id pub-id-type="doi">10.1101/gr.6861907</pub-id> <pub-id pub-id-type="pmid">17921354</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wickham</surname> <given-names>H.</given-names></name> <name><surname>Averick</surname> <given-names>M.</given-names></name> <name><surname>Bryan</surname> <given-names>J.</given-names></name> <name><surname>Chang</surname> <given-names>W.</given-names></name> <name><surname>McGowan</surname> <given-names>L.</given-names></name> <name><surname>Fran&#x00E7;ois</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Welcome to the tidyverse.</article-title> <source><italic>J. Open Source Softw.</italic></source> <volume>4</volume>:<issue>1686</issue>. <pub-id pub-id-type="doi">10.21105/joss.01686</pub-id></citation></ref>
</ref-list>
</back>
</article>
