<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2021.766548</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Chromosome-Level Genome Assembly and HazelOmics Database Construction Provides Insights Into Unsaturated Fatty Acid Synthesis and Cold Resistance in Hazelnut (<italic>Corylus heterophylla</italic>)</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Jianfeng</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/176299/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wei</surname> <given-names>Heng</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1416516/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname> <given-names>Xingzheng</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/853753/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>He</surname> <given-names>Hongli</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/380516/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Cheng</surname> <given-names>Yunqing</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/404807/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Daoming</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1573105/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Jilin Provincial Key Laboratory of Plant Resource Science and Green Production, Jilin Normal University</institution>, <addr-line>Siping</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Liaoning Economic Forest Research Institute</institution>, <addr-line>Dalian</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Merc&#x00E8; Rovira, Institute of Agrifood Research and Technology (IRTA), Spain</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Stuart James Lucas, Sabanci University Nanotechnology Research and Application Center (SUNUM), Turkey; Anita Solar, University of Ljubljana, Slovenia</p></fn>
<corresp id="c001">&#x002A;Correspondence: Yunqing Cheng, <email>chengyunqing1977@163.com</email></corresp>
<fn fn-type="equal" id="fn002"><p><sup>&#x2020;</sup>These authors have contributed equally to this work and share first authorship</p></fn>
<fn fn-type="other" id="fn004"><p>This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>766548</elocation-id>
<history>
<date date-type="received">
<day>29</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>18</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Liu, Wei, Zhang, He, Cheng and Wang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Liu, Wei, Zhang, He, Cheng and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p><italic>Corylus heterophylla</italic> (2n = 22) is the most widely distributed, unique, and economically important nut species in China. Chromosome-level genomes of <italic>C. avellana</italic>, <italic>C. heterophylla</italic>, and <italic>C. mandshurica</italic> have been published in 2021, but a satisfactory hazelnut genome database is absent. Northeast China is the main distribution and cultivation area of <italic>C. heterophylla</italic>, and the mechanism underlying the adaptation of <italic>C. heterophylla</italic> to extremely low temperature in this area remains unclear. Using single-molecule real-time sequencing and the chromosomal conformational capture (Hi-C) assisted genome assembly strategy, we obtained a high-quality chromosome-scale genome sequence of <italic>C. heterophylla</italic>, with a total length of 343 Mb and scaffold N50 of 32.88 Mb. A total of 94.72% of the test genes from the assembled genome could be aligned to the Embryophyta_odb9 database. In total, 22,319 protein-coding genes were predicted, and 21,056 (94.34%) were annotated in the assembled genome. A HazelOmics online database (HOD) containing the assembled genome, gene-coding sequences, protein sequences, and various types of annotation information was constructed. This database has a user-friendly and straightforward interface. In total, 439 contracted genes and 3,810 expanded genes were identified through genome evolution analysis, and 17 expanded genes were significantly enriched in the unsaturated fatty acid biosynthesis pathway (ko01040). Transcriptome analysis results showed that <italic>FAD</italic> (Cor0058010.1), <italic>SAD</italic> (Cor0141290.1), and <italic>KAT</italic> (Cor0122500.1) with high expression abundance were upregulated at the ovule maturity stage. We deduced that the expansion of these genes may promote high unsaturated fatty acid content in the kernels and improve the adaptability of <italic>C. heterophylla</italic> to the cold climate of Northeast China. The reference genome and database will be beneficial for future molecular breeding and gene function studies in this nut species, as well as for evolutionary research on species of the order Fagales.</p>
</abstract>
<kwd-group>
<kwd><italic>Corylus heterophylla</italic></kwd>
<kwd>genome assembly</kwd>
<kwd>PacBio sequencing</kwd>
<kwd>Hi-C</kwd>
<kwd>unsaturated fatty acid</kwd>
<kwd>cold resistance</kwd>
</kwd-group>
<contract-num rid="cn001">31770723</contract-num>
<contract-num rid="cn001">32171840</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China <named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="5"/>
<equation-count count="0"/>
<ref-count count="69"/>
<page-count count="15"/>
<word-count count="10224"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>Hazelnut (<italic>Corylus</italic> spp.) belongs to the subfamily Coryloidae and is the most widely distributed and economically important genus in the Betulaceae family (<xref ref-type="bibr" rid="B27">Helmstetter et al., 2019</xref>). The fruit of the hazelnut, the hazelnut, is rich in nutrients and fatty acids and is widely used in the food industry. Its uses include making oil, paste, and roasted kernels (<xref ref-type="bibr" rid="B42">Madhaven, 2000</xref>; <xref ref-type="bibr" rid="B1">Amaral et al., 2006</xref>). In addition, crushed kernels are used in the production of cakes, ice cream, and chocolate to improve the flavor, and these products are widely popular with consumers. There are approximately 16 species of hazelnut in the world, ten of which are native to China, comprising eight wild species and two cultivated species (<xref ref-type="bibr" rid="B17">Dong et al., 2010</xref>). Among these, <italic>Corylus heterophylla</italic> is the most widely distributed and economically important nut species in China. Currently, the area of <italic>C. heterophylla</italic> forests in China covers more than 1.0 million hectares, and this species is the main source of hazelnut in the Chinese market even though the yield of hybrid hazelnut (<italic>C. heterophylla</italic> &#x00D7; <italic>C. avellana</italic>) has increased rapidly in recent years (<xref ref-type="bibr" rid="B15">Cheng et al., 2018b</xref>,<xref ref-type="bibr" rid="B13">2019</xref>; <xref ref-type="bibr" rid="B39">Liu et al., 2020</xref>). Northeast China is the main distribution and cultivation area of hazelnut. Although most hazelnut products in the international market are derived from the European hazelnut (<italic>C. avellana</italic>), the extreme low winter temperatures in Northeast China are not suitable for the cultivation of this species. Although the area of horticultural cultivation of <italic>C. heterophylla</italic> is increasing rapidly, its commercial varieties remain scarce. As <italic>C. heterophylla</italic> is a unique species of hazelnut in China, reasonable utilization of its germplasm is crucial for the development of the hazelnut industry. Therefore, genome analysis of <italic>C. heterophylla</italic> would be important to provide new insights into the key adaptations that contribute to the breeding and culture of <italic>C. heterophylla</italic>.</p>
<p>In 2013, to facilitate breeding and genetic studies on hazelnut, the genome sequence of <italic>C. avellana</italic> &#x2018;Jefferson&#x2019; was assembled and released online on the European Hazelnut Genomic Resource Portal (EHG<sup><xref ref-type="fn" rid="footnote1">1</xref></sup>) (<xref ref-type="bibr" rid="B55">Sathuvalli et al., 2011</xref>; <xref ref-type="bibr" rid="B52">Rowley et al., 2018</xref>), but the structural and functional annotations of its genes are not available. EHG plays an important role in genetic studies on hazelnut by providing the scientific community access to the genome sequence of the variety &#x2018;Jefferson&#x2019;. However, <italic>C. avellana</italic> is not an important cultivated species, although some cultivars have been introduced and cultured in China. EHG still needs further improvement because it lacks the structural and functional annotations of genes, which are vital for gene bioinformatics of the hazelnut genome. During the past few years, some genome re-sequencing data and transcriptome data of hazelnut, including <italic>C. heterophylla</italic> &#x00D7; <italic>C. avellana</italic> and <italic>C. heterophylla</italic>, have accumulated at an increasing rate (<xref ref-type="bibr" rid="B51">Rowley et al., 2012</xref>; <xref ref-type="bibr" rid="B12">Chen et al., 2014</xref>; <xref ref-type="bibr" rid="B15">Cheng et al., 2018b</xref>,<xref ref-type="bibr" rid="B13">2019</xref>; <xref ref-type="bibr" rid="B39">Liu et al., 2020</xref>). In 2021, high-quality chromosome-level reference genomes for <italic>C. heterophylla</italic> (<xref ref-type="bibr" rid="B69">Zhao et al., 2021</xref>) and <italic>C. mandshurica</italic> (<xref ref-type="bibr" rid="B38">Li et al., 2021</xref>) based on combining Illumina short reads, Nanopore long reads, and chromosomal conformational capture (Hi-C) sequencing reads were published, thus providing a valuable resource for hazelnut breeding. However, a satisfactory hazelnut genome database is absent, hindering hazelnut breeding research. Considering the economic importance of <italic>C. heterophylla</italic> in the main production areas of Northeast China and the shortage of its cultivars, it is crucial to construct a hazelnut genome database to archive and access genome sequences of this species. Here, we present a complete reference genome sequence and the constructed HazelOmics online database of <italic>C. heterophylla</italic> with the aim of providing mass data of high-class reference genomes and evolution of this important nut species.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S2.SS1">
<title>Plant Sample Collection and Genome Sequencing</title>
<p><italic>Corylus heterophylla</italic> was cultivated in Yitong County of Siping City (43.34&#x00B0;N, 125.30&#x00B0;E), Jilin Province, China. Fresh young leaves of <italic>C. heterophylla</italic> were collected and subjected to genomic DNA extraction, PacBio genome sequencing library construction, and single molecule real-time (SMRT) sequencing. The template library was constructed using the SMRTbell Template Prep Kit 1.0 (product code 100-259-100); the experimental steps are as follows. Leaf cells were lysed, and genomic DNA was sheared by the Covaris g-Tube, followed by exonuclease VII digestion to remove the single chain at the 3&#x2032;-end. Next, the SMRTbell Damage Repair Kit was used to repair single-strand breaks, base loss, and oxidation on the DNA strand. DNA was repaired to a flat end by terminal repair, and DNA fragments were connected with the SMRT dumbbell connector. Thereafter, exonuclease digestion was performed to remove the fragments without SMRT dumbbell connector at both ends. Finally, the AMPure <sup>&#x00AE;</sup> PB Beads were used for secondary screening and purification to obtain an SMRTbell library with a fragment size of 20 kb. The library was sequenced using the long-read PacBio Sequel II platform (Pacific Biosciences Inc., Menlo Park, CA, United States), and data from one SMRT cell were generated. Second-generation survey sequencing was performed to provide sequence information for error correction of the assembled genome based on SMRT sequencing. DNA was isolated using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, United States) according to the manufacturer&#x2019;s recommendations. DNA purity was evaluated by a Nanodrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, United States) and Qubit 2.0 Fluorometer (Thermo Fisher Scientific Inc., Waltham, MA, United States), and DNA integrity was evaluated by electrophoresis. The qualified DNA sample was fragmented randomly using a g-tube. Fragments ranging from 300 to 350 bp in length were recycled by electrophoresis, and the sequencing library was prepared by terminal repair, addition of an A tail, addition of a sequencing adaptor, purification, and PCR amplification, followed by sequencing on the Illumina HiSeq X Ten platform (San Diego, CA, United States) with 150PEmode. All sequencing procedures were performed by Wuhan GOOAL Gene Technology Co., Ltd.</p>
</sec>
<sec id="S2.SS2">
<title>Hi-C Sequencing</title>
<p>The Hi-C library for Illumina sequencing was prepped by the NEBNext <sup>&#x00AE;</sup> Ultra&#x2122; II DNA library Prep Kit for Illumina (NEB) according to the manufacturers&#x2019; instructions. First, genomic DNA was treated with paraformaldehyde to fix the DNA conformation, and the cross-linked DNA was treated with restriction enzymes to produce sticky ends. At the same time, biotin was introduced to label the oligonucleotide ends. Thereafter, DNA ligase was used to connect the DNA fragments. DNA cross-linking was reversed by protease digestion, after which the DNA was purified and randomly broken into 300&#x2013;500 bp fragments using Covaris E220 Evolution Sonicator (Woburn, MA, United States). Finally, the labeled DNA was captured by avidin magnetic beads and used to construct a Hi-C sequencing library with the NEBNext <sup>&#x00AE;</sup> Ultra&#x2122; II DNA library Prep Kit, followed by sequencing on the Illumina HiSeq X Ten platform with 150PEmode.</p>
</sec>
<sec id="S2.SS3">
<title>Genome Assembly</title>
<p>Canu software (<xref ref-type="bibr" rid="B33">Koren et al., 2017</xref>) was used to assemble the acquired raw reads. The assembly contained three steps: error correction, trimming, and assembly, and each step was carried out using the following processing protocol. Reads were loaded to the gkpStore read database, and k-mers were counted to evaluate overlaps between sequences. Overlaps were loaded to the overlap database OvlStore to complete error correction, trimming, or assembly. Details and parameter information of all used software are listed in <xref ref-type="supplementary-material" rid="DS1">Supplementary Table 1</xref>. Single molecule real-time sequencing has a high error rate, which makes the original very noisy. In the process of correction, highly reliable bases are obtained by comparing the reads. Consistent sequences were obtained by calculating the overlapped reads, which were used to replace the original reads with high error rates. In the process of read trimming, overlap was used to determine which read regions were of high quality and which low-quality regions needed to be trimmed; only sequence blocks with the highest quality were retained. Next, the original offline data of SMRT sequencing were mapped to the assembled genome for error correction analysis using pbmm2 software, and the corrected assembled genome was generated after polishing using the arrow method. Thereafter, the reads obtained from Illumina genome survey were mapped to the third generation assembled genome for further polishing using BWA (<xref ref-type="bibr" rid="B35">Li and Durbin, 2009</xref>) and Pilon (<xref ref-type="bibr" rid="B64">Walker et al., 2014</xref>) software for sequence alignment and error correction, respectively. According to the depth distribution of reads and sequence similarity, redundant heterozygous contigs were identified and removed using Purge Haplotigs software (<xref ref-type="bibr" rid="B50">Roach et al., 2018</xref>).</p>
</sec>
<sec id="S2.SS4">
<title>Chromosome-Scale Assembly With Hi-C Data</title>
<p>The main types of reads produced by Hi-C sequencing data comprise valid di-tags, contiguous sequences, circularized fragments, dangling ends, internal fragments, PCR duplicates, and wrong sized reads. We retained only the valid di-tags, and other types of reads were filtered out (<xref ref-type="bibr" rid="B18">Dudchenko et al., 2017</xref>). Chromosome-level genome assembly was carried out by dividing, anchoring, sequencing, orienting, and merging the contigs or scaffolds using HiC-Pro (<xref ref-type="bibr" rid="B56">Servant et al., 2015</xref>; <xref ref-type="bibr" rid="B18">Dudchenko et al., 2017</xref>). A genome-wide interaction map was constructed using JuiceBox software (<xref ref-type="bibr" rid="B19">Durand et al., 2016</xref>). We encountered contig sequencing and orientation errors in the process of 3D-DNA assembly. According to the principle that the closer the linear distance, the stronger the interaction, we carried out visualized error correction manually using JuiceBox (<xref ref-type="bibr" rid="B19">Durand et al., 2016</xref>). The integrity assessment of conserved genes from the assembled genome was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) tests (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>), and its results reflected the completeness and quality of the test genome. In total, 1,440 single-copy orthologous genes were chosen to be aligned to the Embryophyta_odb9 database (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>).</p>
</sec>
<sec id="S2.SS5">
<title>Genome Annotation</title>
<p>Genome annotation analysis mainly includes the recognition of repetitive sequences, prediction of non-coding RNA, prediction of gene structure, and functional annotation. As an important part of the plant genome, repeat sequences mainly include tandem repeats and interspersed repeats (DNA transposons and retrotransposons). Tandem Repeats Finder software (<xref ref-type="bibr" rid="B6">Benson, 1999</xref>) was used to predict tandem repeats in the investigated genome. RepeatMasker and RepeatProteinMask based on Repbase TE library were used to acquire the annotation of DNA transposons and retrotransposons (<xref ref-type="bibr" rid="B61">Tarailo-Graovac and Chen, 2009</xref>). Afterward, <italic>de novo</italic> prediction software RepeatModeler (<xref ref-type="bibr" rid="B4">Bao et al., 2015</xref>) and LTR_FINDER (<xref ref-type="bibr" rid="B5">Barker et al., 2010</xref>) were used to identify and annotate interspersed repeats in the hazelnut genome.</p>
<p>Gene annotation of hazelnut includes structural and functional annotation. Several methods were used to predict the structure of the coding genes, such as homology prediction, <italic>de novo</italic> prediction (software: Augustus, GENSCAN, and GlimmerHMM), and cDNA/EST prediction (<xref ref-type="bibr" rid="B8">Burge and Karlin, 1997</xref>; <xref ref-type="bibr" rid="B43">Majoros et al., 2004</xref>; <xref ref-type="bibr" rid="B44">Mario et al., 2006</xref>). Furthermore, RNA-seq data were mapped to genome by HISAT2 (<xref ref-type="bibr" rid="B32">Kim et al., 2015</xref>) and transcripts were generated by StringTie (<xref ref-type="bibr" rid="B49">Pertea et al., 2015</xref>). After that, Transdecoder (<xref ref-type="bibr" rid="B25">Haas et al., 2013</xref>) was used to predict ORF in these transcripts. The gene set predicted by various methods was integrated into a non-redundant, more complete, and reliable gene set using MAKER software (<xref ref-type="bibr" rid="B9">Cantarel et al., 2008</xref>). Finally, functional annotation of the proteins in the investigated gene set was carried out by aligning their protein sequences to various protein databases, including SwissProt (<xref ref-type="bibr" rid="B2">Bairoch and Apweiler, 2000</xref>), TrEMBL (<xref ref-type="bibr" rid="B2">Bairoch and Apweiler, 2000</xref>), Kyoto Encyclopedia of Genes and Genomes (KEGG) (<xref ref-type="bibr" rid="B31">Kanehisa et al., 2003</xref>), InterPro (<xref ref-type="bibr" rid="B68">Zdobnov and Rolf, 2001</xref>), and Gene Ontology (GO) (<xref ref-type="bibr" rid="B3">Balakrishnan et al., 2013</xref>). For non-coding RNA annotation, tRNAscan-SE program (<xref ref-type="bibr" rid="B40">Lowe and Chan, 2016</xref>) was used to identify tRNA, BLASTN alignment was used to identify rRNA, and INFERNAL software (<xref ref-type="bibr" rid="B45">Nawrocki et al., 2009</xref>) of the Rfam database (<xref ref-type="bibr" rid="B23">Griffiths-Jones et al., 2005</xref>) was used to predict miRNA and snRNA sequences in the genome.</p>
</sec>
<sec id="S2.SS6">
<title>Gene Family and Phylogenetic Analyses</title>
<p>The longest transcript of each gene in each species was obtained by an in-house script as the representative sequence of the gene, and their coding sequence (CDS) and protein sequence information was obtained. The version and other details of all downloaded sequences are listed in <xref ref-type="supplementary-material" rid="DS1">Supplementary Table 2</xref>. The homologous low copy genes (copy number &#x2264; 4) of these species were identified by orthofinder software (<xref ref-type="bibr" rid="B20">Emms and Kelly, 2015</xref>). OrthoMCL software and Markov chain clustering (MCL) (<xref ref-type="bibr" rid="B37">Li et al., 2003</xref>) were used to evaluate gene family membership based on obtained gene similarity calculation results. The protein sequences of these low copy homologous genes were aligned by muscle software, and the phylogenetic tree was constructed by RAxML software based on the results of multiple sequence alignment using the GTRGAMMA method (<xref ref-type="bibr" rid="B60">Stamatakis, 2014</xref>). Next, according to the results of the phylogenetic tree, r8s (<xref ref-type="bibr" rid="B54">Sanderson, 2003</xref>) and MCMCTREE of the PAML package (<xref ref-type="bibr" rid="B67">Yang, 2007</xref>) were used to estimate divergence time. The divergence times of <italic>Oryza sativa</italic>&#x2013;<italic>Arabidopsis thaliana</italic> [115&#x2013;308 million years ago (Mya)], <italic>Betula pendula</italic>&#x2013;<italic>C. avellana</italic> (22&#x2013;74 Mya), and <italic>Populus trichocarpa</italic>&#x2013;<italic>P. euphratica</italic> (10.9 Mya) acquired from TimeTree<sup><xref ref-type="fn" rid="footnote2">2</xref></sup> were used as the calibration times. Gene families that underwent expansion or contraction events were identified by CAFE software (<xref ref-type="bibr" rid="B26">Han et al., 2013</xref>). The identified genes were subjected to further analysis of GO term enrichment and KEGG enrichment, and the p-value of significant enrichment was set as 0.05 in GO term and KEGG enrichment analysis (<xref ref-type="bibr" rid="B31">Kanehisa et al., 2003</xref>; <xref ref-type="bibr" rid="B3">Balakrishnan et al., 2013</xref>).</p>
</sec>
<sec id="S2.SS7">
<title>Construction of HazelOmics Online Database</title>
<p>The HazelOmics online database (HOD) was constructed based on the assembled <italic>C. heterophylla</italic> reference genome. The establishment and maintenance of HOD was entrusted to GOOAL GENE Technology Ltd. (Wuhan, China) and the Information Network Center of the Jilin Normal University. For online website building, the website interface was developed based on the Vue.JS framework. Three universally used open source application framework or database management systems, Spring Boot, JDK8, and MySQL, were employed for database server development to facilitate user access and operation. In addition, the genome data stored in HOD can be visualized by JBrowse and its plugins (<xref ref-type="bibr" rid="B7">Buels et al., 2016</xref>). A sequence query option was also added to the website using the BLAST tool. The Primer3 tool (<xref ref-type="bibr" rid="B53">Rozen and Skaletsky, 2000</xref>) was provided for primer design. All available sequences, along with corresponding function annotation information (genomes, protein-coding sequences, and protein sequences), can be downloaded from HOD.</p>
</sec>
<sec id="S2.SS8">
<title>RNA-seq Analysis and KEGG Pathway Enrichment of Differentially Expressed Genes</title>
<p>RNA-seq analysis of 12 ovule samples of hazelnut at four developmental stages was performed following the protocol described previously (<xref ref-type="bibr" rid="B39">Liu et al., 2020</xref>). HISAT2 (<xref ref-type="bibr" rid="B32">Kim et al., 2015</xref>) was used to align the transcripts to our reference genome, and a sam file was generated. Samtools (<xref ref-type="bibr" rid="B36">Li et al., 2009</xref>) software was used to convert the obtained Sam file into BAM format. After sorting, qualimap software (<xref ref-type="bibr" rid="B46">Okonechnikov et al., 2016</xref>) was used to count the sequence alignment results. The transcripts were assembled according to BAM files by StringTie software (<xref ref-type="bibr" rid="B49">Pertea et al., 2015</xref>), and the reconstructed results of all samples were merged to obtain the structure annotation file of the optimized transcripts. Gene expression was quantified according to the BAM file using StringTie (<xref ref-type="bibr" rid="B49">Pertea et al., 2015</xref>) and expressed in FPKM values. | log<sub>2</sub>FC | &#x2265; 1 and false discovery rate &#x003C; 0.05 were set as the threshold values for the identification of differentially expressed genes. KEGG enrichment analysis was performed using KOBAS software (<xref ref-type="bibr" rid="B65">Xie et al., 2011</xref>).</p>
</sec>
</sec>
<sec id="S3" sec-type="results">
<title>Results</title>
<sec id="S3.SS1">
<title>Single Molecule Real-Time Sequencing and <italic>de novo</italic> Genome Assembly</title>
<p>A total of 7,319,564 reads were sequenced, among which 6,803,436 reads were longer than 2.0 kb, accounting for 92.94% of all sequenced reads. The sequencing generated 144.01 Gb of PacBio sequencing data from the SMRT sequencing platform, achieving &#x223C;415 &#x00D7; coverage of the <italic>C. heterophylla</italic> genome (<xref ref-type="table" rid="T1">Table 1</xref>). On average, the reads were 19,675 bp in length, with N50 of 30,570 bp. These results suggested that SMRT sequencing is reliable and can produce long reads (<xref ref-type="bibr" rid="B63">Vaser et al., 2017</xref>; <xref ref-type="bibr" rid="B62">Vasanthan and Yasubumi, 2019</xref>). The read sequences were assembled by Canu (<xref ref-type="bibr" rid="B33">Koren et al., 2017</xref>). Next, the original offline SMRT sequencing data and second-generation survey sequencing were mapped to the assembled genome for error correction, followed by redundant sequence removal using Purge Haplotigs software (<xref ref-type="bibr" rid="B50">Roach et al., 2018</xref>). The genome assembly analysis produced 386 contigs with N50 of 2,025,119 bp and GC content of 36.01%, covering 346,578,452 bp. Among the contigs, 384 were longer than 2 kb (<xref ref-type="table" rid="T1">Table 1</xref>). To confirm that the obtained assembly belongs to the target species, the genomic sequence was divided into 1,000 bp fragments, and the divided sequence was aligned to the NCBI nucleotide database (NT Library) using the Blast tool. The results showed that 10.20% of the fragments belonged to the genus <italic>Corylus</italic>, and the accuracy of the sequencing and assembly data was preliminarily confirmed (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 3</xref>). In addition, integrity assessment of conserved genes was performed using the method of BUSCO (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>). Of the chosen 1,440 single-copy orthologous genes, 1,364 (94.72%) were aligned to the Embryophyta_odb9 database (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>), of which 1,338 (92.92%) were considered to be complete (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 4</xref>). The SMRT sequencing data were mapped to the assembled genome using pbmm2 software program; the results showed that 92.46% of SMRT data could be mapped to the assembled genome with coverage of 99.72%, suggesting high quality of genome assembly (<xref ref-type="table" rid="T2">Table 2</xref>). In summary, the contigs of the assembled genome can be extended to the scaffold by downstream analysis up to the chromosome level.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Statistical results of <italic>C. heterophylla</italic> genome sequencing and assembly.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Total bases (Gb)</td>
<td valign="top" align="center">Total length (bp)</td>
<td valign="top" align="center">Total number</td>
<td valign="top" align="center">Total number (&#x2265; 2 kb)</td>
<td valign="top" align="center">Max length (bp)</td>
<td valign="top" align="center">Mean length (bp)</td>
<td valign="top" align="center">N50 (bp)</td>
<td valign="top" align="center">N90 (bp)</td>
<td valign="top" align="center">GC content (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Library</td>
<td valign="top" align="center">144.01</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">7,319,564</td>
<td valign="top" align="center">6,803,436</td>
<td valign="top" align="center">238,716</td>
<td valign="top" align="center">19,675</td>
<td valign="top" align="center">30,570</td>
<td valign="top" align="center">11,171</td>
<td valign="top" align="center">37.9</td>
</tr>
<tr>
<td valign="top" align="left">Assembly</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">346,578,452</td>
<td valign="top" align="center">386</td>
<td valign="top" align="center">384</td>
<td valign="top" align="center">7,707,050</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">2,025,119</td>
<td valign="top" align="center">424,534</td>
<td valign="top" align="center">36.01</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Sequences consistency assessment by comparing with the <italic>C. heterophylla</italic> reference genome.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Sequencing platform</td>
<td valign="top" align="center">Mapping rate (%)</td>
<td valign="top" align="center">Paired mapping rate (%)</td>
<td valign="top" align="center">Coverage (%)</td>
<td valign="top" align="center">Coverage at least 4 &#x00D7; (%)</td>
<td valign="top" align="center">Coverage at least 10 &#x00D7; (%)</td>
<td valign="top" align="center">Coverage at least 20 &#x00D7; (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">PacBio</td>
<td valign="top" align="center">92.46</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">99.72</td>
<td valign="top" align="center">99.54</td>
<td valign="top" align="center">99.32</td>
<td valign="top" align="center">99.13</td>
</tr>
<tr>
<td valign="top" align="left">Hi-C</td>
<td valign="top" align="center">91.04</td>
<td valign="top" align="center">84.89</td>
<td valign="top" align="center">98.97</td>
<td valign="top" align="center">98.67</td>
<td valign="top" align="center">98.27</td>
<td valign="top" align="center">97.66</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="S3.SS2">
<title>Hi-C Sequencing and Assisted Genome Assembly</title>
<p>In total, 392,233,610 raw reads were sequenced, covering a length of 58,835,041,500 bp. After data filtration, 383,274,912 clean reads covering 56,648,423,236 clean bases were obtained, with average read length of 150 bp and Q20 of 96.24% (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 5</xref>). The Hi-C sequencing data were mapped to the assembled genome using BWA (<xref ref-type="bibr" rid="B35">Li and Durbin, 2009</xref>) software program, and the results showed that 91.04% of Hi-C data could be mapped to the assembled genome, with coverage of 98.97% (<xref ref-type="table" rid="T2">Table 2</xref>). Alignment results of Hi-C sequencing data showed that 84,085,237 reads belonged to paired-end alignments (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 6</xref>). After redundancy removal, valid pairs accounted for 73.72% of the total Hi-C sequencing data (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 7</xref>). Mutations were identified by Samtools, Picard, and GATK software programs (<xref ref-type="bibr" rid="B36">Li et al., 2009</xref>; <xref ref-type="bibr" rid="B16">do Valle et al., 2016</xref>), and the homozygous and heterozygous rate of SNPs and indels of the reference genome were calculated. The homozygous rate of SNPs and indels were as low as 0.011% and 0.037%, respectively, which indicated that the accuracy of the genome assembly was very high; the heterozygous rate of SNPs and indels were as low as 1.118% and 0.216%, respectively, indicating that genome heterozygosity was low (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 8</xref>). In the process of assembly and error correction, the original 386 contigs were split and sorted according to the Hi-C interaction map, and 11 chromosomes and 64 scaffolds were constructed, with a total length of 0.35 Gb, contig N50 of 2.02 Mb, and scaffold N50 of 32.88 Mb. The rate of chromosome anchoring was 98.95% (<xref ref-type="table" rid="T3">Table 3</xref>). Genome integrity was evaluated using long terminal repeats (LTRs) (<xref ref-type="bibr" rid="B47">Ou et al., 2018</xref>). The LTR assembly index (LAI), a standard for assessing assembly continuity, of the genome was 14.20, which was within the range of &#x201C;reference&#x201D; quality based on the LAI classification (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 9</xref>) (<xref ref-type="bibr" rid="B47">Ou et al., 2018</xref>; <xref ref-type="bibr" rid="B66">Xie et al., 2020</xref>). Subsequently, chromosome and genome-wide interaction maps (<xref ref-type="fig" rid="F1">Figure 1</xref>) were constructed, and the results showed that the Hi-C assisted assembly was of high quality. Finally, based on the reference genome generated by SMRT sequencing and assembly (<xref ref-type="bibr" rid="B30">Jonas et al., 2017</xref>), Hi-C sequencing (<xref ref-type="bibr" rid="B56">Servant et al., 2015</xref>) and assisted genome assembly were further performed to correct the errors in the genome, and the final genome sequence and direction of <italic>C. heterophylla</italic> was determined. The length of the final assembled genome was 342,961,297 bp, with contig N50 of 2,025,119 bp, scaffold N50 of 32,881,252 bp, and chromosome anchoring rate of 98.95% (<xref ref-type="table" rid="T3">Table 3</xref>).</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Statistic for Hi-C auxiliary genome assembly.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Sequence length (bp)</td>
<td valign="top" align="center">Sequence number</td>
<td valign="top" align="center">Contig N50 (bp)</td>
<td valign="top" align="center">Scaffold N50 (bp)</td>
<td valign="top" align="center" colspan="4">Chromosome anchoring rate (%)<hr/></td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center">Contig number</td>
<td valign="top" align="center">Contig length (bp)</td>
<td valign="top" align="center">Contig number (&#x003E; 100 kb)</td>
<td valign="top" align="center">Contig length (&#x003E; 100 kb)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Draft genome</td>
<td valign="top" align="center">346,578,452</td>
<td valign="top" align="center">386</td>
<td valign="top" align="center">2,025,119</td>
<td valign="top" align="center">2,025,119</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center"/></tr>
<tr>
<td valign="top" align="left">Genome assembly (+ Hi-C)</td>
<td valign="top" align="center">346,614,552</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">2,017,784</td>
<td valign="top" align="center">32,881,252</td>
<td valign="top" align="center">83.49</td>
<td valign="top" align="center">98.95</td>
<td valign="top" align="center">96.83</td>
<td valign="top" align="center">99.38</td>
</tr>
<tr>
<td valign="top" align="left">Chromosome assembly (+ Hi-C)</td>
<td valign="top" align="center">342,961,297</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">2,025,119</td>
<td valign="top" align="center">32,881,252</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center"/>
</tr>
<tr>
<td valign="top" align="center">Unanchored sequences (+ Hi-C)</td>
<td valign="top" align="center">3,653,255</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">114,029</td>
<td valign="top" align="center">190,000</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center"/></tr>
</tbody>
</table>
</table-wrap>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Hi-C heatmap of the <italic>C. heterophylla</italic> genome representing genome-wide all-by-all interactions. The map shows a high-resolution of individual chromosomes that are scaffolded and assembled independently. The color bar depicts the frequency of Hi-C interaction links from white (low) to red (high). The coordinates on the x- and y-axes can be used to determine the relative position (number of bins) on the genome.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g001.tif"/>
</fig>
</sec>
<sec id="S3.SS3">
<title>Genome Annotation</title>
<p>In total, we identified 200.99 Mb of non-redundant repetitive elements based on <italic>de novo</italic> and homolog methods, accounting for 57.99% of the assembled genome length (<xref ref-type="table" rid="T4">Table 4</xref>). Among these, long terminal retrotransposons (LTR-RTs), DNA transposons, and long interspersed elements (LINEs) had lengths of 110.26, 68.48, and 30.42 Mb, accounting for 31.81%, 19.76%, and 8.78% of the total genome length, respectively. This suggested that LTR-RTs dominated the repetitive sequences in the investigated genome. Based on <italic>de novo</italic>, homolog, and RNA-seq/EST methods, we predicted a total of 22,319 protein-coding genes in the investigated genome, with the average length of 6,646 bp. Average CDS length was 1,201 bp. On average, each gene had 5.61 exons that were 330 bp long. Average intron length was 1,040 bp (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 10</xref>). Among the predicted protein-coding genes, 14,763 (66.15%) genes were supported by evidence of <italic>de novo</italic> prediction, homologous prediction, and RNA-seq data (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 11</xref>). In total, 21,056 (94.34%) predicted genes had matching functional annotations in at least one protein database (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 12</xref>). In total, 92.7% of the conserved single-copy orthologs used by BUSCO could be mapped to the Embryophyta_odb9 database (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>), of which 93.1% were complete (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 13</xref>). In addition, we further annotated the non-coding RNAs covering the length of 346,578,452 bp and accounting for 0.13% of the total genome. The predicted non-coding RNAs included 453 tRNA genes, 1,020 rRNA genes, 183 miRNA genes, and 595 snRNA (<xref ref-type="table" rid="T5">Table 5</xref>).</p>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Category of repeat sequences in <italic>C. heterophylla</italic> genome.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Categories</td>
<td valign="top" align="center" colspan="2">RepBase TEs<hr/></td>
<td valign="top" align="center" colspan="2">TE Proteins<hr/></td>
<td valign="top" align="center" colspan="2"><italic>De novo</italic><hr/></td>
<td valign="top" align="center" colspan="2">Combined TEs<hr/></td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Length (bp)</td>
<td valign="top" align="center">Percentage in genome (%)</td>
<td valign="top" align="center">Length (bp)</td>
<td valign="top" align="center">Percentage in genome (%)</td>
<td valign="top" align="center">Length (bp)</td>
<td valign="top" align="center">Percentage in genome (%)</td>
<td valign="top" align="center">Length (bp)</td>
<td valign="top" align="center">Percentage in genome (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DNA</td>
<td valign="top" align="center">9,799,717</td>
<td valign="top" align="center">2.83</td>
<td valign="top" align="center">3,919,867</td>
<td valign="top" align="center">1.13</td>
<td valign="top" align="center">62,545,620</td>
<td valign="top" align="center">18.05</td>
<td valign="top" align="center">68,475,106</td>
<td valign="top" align="center">19.76</td>
</tr>
<tr>
<td valign="top" align="left">LINE</td>
<td valign="top" align="center">7,164,463</td>
<td valign="top" align="center">2.07</td>
<td valign="top" align="center">8,921,980</td>
<td valign="top" align="center">2.57</td>
<td valign="top" align="center">27,339,957</td>
<td valign="top" align="center">7.89</td>
<td valign="top" align="center">30,417,364</td>
<td valign="top" align="center">8.78</td>
</tr>
<tr>
<td valign="top" align="left">SINE</td>
<td valign="top" align="center">24,749</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">27,221</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">51,884</td>
<td valign="top" align="center">0.01</td>
</tr>
<tr>
<td valign="top" align="left">LTR-RT</td>
<td valign="top" align="center">23,038,666</td>
<td valign="top" align="center">6.65</td>
<td valign="top" align="center">16,976,559</td>
<td valign="top" align="center">4.9</td>
<td valign="top" align="center">106,305,286</td>
<td valign="top" align="center">30.67</td>
<td valign="top" align="center">110,258,874</td>
<td valign="top" align="center">31.81</td>
</tr>
<tr>
<td valign="top" align="left">Satellite</td>
<td valign="top" align="center">229,854</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">550,649</td>
<td valign="top" align="center">0.16</td>
<td valign="top" align="center">696,377</td>
<td valign="top" align="center">0.2</td>
</tr>
<tr>
<td valign="top" align="left">Simple_repeat</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">2,587,539</td>
<td valign="top" align="center">0.75</td>
<td valign="top" align="center">2,587,539</td>
<td valign="top" align="center">0.75</td>
</tr>
<tr>
<td valign="top" align="left">Other</td>
<td valign="top" align="center">1,456</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1,456</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">Unknown</td>
<td valign="top" align="center">150,029</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">10,428</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">23,266,035</td>
<td valign="top" align="center">6.71</td>
<td valign="top" align="center">23,410,828</td>
<td valign="top" align="center">6.75</td>
</tr>
<tr>
<td valign="top" align="left">Total</td>
<td valign="top" align="center">39,297,124</td>
<td valign="top" align="center">11.34</td>
<td valign="top" align="center">29,823,067</td>
<td valign="top" align="center">8.6</td>
<td valign="top" align="center">191,137,480</td>
<td valign="top" align="center">55.15</td>
<td valign="top" align="center">200,988,864</td>
<td valign="top" align="center">57.99</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>Annotation statistics for <italic>C. heterophylla</italic> genome.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left" colspan="2">Type</td>
<td valign="top" align="center">Copy</td>
<td valign="top" align="center">Average length (bp)</td>
<td valign="top" align="center">Total length (bp)</td>
<td valign="top" align="center">Percentage (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="2">miRNA</td>
<td valign="top" align="center">183</td>
<td valign="top" align="center">125</td>
<td valign="top" align="center">22,798</td>
<td valign="top" align="center">0.006578</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">tRNA</td>
<td valign="top" align="center">453</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">34,003</td>
<td valign="top" align="center">0.009811</td>
</tr>
<tr>
<td valign="top" align="left">rRNA</td>
<td valign="top" align="center">rRNA</td>
<td valign="top" align="center">1,020</td>
<td valign="top" align="center">127</td>
<td valign="top" align="center">129,161</td>
<td valign="top" align="center">0.037267</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">18S</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">1,228</td>
<td valign="top" align="center">8,594</td>
<td valign="top" align="center">0.00248</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">28S</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">187</td>
<td valign="top" align="center">748</td>
<td valign="top" align="center">0.000216</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">5.8S</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">155</td>
<td valign="top" align="center">1,242</td>
<td valign="top" align="center">0.000358</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">5S</td>
<td valign="top" align="center">1,001</td>
<td valign="top" align="center">118</td>
<td valign="top" align="center">118,577</td>
<td valign="top" align="center">0.034214</td>
</tr>
<tr>
<td valign="top" align="left">snRNA</td>
<td valign="top" align="center">snRNA</td>
<td valign="top" align="center">595</td>
<td valign="top" align="center">114</td>
<td valign="top" align="center">67,860</td>
<td valign="top" align="center">0.01958</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">CD-box</td>
<td valign="top" align="center">408</td>
<td valign="top" align="center">105</td>
<td valign="top" align="center">42,759</td>
<td valign="top" align="center">0.012337</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">HACA-box</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">125</td>
<td valign="top" align="center">7,233</td>
<td valign="top" align="center">0.002087</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">splicing</td>
<td valign="top" align="center">129</td>
<td valign="top" align="center">139</td>
<td valign="top" align="center">17,868</td>
<td valign="top" align="center">0.005156</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">scaRNA</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="S3.SS4">
<title>Phylogenetic Relationship Analysis</title>
<p>Because of the limited number of species in the order Fagales, the annotated genes in the investigated genome were clustered into gene families with 15 closely related species with available genome information, comprising two model plant species (<italic>O. sativa</italic> and <italic>A. thaliana</italic>), five species from the order Fagales (<italic>C. avellana</italic>, <italic>C. mandshurica</italic>, <italic>B. pendula</italic>, <italic>Castanea mollissima</italic>, and <italic>Juglans regia</italic>), three species from the order Euphorbiales (<italic>Ricinus communis</italic>, <italic>Manihot esculenta</italic>, and <italic>Hevea brasiliensis</italic>), four species from the order Salicales (<italic>Salix brachista</italic>, <italic>Populus alba</italic>, <italic>P. trichocarpa</italic>, and <italic>P. euphratica</italic>), and one species from the order Rosales (<italic>Prunus persica</italic>) (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 2</xref>). OrthoMCL gene family clustering analysis revealed that 19,683 <italic>C. heterophylla</italic> genes (88.19%) clustered into 14,421 gene families, and 113 of these were specific for <italic>C. heterophylla</italic> (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 14</xref>). In addition, the results of this analysis showed that <italic>C. heterophylla</italic>, <italic>A. thaliana</italic>, <italic>O. sativa</italic>, <italic>C. avellana</italic>, and <italic>C. mandshurica</italic> shared a core set of 8,024 gene families (<xref ref-type="fig" rid="F2">Figure 2</xref>). Furthermore, after OrthoMCL clustering, 695 single-copy gene families were selected from the 16 analyzed species for subsequent analysis. Genetic evolutionary analysis revealed that <italic>C. heterophylla</italic> was a sister group to <italic>C. avellana</italic>, and the estimated divergence time between them was 11.1 (9.8&#x2013;13.9) million years ago (<xref ref-type="fig" rid="F3">Figure 3A</xref>). Our phylogenetic tree suggested that the species from the same order have a close genetic relationship, and the relationships between the 16 investigated species were consistent with their taxonomic positions and the results of previous phylogenetic analyses (<xref ref-type="bibr" rid="B22">Flora and Sciences, 1979</xref>; <xref ref-type="bibr" rid="B14">Cheng et al., 2018a</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Comparative genomic analysis of <italic>C. heterophylla</italic> and other species. Venn diagram representing the cluster distribution of shared gene family among <italic>C. heterophylla</italic> and four other species, including <italic>A. thaliana</italic>, <italic>O. sativa</italic>, <italic>C. avellana</italic>, and <italic>C. mandshurica</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Phylogenetic and evolutionary analysis of the <italic>C. heterophylla</italic> genome. Among all used species, members belong to Rosales, Fagales, Euphorbiales, and Salicales are indicated in pink, orange, purple, and green, respectively. <bold>(A)</bold> The phylogenetic tree is constructed based on a concatenated alignment of 695 single-copy ortholog gene sets. The estimated divergence times (million years ago, MYA) are indicated at each node. The reference points used for calibration are marked with red dots. <bold>(B)</bold> Expansions and contractions of gene families are indicated in green and red, respectively. The pie charts show the proportions of conserved (blue), expanded (green) and contracted (red) gene families among all gene families.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g003.tif"/>
</fig>
</sec>
<sec id="S3.SS5">
<title>The Expansion and Contraction of Gene Families</title>
<p>The expansion and contraction of gene families play critical roles in driving the phenotypic diversification of plants. We identified 352 expanded and 1,197 contracted gene families in <italic>C. heterophylla</italic> relative to <italic>C. avellana</italic> (<xref ref-type="fig" rid="F3">Figure 3B</xref>). Gene Ontology classification analysis of expanding and contracting genes suggested that the most abundant genes were related to cellular, metabolic, and localization processes, and they were mainly located in membrane and intracellular organelles of cellular anatomical entity and executed molecular functions of catalytic activity and binding (<xref ref-type="fig" rid="F4">Figure 4A</xref>). Gene Ontology enrichment analysis indicated that multiple GO terms were significantly enriched, including inorganic anion and sulfate transmembrane transport, ATP binding, protein phosphorylation, and catalytic activity (<xref ref-type="fig" rid="F4">Figure 4B</xref>). KEGG enrichment analysis revealed multiple significantly enriched KEGG pathways, including fatty acid elongation and plant&#x2013;pathogen interaction (<xref ref-type="fig" rid="F4">Figures 4C,D</xref>). These results may indicate significant differences in fatty acid biosynthesis and environmental adaptation between <italic>C. heterophylla</italic> and <italic>C. avellana</italic>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Gene ontology (GO) and KEGG enrichment of genes belonged to expansion and contraction gene families in <italic>C. heterophylla</italic> genome. <bold>(A,B)</bold> Represent the GO enrichment of genes; <bold>(C,D)</bold> represent the KEGG enrichment of genes.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g004.tif"/>
</fig>
</sec>
<sec id="S3.SS6">
<title>Construction of the HazelOmics Database</title>
<p>In order to facilitate future genetic studies and molecular breeding of hazelnut, the HazelOmics Database (HOD<sup><xref ref-type="fn" rid="footnote3">3</xref></sup>) was constructed based on genome sequencing and assembling of the Chinese cultivar <italic>C. heterophylla</italic> &#x2018;Jizhen 6.&#x2019; The database includes the assembled genome sequences, predicted genes, CDS, and protein sequences, along with their functional annotations. The server was explored using several tools and software programs with a user-friendly and straightforward interface, including Spring Boot, JDK8, and MySQL. Locations where hazels were planted and collected were marked on the map on the website homepage. HOD offers four major homepage sections or entries for the users to choose from, named JBrowse (<xref ref-type="bibr" rid="B7">Buels et al., 2016</xref>), BLAST, Primer design, and Download. Detailed species and genome data are presented in the Genome subsection. The widely used genome browser JBrowse (<xref ref-type="bibr" rid="B7">Buels et al., 2016</xref>) was employed for displaying genome sequences, gene positions, and structures. To provide a gene retrieval function in HOD, we used the embedded BLAST sequence server tool. All known sequences of hazelnut, including genome, CDS, and protein sequences stored in HOD, are available for alignment using the BLAST program. Users are permitted to retrieve and download the genome, CDS, and protein sequences along with their corresponding annotation files in GFF3 format. Conveniently, an entry with the embedded Primer 3 tool is also offered on the website homepage for primer design.</p>
<p>For gene and genome region search, two search options and entries are provided on the homepage: &#x201C;Search by gene&#x201D; and &#x201C;Search by region.&#x201D; Furthermore, the webpage provides an interface for querying gene ID, name, or function in the &#x201C;gene search option&#x201D; (<xref ref-type="fig" rid="F5">Figures 5A,B</xref>). For sequence alignment and homology queries, the Blast program was embedded in HOD, and the entry for sequence blasting is also available on the website homepage (<xref ref-type="fig" rid="F6">Figure 6A</xref>). The parameters for filtering low-homology sequences of the returned blast hits can be manually set based on user demands. Users can provide all available sequences (such as genome, transcript, CDS, and protein sequences) in the textbox or upload the sequence file for homology query by comparison with sequences stored in HOD, and query results will be sorted according to the blast scores or E-value on the results page and can be downloaded in FASTA, XML, or TSV format. All hits are also linked to a graphic output with detailed information, including the sequence alignment sketch map, blast E-value, and identities between query and hit sequences. In HOD, a &#x201C;Primer design&#x201D; option is also embedded in the main menu on the homepage, which can conveniently be used for subsequent molecular research (<xref ref-type="fig" rid="F6">Figure 6B</xref>). The usage and fundamental principles of this option are similar to those of the mainstream software or tools for primer design, such as Primer Premier 5. Several parameters of predicted primers for the target sequence, containing primer size (nt), GC content (%), and primer Tm (&#x00B0;C), can be defined by the user. All theoretically usable primer pairs will be listed in the results page, and they will be comprehensively ordered by primer quality considering several parameters, including primer GC content, Tm, any or 3&#x2032; self-complementarity, and hairpin.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Gene search and JBrowse functions. <bold>(A)</bold> The database provided an interface for querying with gene ID, name or function in genes search option. <bold>(B)</bold> Users are also able to visualize the location of genes in the genome with the help of JBrowse.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g005.tif"/>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Blast and primer design function. <bold>(A)</bold> Users can search for <italic>C. heterophylla</italic> homologous genes by using Blast function. <bold>(B)</bold> Users also can design primers using the embedded &#x201C;Primer 3&#x201D; tool. Several parameters of anticipant primers for target sequence, containing primer size (nt), GC content (%) and primer Tm (&#x00B0;C), can be users defined.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g006.tif"/>
</fig>
</sec>
<sec id="S3.SS7">
<title>Expanding Genes Related to Unsaturated Fatty Acids Biosynthesis in the Hazelnut Genome</title>
<p>Hazelnut kernels are rich in fatty acids. Analysis of gene expansion and gene contraction in <italic>C. heterophylla</italic> and its related species revealed that the expansion and contraction of <italic>C. heterophylla</italic> genes were significantly enriched in the KEGG pathway of fatty acid biosynthesis, which indicated that changes in fatty acid gene families led to the formation of special adaptive mechanisms in <italic>C. heterophylla</italic>. To further understand the peculiarity of fatty acid synthesis in <italic>C. heterophylla</italic>, we analyzed the genome evolution of <italic>C. heterophylla</italic> together with 14 other important oil plants, including <italic>Arachis duranensis</italic>, <italic>A. thaliana</italic>, <italic>Brassica napus</italic>, <italic>C. avellana</italic>, <italic>Camellia sinensis</italic>, <italic>Glycine max</italic>, <italic>Juglans regia</italic>, <italic>Ostryopsis davidiana, Olea europaea</italic>, <italic>Ostryopsis intermedia</italic>, <italic>Ostryopsis nobilis</italic>, <italic>O. sativa</italic>, <italic>Sesamum indicum</italic>, <italic>and Zea mays</italic> (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 2</xref>). In total, 1,277 unique family genes were found in <italic>C. heterophylla</italic>, belonging to 1,272 unique families (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 15</xref>), which may be related to the specificity of <italic>C. heterophylla</italic>. Random occurrence and death patterns were used to simulate the expansion and contraction events of gene families in each lineage of the evolutionary tree, and 439 contracted genes and 3,810 expanded genes were identified. These genes were further subjected to KEGG enrichment analysis (<xref ref-type="supplementary-material" rid="DS1">Supplementary Tables 16</xref>, <xref ref-type="supplementary-material" rid="DS1">17</xref>). We focused on the differences in biological characteristics of fatty acid synthesis between <italic>C. heterophylla</italic> and other oil plants. A total of 30 genes related to the synthesis of unsaturated fatty acids were identified in the genome of <italic>C. heterophylla</italic>, and 17 expanded genes were significantly enriched in the biosynthesis of the unsaturated fatty acids pathway (ko01040). Therefore, compared with the 14 other oil plants mentioned above, <italic>C. heterophylla</italic> was unique in unsaturated fatty acid synthesis, which is consistent with the extremely high unsaturated fatty acid level in the kernels of <italic>C. heterophylla</italic> (<xref ref-type="bibr" rid="B59">Song et al., 2008</xref>).</p>
<p>We previously obtained the transcript sequencing data of hazelnut at four successive ovule developmental stages: ovule formation (stage Ov1), early ovule growth (stage Ov2), rapid ovule growth (stage Ov3), and ovule maturity (Ov4). We reanalyzed the sequencing data using our assembled genome as the reference genome, along with the HazelOmics Database. The transcriptome data were aligned to our assembled genome, and the results showed that 81.48&#x2013;84.79% reads could be aligned to the genome, indicating that our genome assembly is of good quality and can be used as a reference genome to meet the needs of information analysis (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 18</xref>). In the homepage of the HOD database, IDs of the 17 expanded genes were used to search for gene annotation, and these genes encoded fatty acid desaturase (FAD), steroid 5-alpha-reductase DET (DET), 3-ketoacyl-CoA thiolase (KAT), beta-ketoacyl acyl carrier protein reductase (BKR), and stearoyl-acyl carrier protein desaturase (SAD). A phylogenetic tree was further constructed by aligning homologous CDS sequences of these five key enzymes from <italic>C. heterophylla</italic> and 14 other oil plants, showing that 284 sequences were cluster into five clades, four of which (BKR, DET, FAD, and KAT) were relatively conservative while the other one (SAD) with lower conservatism was further divided into different subclusters (<xref ref-type="fig" rid="F7">Figure 7A</xref>). Meanwhile, our interesting 17 expanded genes were evenly distributed in five evolutionary clades consisted with the functional annotation of HOD. Among these, the expression levels of genes encoding FAD, KAT, and SAD were relatively high during the ovule maturity stage, which is consistent with high fatty acid levels at this stage, indicating that they may play an important role in regulating the biosynthesis of unsaturated fatty acids (<xref ref-type="fig" rid="F7">Figure 7B</xref>).</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Identification and expression analyses of expanded genes in biosynthesis of unsaturated fatty acids pathway (ko01040). <bold>(A)</bold> Phylogenetic analysis of 17 expanded genes&#x2019; CDS sequences in ko01040 pathway from 15 oil plant species, including <italic>A. duranensis</italic>, <italic>A. thaliana</italic>, <italic>B. napus</italic>, <italic>C. avellana</italic>, <italic>C. sinensis</italic>, <italic>G. max</italic>, <italic>J. regia</italic>, <italic>O. davidiana, O. europaea</italic>, <italic>O. intermedia</italic>, <italic>O. nobilis</italic>, <italic>O. sativa</italic>, <italic>S. indicum</italic>, <italic>Z. mays</italic>, and <italic>C. heterophylla</italic>. Seventeen expanded genes of <italic>C. heterophylla</italic> are marked with red stars. <bold>(B)</bold> Genes expression analysis of 17 expanded genes in ko01040 pathway at four successive ovule developmental stages. <italic>C. het</italic>, <italic>C. heterophylla</italic>; BKR, beta-ketoacyl acyl carrier protein reductase; DET, steroid 5-alpha-reductase DET; FAD, fatty acid desaturase; KAT, 3-ketoacyl-CoA thiolase; SAD, stearoyl-acyl carrier protein desaturase.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-12-766548-g007.tif"/>
</fig>
</sec>
</sec>
<sec id="S4" sec-type="discussion">
<title>Discussion</title>
<p>Consistency, integrity, and accuracy are important parameters for evaluating the quality of genome assembly. SMRT sequencing and assembly, followed by error corrections using the Hi-C assisted genome assembly strategy, could effectively improve the genome assembly quality (<xref ref-type="bibr" rid="B30">Jonas et al., 2017</xref>). Using this strategy, our sequencing and sequence assembly generated a reference genome of hazelnut with contig N50 of 2.03 Mb. The Hi-C data were then used to cluster 386 contigs into 11 chromosomes with a scaffold N50 of 32.88 Mb, which was consistent with a karyotype of 2n = 22 chromosomes of <italic>C. heterophylla</italic> (<xref ref-type="bibr" rid="B24">Guo et al., 2009</xref>). <italic>B. pendula</italic> and <italic>C. avellana</italic> are the only Betulaceae species with available genome information (<xref ref-type="bibr" rid="B48">Pekkinen et al., 2005</xref>; <xref ref-type="bibr" rid="B55">Sathuvalli et al., 2011</xref>; <xref ref-type="bibr" rid="B10">Chen et al., 2019</xref>). Thus, the genome of <italic>C. avellana</italic> plays an important role in interpreting transcriptome data and gene function analysis of hazelnut species. The assembly of the <italic>C. avellana</italic> &#x2018;Jefferson&#x2019; genome was based only on the Illumina sequencing approach because of the limitation of technical conditions, and it was found to have a scaffold N50 of 21.5 kb (<xref ref-type="bibr" rid="B55">Sathuvalli et al., 2011</xref>). Recently, the chromosome-scale genome of <italic>C. avellana</italic> &#x2018;Tombul&#x2019; was assembled, with a total length of 370 Mb and scaffold N50 of 36.65 Mb, using Illumina and Pacbio sequencing (<xref ref-type="bibr" rid="B41">Lucas et al., 2021</xref>); furthermore, the chromosome-scale genome of <italic>C. mandshurica</italic> was assembled, with a total length of 368 Mb and scaffold N50 of 14.85 Mb, using Illumina and Nanopore sequencing. Our scaffold N50 was approximately 1,500 times longer than that of &#x2018;Jefferson,&#x2019; close to that of <italic>C. avellana</italic> &#x2018;Tombul,&#x2019; and much higher than that of <italic>C. mandshurica</italic> (<xref ref-type="bibr" rid="B38">Li et al., 2021</xref>). The completeness and high quality of the <italic>C. heterophylla</italic> genome assembled in the present study was further verified by BUSCO alignment (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>) and Hi-C data mapping (<xref ref-type="bibr" rid="B56">Servant et al., 2015</xref>). A total of 94.72% of the test genes could be mapped to the Embryophyta_odb9 database (<xref ref-type="bibr" rid="B58">Simao et al., 2015</xref>), of which 92.92% were complete, and LAI was 14.20, indicating a high integrity of the assembled genome. The accuracy of the assembled genome was verified by the homozygous SNP rate of 0.011% and the homozygous indel rate of 0.037%. Thus, the obtained high-quality reference genome of <italic>C. heterophylla</italic> can be beneficial for further molecular breeding and gene function studies of <italic>Corylus</italic> species.</p>
<p>A new <italic>C. heterophylla</italic> genome based on Nanopore long reads was assembled in 2021, with a genome size of 371 Mb and N50 contig size of 2.07 Mb, and 27,591 protein-coding genes were predicted (<xref ref-type="bibr" rid="B69">Zhao et al., 2021</xref>). We obtained a 346-Mb genome and predicted 22,319 protein-coding genes. The genome size and number of protein-coding genes predicted in the current study were less than those reported in a previous study (<xref ref-type="bibr" rid="B69">Zhao et al., 2021</xref>). <xref ref-type="bibr" rid="B69">Zhao et al. (2021)</xref> collected their samples in Yanqing, Beijing, China (40.54&#x00B0;N, 116.06&#x00B0;E), whereas our samples were collected in Yitong (43.34&#x00B0;N, 125.30&#x00B0;E), Jilin, China. The distance between the two locations is more than 800 km, and expectedly, there are differences in the genetic background of these wild plants. Furthermore, these genetic background differences may partly explain the differences in genome size and number of predicted protein-coding genes. Given that hazelnut is a highly heterozygous species and because the accuracy and completeness of our assembly were confirmed, we speculated that some heterozygous sequences were redundant in their assembly, which would lead to redundant genes. N50 contig size was similar for the two assemblies (both &#x223C;2 Mb), but the size of the N90 contig, as determined in the present study, was 424 kb, which was markedly greater than the size of 125 kb for the same contig, as reported by <xref ref-type="bibr" rid="B69">Zhao et al. (2021)</xref>. These data may suggest that our assembly continuity has notable advantages, and our assembly better avoids the prediction of partial gene. To further investigate genomic conservation and variation, genome-wide collinearity comparison was performed between our self-assembled and previous <italic>C. heterophylla</italic> (<xref ref-type="bibr" rid="B69">Zhao et al., 2021</xref>) genomes. Two <italic>C. heterophylla</italic> genomes had extensive collinearity and some differences (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 1</xref> and <xref ref-type="supplementary-material" rid="DS1">Supplementary Table 19</xref>). Regions of conserved synteny between our self-assembled and previous <italic>C. heterophylla</italic> genomes shared 19,784 and 21,305 protein-coding genes as well as covered 88.64% and 77.22% of collinearity region, respectively (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 1</xref>). However, owing to the syntenic relationship, the chromosome showed some structural variations (e.g., inversions, translocations, and duplications), and these were mainly located in chromosome 1,2,4,5, and 7 (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 1</xref>). The structural variation and detected mutations, including SNPs, indels, and CNVs (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 19</xref>), may suggest the evolutionary differences between our self-assembled and previous <italic>C. heterophylla</italic> (<xref ref-type="bibr" rid="B69">Zhao et al., 2021</xref>), which was consistent with the fact that the distance between two sampling locations is more than 800 km.</p>
<p>The <italic>C. heterophylla</italic> phylogenetic tree was constructed using the assembled genome based on SMRT and Hi-C sequencing, and we found that among all related species with available genome information, <italic>C. heterophylla</italic> is a sister group to <italic>C. avellana</italic>, which was consistent with their taxonomic positions and the results of previous phylogenetic analyses (<xref ref-type="bibr" rid="B22">Flora and Sciences, 1979</xref>; <xref ref-type="bibr" rid="B14">Cheng et al., 2018a</xref>). It was estimated that the divergence time between <italic>C. heterophylla</italic> and <italic>C. avellana</italic> was 11.1 (9.8&#x2013;13.9) million years ago. On the basis of these results, the expanded and contracted gene families of <italic>C. heterophylla</italic> were further identified. Hazelnuts have a high fatty acid content, and the results of our KEGG enrichment analysis showed significant differences in KEGG pathways of fatty acid elongation and plant&#x2013;pathogen interaction between <italic>C. heterophylla</italic> and <italic>C. avellana</italic>, suggesting significant differences in the nutrient content of their nuts and in their plant&#x2013;pathogen interactions. Collectively, these results suggested that fatty acid content and plant&#x2013;pathogen interactions may be responsible for gene expansion and contraction in <italic>C. heterophylla</italic>.</p>
<p>A database is a comprehensive collection of related data organized for convenient access. As the European hazelnut (<italic>C. avellana</italic>) Genomic Resource Portal (EHG)<sup><xref ref-type="fn" rid="footnote4">4</xref></sup> only provides a link for the download of data (<xref ref-type="bibr" rid="B55">Sathuvalli et al., 2011</xref>; <xref ref-type="bibr" rid="B52">Rowley et al., 2018</xref>), which was based on Illumina sequencing of <italic>C. avellana</italic>, it is not a database in the strict sense due to the absence of basic genomic analysis and data mining functions. Based on SMRT sequencing and Hi-C assisted assembly, we established a database of <italic>C. heterophylla</italic>, which is a unique species of <italic>Corylus</italic> from China (<xref ref-type="bibr" rid="B13">Cheng et al., 2019</xref>; <xref ref-type="bibr" rid="B39">Liu et al., 2020</xref>), with a large distribution area and high biodiversity. Moreover, currently, <italic>C. heterophylla</italic> is the main source of hazelnut products in China. The establishment of our database with mass data will be highly beneficial for promoting the molecular breeding of hazelnut. Our database is the only available genome database of <italic>Corylus</italic> at present. Its function module is simple and clear, and data comparison and mining are convenient and practical. Moreover, different functional modules can be added for database expansion in the future, and the database is convenient for use in studies related to hazelnut.</p>
<p>Fatty acids, which account for 64.48&#x2013;71.92% of hazelnut kernels (<xref ref-type="bibr" rid="B21">Erdogan and Aygun, 2005</xref>), form the most abundant nutrients in hazelnut kernels. FAD, SAD, and KAT are important enzymes involved in the biosynthesis of unsaturated fatty acids in higher plants. FAD catalyzes the formation of double bonds at specific positions of the fatty acid chain and determines the composition and proportion of unsaturated fatty acids (<xref ref-type="bibr" rid="B28">Huang et al., 2018</xref>). In <italic>Oryza sativa</italic>, <italic>OsFAD2</italic> is involved in fatty acid desaturation and maintenance of the membrane lipid balance in cells, possibly improving the tolerance of rice to low-temperature stress (<xref ref-type="bibr" rid="B57">Shi et al., 2012</xref>). SAD is located in the plastid stroma, catalyzing the desaturation of stearoyl-ACP to oleyl-ACP. SDA determines the ratio of saturated fatty acids and unsaturated fatty acids and is involved in cold acclimation in plants (<xref ref-type="bibr" rid="B34">Li et al., 2015</xref>). KAT catalyzes the &#x03B2;-oxidation of fatty acids and is involved in ABA signaling in <italic>Arabidopsis</italic>; furthermore, it is expected to participate in the regulation of plant adaptation to adverse conditions, such as drought and cold stresses (<xref ref-type="bibr" rid="B29">Jiang et al., 2011</xref>). Collectively, these genes participating in the biosynthesis of unsaturated fatty acids play an important role in cold resistance in plants. <italic>C. heterophylla</italic> is an endemic species of <italic>Corylus</italic> in China. The proportion of unsaturated fatty acids in the <italic>C. heterophylla</italic> kernel is 94&#x2013;97%, which is higher than that of <italic>C. avellana</italic> (92&#x2013;93%) (<xref ref-type="bibr" rid="B59">Song et al., 2008</xref>) and much higher than that of most oil plants. <italic>C. heterophylla</italic> is a species with well-known cold-resistant capability, resisting the extreme low winter temperature of &#x2212;48&#x00B0;C in Northeast China (<xref ref-type="bibr" rid="B11">Chen et al., 2012</xref>). In total, 17 expanded genes were found to be significantly enriched in the pathway of unsaturated fatty acid synthesis. Transcriptome analysis at four stages of ovule development showed that the expanded genes of <italic>FAD</italic> (Cor0058010.1), <italic>SAD</italic> (Cor0141290.1), and <italic>KAT</italic> (Cor0122500.1) were highly upregulated at the ovule maturity stage, when fatty acids were most abundant. We deduced that the expansion of <italic>FAD</italic>, <italic>SAD</italic>, and <italic>KAT</italic> may promote high unsaturated fatty acid content in kernels and improve the adaptability of <italic>C. heterophylla</italic> to the cold climate of Northeast China, which may explain why <italic>C. heterophylla</italic> became the dominant <italic>Corylus</italic> species in the area. The important candidate genes for regulating the biosynthesis of unsaturated fatty acids in <italic>C. heterophylla</italic> proposed in this study may also provide a scientific basis for the breeding of hazelnut. In conclusion, our research enhances the understanding of unsaturated fatty acid biosynthesis in hazelnut, and the reference genome and database constructed in this study provide an important platform for future studies on hazelnut and its related species.</p>
</sec>
<sec id="S5" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are publicly available. This data can be found here: Raw sequencing data (PacBio, Illumina, and Hi-C data) for <italic>de novo</italic> whole-genome assembly have been deposited in the NCBI Sequence Read Archive PRJNA664441 (<ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA664441">https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA664441</ext-link>). The assembled genome has been deposited in DDBJ/ENA/GenBank under accession number <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="JADFUG000000000">JADFUG000000000</ext-link>. The version described in this article is version <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="JADFUG000000000">JADFUG000000000</ext-link>.</p>
</sec>
<sec id="S6">
<title>Author Contributions</title>
<p>JL and YC contributed to study conception and design, collection and/or assembly of data, and data analysis and interpretation. JL and XZ contributed to writing the manuscript. HW, XZ, HH, and DW prepared samples. All authors have read and approved the manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="S7" sec-type="funding-information">
<title>Funding</title>
<p>This study was supported by grants from the National Natural Science Foundation of China (Nos. 31770723; 32171840) and the Outstanding Talents Team Project of Department of Science and Technology of Jilin Province (No. 20210509033RQ). The funding bodies had no role in the design of the study, collection, analysis, and interpretation of data, or in writing the manuscript.</p>
</sec>
<ack>
<p>We would like to thank the reviewers for their helpful comments and proposals on the manuscript.</p>
</ack>
<sec id="S9" sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fpls.2021.766548/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fpls.2021.766548/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.DOCX" id="DS1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amaral</surname> <given-names>J. S.</given-names></name> <name><surname>Casal</surname> <given-names>S.</given-names></name> <name><surname>Seabra</surname> <given-names>R. M.</given-names></name> <name><surname>Oliveira</surname> <given-names>B. P. P.</given-names></name></person-group> (<year>2006</year>). <article-title>Effects of roasting on hazelnut lipids.</article-title> <source><italic>J. Agric. Food Chem.</italic></source> <volume>54</volume> <fpage>1315</fpage>&#x2013;<lpage>1321</lpage>. <pub-id pub-id-type="doi">10.1021/jf052287v</pub-id> <pub-id pub-id-type="pmid">16478254</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bairoch</surname> <given-names>A.</given-names></name> <name><surname>Apweiler</surname> <given-names>R.</given-names></name></person-group> (<year>2000</year>). <article-title>The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>28</volume> <fpage>45</fpage>&#x2013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.45</pub-id> <pub-id pub-id-type="pmid">10592178</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balakrishnan</surname> <given-names>R.</given-names></name> <name><surname>Harris</surname> <given-names>M. A.</given-names></name> <name><surname>Huntley</surname> <given-names>R.</given-names></name> <name><surname>Van</surname> <given-names>A. K.</given-names></name> <name><surname>Cherry</surname> <given-names>J. M.</given-names></name></person-group> (<year>2013</year>). <article-title>A guide to best practices for Gene Ontology (GO) manual annotation.</article-title> <source><italic>Database</italic></source> <volume>2013</volume>:<issue>bat054</issue>. <pub-id pub-id-type="doi">10.1093/database/bat054</pub-id> <pub-id pub-id-type="pmid">23842463</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bao</surname> <given-names>W. D.</given-names></name> <name><surname>Kojima</surname> <given-names>K. K.</given-names></name> <name><surname>Kohany</surname> <given-names>O.</given-names></name></person-group> (<year>2015</year>). <article-title>Repbase update, a database of repetitive elements in eukaryotic genomes.</article-title> <source><italic>Mobile DNA</italic></source> <volume>6</volume>:<issue>11</issue>. <pub-id pub-id-type="doi">10.1186/s13100-015-0041-9</pub-id> <pub-id pub-id-type="pmid">26045719</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barker</surname> <given-names>M. S.</given-names></name> <name><surname>Vogel</surname> <given-names>H.</given-names></name> <name><surname>Schranz</surname> <given-names>M. E.</given-names></name></person-group> (<year>2010</year>). <article-title>Paleopolyploidy in the Brassicales: analyses of the <italic>Cleome</italic> transcriptome elucidate the history of genome duplications in <italic>Arabidopsis</italic> and other Brassicales.</article-title> <source><italic>Genome Biol. Evol.</italic></source> <volume>1</volume> <fpage>391</fpage>&#x2013;<lpage>399</lpage>. <pub-id pub-id-type="doi">10.1093/gbe/evp040</pub-id> <pub-id pub-id-type="pmid">20333207</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benson</surname> <given-names>G.</given-names></name></person-group> (<year>1999</year>). <article-title>Tandem repeats finder: a program to analyze DNA sequences.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>27</volume> <fpage>573</fpage>&#x2013;<lpage>580</lpage>. <pub-id pub-id-type="doi">10.1093/nar/27.2.573</pub-id> <pub-id pub-id-type="pmid">9862982</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buels</surname> <given-names>R.</given-names></name> <name><surname>Yao</surname> <given-names>E.</given-names></name> <name><surname>Diesh</surname> <given-names>C. M.</given-names></name> <name><surname>Hayes</surname> <given-names>R. D.</given-names></name> <name><surname>Munoz-Torres</surname> <given-names>M.</given-names></name> <name><surname>Helt</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>JBrowse: a dynamic web platform for genome visualization and analysis.</article-title> <source><italic>Genome Biol.</italic></source> <volume>17</volume>:<issue>66</issue>. <pub-id pub-id-type="doi">10.1186/s13059-016-0924-1</pub-id> <pub-id pub-id-type="pmid">27072794</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burge</surname> <given-names>C.</given-names></name> <name><surname>Karlin</surname> <given-names>S.</given-names></name></person-group> (<year>1997</year>). <article-title>Prediction of complete gene structures in human genomic DNA.</article-title> <source><italic>J. Mol. Biol.</italic></source> <volume>268</volume> <fpage>78</fpage>&#x2013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.1997.0951</pub-id> <pub-id pub-id-type="pmid">9149143</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cantarel</surname> <given-names>B. L.</given-names></name> <name><surname>Korf</surname> <given-names>I.</given-names></name> <name><surname>Robb</surname> <given-names>S. M.</given-names></name> <name><surname>Parra</surname> <given-names>G.</given-names></name> <name><surname>Ross</surname> <given-names>E.</given-names></name> <name><surname>Moore</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes.</article-title> <source><italic>Genome Res.</italic></source> <volume>18</volume> <fpage>188</fpage>&#x2013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1101/gr.6743907</pub-id> <pub-id pub-id-type="pmid">18025269</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>S.</given-names></name> <name><surname>Lin</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Chen</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Genome-wide analysis of NAC gene family in <italic>Betula pendula</italic>.</article-title> <source><italic>Forests</italic></source> <volume>10</volume>:<issue>741</issue>. <pub-id pub-id-type="doi">10.3390/f10090741</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>G.</given-names></name> <name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name></person-group> (<year>2012</year>). <article-title>Cloning and temporal-spatial expression of a CBF homolog associated with cold acclimation from <italic>Corylus heterophylla</italic>.</article-title> <source><italic>Sci. Silvae Sin.</italic></source> <volume>48</volume> <fpage>167</fpage>&#x2013;<lpage>172</lpage>.</citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Guo</surname> <given-names>W.</given-names></name> <name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Transcriptome sequencing and identification of cold tolerance genes in hardy <italic>Corylus</italic> species (<italic>C. heterophylla</italic> Fisch) floral buds.</article-title> <source><italic>Plos One</italic></source> <volume>9</volume>:<issue>e108604</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0108604</pub-id> <pub-id pub-id-type="pmid">25268521</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Mou</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>iTRAQ protein profiling reveals candidate proteins regulating ovary and ovule differentiation in pistillate inflorescences after pollination in hazel.</article-title> <source><italic>Tree Genet. Genomes</italic></source> <volume>15</volume>:<issue>21</issue>. <pub-id pub-id-type="doi">10.1007/s11295-019-1328-7</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Zhao</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name></person-group> (<year>2018a</year>). <article-title>Analysis of SSR markers information and primer selection from transcriptome sequence of hybrid hazelnut <italic>Corylus heterophylla</italic> &#x00D7; <italic>C. avellana</italic>.</article-title> <source><italic>Acta Hortic. Sin.</italic></source> <volume>45</volume> <fpage>139</fpage>&#x2013;<lpage>148</lpage>. <pub-id pub-id-type="doi">10.16420/j.issn.0513-353x.2017-0281</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Ai</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name></person-group> (<year>2018b</year>). <article-title>Identification of genes regulating ovary differentiation after pollination in hazel by comparative transcriptome analysis.</article-title> <source><italic>BMC Plant Biol.</italic></source> <volume>18</volume>:<issue>84</issue>. <pub-id pub-id-type="doi">10.1186/s12870-018-1296-3</pub-id> <pub-id pub-id-type="pmid">29739322</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>do Valle</surname> <given-names>&#x00CD;F.</given-names></name> <name><surname>Giampieri</surname> <given-names>E.</given-names></name> <name><surname>Simonetti</surname> <given-names>G.</given-names></name> <name><surname>Padella</surname> <given-names>A.</given-names></name> <name><surname>Manfrini</surname> <given-names>M.</given-names></name> <name><surname>Ferrari</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data.</article-title> <source><italic>BMC Bioinform.</italic></source> <volume>17</volume> <fpage>27</fpage>&#x2013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-016-1190-7</pub-id> <pub-id pub-id-type="pmid">28185561</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>W.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Tan</surname> <given-names>Y.</given-names></name> <name><surname>Yongsheng</surname> <given-names>J. I.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name></person-group> (<year>2010</year>). <article-title>Study on the process of female flower bud differentiation in Ping&#x2019;ou hybrid hazelnut.</article-title> <source><italic>J. Fruit Sci.</italic></source> <volume>27</volume> <fpage>812</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.13925/j.cnki.gsxb.2010.05.028</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dudchenko</surname> <given-names>O.</given-names></name> <name><surname>Batra</surname> <given-names>S. S.</given-names></name> <name><surname>Omer</surname> <given-names>A. D.</given-names></name> <name><surname>Nyquist</surname> <given-names>S. K.</given-names></name> <name><surname>Hoeger</surname> <given-names>M.</given-names></name> <name><surname>Durand</surname> <given-names>N. C.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>De novo assembly of the <italic>Aedes aegypti</italic> genome using Hi-C yields chromosome-length scaffolds.</article-title> <source><italic>Science</italic></source> <volume>356</volume> <fpage>92</fpage>&#x2013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1126/science.aal3327</pub-id> <pub-id pub-id-type="pmid">28336562</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Durand</surname> <given-names>N. C.</given-names></name> <name><surname>Robinson</surname> <given-names>J. T.</given-names></name> <name><surname>Shamim</surname> <given-names>M. S.</given-names></name> <name><surname>Machol</surname> <given-names>I.</given-names></name> <name><surname>Mesirov</surname> <given-names>J. P.</given-names></name> <name><surname>Lander</surname> <given-names>E. S.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom.</article-title> <source><italic>Cell Syst.</italic></source> <volume>3</volume> <fpage>99</fpage>&#x2013;<lpage>101</lpage>. <pub-id pub-id-type="doi">10.1016/j.cels.2015.07.012</pub-id> <pub-id pub-id-type="pmid">27467250</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Emms</surname> <given-names>D. M.</given-names></name> <name><surname>Kelly</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy.</article-title> <source><italic>Genome Biol.</italic></source> <volume>16</volume> <fpage>157</fpage>&#x2013;<lpage>157</lpage>. <pub-id pub-id-type="doi">10.1186/s13059-015-0721-2</pub-id> <pub-id pub-id-type="pmid">26243257</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Erdogan</surname> <given-names>V.</given-names></name> <name><surname>Aygun</surname> <given-names>A.</given-names></name></person-group> (<year>2005</year>). <article-title>Fatty acid composition and physical properties of Turkish tree hazel nuts.</article-title> <source><italic>Chem. Nat. Compd.</italic></source> <volume>41</volume> <fpage>378</fpage>&#x2013;<lpage>381</lpage>. <pub-id pub-id-type="doi">10.1007/s10600-005-0156-1</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Flora</surname> <given-names>C. O. C.</given-names></name> <name><surname>Sciences</surname> <given-names>C. A. O.</given-names></name></person-group> (<year>1979</year>). <source><italic>Flora of China</italic></source>, <volume>Vol. 21</volume>. <publisher-loc>Beijing</publisher-loc>: <publisher-name>Science Press</publisher-name>.</citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffiths-Jones</surname> <given-names>S.</given-names></name> <name><surname>Moxon</surname> <given-names>S.</given-names></name> <name><surname>Marshall</surname> <given-names>M.</given-names></name> <name><surname>Khanna</surname> <given-names>A.</given-names></name> <name><surname>Bateman</surname> <given-names>A.</given-names></name></person-group> (<year>2005</year>). <article-title>Rfam: annotating non-coding RNAs in complete genomes.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>33</volume> <fpage>121</fpage>&#x2013;<lpage>124</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gki081</pub-id> <pub-id pub-id-type="pmid">15608160</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Xing</surname> <given-names>S.</given-names></name> <name><surname>Yingmin</surname> <given-names>M. A.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Han</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>Analysis of karyotype on fifteen hazelnut germplasms.</article-title> <source><italic>Acta Hortic. Sin.</italic></source> <volume>36</volume> <fpage>27</fpage>&#x2013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.16420/j.issn.0513-353x.2009.01.009</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haas</surname> <given-names>B. J.</given-names></name> <name><surname>Papanicolaou</surname> <given-names>A.</given-names></name> <name><surname>Yassour</surname> <given-names>M.</given-names></name> <name><surname>Grabherr</surname> <given-names>M.</given-names></name> <name><surname>Blood</surname> <given-names>P. D.</given-names></name> <name><surname>Bowden</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title><italic>De novo</italic> transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.</article-title> <source><italic>Nat. Protoc.</italic></source> <volume>8</volume> <fpage>1494</fpage>&#x2013;<lpage>1512</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2013.084</pub-id> <pub-id pub-id-type="pmid">23845962</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>M. V.</given-names></name> <name><surname>Thomas</surname> <given-names>G. W. C.</given-names></name> <name><surname>Jose</surname> <given-names>L. M.</given-names></name> <name><surname>Hahn</surname> <given-names>M. W.</given-names></name></person-group> (<year>2013</year>). <article-title>Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.</article-title> <source><italic>Mol. Biol. Evol.</italic></source> <volume>30</volume> <fpage>1987</fpage>&#x2013;<lpage>1997</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/mst100</pub-id> <pub-id pub-id-type="pmid">23709260</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Helmstetter</surname> <given-names>A. J.</given-names></name> <name><surname>Buggs</surname> <given-names>R. J. A.</given-names></name> <name><surname>Lucas</surname> <given-names>S. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Repeated long-distance dispersal and convergent evolution in hazel.</article-title> <source><italic>Sci. Rep.</italic></source> <volume>9</volume>:<issue>16016</issue>. <pub-id pub-id-type="doi">10.1038/s41598-019-52403-2</pub-id> <pub-id pub-id-type="pmid">31690762</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>C.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>R.</given-names></name> <name><surname>Luan</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>SAD and FAD genes regulate the ratio of unsaturated fatty acid components in Carya cathayensis.</article-title> <source><italic>Acta Hortic. Sin.</italic></source> <volume>45</volume> <fpage>250</fpage>&#x2013;<lpage>260</lpage>. <pub-id pub-id-type="doi">10.16420/j.issn.0513-353x.2017-0378</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>T.</given-names></name> <name><surname>Zhang</surname> <given-names>X. F.</given-names></name> <name><surname>Wang</surname> <given-names>X. F.</given-names></name> <name><surname>Zhang</surname> <given-names>D. P.</given-names></name></person-group> (<year>2011</year>). <article-title>Arabidopsis 3-ketoacyl-CoA thiolase-2 (KAT2), an enzyme of fatty acid &#x03B2;-oxidation, is involved in ABA signal transduction.</article-title> <source><italic>Plant Cell Physiol.</italic></source> <volume>52</volume> <fpage>528</fpage>&#x2013;<lpage>538</lpage>. <pub-id pub-id-type="doi">10.1093/pcp/pcr008</pub-id> <pub-id pub-id-type="pmid">21257607</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jonas</surname> <given-names>K.</given-names></name> <name><surname>Gregory</surname> <given-names>G.</given-names></name> <name><surname>Kingan</surname> <given-names>S. B.</given-names></name> <name><surname>Chen-Shan</surname> <given-names>C.</given-names></name> <name><surname>Howard</surname> <given-names>J. T.</given-names></name> <name><surname>Jean-Nicolas</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads.</article-title> <source><italic>Gigascience</italic></source> <volume>6</volume> <fpage>1</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1093/gigascience/gix085</pub-id> <pub-id pub-id-type="pmid">29020750</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanehisa</surname> <given-names>M.</given-names></name> <name><surname>Goto</surname> <given-names>S.</given-names></name> <name><surname>Kawashima</surname> <given-names>S.</given-names></name> <name><surname>Okuno</surname> <given-names>Y.</given-names></name> <name><surname>Hattori</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>The KEGG resource for deciphering the genome.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>32</volume> <fpage>D277</fpage>&#x2013;<lpage>D280</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh063</pub-id> <pub-id pub-id-type="pmid">14681412</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>D.</given-names></name> <name><surname>Langmead</surname> <given-names>B.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2015</year>). <article-title>HISAT: a fast spliced aligner with low memory requirements.</article-title> <source><italic>Nat. Methods</italic></source> <volume>12</volume> <fpage>357</fpage>&#x2013;<lpage>360</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3317</pub-id> <pub-id pub-id-type="pmid">25751142</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koren</surname> <given-names>S.</given-names></name> <name><surname>Walenz</surname> <given-names>B. P.</given-names></name> <name><surname>Berlin</surname> <given-names>K.</given-names></name> <name><surname>Miller</surname> <given-names>J. R.</given-names></name> <name><surname>Bergman</surname> <given-names>N. H.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Canu: scalable and accurate long-read assembly <italic>via</italic> adaptive k-mer weighting and repeat separation.</article-title> <source><italic>Genome Res.</italic></source> <volume>27</volume> <fpage>722</fpage>&#x2013;<lpage>736</lpage>. <pub-id pub-id-type="doi">10.1101/gr.215087.116</pub-id> <pub-id pub-id-type="pmid">28298431</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>F.</given-names></name> <name><surname>Bian</surname> <given-names>C. S.</given-names></name> <name><surname>Xu</surname> <given-names>J. F.</given-names></name> <name><surname>Pang</surname> <given-names>W. F.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Duan</surname> <given-names>S. G.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Cloning and functional characterization of SAD genes in potato.</article-title> <source><italic>PLoS One</italic></source> <volume>10</volume>:<issue>e0122036</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0122036</pub-id> <pub-id pub-id-type="pmid">25825911</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Durbin</surname> <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Fast and accurate short read alignment with Burrows-Wheeler transform.</article-title> <source><italic>Bioinformatics</italic></source> <volume>25</volume> <fpage>1754</fpage>&#x2013;<lpage>1760</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp324</pub-id> <pub-id pub-id-type="pmid">19451168</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Handsaker</surname> <given-names>B.</given-names></name> <name><surname>Wysoker</surname> <given-names>A.</given-names></name> <name><surname>Fennell</surname> <given-names>T.</given-names></name> <name><surname>Ruan</surname> <given-names>J.</given-names></name> <name><surname>Homer</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2009</year>). <article-title>The sequence alignment/map format and SAMtools.</article-title> <source><italic>Bioinformatics</italic></source> <volume>25</volume> <fpage>2078</fpage>&#x2013;<lpage>2079</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id> <pub-id pub-id-type="pmid">19505943</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Stoeckert</surname> <given-names>C. J.</given-names></name> <name><surname>Roos</surname> <given-names>D. S.</given-names></name></person-group> (<year>2003</year>). <article-title>OrthoMCL: identification of ortholog groups for eukaryotic genomes.</article-title> <source><italic>Genome Res.</italic></source> <volume>13</volume> <fpage>2178</fpage>&#x2013;<lpage>2189</lpage>. <pub-id pub-id-type="doi">10.1101/gr.1224503</pub-id> <pub-id pub-id-type="pmid">12952885</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Sun</surname> <given-names>P.</given-names></name> <name><surname>Lu</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Du</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>The Corylus mandshurica genome provides insights into the evolution of Betulaceae genomes and hazelnut breeding.</article-title> <source><italic>Hortic. Res.</italic></source> <volume>8</volume>:<issue>54</issue>. <pub-id pub-id-type="doi">10.1038/s41438-021-00495-1</pub-id> <pub-id pub-id-type="pmid">33642584</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Luo</surname> <given-names>Q.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Cheng</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Identification of vital candidate microRNA/mRNA pairs regulating ovule development using high-throughput sequencing in hazel.</article-title> <source><italic>BMC Dev. Biol.</italic></source> <volume>20</volume>:<issue>13</issue>. <pub-id pub-id-type="doi">10.1186/s12861-020-00219-z</pub-id> <pub-id pub-id-type="pmid">32605594</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lowe</surname> <given-names>T. M.</given-names></name> <name><surname>Chan</surname> <given-names>P. P.</given-names></name></person-group> (<year>2016</year>). <article-title>tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>44</volume> <fpage>W54</fpage>&#x2013;<lpage>W57</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw413</pub-id> <pub-id pub-id-type="pmid">27174935</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lucas</surname> <given-names>S. J.</given-names></name> <name><surname>Kahraman</surname> <given-names>K.</given-names></name> <name><surname>Av&#x015F;ar</surname> <given-names>B.</given-names></name> <name><surname>Buggs</surname> <given-names>R. J. A.</given-names></name> <name><surname>Bilge</surname> <given-names>I.</given-names></name></person-group> (<year>2021</year>). <article-title>A chromosome-scale genome assembly of European hazel (<italic>Corylus avellana</italic> L.) reveals targets for crop improvement.</article-title> <source><italic>Plant J.</italic></source> <volume>105</volume> <fpage>1413</fpage>&#x2013;<lpage>1430</lpage>. <pub-id pub-id-type="doi">10.1111/tpj.15099</pub-id> <pub-id pub-id-type="pmid">33249676</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madhaven</surname> <given-names>N.</given-names></name></person-group> (<year>2000</year>). <article-title>Final report on the safety assessment of Corylus avellana (Hazel) seed oil, Corylus americana (Hazel) seed oil, Corylus avellana (Hazel) seed extract, Corylus americana (Hazel) seed extract, Corylus avellana (Hazel) leaf extract, Corylus americana (Hazel) leaf extract, and Corylus rostrata (Hazel) leaf extract.</article-title> <source><italic>Int. J. Toxicol.</italic></source> <volume>20</volume> <fpage>15</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1080/109158101750300928</pub-id> <pub-id pub-id-type="pmid">11358108</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Majoros</surname> <given-names>W. H.</given-names></name> <name><surname>Pertea</surname> <given-names>M.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2004</year>). <article-title>TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.</article-title> <source><italic>Bioinformatics</italic></source> <volume>20</volume> <fpage>2878</fpage>&#x2013;<lpage>2879</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bth315</pub-id> <pub-id pub-id-type="pmid">15145805</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mario</surname> <given-names>S.</given-names></name> <name><surname>Oliver</surname> <given-names>K.</given-names></name> <name><surname>Irfan</surname> <given-names>G.</given-names></name> <name><surname>Alec</surname> <given-names>H.</given-names></name> <name><surname>Stephan</surname> <given-names>W.</given-names></name> <name><surname>Burkhard</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>AUGUSTUS: ab initio prediction of alternative transcripts.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>34</volume> <fpage>W435</fpage>&#x2013;<lpage>W439</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkl200</pub-id> <pub-id pub-id-type="pmid">16845043</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nawrocki</surname> <given-names>E. P.</given-names></name> <name><surname>Kolbe</surname> <given-names>D. L.</given-names></name> <name><surname>Eddy</surname> <given-names>S. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Infernal 1.0: inference of RNA alignments.</article-title> <source><italic>Bioinformatics</italic></source> <volume>25</volume> <fpage>1335</fpage>&#x2013;<lpage>1337</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp157</pub-id> <pub-id pub-id-type="pmid">19307242</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okonechnikov</surname> <given-names>K.</given-names></name> <name><surname>Conesa</surname> <given-names>A.</given-names></name> <name><surname>Garc&#x00ED;a-Alcalde</surname> <given-names>F.</given-names></name></person-group> (<year>2016</year>). <article-title>Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>32</volume> <fpage>292</fpage>&#x2013;<lpage>294</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv566</pub-id> <pub-id pub-id-type="pmid">26428292</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ou</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Jiang</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Assessing genome assembly quality using the LTR Assembly Index (LAI).</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>46</volume>:<issue>e126</issue>. <pub-id pub-id-type="doi">10.1093/nar/gky730</pub-id> <pub-id pub-id-type="pmid">30107434</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pekkinen</surname> <given-names>M.</given-names></name> <name><surname>Varvio</surname> <given-names>S.</given-names></name> <name><surname>Kulju</surname> <given-names>K. M.</given-names></name> <name><surname>K&#x00E4;rkk&#x00E4;inen</surname> <given-names>H.</given-names></name> <name><surname>Smolander</surname> <given-names>S.</given-names></name> <name><surname>Viher&#x00E4;-Aarnio</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2005</year>). <article-title>Linkage map of birch, <italic>Betula pendula</italic> Roth, based on microsatellites and amplified fragment length polymorphisms.</article-title> <source><italic>Genome</italic></source> <volume>48</volume> <fpage>619</fpage>&#x2013;<lpage>625</lpage>. <pub-id pub-id-type="doi">10.1139/g05-031</pub-id> <pub-id pub-id-type="pmid">16094429</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pertea</surname> <given-names>M.</given-names></name> <name><surname>Pertea</surname> <given-names>G. M.</given-names></name> <name><surname>Antonescu</surname> <given-names>C. M.</given-names></name> <name><surname>Chang</surname> <given-names>T.-C.</given-names></name> <name><surname>Mendell</surname> <given-names>J. T.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2015</year>). <article-title>StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>33</volume> <fpage>290</fpage>&#x2013;<lpage>295</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3122</pub-id> <pub-id pub-id-type="pmid">25690850</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roach</surname> <given-names>M. J.</given-names></name> <name><surname>Schmidt</surname> <given-names>S. A.</given-names></name> <name><surname>Borneman</surname> <given-names>A. R.</given-names></name></person-group> (<year>2018</year>). <article-title>Purge Haplotigs: synteny reduction for third-gen diploid genome assemblies.</article-title> <source><italic>BMC Bioinform.</italic></source> <volume>19</volume>:<issue>460</issue>. <pub-id pub-id-type="doi">10.1186/s12859-018-2485-7</pub-id> <pub-id pub-id-type="pmid">30497373</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowley</surname> <given-names>E. R.</given-names></name> <name><surname>Fox</surname> <given-names>S. E.</given-names></name> <name><surname>Bryant</surname> <given-names>D. W.</given-names></name> <name><surname>Sullivan</surname> <given-names>C. M.</given-names></name> <name><surname>Priest</surname> <given-names>H. D.</given-names></name> <name><surname>Givan</surname> <given-names>S. A.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Assembly and characterization of the European Hazelnut &#x2018;Jefferson&#x2019; transcriptome.</article-title> <source><italic>Crop Sci.</italic></source> <volume>52</volume> <fpage>2679</fpage>&#x2013;<lpage>2686</lpage>. <pub-id pub-id-type="doi">10.2135/cropsci2012.02.0065</pub-id> <pub-id pub-id-type="pmid">34798789</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowley</surname> <given-names>E.</given-names></name> <name><surname>VanBuren</surname> <given-names>R.</given-names></name> <name><surname>Bryant</surname> <given-names>D. W.</given-names></name> <name><surname>Priest</surname> <given-names>H. D.</given-names></name> <name><surname>Mehlenbacher</surname> <given-names>S. A.</given-names></name> <name><surname>Mockler</surname> <given-names>T. C.</given-names></name></person-group> (<year>2018</year>). <article-title>A draft genome and high-density genetic map of European hazelnut (<italic>Corylus avellana</italic> L.).</article-title> <source><italic>bioRxiv</italic></source>. <fpage>469015</fpage>. <pub-id pub-id-type="doi">10.1101/469015</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rozen</surname> <given-names>S.</given-names></name> <name><surname>Skaletsky</surname> <given-names>H.</given-names></name></person-group> (<year>2000</year>). <article-title>Primer3 on the WWW for general users and for biologist programmers.</article-title> <source><italic>Methods Mol. Biol.</italic></source> <volume>132</volume> <fpage>365</fpage>&#x2013;<lpage>386</lpage>. <pub-id pub-id-type="doi">10.1385/1-59259-192-2:365</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanderson</surname> <given-names>M. J.</given-names></name></person-group> (<year>2003</year>). <article-title>r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock.</article-title> <source><italic>Bioinformatics</italic></source> <volume>19</volume> <fpage>301</fpage>&#x2013;<lpage>302</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/19.2.301</pub-id> <pub-id pub-id-type="pmid">12538260</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sathuvalli</surname> <given-names>V. R.</given-names></name> <name><surname>Mehlenbacher</surname> <given-names>S. A.</given-names></name> <name><surname>Belzile</surname> <given-names>F.</given-names></name></person-group> (<year>2011</year>). <article-title>A bacterial artificial chromosome library for &#x2018;Jefferson&#x2019; hazelnut and identification of clones associated with eastern filbert blight resistance and pollen-stigma incompatibility.</article-title> <source><italic>Genome</italic></source> <volume>54</volume> <fpage>862</fpage>&#x2013;<lpage>867</lpage>. <pub-id pub-id-type="doi">10.1139/g11-048</pub-id> <pub-id pub-id-type="pmid">21936690</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Servant</surname> <given-names>N.</given-names></name> <name><surname>Varoquaux</surname> <given-names>N.</given-names></name> <name><surname>Lajoie</surname> <given-names>B. R.</given-names></name> <name><surname>Viara</surname> <given-names>E.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Vert</surname> <given-names>J.-P.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>HiC-Pro: an optimized and flexible pipeline for Hi-C data processing.</article-title> <source><italic>Genome Biol.</italic></source> <volume>16</volume>:<issue>259</issue>. <pub-id pub-id-type="doi">10.1186/s13059-015-0831-x</pub-id> <pub-id pub-id-type="pmid">26619908</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>J.</given-names></name> <name><surname>Cao</surname> <given-names>Y.</given-names></name> <name><surname>Fan</surname> <given-names>X.</given-names></name> <name><surname>Min</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Feng</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>A rice microsomal delta-12 fatty acid desaturase can enhance resistance to cold stress in yeast and <italic>Oryza sativa</italic>.</article-title> <source><italic>Mol. Breed.</italic></source> <volume>29</volume> <fpage>743</fpage>&#x2013;<lpage>757</lpage>. <pub-id pub-id-type="doi">10.1007/s11032-011-9587-5</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simao</surname> <given-names>F. A.</given-names></name> <name><surname>Waterhouse</surname> <given-names>R. M.</given-names></name> <name><surname>Ioannidis</surname> <given-names>P.</given-names></name> <name><surname>Kriventseva</surname> <given-names>E. V.</given-names></name> <name><surname>Zdobnov</surname> <given-names>E. M.</given-names></name></person-group> (<year>2015</year>). <article-title>BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>3210</fpage>&#x2013;<lpage>3212</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv351</pub-id> <pub-id pub-id-type="pmid">26059717</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>X.</given-names></name> <name><surname>Xing</surname> <given-names>S.</given-names></name> <name><surname>Dong</surname> <given-names>L.</given-names></name></person-group> (<year>2008</year>). <article-title>Fat content and fatty acid composition of Corylus heterophylla F. and corresponding comprehensive evaluation.</article-title> <source><italic>J. Chinese Cereals Oils Assoc.</italic></source> <volume>23</volume> <fpage>189</fpage>&#x2013;<lpage>193</lpage>.</citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stamatakis</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.</article-title> <source><italic>Bioinformatics</italic></source> <volume>30</volume> <fpage>1312</fpage>&#x2013;<lpage>1313</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu033</pub-id> <pub-id pub-id-type="pmid">24451623</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tarailo-Graovac</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>N.</given-names></name></person-group> (<year>2009</year>). <article-title>Using RepeatMasker to identify repetitive elements in genomic sequences.</article-title> <source><italic>Curr. Protoc. Bioinform.</italic></source> <volume>25</volume> <fpage>4.10.11</fpage>&#x2013;<lpage>14.10.14</lpage>. <pub-id pub-id-type="doi">10.1002/0471250953.bi0410s25</pub-id> <pub-id pub-id-type="pmid">19274634</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vasanthan</surname> <given-names>J.</given-names></name> <name><surname>Yasubumi</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>20</volume> <fpage>866</fpage>&#x2013;<lpage>876</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbx147</pub-id> <pub-id pub-id-type="pmid">29112696</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaser</surname> <given-names>R.</given-names></name> <name><surname>Sovic</surname> <given-names>I.</given-names></name> <name><surname>Nagarajan</surname> <given-names>N.</given-names></name> <name><surname>Sikic</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Fast and accurate de novo genome assembly from long uncorrected reads.</article-title> <source><italic>Genome Res.</italic></source> <volume>27</volume> <fpage>737</fpage>&#x2013;<lpage>746</lpage>. <pub-id pub-id-type="doi">10.1101/gr.214270.116</pub-id> <pub-id pub-id-type="pmid">28100585</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walker</surname> <given-names>B. J.</given-names></name> <name><surname>Abeel</surname> <given-names>T.</given-names></name> <name><surname>Shea</surname> <given-names>T.</given-names></name> <name><surname>Priest</surname> <given-names>M.</given-names></name> <name><surname>Abouelliel</surname> <given-names>A.</given-names></name> <name><surname>Sakthikumar</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.</article-title> <source><italic>PLoS One</italic></source> <volume>9</volume>:<issue>e112963</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0112963</pub-id> <pub-id pub-id-type="pmid">25409509</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>C.</given-names></name> <name><surname>Mao</surname> <given-names>X.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name> <name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Dong</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>39</volume> <fpage>W316</fpage>&#x2013;<lpage>W322</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkr483</pub-id> <pub-id pub-id-type="pmid">21715386</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>J.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name> <name><surname>Jiang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>A chromosome-scale reference genome of <italic>Aquilegia oxysepala</italic> var. <italic>kansuensis</italic>.</article-title> <source><italic>Hortic. Res.</italic></source> <volume>7</volume>:<issue>113</issue>. <pub-id pub-id-type="doi">10.1038/s41438-020-0328-y</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Z.</given-names></name></person-group> (<year>2007</year>). <article-title>Paml 4: phylogenetic analysis by maximum likelihood.</article-title> <source><italic>Mol. Biol. Evol.</italic></source> <volume>24</volume> <fpage>1586</fpage>&#x2013;<lpage>1591</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/msm088</pub-id> <pub-id pub-id-type="pmid">17483113</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zdobnov</surname> <given-names>E. M.</given-names></name> <name><surname>Rolf</surname> <given-names>A.</given-names></name></person-group> (<year>2001</year>). <article-title>InterProScan&#x2013;an integration platform for the signature-recognition methods in InterPro.</article-title> <source><italic>Bioinformatics</italic></source> <volume>17</volume> <fpage>847</fpage>&#x2013;<lpage>848</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/17.9.847</pub-id> <pub-id pub-id-type="pmid">11590104</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Ma</surname> <given-names>W.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>A chromosome-level reference genome of the hazelnut, <italic>Corylus heterophylla</italic> Fisch.</article-title> <source><italic>Gigascience</italic></source> <volume>10</volume>:<issue>giab027</issue>. <pub-id pub-id-type="doi">10.1093/gigascience/giab027</pub-id> <pub-id pub-id-type="pmid">33871007</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="footnote1"><label>1</label><p><ext-link ext-link-type="uri" xlink:href="https://hazelnut.data.mocklerlab.org/">https://hazelnut.data.mocklerlab.org/</ext-link></p></fn>
<fn id="footnote2"><label>2</label><p><ext-link ext-link-type="uri" xlink:href="http://www.timetree.org/">http://www.timetree.org/</ext-link></p></fn>
<fn id="footnote3"><label>3</label><p><ext-link ext-link-type="uri" xlink:href="http://122.9.151.76/">http://122.9.151.76/</ext-link></p></fn>
<fn id="footnote4"><label>4</label><p><ext-link ext-link-type="uri" xlink:href="https://hazelnut.data.mocklerlab.org/">https://hazelnut.data.mocklerlab.org/</ext-link></p></fn>
</fn-group>
</back>
</article>