<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2014.00102</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>CARF and WYL domains: ligand-binding regulators of prokaryotic defense systems</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Makarova</surname> <given-names>Kira S.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/35271"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Anantharaman</surname> <given-names>Vivek</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Grishin</surname> <given-names>Nick V.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Koonin</surname> <given-names>Eugene V.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/29594"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Aravind</surname> <given-names>L.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/54234"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health</institution> <country>Bethesda, MD, USA</country></aff>
<aff id="aff2"><sup>2</sup><institution>Departments of Biophysics and Biochemistry, Howard Hughes Medical Institute, University of Texas Southwestern Medical Center</institution> <country>Dallas, TX, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Thiago Motta Venancio, Universidade Estadual do Norte Fluminense, Brazil</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Rafael Dias Mesquita, Universidade Federal do Rio de Janeiro, Brazil; Vinicius Maracaja-Coutinho, Universidad Mayor, Chile; Malcolm F. White, University of St Andrews, UK</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Eugene V. Koonin and L. Aravind, National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Building 38A, 8600 Rockville Pike Bethesda, MD 20894, USA e-mail: <email>koonin&#x00040;ncbi.nlm.nih.gov</email>; <email>aravind&#x00040;mail.nih.gov</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>30</day>
<month>04</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="collection">
<year>2014</year>
</pub-date>
<volume>5</volume>
<elocation-id>102</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>02</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>04</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2014 Makarova, Anantharaman, Grishin, Koonin and Aravind.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized <italic>cas</italic> genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and &#x0201C;effector&#x0201D; domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the <italic>rtc</italic> RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 &#x003B2;-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death.</p></abstract>
<kwd-group>
<kwd>CRISPR</kwd>
<kwd>Rossmann fold</kwd>
<kwd>beta barrel</kwd>
<kwd>DNA-binding proteins</kwd>
<kwd>phage defense</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="62"/>
<page-count count="9"/>
<word-count count="7711"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>Introduction</title>
<p>In prokaryotes CRISPR-Cas systems (Clustered Regularly Interspaced Short Palindromic Repeats- CRISPR-associated genes) code for RNA-dependent self&#x02013;non-self recognition mechanisms, which are partially analogous eukaryotic RNA interference (RNAi) systems, and serve as an adaptive immunity system against invasive nucleic acids. The CRISPR-Cas system incorporates fragments of virus or plasmid DNA into the CRISPR repeat cassettes and employs the processed transcripts of these spacers as guide RNAs to cleave the cognate foreign DNA or RNA. Recently, the type-II CRISPR systems have been used as biotechnological reagents of targeted mutagenesis, genome editing or gene-inactivation in eukaryotes (Jinek et al., <xref ref-type="bibr" rid="B28">2013</xref>; Mali et al., <xref ref-type="bibr" rid="B42">2013</xref>; Niu et al., <xref ref-type="bibr" rid="B45">2014</xref>). Many CRISPR-Cas systems are associated with genes that appear not to be directly implicated in spacer acquisition, CRISPR transcript processing or the restriction of the invasive nucleic acids known as interference (Makarova et al., <xref ref-type="bibr" rid="B37">2011a</xref>,<xref ref-type="bibr" rid="B38">b</xref>; Wiedenheft et al., <xref ref-type="bibr" rid="B58">2012</xref>; Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>). The most common among such genes (the csm6/csx1-like genes) encode experimentally uncharacterized or poorly characterized proteins that belong to COG1517 (Makarova et al., <xref ref-type="bibr" rid="B39">2006</xref>, <xref ref-type="bibr" rid="B38">2011b</xref>). Structures of four proteins from this family have been experimentally determined and it has been shown that they all share a distinct Rossmann-fold-like domain that we here denote CARF (CRISPR-Cas Associated Rossmann Fold). In addition, most of the CARF domain proteins contain a winged HTH (wHTH) DNA-binding domain immediately C-terminal of CARF (Lintner et al., <xref ref-type="bibr" rid="B35">2010</xref>; Kim et al., <xref ref-type="bibr" rid="B31">2013</xref>). It has been hypothesized that these proteins are CRISPR-Cas system-specific, allosterically controlled transcriptional regulators, with the Rossmann-like domain binding an unknown nucleotide (Lintner et al., <xref ref-type="bibr" rid="B35">2010</xref>). Recently, involvement of the Csx1 protein in the interference associated with type III-B CRISPR-Cas systems in <italic>Sulfolobus islandicus</italic> has been demonstrated (Deng et al., <xref ref-type="bibr" rid="B14">2013</xref>). Furthermore, deletion of the <italic>csm6</italic> gene results in disruption of CRISPR-based immunity in <italic>Staphylococcus epidermidis</italic> (Hatoum-Aslan et al., <xref ref-type="bibr" rid="B22">2013</xref>).</p>
<p>Despite the progress in the structure analysis and the availability of first experimental clues, the specific biochemical roles of the CARF proteins in the CRISPR-Cas systems and beyond remain largely obscure. Many CARF-domain proteins possess additional C-terminal domains that include both DNases, in particular those of the Restriction Endonuclease (REase) fold (Makarova et al., <xref ref-type="bibr" rid="B39">2006</xref>), and RNases, such as members of the RelE (Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>) and HEPN families (Anantharaman et al., <xref ref-type="bibr" rid="B4">2013</xref>). This observation led to a hypothesis that these proteins can be involved in immunity mechanisms complement the activity of the core CRISPR-Cas systems by targeting self or invasive nucleic acids (Makarova et al., <xref ref-type="bibr" rid="B36">2012</xref>, <xref ref-type="bibr" rid="B40">2013</xref>; Anantharaman et al., <xref ref-type="bibr" rid="B4">2013</xref>). Action against self nucleic acids could augment the immunity of a population of prokaryotic cells in two ways: first, by inducing dormancy and thus &#x0201C;buying time&#x0201D; for the immune system to spring into action, or second, by inducing programmed cell death of the host when CRISPR-Cas fails to stop virus propagation (Makarova et al., <xref ref-type="bibr" rid="B36">2012</xref>, <xref ref-type="bibr" rid="B40">2013</xref>; Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>). Here we present an in-depth comparative genomic and phylogenetic analysis of the CARF (COG1517) superfamily in an attempt to shed more light on the function and evolution of these proteins.</p>
</sec>
<sec sec-type="results" id="s2">
<title>Results</title>
<sec>
<title>Sequence analysis and identification of new members of the CARF superfamily</title>
<p>We used several approaches to identify CARF superfamily proteins. First, a CDD search was employed to identify all proteins in 2262 complete genomes (as of February 2013) that could be assigned to previously identified CARF families [namely COG1517, PF09455, PF09670, PF09659, PF09651, PF09623, PF09002, Csa3 (Lintner et al., <xref ref-type="bibr" rid="B35">2010</xref>; Makarova et al., <xref ref-type="bibr" rid="B38">2011b</xref>)]. Representatives of each family were used as queries for PSI-BLAST using the search strategy described in the Materials and Methods section (Altschul et al., <xref ref-type="bibr" rid="B1">1997</xref>). Putative new members were validated using HHpred search (Soding et al., <xref ref-type="bibr" rid="B52">2005</xref>). The same methods were used to identify other domains fused to CARF domains (Supplementary File <xref ref-type="supplementary-material" rid="SM1">1</xref>). For further analysis incomplete protein sequences were discarded. The final data set included 1441 proteins (Supplementary File <xref ref-type="supplementary-material" rid="SM1">1</xref>). This set was further clustered to generate a non-redundant subset (635 proteins) using BLASTCLUST (Wheeler and Bhagwat, <xref ref-type="bibr" rid="B56">2007</xref>) with a length coverage cutoff of 0.8 and a score coverage threshold (bit score divided by alignment length) of 0.8. For this representative subset of 635 CARF domain-containing proteins, analysis of domain architecture and gene neighborhoods was performed as described under Materials and Methods. Because the extensive sequence divergence of the CARF domains results in saturation of substitutions and prevents building a high quality alignment for phylogenetic analysis, the relationships between families were determined approximately, on the basis of their similarity in HHpred searches (Figure <xref ref-type="fig" rid="F1">1</xref> and Supplementary File <xref ref-type="supplementary-material" rid="SM2">2</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Comparative genomic analysis of CARF domain-containing proteins. (A)</bold> scheme of the relationships between major CARF families, their domain architectures and association with CRISPR-Cas system types. The dendrogram shows the relationship between CARF domain containing families. The clustering is based on sequence and structure similarity analysis as described under Materials and Methods; unresolved relationships are shown as a multifurcation. The pfam ID or other recognized family description is provided for each of the seven major groups. A typical member of a family (either locus tag of a representative protein or a pdb identifier) is shown for each terminal node; subfamilies that have not been described previously are underlined. The typical domain architecture is shown for each family. The domain name is shown above the corresponding shape the first time it appears. Brackets indicate that in several proteins in the respective family the domain is missing. In the first column on the right hand side, the number of proteins in the respective family is indicated, and the number of proteins encoded in the vicinity of <italic>cas</italic> genes is shown in parentheses. In the second, third and fourth columns, the number of genes of each family that are specifically associated with CRISPR-Cas systems of types III-A, III-B, and I are shown (the numbers representing a substantial fraction of the family are highlighted in red). <bold>(B)</bold> Domain organization of several minor CARF domain-containing families. Designations are as in Figure <xref ref-type="fig" rid="F1">1A</xref>. <bold>(C)</bold> Protein families associated with genes encoding CARF domains. The histogram shows how many times each family was identified in the vicinity of CARF domain-containing genes; the scale is shown above the histogram. Only the most frequently co-occurring families outside the set of recognized <italic>cas</italic> genes are shown. The numbers on the right hand side reflect the results of a reverse analysis when neighborhoods of the genes from each family were analyzed for the presence of <italic>cas</italic> genes. The total number of genes and the number of genes in the vicinity of known <italic>cas</italic> genes (in parentheses) are indicated. <bold>(D)</bold> Association of CARF domains with (predicted) toxin domains in the three types of CRISPR-Cas systems. The histogram shows the co-occurrence of CARF proteins with toxin domains separately for the three CRISPR-Cas system types; the type III systems are additionally partitioned into those that co-occur with type I or type II in the same genome and those that represent the sole instance of CRISPR-Cas in the respective genomes.</p></caption>
<graphic xlink:href="fgene-05-00102-g0001.tif"/>
</fig>
<p>Figure <xref ref-type="fig" rid="F1">1</xref> shows the relationships between the CARF families, their domain organization and association (if any) with different types of CRISPR-Cas systems. The results of this analysis suggest that the CARF superfamily could be classified into at least 12 distinct major families with 10 or more representatives each and several minor families (Figures <xref ref-type="fig" rid="F1">1A,B</xref>, Supplementary File <xref ref-type="supplementary-material" rid="SM1">1</xref>). In addition to the aforementioned CARF domain families, HHpred search using pfam09659 as the query identified significant sequence similarity between the CARF domain and an uncharacterized N-terminal domain of RtcR (Supplementary File <xref ref-type="supplementary-material" rid="SM2">2</xref>), which is the regulator of the <italic>Rtc</italic> RNA repair system that consists of the 3&#x02032;-terminal phosphate cyclase RtcA, and RNA ligase RtcB (Genschik et al., <xref ref-type="bibr" rid="B19">1998</xref>; Chakravarty et al., <xref ref-type="bibr" rid="B12">2012</xref>). Although this domain occurs in distinct protein architectural and genomic contexts (see below), it shares distinct sequence motifs with the CARF domains to the exclusion of other Rossmann fold domains. Hence we consider the predicted nucleotide-binding domain of RtcR a divergent member of the CARF superfamily.</p>
</sec>
<sec>
<title>Structural features of CARF domain proteins</title>
<p>The availability of five crystal structures of CARF domain proteins along with the above sequence analysis provides for a more detailed understanding of the conserved structural features of the superfamily and their functional implications. The core of the CARF domain is a six-stranded Rossmann-like fold with the core strand-5 and strand-6 forming a &#x003B2;-hairpin (Figure <xref ref-type="fig" rid="F2">2</xref>). The main regions of sequence conservation are associated with strand-1 and strand-4 of the core domain: the end of strand-1 is often characterized by a polar residue, typically with an alcoholic side chain (S/T), whereas immediately downstream of strand-4 is a highly conserved basic residue (K/R) often associated with [DN]X[ST]XXX[RK] signature. The position of these characteristic motifs is typical of the location of substrate-binding sites across a diverse range of Rossmann-like domains (Anantharaman and Aravind, <xref ref-type="bibr" rid="B2">2006</xref>; Burroughs et al., <xref ref-type="bibr" rid="B8">2006</xref>, <xref ref-type="bibr" rid="B11">2009</xref>) with the implication that the ligand-binding capability is conserved throughout the CARF superfamily. Consistent with this prediction, probing the active site with a probe of 2 or more solvent radii shows the presence of a conserved pocket that is formed largely by the residues from the aforementioned motifs associated with strand-1 and strand-4 (Figure <xref ref-type="fig" rid="F2">2</xref>, Supplementary File <xref ref-type="supplementary-material" rid="SM3">3</xref>). The conservation of K/R after strand-4 and its location in the pocket is consistent with the proposal of a nucleotide or nucleotide-derived molecule being the primary ligand of the CARF domains (Lintner et al., <xref ref-type="bibr" rid="B35">2010</xref>). However, the RV2818 and RtcR families mostly lack the positively charged residue downstream of strand-4 suggesting that they might bind distinct ligands.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Structure of the VC1899 CARF domain</bold>. This version of the CARF domain contains no elaborations or inserts observed in certain other CARF domains. The predicted active site pocket was identified using probe of 2 solvent or greater radii (gray mesh) and the predicted ligand-interacting residues the pocket are also shown.</p></caption>
<graphic xlink:href="fgene-05-00102-g0002.tif"/>
</fig>
<p>Examination of the structures also shows that the core fold of the CARF domain is prone to considerable divergence due to several distinct inserts (Figure <xref ref-type="fig" rid="F3">3</xref>). For example, in the group that consists of the SSO1393, sll7062, ST0035, and MA0186 families, there is an &#x003B1;-helical bundle inserted immediately after strand-1. Likewise, in the PF1127 family, a &#x003B2;-hairpin is inserted after strand-1 and multiple additional inserts are present after strand-2, strand-3 and in the &#x003B2;-hairpin formed by strand-5 and strand-6 (Figure <xref ref-type="fig" rid="F3">3</xref>). Based on the sequence alignments, we also detected smaller but comparable inserts after strand-1 in most members of the Aq_376 group and several members of the DET1451 group. These inserts typically are packed around the active site and form a &#x0201C;cap&#x0201D; that appears to shelter and augment the conserved ligand-binding site. The repeated emergence of inserts in similar locations in different families suggests that they might be determinants of ligand diversity across the CARF superfamily.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Comparison of the structures of multiple CARF proteins</bold>. The CARF domains of all proteins were aligned and then separated for clarity. The different spatial orientations of the C-terminal domains are shown with respect to the CARF domain. The linker between the CARF domain and the C-terminal domains is colored green, the wHTH or the equivalent domain is rendered in white, and the C-terminal effector domain is colored purple. Inserts within the CARF domain are colored gray and are shown in &#x0201C;wire&#x0201D; representation. A domain of uncertain origin in PF1127 is colored gray and is shown as ribbon.</p></caption>
<graphic xlink:href="fgene-05-00102-g0003.tif"/>
</fig>
<p>Another striking feature revealed by the comparison of the available structures is the diversity of spatial positions of the C-terminal wHTH and effector domains (Figure <xref ref-type="fig" rid="F3">3</xref>) vis-&#x000E0;-vis the CARF domain. This diversity of spatial positions is in sharp contrast to the strong positional polarity that is typical of prokaryotic one-component transcription factors with respect to their upstream ligand-binding domains (Aravind et al., <xref ref-type="bibr" rid="B6">2010</xref>). Instead, it appears likely that the spatial organization of the C-terminal domains reflects optimization for transmitting the signal generated by the bound ligand to different C-terminal effector domains. This observation is compatible with the proposal that in most members of the CARF superfamily ligand binding is not directly linked to transcription but rather affects other DNA-associated activities (See discussion below).</p>
</sec>
<sec>
<title>Domain architectures of CARF superfamily proteins</title>
<p>The majority of the families contain a wHTH domain downstream of the CARF domain (Figures <xref ref-type="fig" rid="F1">1A,B</xref>, Supplementary File <xref ref-type="supplementary-material" rid="SM2">2</xref>). In the PF09659 and PF09670 related families, we were unable to identify a HTH domain; instead, proteins in both these families contain a distinct, conserved alpha-helical region (6H domain) (Supplementary File <xref ref-type="supplementary-material" rid="SM3">3</xref>). In the largest family (PF09455), the wHTH domain cannot be identified by sequence similarity searches (Kim et al., <xref ref-type="bibr" rid="B31">2013</xref>) but an &#x003B1;-helical domain of uncertain provenance, potentially derived from a wHTH is present at the C-terminus, and harbors a partly disordered insertion that contains a highly modified remnant of the HEPN domain. In addition to the previously described fusions to DNases and RNases, several new domain architectures were identified in this analysis, namely (1) fusion of two CARF domains, (2) a membrane-associated CARF, (3) fusion with a HD phosphoesterase domain, (4) fusion to a TIM barrel adenosine deaminase Ada domain the enzyme that catalyzes deamination of adenosine to inosine in the purine salvage pathway (Nygaard, <xref ref-type="bibr" rid="B46">1977</xref>; Holm and Sander, <xref ref-type="bibr" rid="B25">1997</xref>). Notably, fusion of the CARF domain with nuclease domains of the same family might have occurred independently on several occasions. In particular, we detected at least four distinct CARF families associated with the HEPN domain and two families associated with the PIN domain (Figures <xref ref-type="fig" rid="F1">1A,B</xref>). Overall, most of the C-terminal catalytic domains of the CARF superfamily proteins are predicted to be nucleases or other enzymes targeting nucleic acids (Makarova et al., <xref ref-type="bibr" rid="B40">2013</xref>).</p>
<p>A small family consists of large multidomain proteins in which a Zn ribbon, a serine/threonine/tyrosine protein kinase and a distinctive AAA&#x0002B; ATPase domain with an arginine finger within the P-loop motif are fused upstream of the CARF and wHTH domains (Figure <xref ref-type="fig" rid="F1">1B</xref>). In this case, the CARF domain might function as part of a signal transduction pathway mediated by the kinase. The RtcR proteins in addition to the divergent CARF domain contain a NtrC-like AAA&#x0002B; ATPase and HTH domains. Furthermore, BLAST search initiated with the CARF-like domain of RtcR detects high similarity with a family of proteins that, similar to RtcR, contain NtrC-like AAA&#x0002B; ATPase and HTH domains but are not linked to <italic>Rtc</italic> system. Instead these proteins are often associated with restriction-modification (R-M) systems (Supplementary File <xref ref-type="supplementary-material" rid="SM4">4</xref>). One of the close homologs of these proteins, PspF, which contains AAA ATPase and HTH domains only, has been shown to be involved in sigma-54 dependent activation of membrane-associated phage shock protein (PSP) system in response to phage infection and other stress factors (Model et al., <xref ref-type="bibr" rid="B44">1997</xref>; Joly et al., <xref ref-type="bibr" rid="B29">2009</xref>, <xref ref-type="bibr" rid="B30">2010</xref>). Thus, these systems are likely to function as sigma-54 dependent activators of their respective downstream genes, with the NtrC-like AAA&#x0002B; domain binding the sigma factor. In these proteins, CARF domains might sense ligands generated during or after phage infection, such as RNA with 2&#x02032;&#x02013;3&#x02032; cyclic phosphate ends or a phage-specific nucleotide to regulate either RNA repair or DNA restriction. Thus, the central functional theme for the majority of CARF superfamily domains, whether associated with CRISPR-Cas systems or not, seems to be antivirus defense and stress response.</p>
</sec>
<sec>
<title>The WYL domain and Cas protein families are enriched in gene neighborhoods of the CARF superfamily</title>
<p>To further characterize potential functional partners of the CARF proteins, we analyzed their genomic context by examining both known and new proteins families in the respective genomic neighborhoods. All gene products from these neighborhoods were collected, clustered using BLASTCLUST and analyzed using PSI-BLAST to further expand the respective families. The most common families associated with the CARF-domain proteins are shown in Figure <xref ref-type="fig" rid="F1">1C</xref>.</p>
<p>The WYL (named for three conserved amino acids found in a subset of domains of this superfamily) domain proteins are most abundant. Recently, it has been shown that a WYL domain protein (sll7009) is a negative regulator of the I-D CRISPR-Cas system in <italic>Synechocystis</italic> sp. (Hein et al., <xref ref-type="bibr" rid="B23">2013</xref>). Further analysis of the WYL domain showed that the domain boundaries, as currently defined in the Pfam database (PF13280), are inaccurate because they encompass both a copy of the domain WYL domain (Supplementary File <xref ref-type="supplementary-material" rid="SM5">5</xref>) and an additional C-terminal extension which is found primarily in the subset of WYL proteins with wHTH domains. HHpred searches revealed similarity of the refined WYL domain with SH3 &#x003B2;-barrel fold related to Sm domains (Supplementary File <xref ref-type="supplementary-material" rid="SM2">2</xref>). Additionally, these searches showed that the uncharacterized Pfam DUF2693 family and the YolD family encoded in SOS DNA repair-associated operons (Permina et al., <xref ref-type="bibr" rid="B49">2002</xref>; Aravind et al., <xref ref-type="bibr" rid="B5">2013</xref>) are also members of the WYL domain superfamily (Supplementary File <xref ref-type="supplementary-material" rid="SM2">2</xref>). Although the WYL domain was originally named for the 3 eponymous amino acids, examination of the refined and expanded alignment generated in the course of this work showed that these residues are not strongly conserved throughout the family. Rather, the conservation pattern includes four basic residues and a position often occupied by a cysteine (Supplementary File <xref ref-type="supplementary-material" rid="SM5">5</xref>), which are predicted to line a ligand-binding groove typical of the Sm-like SH3 &#x003B2;-barrels (Gutierrez et al., <xref ref-type="bibr" rid="B21">2007</xref>). Given that WYL domains often occurs in two copies in the same polypeptide or are encoded alongside other genes encoding multi-WYL proteins, it is conceivable that they form torroidal multimeric assemblies similar to other Sm-like SH3 &#x003B2;-barrels with a central ligand-binding channel (Schumacher et al., <xref ref-type="bibr" rid="B51">2002</xref>).</p>
<p>In terms of domain architectures, WYL domains are most frequently associated with different predicted DNA-binding N-terminal wHTH domains. However, similar to the CARF domains, WYL domains also show fusions to several enzymatic domains (Supplementary File <xref ref-type="supplementary-material" rid="SM6">6</xref>). In some of the type I CRISPR-Cas systems, a WYL domain is fused to the Cas3 protein which consist of a HD phosphoesterase domain and Superfamily-II helicase module. Additionally, WYL domains combine with 3&#x02032;&#x02192;5&#x02032; exoRNase, Mrr-like REase, HNH endonuclease, Superfamily-I helicase, AbiGII-like nucleotidyltransferase (DUF1814), BRCT, and TerB domains (Anantharaman et al., <xref ref-type="bibr" rid="B3">2012</xref>). These fusions, the relationship between the WYL domain and the Sm-like domains, and the sequence conservation pattern of the WYL domain together seem to suggest that this is another ligand-sensing domain that could bind negatively charged ligands, such as nucleotides or nucleic acid fragments, to regulate CRISPR-Cas and other defense systems such as the abortive infection AbiG system (O&#x00027;Connor et al., <xref ref-type="bibr" rid="B47">1996</xref>; Makarova et al., <xref ref-type="bibr" rid="B40">2013</xref>).</p>
<p>Several <italic>cas</italic> genes are enriched in the gene neighborhoods of the CARF superfamily (Figure <xref ref-type="fig" rid="F1">1C</xref>, Supplementary File <xref ref-type="supplementary-material" rid="SM7">7</xref>). One of these, csx19, is always associated with CRISPR-Cas systems, and is predicted to represent a diverged version of the RAMP domain (RRM-like fold) that is found in many Cas proteins (Makarova et al., <xref ref-type="bibr" rid="B37">2011a</xref>). Thus, colocalization of the csx19 genes with the genes encoding CARF domain proteins might simply reflect their shared association with the CRISPR-Cas systems rather than a direct functional link. In addition, <italic>cas</italic> genes of another, less common family, Csx15, are fused to the genes coding for CARF domain proteins on several occasions (Figure <xref ref-type="fig" rid="F1">1B</xref>). The Csx15 proteins show no significant similarity to any known domains, and their functions remain obscure. However, the presence of several highly conserved residues, namely two histidines, glutamate, and arginine are reminiscent of active site residues of metal-independent RNases (Zhang et al., <xref ref-type="bibr" rid="B62">2012</xref>) and could be potentially involved in catalysis (Supplementary File <xref ref-type="supplementary-material" rid="SM7">7</xref>). This together with the CARF domain fusions (Figure <xref ref-type="fig" rid="F1">1B</xref>), suggest that Csx15 might be a novel nuclease.</p>
</sec>
<sec>
<title>Strong link between CARF-containing proteins and CRISPR-Cas systems</title>
<p>The association of CARF domain-containing proteins with CRISPR-Cas systems, especially those of type III, has been noted previously (Makarova et al., <xref ref-type="bibr" rid="B37">2011a</xref>,<xref ref-type="bibr" rid="B38">b</xref>; Anantharaman et al., <xref ref-type="bibr" rid="B4">2013</xref>; Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>). Here we sought to identify specific associations with CRISPR-Cas systems for each major family of CARF-domain proteins separately. The assessment was based on the proximity of the respective genes to CRISPR-Cas loci. Most of the 12 major CARF families are indeed typically found in vicinity of other <italic>cas</italic> genes (Figure <xref ref-type="fig" rid="F1">1A</xref>, Supplementary File <xref ref-type="supplementary-material" rid="SM8">8</xref>), with the exception of DET1451, MA0186, and the divergent RtcR-like family. Those families of CARF-domain proteins that are associated with CRISPR-Cas systems most often are contained within type III CRISPR-Cas systems, and some show specific preference for type III-A or III-B. All these CARF domain protein families possess a third domain, a nuclease, which is predicted to function as a toxin that targets non-self or self-nucleic acids (Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>). The only CARF family (Csa3) that displays clear affinity to type I systems, and subtype I-A in particular, lacks a C-terminal catalytic effector domain. However, these associations notwithstanding, there are genes in each CARF family that are not linked to CRISPR-Cas and thus might not be functionally involved in the CRISPR-Cas-mediated defense. Some of the CARF genes that are not linked to CRISPR-Cas (e.g., Daci_4198 from <italic>Delftia acidovorans</italic>) of the VC1899 family (PF9002) are embedded within a novel Type-VII secretion system gene cluster predicted to function as a DNA-transfer agent and additionally encompassing multiple Ter genes that have been implicated in phage restriction (Anantharaman et al., <xref ref-type="bibr" rid="B3">2012</xref>).</p>
</sec>
<sec>
<title>CARF domain proteins containing a C-terminal effector domain belong to type III CRISPR-Cas systems</title>
<p>CARF domain-containing proteins are present in 145 genomes (among the representative set of 659 complete archaeal and bacterial genomes) of which only 9 genomes possess neither <italic>cas1</italic> nor <italic>cas10</italic> (the signature protein families of CRISPR-Cas systems), suggesting a strong link of these proteins to CRISPR-Cas (Supplementary File <xref ref-type="supplementary-material" rid="SM9">9</xref>). Type III CRISPR-Cas systems often co-occur with type I system, so it was of interest to clarify whether a specific link existed between CARF domain and type III systems and whether or not this linkage depended on the presence of a C-terminal catalytic effector domain in the CARF-domain proteins. To address this question, we compared the co-occurrence of at least one CARF-domain protein containing a (predicted) effector domain with type I, type II, and type III CRISPR-Cas systems (Supplementary File <xref ref-type="supplementary-material" rid="SM9">9</xref>). The data presented in Figure <xref ref-type="fig" rid="F1">1D</xref> clearly demonstrate a strong, specific link between CARF proteins containing a C-terminal catalytic effector domains and type III systems. This association suggests that CARF-domain proteins with this type of architecture play important roles in the majority of type III systems.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s3">
<title>Discussion</title>
<p>Multiple lines of evidence from structural analysis and contextual information from domain architectures and gene neighborhoods suggest that the CARF domains are dedicated ligand-sensors that function primarily in the context of defense against invasive nucleic acids in prokaryotes. Moreover, in the majority of cases (Figures <xref ref-type="fig" rid="F1">1A,B</xref>) CARF-domains are fused to C-terminal catalytic effector domains, most often nucleases. Thus, it can be predicted that the primary function of CARF-domain proteins is coupling of the sensory stimulus from a ligand to an output in the form of the catalytic activity of the C-terminal effector domains.</p>
<p>The domain architectures of the CARF proteins show certain parallels to those containing the WYL domain: both domains combine with predicted DNA-binding wHTH domains and/or catalytic effector domains. This similarity of domain architectures implies analogous general functions for the CARF and WYL domains which involve sensing soluble ligands in the context of host-virus conflicts. However, unlike the CARF domain, which commonly combines with C-terminal enzymatic effector domains when encoded within CRISPR-Cas loci, the WYL domains appears to be primarily coupled with wHTH domains in the same contexts. Thus, in CRISPR-Cas systems, the WYL domains are predicted to primarily couple ligand-sensing to transcriptional regulation and less often to direct regulation of effectors that target alien nucleic acids. Some families of CARF proteins, such as Csa3 and NE0113, that lack C-terminal effector domains, and the divergent RtcR-like family domains that are linked to the NtrC-like AAA&#x0002B; domains are predicted, respectively, to regulate transcription directly or via sigma-54. Taken together, the observations presented here raise two key questions: what are the ligands recognized by the CARF domains and what are the targets of their associated effector domains?</p>
<p>With respect to the nature of the CARF domain ligands, recent comparative genomic analysis (Iyer et al., <xref ref-type="bibr" rid="B26">2013</xref>), together with biochemical data (Miller and Warren, <xref ref-type="bibr" rid="B43">1984</xref>; Wiatr and Witmer, <xref ref-type="bibr" rid="B57">1984</xref>; Witmer and Wiatr, <xref ref-type="bibr" rid="B60">1985</xref>; Gommers-Ampt and Borst, <xref ref-type="bibr" rid="B20">1995</xref>), indicate that prokaryotic viruses produce a wide variety of modified nucleotides both <italic>in situ</italic> and as free NTPs as part of their restriction-avoidance and epigenetic regulatory strategies. Many prokaryotic viruses also encode NAD-utilizing enzymes that modify host proteins, in particular RNA polymerase, with ADP-ribosyl moieties (Wilkens et al., <xref ref-type="bibr" rid="B59">1997</xref>; de Souza and Aravind, <xref ref-type="bibr" rid="B15">2012</xref>). Moreover, cyclic 2&#x02032;&#x02013;3&#x02032; phosphates and their derivatives produced as a result of cleavage of viral mRNA or host tRNA by host RNases during viral infection could also serve as potential ligands (Tanaka et al., <xref ref-type="bibr" rid="B55">2011</xref>). Furthermore, comparative genomic analysis of the counter-phage Ter system has revealed the presence of a cluster of genes that are predicted to encode enzymes involved in the synthesis of a nucleotide-derived metabolite (Anantharaman et al., <xref ref-type="bibr" rid="B3">2012</xref>). Complementary to this plethora of (predicted) ligands, bacteria have evolved several dedicated domains to recognize modified nucleotides in DNA as a part of their bacteriophage restriction strategies (Iyer et al., <xref ref-type="bibr" rid="B26">2013</xref>). Given the prediction that most CARF domains bind negatively charged ligands, such as nucleotides and their derivatives, we hypothesize that at least some of the aforementioned virus-induced metabolites are ligands of the CARF domains. Multiple ligand recognition steps might be critical for the tight regulation of defense systems, such as CRISPR-Cas, whose unchecked activity could have deleterious consequences for the cell (Stern et al., <xref ref-type="bibr" rid="B54">2010</xref>; Makarova et al., <xref ref-type="bibr" rid="B36">2012</xref>, <xref ref-type="bibr" rid="B40">2013</xref>; Dy et al., <xref ref-type="bibr" rid="B16">2013</xref>; Jiang et al., <xref ref-type="bibr" rid="B27">2013</xref>; Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>; Sorek et al., <xref ref-type="bibr" rid="B53">2013</xref>). Transcription factors containing WYL and CARF domains could act as regulators that tightly control the expression of defense systems unless a specific ligand is present either to relieve the transcriptional block or activate transcription. This is consistent with the recent results showing that a WYL domain protein (sll7009) is a negative regulator of the I-D CRISPR-Cas system in <italic>Synechocystis</italic> sp. (Hein et al., <xref ref-type="bibr" rid="B23">2013</xref>).</p>
<p>We failed to detect CARF or WYL domains in eukaryotes despite extensive sequence searches. The apparent absence of these domains correlates with the conspicuous absence of R-M or CRISPR-Cas systems in eukaryotes. Conceivably, the disruption of operonic organization of co-regulated genes that was apparently associated with eukaryogenesis exacerbated the deleterious effects of these defense systems, leading to their elimination along with the dedicated regulators (Burroughs et al., <xref ref-type="bibr" rid="B9">2013a</xref>; Koonin, <xref ref-type="bibr" rid="B32">2014</xref>). Furthermore, the loss of CARF and WYL-domain proteins, which are predicted sensors of nucleotide derivatives, in eukaryotes is consistent with the limited use of modified nucleotides by eukaryotic viruses (Iyer et al., <xref ref-type="bibr" rid="B26">2013</xref>).</p>
<p>As for the targets of the C-terminal effector domains of CARF proteins, several hints are offered by the parallels with classical Toxin-antitoxin systems and polymorphic toxin systems in which domains of the same families have been identified. In these systems, the RNase domains, such as HEPN, RelE, and PIN, primarily attack host tRNAs or mRNAs and induce dormancy or programmed cell death by inhibiting protein synthesis (Yamaguchi and Inouye, <xref ref-type="bibr" rid="B61">2011</xref>; Zhang et al., <xref ref-type="bibr" rid="B62">2012</xref>; Anantharaman et al., <xref ref-type="bibr" rid="B4">2013</xref>; Makarova et al., <xref ref-type="bibr" rid="B40">2013</xref>). Coupling between such a toxin-like function and interference provided by Cascade-like complexes is most likely ancestral among the type III CRISPR-Cas systems, in parallel with the association of Cas1 protein, a universal component of CRISPR-Cas systems, with toxin-like nucleases Cas2 or Cas4 (Makarova et al., <xref ref-type="bibr" rid="B36">2012</xref>, <xref ref-type="bibr" rid="B40">2013</xref>; Koonin and Makarova, <xref ref-type="bibr" rid="B33">2013</xref>). The fusion of a wHTH domain with many CARF domains suggests that the respective proteins specifically bind DNA. Indeed, REase domains which are present in several CARF proteins typically targeting alien DNA whereas self DNA is targeted only under exceptional circumstances. The REases achieve this selectivity by either targeting DNA with specific modified nucleotides, such as hydroxymethylcytosine (e.g., Mrr, McrA, and McrB systems) (Bickle and Kruger, <xref ref-type="bibr" rid="B7">1993</xref>; Burroughs et al., <xref ref-type="bibr" rid="B10">2013b</xref>), or by targeting unmodified DNA in contrast to the host DNA that is methylated by cognate methylases (Roberts et al., <xref ref-type="bibr" rid="B50">2007</xref>), and probably also by using RNA or DNA guides supplied by Argonaute (PIWI) family proteins (Makarova et al., <xref ref-type="bibr" rid="B41">2009</xref>; Burroughs et al., <xref ref-type="bibr" rid="B9">2013a</xref>,<xref ref-type="bibr" rid="B10">b</xref>; Olovnikov et al., <xref ref-type="bibr" rid="B48">2013</xref>).</p>
<p>Thus, we propose that CARF proteins containing C-terminal REase domains function in parallel with the Cascade-like complexes resulting in a double-pronged assault on the invading DNA. In contrast, several bacterial HEPN proteins, such as LsoA and RNase LS, are RNAses that target ribosome-associated mRNAs of infecting bacteriophages, and similar predictions have been made for many other HEPN proteins (Anantharaman et al., <xref ref-type="bibr" rid="B4">2013</xref>). Thus, some of the CARF proteins that contain the HEPN domain and other (predicted) RNAses might act directly on viral RNA to augment the attack on viral DNA or RNA by the type III CRISPR-Cas systems.</p>
<p>The present analysis of the CARF superfamily is expected to provide a new handle on unresolved questions on the regulation and function of CRISPR-Cas systems. Furthermore, these findings could offer leads for biotechnological applications involving ligand-induced action on nucleic acid targets.</p>
</sec>
<sec sec-type="materials and methods" id="s4">
<title>Materials and methods</title>
<p>The Refseq database (February 2013 release) was used to search for CARF domain-containing proteins and analyzed their genomic context in 2262 completely sequenced prokaryotic genomes. The set of 659 representative genomes was selected for quantitative analysis of co-occurrence of CARF-domain containing proteins and CRISPR-Cas systems as follows: for each genus, a species with the largest genome was selected except for the genera <italic>Bacillus</italic> and <italic>Escherichia</italic> for which <italic>Bacillus subtilis</italic> 168 and <italic>Escherichia coli</italic> K12 <italic>substr.</italic> MG1655, the model organisms, were selected for respective genus.</p>
<p>Iterative profile searches with the PSI-BLAST (Altschul et al., <xref ref-type="bibr" rid="B1">1997</xref>) program with cut-off <italic>e</italic>-value of 0.01, composition based-statistics and low complexity filtering turned off were used to retrieve homologous sequences from the Refseq database. In each iteration, all detected sequences were examined for conserved motifs to detect either potential homologs below the cut-off to be included in the profile or potential false positives to be excluded. For borderline cases, additional profile-profile searches were carried out using the HHpred program with default parameters to evaluate the veracity of those matches (Soding et al., <xref ref-type="bibr" rid="B52">2005</xref>). The HHpred program was also used to detect remote homologous families with query sequences selected for each CARF family. Similarity based clustering was performed using the BLASTCLUST program (<ext-link ext-link-type="uri" xlink:href="http://ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html">ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html</ext-link>) to cluster sequences at different thresholds. Multiple sequence alignments were built using the MUSCLE (Edgar, <xref ref-type="bibr" rid="B17">2004</xref>) program, followed by manual adjustments on the basis of PSI-BLAST and HHpred alignments, secondary structure prediction and structural alignments (if applicable). Protein secondary structure was predicted using the JPred program (Cuff et al., <xref ref-type="bibr" rid="B13">1998</xref>). Transmembrane segments were predicted using the TMHMM version 2 program (Krogh et al., <xref ref-type="bibr" rid="B34">2001</xref>). For each of these programs, unless specifically mentioned, default parameters were used. For each CARF or WYL gene, the gene neighborhood was comprehensively analyzed using an inhouse Perl script. The scrip either the PTT file (downloadable from the NCBI ftp site) or the Genbank file in the case of whole genome shot gun sequences to extract the neighbors of a given query gene. Usually we used a cutoff of 5&#x02013;10 genes on either side of the query for initial screening. The protein sequences of all neighbors were clustered using the BLASTCLUST program (<ext-link ext-link-type="uri" xlink:href="http://ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html">ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html</ext-link>) to identify related sequences in gene neighborhoods. Each cluster of homologous proteins were then assigned an annotation based on the domain architecture or conserved shared domain. The Pfam database was used as a guide to make preliminary domain identifications followed by detailed analysis (Finn et al., <xref ref-type="bibr" rid="B18">2014</xref>). This allowed an initial annotation of gene neighborhoods and their grouping based on conservation of neighborhood associations. This was followed by detailed manual analysis of exemplars of each class of neighborhoods. Known <italic>cas</italic> genes were assigned using respective Pfam profiles (Finn et al., <xref ref-type="bibr" rid="B18">2014</xref>) and manual annotation. A complete list of Genbank gene identifiers for CARF proteins investigated in this study is provided in the Supplementary File <xref ref-type="supplementary-material" rid="SM1">1</xref>. Structure similarity searches were conducted using the DALIlite program (Holm and Rosenstrom, <xref ref-type="bibr" rid="B24">2010</xref>). The detection of pockets in the structure was performed using the PyMOL Molecular Graphics System, Version 1.5.0.4 Schr&#x000F6;dinger, LLC (<ext-link ext-link-type="uri" xlink:href="http://www.pymol.org/">http://www.pymol.org/</ext-link>) with the Surface&#x02192;Cavities and Pockets only option. The predicted ligand-binding residues were inferred from the alignment provided in Supplementary File <xref ref-type="supplementary-material" rid="SM3">3</xref>.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>Research by Kira S. Makarova, Vivek Anantharaman, Eugene V. Koonin and L. Aravind is supported by the funds of the Intramural Research Program of the National Library of Medicine, NIH (US Department of Health and Human Services). Nick V. Grishin was supported in part by NIH grant GM094575 and the Welch Foundation grant I-1505 to Nick V. Grishin.</p>
</ack>
<sec sec-type="supplementary material" id="s5">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://www.frontiersin.org/journal/10.3389/fgene.2014.00102/abstract">http://www.frontiersin.org/journal/10.3389/fgene.2014.00102/abstract</ext-link></p>
<supplementary-material xlink:href="DataSheet1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet2.XLSX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet3.ZIP" id="SM3" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet4.XLSX" id="SM4" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet5.PDF" id="SM5" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet6.ZIP" id="SM6" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet7.ZIP" id="SM7" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet8.XLSX" id="SM8" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet9.XLSX" id="SM9" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet10.ZIP" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname> <given-names>S. F.</given-names></name> <name><surname>Madden</surname> <given-names>T. L.</given-names></name> <name><surname>Schaffer</surname> <given-names>A. A.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Miller</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>1997</year>). <article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>. <source>Nucleic Acids Res</source>. <volume>25</volume>, <fpage>3389</fpage>&#x02013;<lpage>3402</lpage>. <pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id><pub-id pub-id-type="pmid">9254694</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anantharaman</surname> <given-names>V.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). <article-title>The NYN domains: novel predicted RNAses with a PIN domain-like fold</article-title>. <source>RNA Biol</source>. <volume>3</volume>, <fpage>18</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.4161/rna.3.1.2548</pub-id><pub-id pub-id-type="pmid">17114934</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anantharaman</surname> <given-names>V.</given-names></name> <name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>Ter-dependent stress response systems: novel pathways related to metal sensing, production of a nucleoside-like metabolite, and DNA-processing</article-title>. <source>Mol. Biosyst</source>. <volume>8</volume>, <fpage>3142</fpage>&#x02013;<lpage>3165</lpage>. <pub-id pub-id-type="doi">10.1039/c2mb25239b</pub-id><pub-id pub-id-type="pmid">23044854</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anantharaman</surname> <given-names>V.</given-names></name> <name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Burroughs</surname> <given-names>A. M.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2013</year>). <article-title>Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing</article-title>. <source>Biol. Direct</source> <volume>8</volume>, <fpage>15</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-8-15</pub-id><pub-id pub-id-type="pmid">23768067</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aravind</surname> <given-names>L.</given-names></name> <name><surname>Anand</surname> <given-names>S.</given-names></name> <name><surname>Iyer</surname> <given-names>L. M.</given-names></name></person-group> (<year>2013</year>). <article-title>Novel autoproteolytic and DNA-damage sensing components in the bacterial SOS response and oxidized methylcytosine-induced eukaryotic DNA demethylation systems</article-title>. <source>Biol. Direct</source> <volume>8</volume>, <fpage>20</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-8-20</pub-id><pub-id pub-id-type="pmid">23945014</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Aravind</surname> <given-names>L.</given-names></name> <name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Anantharaman</surname> <given-names>V.</given-names></name></person-group> (<year>2010</year>). <article-title>Natural history of sensor domains in bacterial signaling systems</article-title> in <source>Sensory Mechanisms in Bacteria: Molecular Aspects of Signal Recognition</source>, eds <person-group person-group-type="editor"><name><surname>Spiro</surname> <given-names>S.</given-names></name> <name><surname>Dixon</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Norfolk</publisher-loc>: <publisher-name>Caister Academic Press</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>38</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bickle</surname> <given-names>T. A.</given-names></name> <name><surname>Kruger</surname> <given-names>D. H.</given-names></name></person-group> (<year>1993</year>). <article-title>Biology of DNA restriction</article-title>. <source>Microbiol. Rev</source>. <volume>57</volume>, <fpage>434</fpage>&#x02013;<lpage>450</lpage>. <pub-id pub-id-type="pmid">8336674</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burroughs</surname> <given-names>A. M.</given-names></name> <name><surname>Allen</surname> <given-names>K. N.</given-names></name> <name><surname>Dunaway-Mariano</surname> <given-names>D.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). <article-title>Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes</article-title>. <source>J. Mol. Biol</source>. <volume>361</volume>, <fpage>1003</fpage>&#x02013;<lpage>1034</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2006.06.049</pub-id><pub-id pub-id-type="pmid">16889794</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burroughs</surname> <given-names>A. M.</given-names></name> <name><surname>Ando</surname> <given-names>Y.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2013a</year>). <article-title>New perspectives on the diversification of the RNA interference system: insights from comparative genomics and small RNA sequencing</article-title>. <source>Wiley Interdiscip. Rev. RNA</source> <volume>5</volume>, <fpage>141</fpage>&#x02013;<lpage>182</lpage>. <pub-id pub-id-type="doi">10.1002/wrna.1210</pub-id><pub-id pub-id-type="pmid">24311560</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burroughs</surname> <given-names>A. M.</given-names></name> <name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>Natural history of the E1-like superfamily: implication for adenylation, sulfur transfer, and ubiquitin conjugation</article-title>. <source>Proteins</source> <volume>75</volume>, <fpage>895</fpage>&#x02013;<lpage>910</lpage>. <pub-id pub-id-type="doi">10.1002/prot.22298</pub-id><pub-id pub-id-type="pmid">19089947</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burroughs</surname> <given-names>A. M.</given-names></name> <name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2013b</year>). <article-title>Two novel PIWI families: roles in inter-genomic conflicts in bacteria and Mediator-dependent modulation of transcription in eukaryotes</article-title>. <source>Biol. Direct</source> <volume>8</volume>:<fpage>13</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-8-13</pub-id><pub-id pub-id-type="pmid">23758928</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chakravarty</surname> <given-names>A. K.</given-names></name> <name><surname>Subbotin</surname> <given-names>R.</given-names></name> <name><surname>Chait</surname> <given-names>B. T.</given-names></name> <name><surname>Shuman</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>RNA ligase RtcB splices 3&#x02032;-phosphate and 5&#x02032;-OH ends via covalent RtcB-(histidinyl)-GMP and polynucleotide-(3&#x02032;)pp(5&#x02032;)G intermediates</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>109</volume>, <fpage>6072</fpage>&#x02013;<lpage>6077</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1201207109</pub-id><pub-id pub-id-type="pmid">22474365</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cuff</surname> <given-names>J. A.</given-names></name> <name><surname>Clamp</surname> <given-names>M. E.</given-names></name> <name><surname>Siddiqui</surname> <given-names>A. S.</given-names></name> <name><surname>Finlay</surname> <given-names>M.</given-names></name> <name><surname>Barton</surname> <given-names>G. J.</given-names></name></person-group> (<year>1998</year>). <article-title>JPred: a consensus secondary structure prediction server</article-title>. <source>Bioinformatics</source> <volume>14</volume>, <fpage>892</fpage>&#x02013;<lpage>893</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/14.10.892</pub-id><pub-id pub-id-type="pmid">9927721</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Garrett</surname> <given-names>R. A.</given-names></name> <name><surname>Shah</surname> <given-names>S. A.</given-names></name> <name><surname>Peng</surname> <given-names>X.</given-names></name> <name><surname>She</surname> <given-names>Q.</given-names></name></person-group> (<year>2013</year>). <article-title>A novel interference mechanism by a type IIIB CRISPR-Cmr module in Sulfolobus</article-title>. <source>Mol. Microbiol</source>. <volume>87</volume>, <fpage>1088</fpage>&#x02013;<lpage>1099</lpage>. <pub-id pub-id-type="doi">10.1111/mmi.12152</pub-id><pub-id pub-id-type="pmid">23320564</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Souza</surname> <given-names>R. F.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>Identification of novel components of NAD-utilizing metabolic pathways and prediction of their biochemical functions</article-title>. <source>Mol. Biosyst</source>. <volume>8</volume>, <fpage>1661</fpage>&#x02013;<lpage>1677</lpage>. <pub-id pub-id-type="doi">10.1039/c2mb05487f</pub-id><pub-id pub-id-type="pmid">22399070</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dy</surname> <given-names>R. L.</given-names></name> <name><surname>Pitman</surname> <given-names>A. R.</given-names></name> <name><surname>Fineran</surname> <given-names>P. C.</given-names></name></person-group> (<year>2013</year>). <article-title>Chromosomal targeting by CRISPR-Cas systems can contribute to genome plasticity in bacteria</article-title>. <source>Mob. Genet. Elements</source> <volume>3</volume>, <fpage>e26831</fpage>. <pub-id pub-id-type="doi">10.4161/mge.26831</pub-id><pub-id pub-id-type="pmid">24251073</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edgar</surname> <given-names>R. C.</given-names></name></person-group> (<year>2004</year>). <article-title>MUSCLE: multiple sequence alignment with high accuracy and high throughput</article-title>. <source>Nucleic Acids Res</source>. <volume>32</volume>, <fpage>1792</fpage>&#x02013;<lpage>1797</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh340</pub-id><pub-id pub-id-type="pmid">15034147</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finn</surname> <given-names>R. D.</given-names></name> <name><surname>Bateman</surname> <given-names>A.</given-names></name> <name><surname>Clements</surname> <given-names>J.</given-names></name> <name><surname>Coggill</surname> <given-names>P.</given-names></name> <name><surname>Eberhardt</surname> <given-names>R. Y.</given-names></name> <name><surname>Eddy</surname> <given-names>S. R.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Pfam: the protein families database</article-title>. <source>Nucleic Acids Res</source>. <volume>42</volume>, <fpage>D222</fpage>&#x02013;<lpage>D230</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt1223</pub-id><pub-id pub-id-type="pmid">24288371</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Genschik</surname> <given-names>P.</given-names></name> <name><surname>Drabikowski</surname> <given-names>K.</given-names></name> <name><surname>Filipowicz</surname> <given-names>W.</given-names></name></person-group> (<year>1998</year>). <article-title>Characterization of the Escherichia coli RNA 3&#x02032;-terminal phosphate cyclase and its sigma54-regulated operon</article-title>. <source>J. Biol. Chem</source>. <volume>273</volume>, <fpage>25516</fpage>&#x02013;<lpage>25526</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.273.39.25516</pub-id><pub-id pub-id-type="pmid">9738023</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gommers-Ampt</surname> <given-names>J. H.</given-names></name> <name><surname>Borst</surname> <given-names>P.</given-names></name></person-group> (<year>1995</year>). <article-title>Hypermodified bases in DNA</article-title>. <source>FASEB J</source>. <volume>9</volume>, <fpage>1034</fpage>&#x02013;<lpage>1042</lpage>. <pub-id pub-id-type="pmid">7649402</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gutierrez</surname> <given-names>P.</given-names></name> <name><surname>Kozlov</surname> <given-names>G.</given-names></name> <name><surname>Gabrielli</surname> <given-names>L.</given-names></name> <name><surname>Elias</surname> <given-names>D.</given-names></name> <name><surname>Osborne</surname> <given-names>M. J.</given-names></name> <name><surname>Gallouzi</surname> <given-names>I. E.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Solution structure of YaeO, a Rho-specific inhibitor of transcription termination</article-title>. <source>J. Biol. Chem</source>. <volume>282</volume>, <fpage>23348</fpage>&#x02013;<lpage>23353</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M702010200</pub-id><pub-id pub-id-type="pmid">17565995</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hatoum-Aslan</surname> <given-names>A.</given-names></name> <name><surname>Maniv</surname> <given-names>I.</given-names></name> <name><surname>Samai</surname> <given-names>P.</given-names></name> <name><surname>Marraffini</surname> <given-names>L. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Genetic characterization of anti-plasmid immunity by a Type III-A CRISPR-Cas system</article-title>. <source>J. Bacteriol</source>. <volume>196</volume>, <fpage>310</fpage>&#x02013;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1128/JB.01130-13</pub-id><pub-id pub-id-type="pmid">24187086</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hein</surname> <given-names>S.</given-names></name> <name><surname>Scholz</surname> <given-names>I.</given-names></name> <name><surname>Voss</surname> <given-names>B.</given-names></name> <name><surname>Hess</surname> <given-names>W. R.</given-names></name></person-group> (<year>2013</year>). <article-title>Adaptation and modification of three CRISPR loci in two closely related cyanobacteria</article-title>. <source>RNA Biol</source>. <volume>10</volume>, <fpage>852</fpage>&#x02013;<lpage>864</lpage>. <pub-id pub-id-type="doi">10.4161/rna.24160</pub-id><pub-id pub-id-type="pmid">23535141</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holm</surname> <given-names>L.</given-names></name> <name><surname>Rosenstrom</surname> <given-names>P.</given-names></name></person-group> (<year>2010</year>). <article-title>Dali server: conservation mapping in 3D</article-title>. <source>Nucleic Acids Res</source>. <volume>38</volume>, <fpage>W545</fpage>&#x02013;<lpage>W549</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkq366</pub-id><pub-id pub-id-type="pmid">20457744</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holm</surname> <given-names>L.</given-names></name> <name><surname>Sander</surname> <given-names>C.</given-names></name></person-group> (<year>1997</year>). <article-title>An evolutionary treasure: unification of a broad set of amidohydrolases related to urease</article-title>. <source>Proteins</source> <volume>28</volume>, <fpage>72</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1002/(SICI)1097-0134(199705)28:1&#x0003C;72::AID-PROT7&#x0003E;3.0.CO;2-L</pub-id><pub-id pub-id-type="pmid">9144792</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Burroughs</surname> <given-names>A. M.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2013</year>). <article-title>Computational identification of novel biochemical systems involved in oxidation, glycosylation and other complex modifications of bases in DNA</article-title>. <source>Nucleic Acids Res</source>. <volume>41</volume>, <fpage>7635</fpage>&#x02013;<lpage>7655</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt573</pub-id><pub-id pub-id-type="pmid">23814188</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>W.</given-names></name> <name><surname>Maniv</surname> <given-names>I.</given-names></name> <name><surname>Arain</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Levin</surname> <given-names>B. R.</given-names></name> <name><surname>Marraffini</surname> <given-names>L. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Dealing with the Evolutionary Downside of CRISPR Immunity: bacteria and beneficial Plasmids</article-title>. <source>PLoS Genet</source>. <volume>9</volume>:<fpage>e1003844</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pgen.1003844</pub-id><pub-id pub-id-type="pmid">24086164</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jinek</surname> <given-names>M.</given-names></name> <name><surname>East</surname> <given-names>A.</given-names></name> <name><surname>Cheng</surname> <given-names>A.</given-names></name> <name><surname>Lin</surname> <given-names>S.</given-names></name> <name><surname>Ma</surname> <given-names>E.</given-names></name> <name><surname>Doudna</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>RNA-programmed genome editing in human cells</article-title>. <source>Elife</source> <volume>2</volume>:<fpage>e00471</fpage>. <pub-id pub-id-type="doi">10.7554/eLife.00471</pub-id><pub-id pub-id-type="pmid">23386978</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joly</surname> <given-names>N.</given-names></name> <name><surname>Burrows</surname> <given-names>P. C.</given-names></name> <name><surname>Engl</surname> <given-names>C.</given-names></name> <name><surname>Jovanovic</surname> <given-names>G.</given-names></name> <name><surname>Buck</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>A lower-order oligomer form of phage shock protein A (PspA) stably associates with the hexameric AAA(&#x0002B;) transcription activator protein PspF for negative regulation</article-title>. <source>J. Mol. Biol</source>. <volume>394</volume>, <fpage>764</fpage>&#x02013;<lpage>775</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2009.09.055</pub-id><pub-id pub-id-type="pmid">19804784</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joly</surname> <given-names>N.</given-names></name> <name><surname>Engl</surname> <given-names>C.</given-names></name> <name><surname>Jovanovic</surname> <given-names>G.</given-names></name> <name><surname>Huvet</surname> <given-names>M.</given-names></name> <name><surname>Toni</surname> <given-names>T.</given-names></name> <name><surname>Sheng</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Managing membrane stress: the phage shock protein (Psp) response, from molecular mechanisms to physiology</article-title>. <source>FEMS Microbiol. Rev</source>. <volume>34</volume>, <fpage>797</fpage>&#x02013;<lpage>827</lpage>. <pub-id pub-id-type="doi">10.1111/j.1574-6976.2010.00240.x</pub-id><pub-id pub-id-type="pmid">20636484</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y. K.</given-names></name> <name><surname>Kim</surname> <given-names>Y. G.</given-names></name> <name><surname>Oh</surname> <given-names>B. H.</given-names></name></person-group> (<year>2013</year>). <article-title>Crystal structure and nucleic acid-binding activity of the CRISPR-associated protein Csx1 of Pyrococcus furiosus</article-title>. <source>Proteins</source> <volume>81</volume>, <fpage>261</fpage>&#x02013;<lpage>270</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24183</pub-id><pub-id pub-id-type="pmid">22987782</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2014</year>). <article-title>The double-edged sword of Lamarck: comment on diversity, evolution, and therapeutic applications of small RNAs in prokaryotic and eukaryotic immune systems by Edwin L. Cooper and Nicola Overstreet</article-title>. <source>Phys. Life Rev</source>. <volume>11</volume>, <fpage>141</fpage>&#x02013;<lpage>143</lpage>. <pub-id pub-id-type="doi">10.1016/j.plrev.2013.12.002</pub-id><pub-id pub-id-type="pmid">24365235</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koonin</surname> <given-names>E. V.</given-names></name> <name><surname>Makarova</surname> <given-names>K. S.</given-names></name></person-group> (<year>2013</year>). <article-title>CRISPR-Cas: evolution of an RNA-based adaptive immunity system in prokaryotes</article-title>. <source>RNA Biol</source>. <volume>10</volume>, <fpage>679</fpage>&#x02013;<lpage>686</lpage>. <pub-id pub-id-type="doi">10.4161/rna.24022</pub-id><pub-id pub-id-type="pmid">23439366</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krogh</surname> <given-names>A.</given-names></name> <name><surname>Larsson</surname> <given-names>B.</given-names></name> <name><surname>Von Heijne</surname> <given-names>G.</given-names></name> <name><surname>Sonnhammer</surname> <given-names>E. L.</given-names></name></person-group> (<year>2001</year>). <article-title>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes</article-title>. <source>J. Mol. Biol</source>. <volume>305</volume>, <fpage>567</fpage>&#x02013;<lpage>580</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.2000.4315</pub-id><pub-id pub-id-type="pmid">11152613</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lintner</surname> <given-names>N. G.</given-names></name> <name><surname>Frankel</surname> <given-names>K. A.</given-names></name> <name><surname>Tsutakawa</surname> <given-names>S. E.</given-names></name> <name><surname>Alsbury</surname> <given-names>D. L.</given-names></name> <name><surname>Copie</surname> <given-names>V.</given-names></name> <name><surname>Young</surname> <given-names>M. J.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>The structure of the CRISPR-Associated Protein Csa3 provides insight into the regulation of the CRISPR/Cas system</article-title>. <source>J. Mol. Biol</source>. <volume>405</volume>, <fpage>939</fpage>&#x02013;<lpage>955</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2010.11.019</pub-id><pub-id pub-id-type="pmid">21093452</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Anantharaman</surname> <given-names>V.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2012</year>). <article-title>Live virus-free or die: coupling of antivirus immunity and programmed suicide or dormancy in prokaryotes</article-title>. <source>Biol. Direct</source> <volume>7</volume>, <fpage>40</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-7-40</pub-id><pub-id pub-id-type="pmid">23151069</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name> <name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2011a</year>). <article-title>Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems</article-title>. <source>Biol. Direct</source> <volume>6</volume>, <fpage>38</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-6-38</pub-id><pub-id pub-id-type="pmid">21756346</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Grishin</surname> <given-names>N. V.</given-names></name> <name><surname>Shabalina</surname> <given-names>S. A.</given-names></name> <name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2006</year>). <article-title>A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action</article-title>. <source>Biol. Direct</source> <volume>1</volume>:<fpage>7</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-1-7</pub-id><pub-id pub-id-type="pmid">16545108</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Haft</surname> <given-names>D. H.</given-names></name> <name><surname>Barrangou</surname> <given-names>R.</given-names></name> <name><surname>Brouns</surname> <given-names>S. J.</given-names></name> <name><surname>Charpentier</surname> <given-names>E.</given-names></name> <name><surname>Horvath</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2011b</year>). <article-title>Evolution and classification of the CRISPR-Cas systems</article-title>. <source>Nat. Rev. Microbiol</source>. <volume>9</volume>, <fpage>467</fpage>&#x02013;<lpage>477</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro2577</pub-id><pub-id pub-id-type="pmid">21552286</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2013</year>). <article-title>Comparative genomics of defense systems in archaea and bacteria</article-title>. <source>Nucleic Acids Res</source>. <volume>41</volume>, <fpage>4360</fpage>&#x02013;<lpage>4377</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt157</pub-id><pub-id pub-id-type="pmid">23470997</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makarova</surname> <given-names>K. S.</given-names></name> <name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Van der Oost</surname> <given-names>J.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2009</year>). <article-title>Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements</article-title>. <source>Biol. Direct</source> <volume>4</volume>, <fpage>29</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-4-29</pub-id><pub-id pub-id-type="pmid">19706170</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mali</surname> <given-names>P.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Esvelt</surname> <given-names>K. M.</given-names></name> <name><surname>Aach</surname> <given-names>J.</given-names></name> <name><surname>Guell</surname> <given-names>M.</given-names></name> <name><surname>Dicarlo</surname> <given-names>J. E.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>RNA-guided human genome engineering via Cas9</article-title>. <source>Science</source> <volume>339</volume>, <fpage>823</fpage>&#x02013;<lpage>826</lpage>. <pub-id pub-id-type="doi">10.1126/science.1232033</pub-id><pub-id pub-id-type="pmid">23287722</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>P. B.</given-names></name> <name><surname>Warren</surname> <given-names>R. A.</given-names></name></person-group> (<year>1984</year>). <article-title>DNA synthesis in Pseudomonas acidovorans infected with mutants of bacteriophage phi W-14 defective in the synthesis of alpha-putrescinylthymine</article-title>. <source>J. Virol</source>. <volume>52</volume>, <fpage>1036</fpage>&#x02013;<lpage>1038</lpage>. <pub-id pub-id-type="pmid">6492260</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Model</surname> <given-names>P.</given-names></name> <name><surname>Jovanovic</surname> <given-names>G.</given-names></name> <name><surname>Dworkin</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>The Escherichia coli phage-shock-protein (psp) operon</article-title>. <source>Mol. Microbiol</source>. <volume>24</volume>, <fpage>255</fpage>&#x02013;<lpage>261</lpage>. <pub-id pub-id-type="doi">10.1046/j.1365-2958.1997.3481712.x</pub-id><pub-id pub-id-type="pmid">9159513</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niu</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>B.</given-names></name> <name><surname>Cui</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Generation of gene-modified cynomolgus monkey via Cas9/RNA-Mediated gene targeting in one-cell embryos</article-title>. <source>Cell</source> <volume>156</volume>, <fpage>836</fpage>&#x02013;<lpage>843</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2014.01.027</pub-id><pub-id pub-id-type="pmid">24486104</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nygaard</surname> <given-names>P.</given-names></name></person-group> (<year>1977</year>). <article-title>Functioning of purine salvage pathways in Escherichia coli K-12</article-title>. <source>Adv. Exp. Med. Biol</source>. <volume>76A</volume>, <fpage>186</fpage>&#x02013;<lpage>195</lpage>. <pub-id pub-id-type="pmid">193369</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Connor</surname> <given-names>L.</given-names></name> <name><surname>Coffey</surname> <given-names>A.</given-names></name> <name><surname>Daly</surname> <given-names>C.</given-names></name> <name><surname>Fitzgerald</surname> <given-names>G. F.</given-names></name></person-group> (<year>1996</year>). <article-title>AbiG, a genotypically novel abortive infection mechanism encoded by plasmid pCI750 of Lactococcus lactis subsp. cremoris UC653</article-title>. <source>Appl. Environ. Microbiol</source>. <volume>62</volume>, <fpage>3075</fpage>&#x02013;<lpage>3082</lpage>. <pub-id pub-id-type="pmid">8795193</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olovnikov</surname> <given-names>I.</given-names></name> <name><surname>Chan</surname> <given-names>K.</given-names></name> <name><surname>Sachidanandam</surname> <given-names>R.</given-names></name> <name><surname>Newman</surname> <given-names>D. K.</given-names></name> <name><surname>Aravin</surname> <given-names>A. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Bacterial argonaute samples the transcriptome to identify foreign DNA</article-title>. <source>Mol. Cell</source> <volume>51</volume>, <fpage>594</fpage>&#x02013;<lpage>605</lpage>. <pub-id pub-id-type="doi">10.1016/j.molcel.2013.08.014</pub-id><pub-id pub-id-type="pmid">24034694</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Permina</surname> <given-names>E. A.</given-names></name> <name><surname>Mironov</surname> <given-names>A. A.</given-names></name> <name><surname>Gelfand</surname> <given-names>M. S.</given-names></name></person-group> (<year>2002</year>). <article-title>Damage-repair error-prone polymerases of eubacteria: association with mobile genome elements</article-title>. <source>Gene</source> <volume>293</volume>, <fpage>133</fpage>&#x02013;<lpage>140</lpage>. <pub-id pub-id-type="doi">10.1016/S0378-1119(02)00701-1</pub-id><pub-id pub-id-type="pmid">12137951</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roberts</surname> <given-names>R. J.</given-names></name> <name><surname>Vincze</surname> <given-names>T.</given-names></name> <name><surname>Posfai</surname> <given-names>J.</given-names></name> <name><surname>Macelis</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>REBASE&#x02013;enzymes and genes for DNA restriction and modification</article-title>. <source>Nucleic Acids Res</source>. <volume>35</volume>, <fpage>D269</fpage>&#x02013;<lpage>D270</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkl891</pub-id><pub-id pub-id-type="pmid">17202163</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schumacher</surname> <given-names>M. A.</given-names></name> <name><surname>Pearson</surname> <given-names>R. F.</given-names></name> <name><surname>Moller</surname> <given-names>T.</given-names></name> <name><surname>Valentin-Hansen</surname> <given-names>P.</given-names></name> <name><surname>Brennan</surname> <given-names>R. G.</given-names></name></person-group> (<year>2002</year>). <article-title>Structures of the pleiotropic translational regulator Hfq and an Hfq-RNA complex: a bacterial Sm-like protein</article-title>. <source>EMBO J</source>. <volume>21</volume>, <fpage>3546</fpage>&#x02013;<lpage>3556</lpage>. <pub-id pub-id-type="doi">10.1093/emboj/cdf322</pub-id><pub-id pub-id-type="pmid">12093755</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soding</surname> <given-names>J.</given-names></name> <name><surname>Biegert</surname> <given-names>A.</given-names></name> <name><surname>Lupas</surname> <given-names>A. N.</given-names></name></person-group> (<year>2005</year>). <article-title>The HHpred interactive server for protein homology detection and structure prediction</article-title>. <source>Nucleic Acids Res</source>. <volume>33</volume>, <fpage>W244</fpage>&#x02013;<lpage>W248</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gki408</pub-id><pub-id pub-id-type="pmid">15980461</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sorek</surname> <given-names>R.</given-names></name> <name><surname>Lawrence</surname> <given-names>C. M.</given-names></name> <name><surname>Wiedenheft</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>CRISPR-mediated adaptive immune systems in bacteria and archaea</article-title>. <source>Annu. Rev. Biochem</source>. <volume>82</volume>, <fpage>237</fpage>&#x02013;<lpage>266</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-biochem-072911-172315</pub-id><pub-id pub-id-type="pmid">23495939</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stern</surname> <given-names>A.</given-names></name> <name><surname>Keren</surname> <given-names>L.</given-names></name> <name><surname>Wurtzel</surname> <given-names>O.</given-names></name> <name><surname>Amitai</surname> <given-names>G.</given-names></name> <name><surname>Sorek</surname> <given-names>R.</given-names></name></person-group> (<year>2010</year>). <article-title>Self-targeting by CRISPR: gene regulation or autoimmunity?</article-title> <source>Trends Genet</source>. <volume>26</volume>, <fpage>335</fpage>&#x02013;<lpage>340</lpage>. <pub-id pub-id-type="doi">10.1016/j.tig.2010.05.008</pub-id><pub-id pub-id-type="pmid">20598393</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tanaka</surname> <given-names>N.</given-names></name> <name><surname>Chakravarty</surname> <given-names>A. K.</given-names></name> <name><surname>Maughan</surname> <given-names>B.</given-names></name> <name><surname>Shuman</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Novel mechanism of RNA repair by RtcB via sequential 2&#x02032;,3&#x02032;-cyclic phosphodiesterase and 3&#x02032;-Phosphate/5&#x02032;-hydroxyl ligation reactions</article-title>. <source>J. Biol. Chem</source>. <volume>286</volume>, <fpage>43134</fpage>&#x02013;<lpage>43143</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M111.302133</pub-id><pub-id pub-id-type="pmid">22045815</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wheeler</surname> <given-names>D.</given-names></name> <name><surname>Bhagwat</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>BLAST QuickStart: example-driven web-based BLAST tutorial</article-title>. <source>Methods Mol. Biol</source>. <volume>395</volume>, <fpage>149</fpage>&#x02013;<lpage>176</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-59745-514-5_9</pub-id><pub-id pub-id-type="pmid">17993672</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wiatr</surname> <given-names>C. L.</given-names></name> <name><surname>Witmer</surname> <given-names>H. J.</given-names></name></person-group> (<year>1984</year>). <article-title>Selective protection of 5&#x02032;&#x02026; GGCC&#x02026; 3&#x02032; and 5&#x02032;&#x02026; GCNGC&#x02026; 3&#x02032; sequences by the hypermodified oxopyrimidine in Bacillus subtilis bacteriophage SP10 DNA</article-title>. <source>J. Virol</source>. <volume>52</volume>, <fpage>47</fpage>&#x02013;<lpage>54</lpage>. <pub-id pub-id-type="pmid">6090709</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wiedenheft</surname> <given-names>B.</given-names></name> <name><surname>Sternberg</surname> <given-names>S. H.</given-names></name> <name><surname>Doudna</surname> <given-names>J. A.</given-names></name></person-group> (<year>2012</year>). <article-title>RNA-guided genetic silencing systems in bacteria and archaea</article-title>. <source>Nature</source> <volume>482</volume>, <fpage>331</fpage>&#x02013;<lpage>338</lpage>. <pub-id pub-id-type="doi">10.1038/nature10886</pub-id><pub-id pub-id-type="pmid">22337052</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilkens</surname> <given-names>K.</given-names></name> <name><surname>Tiemann</surname> <given-names>B.</given-names></name> <name><surname>Bazan</surname> <given-names>F.</given-names></name> <name><surname>Ruger</surname> <given-names>W.</given-names></name></person-group> (<year>1997</year>). <article-title>ADP-ribosylation and early transcription regulation by bacteriophage T4</article-title>. <source>Adv. Exp. Med. Biol</source>. <volume>419</volume>, <fpage>71</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4419-8632-0_8</pub-id><pub-id pub-id-type="pmid">9193638</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Witmer</surname> <given-names>H.</given-names></name> <name><surname>Wiatr</surname> <given-names>C.</given-names></name></person-group> (<year>1985</year>). <article-title>Polymer-level synthesis of oxopyrimidine deoxynucleotides by Bacillus subtilis phage SP10: characterization of modification-defective mutants</article-title>. <source>J. Virol</source>. <volume>53</volume>, <fpage>522</fpage>&#x02013;<lpage>527</lpage>. <pub-id pub-id-type="pmid">3918174</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yamaguchi</surname> <given-names>Y.</given-names></name> <name><surname>Inouye</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Regulation of growth and death in Escherichia coli by toxin-antitoxin systems</article-title>. <source>Nat. Rev. Microbiol</source>. <volume>9</volume>, <fpage>779</fpage>&#x02013;<lpage>790</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro2651</pub-id><pub-id pub-id-type="pmid">21927020</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>de Souza</surname> <given-names>R. F.</given-names></name> <name><surname>Anantharaman</surname> <given-names>V.</given-names></name> <name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>Polymorphic toxin systems: comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics</article-title>. <source>Biol. Direct</source> <volume>7</volume>, <fpage>18</fpage>. <pub-id pub-id-type="doi">10.1186/1745-6150-7-18</pub-id></citation>
</ref>
</ref-list>
</back>
</article>
