<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="brief-report">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neuroinform.</journal-id>
<journal-title>Frontiers in Neuroinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neuroinform.</abbrev-journal-title>
<issn pub-type="epub">1662-5196</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fninf.2017.00012</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Perspective</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Computer-Aided Experiment Planning toward Causal Discovery in Neuroscience</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Matiasz</surname> <given-names>Nicholas J.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/243134/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wood</surname> <given-names>Justin</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Wei</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Silva</surname> <given-names>Alcino J.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Hsu</surname> <given-names>William</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/243525/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Medical Imaging Informatics Group, Department of Radiological Sciences, University of California, Los Angeles</institution> <country>Los Angeles, CA, USA</country></aff>
<aff id="aff2"><sup>2</sup><institution>Silva Laboratory, Departments of Neurobiology, Psychiatry, and Psychology, Integrative Center for Learning and Memory, Brain Research Institute, University of California, Los Angeles</institution> <country>Los Angeles, CA, USA</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Computer Science, Scalable Analytics Institute, University of California, Los Angeles</institution> <country>Los Angeles, CA, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Gully A. Burns, University of Southern California, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Mihail Bota, University of Southern California, USA; Yann Le Franc, e-Science Data Factory, France</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: William Hsu <email>whsu&#x00040;mednet.ucla.edu</email></p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>02</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>11</volume>
<elocation-id>12</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>05</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>01</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Matiasz, Wood, Wang, Silva and Hsu.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Matiasz, Wood, Wang, Silva and Hsu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Computers help neuroscientists to analyze experimental results by automating the application of statistics; however, computer-aided experiment planning is far less common, due to a lack of similar quantitative formalisms for systematically assessing evidence and uncertainty. While ontologies and other Semantic Web resources help neuroscientists to assimilate required domain knowledge, experiment planning requires not only ontological but also epistemological (e.g., methodological) information regarding how knowledge was obtained. Here, we outline how epistemological principles and graphical representations of causality can be used to formalize experiment planning toward causal discovery. We outline two complementary approaches to experiment planning: one that quantifies evidence per the principles of convergence and consistency, and another that quantifies uncertainty using logical representations of constraints on causal structure. These approaches operationalize experiment planning as the search for an experiment that either maximizes evidence or minimizes uncertainty. Despite work in laboratory automation, humans must still plan experiments and will likely continue to do so for some time. There is thus a great need for experiment-planning frameworks that are not only amenable to machine computation but also useful as aids in human reasoning.</p>
</abstract>
<kwd-group>
<kwd>epistemology</kwd>
<kwd>experiment planning</kwd>
<kwd>research map</kwd>
<kwd>causal graph</kwd>
<kwd>uncertainty quantification</kwd>
<kwd>information gain</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Institutes of Health<named-content content-type="fundref-id">10.13039/100000002</named-content></contract-sponsor>
<contract-sponsor id="cn002">National Cancer Institute<named-content content-type="fundref-id">10.13039/100000054</named-content></contract-sponsor>
<contract-sponsor id="cn003">National Center for Advancing Translational Sciences<named-content content-type="fundref-id">10.13039/100006108</named-content></contract-sponsor>
<counts>
<fig-count count="2"/>
<table-count count="0"/>
<equation-count count="2"/>
<ref-count count="34"/>
<page-count count="8"/>
<word-count count="6433"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Much of the work in neuroscience involves planning experiments to identify causal mechanisms; however, neuroscientists do not use computers to plan future experiments as effectively as they use them to analyze past experiments. When neuroscientists perform experiments, analyze data, and report findings, they do much to ensure that their work is objective: they follow precise lab protocols so that their experiments are reproducible; they employ rigorous statistical methods to show that their findings are significant; and they submit their manuscripts for peer review to build consensus in their fields. In contrast, experiment planning is usually less formal. To plan experiments, neuroscientists find and read relevant literature, synthesize available evidence, and design experiments that would be most instructive, given what is known. Unfortunately, neuroscientists lack tools for systematically navigating and integrating a set of findings, and for objectively and exhaustively considering all causal explanations and experimental designs. Instead, when neuroscientists search for relevant information, they routinely rely on serendipity and their incomplete memory of publications. Similarly, when they synthesize evidence, neuroscientists often use unspecified methods, based mostly on implicit strategies that are accumulated through years of training. Although it applies to much of biology, this problem is particularly worrisome in neuroscience as researchers in this field often integrate information across multiple diverse disciplines, including molecular, cellular, systems, behavioral, and cognitive neuroscience. This methodological diversity and complexity could conceivably confound neuroscientists&#x00027; search for the best experiments to perform next. The subjectivity of the experiment-planning process thus stands in stark contrast to the objectivity of the processes by which neuroscientists perform and analyze experiments.</p>
<p>This paper presents our perspective on computer-aided experiment planning and the role of graphical representations in formalizing causal discovery. After briefly describing ontologies and their role in experiment planning, Section 2 proposes that computer-aided experiment planning requires not only ontological but also epistemological information. After defining these two kinds of information, we outline how the latter can be used to formalize experiment planning. Section 3 discusses graphical representations of causality and their utility as formalisms for guiding experiment planning. First, essential features of causal graphs are introduced, including the concept of a <italic>Markov equivalence class</italic>. Next, we outline the components of a <italic>research map</italic>, a graphical representation of causality with epistemic elements relevant to experiment planning. Section 4 outlines how these two graphical representations of causality could be used to operationalize experiment planning toward causal discovery. Lastly, Section 5 speculates as to why a &#x0201C;statistics of experiment planning&#x0201D; has not been developed, and offers perspectives on the importance of a formal calculus of evidence for the future of scientific investigation.</p>
</sec>
<sec id="s2">
<title>2. Computer-aided experiment planning</title>
<p>Currently, ontologies and Semantic Web technologies help neuroscientists plan experiments by summarizing domain knowledge from the vast literature into forms more readily assimilated by individual researchers (Smith et al., <xref ref-type="bibr" rid="B29">2007</xref>; Rubin et al., <xref ref-type="bibr" rid="B25">2008</xref>; Fung and Bodenreider, <xref ref-type="bibr" rid="B16">2012</xref>; Chen et al., <xref ref-type="bibr" rid="B6">2013</xref>; Dumontier et al., <xref ref-type="bibr" rid="B13">2013</xref>). Such resources classify phenomena hierarchically and describe relations that exist between them. Ontologies&#x00027; usage can be divided broadly into three categories: knowledge management, data integration, and decision support (Bodenreider, <xref ref-type="bibr" rid="B5">2008</xref>). Example applications include aiding drug discovery (V&#x000E1;zquez-Naya et al., <xref ref-type="bibr" rid="B33">2010</xref>), identifying patient cohorts (Fern&#x000E1;ndez-Breis et al., <xref ref-type="bibr" rid="B14">2013</xref>), and facilitating manual literature curation (Krallinger et al., <xref ref-type="bibr" rid="B20">2012</xref>). Widely used ontologies include the Gene Ontology (GO) (Ashburner et al., <xref ref-type="bibr" rid="B1">2000</xref>), the Unified Medical Language System (UMLS) (Bodenreider, <xref ref-type="bibr" rid="B4">2004</xref>), and the Systematized Nomenclature of Medicine&#x02014;Clinical Terms (SNOMED CT) (Donnelly, <xref ref-type="bibr" rid="B12">2006</xref>). While obviously useful, such resources often lack elements that we propose are critical for both identifying meaningful gaps in knowledge and planning experiments to mitigate them.</p>
<p>If experiment planning is to be formalized, it seems its operationalization must involve not only ontological principles but also epistemological ones. While ontological information tells us <italic>what</italic> exists, including objects, properties, and their relations, epistemological information entails descriptions of <italic>how</italic> we obtain this information. An epistemic statement can qualify an ontological assertion by describing both its truth value (e.g., the confidence attributed to knowledge) and its basis (e.g., the evidence that supports it) (de Waard and Maat, <xref ref-type="bibr" rid="B10">2012</xref>). Epistemological methods for experiment planning would thus allow for the ranking of potential experiments, analogous to a cost function that directs the optimization of a model. The Ontology for Biomedical Investigations (Bandrowski et al., <xref ref-type="bibr" rid="B2">2016</xref>) and the Evidence Ontology (Chibucos et al., <xref ref-type="bibr" rid="B7">2014</xref>) are two recent knowledge bases that include epistemic elements (below, we outline quantitative approaches to such concepts). Knowledge-Engineering from Experimental Design (KEfED) is a formalism that captures both ontological and epistemological information by representing not only experimental findings but also semantic elements of the experiments themselves (Russ et al., <xref ref-type="bibr" rid="B26">2011</xref>; Tallis et al., <xref ref-type="bibr" rid="B31">2011</xref>). KEfED is based on the &#x0201C;Cycle of Scientific Investigation&#x0201D; (CoSI), a model in which (i) experiments induce observational and then interpretational assertions, and (ii) domain knowledge motivates hypotheses and then experimental designs (Russ et al., <xref ref-type="bibr" rid="B26">2011</xref>). This paper addresses what we see to be a large asymmetry in this process: Scientists have robust statistical methods for validating observational assertions on the basis of experiments; however, scientists lack similar quantitative formalisms for justifying hypotheses on the basis of domain knowledge.</p>
<p>One way to operationalize experiment planning is thus to quantify the uncertainty of an existing model (or set of models); the goal of experiment planning is then to identify the experiment (or set of experiments) that would minimize uncertainty. A second complementary approach is to quantify experimental evidence; the goal of experiment planning is then to identify the experiment that maximizes evidence. (For interesting discussions of uncertainty and evidence, see Vieland, <xref ref-type="bibr" rid="B34">2006</xref>; de Waard and Schneider, <xref ref-type="bibr" rid="B11">2012</xref>, respectively.) Below, we outline these two strategies, which are meant to inform and complement the mostly implicit, creative processes currently used to plan experiments.</p>
</sec>
<sec id="s3">
<title>3. Graphical representations of causality</title>
<p>As a representation for ontological information (i.e., entities and their relations), graphical causal models (Spirtes et al., <xref ref-type="bibr" rid="B30">2000</xref>; Koller and Friedman, <xref ref-type="bibr" rid="B19">2009</xref>; Pearl, <xref ref-type="bibr" rid="B23">2009</xref>) can been used as a tool for experiment planning (Pearl, <xref ref-type="bibr" rid="B22">1995</xref>). Graphical models are a sensible formalism for guiding causal discovery: graphs concisely encode probabilistic relations between variables (Friedman, <xref ref-type="bibr" rid="B15">2004</xref>); they are accessible to domain experts because they encode plain causal statements (as opposed to only statistical or probabilistic ones) (Pearl, <xref ref-type="bibr" rid="B22">1995</xref>, <xref ref-type="bibr" rid="B23">2009</xref>); and principled methods exist for assembling fragments of graphical models into one (Friedman, <xref ref-type="bibr" rid="B15">2004</xref>; Cohen, <xref ref-type="bibr" rid="B8">2015</xref>), a strategy that resembles the way researchers integrate facts from various sources. After reviewing key aspects of causal graphs below, we briefly introduce the concept of a research map, another graphical representation of causality that, in addition to ontological information, includes epistemological (specifically, methodological) information regarding the evidence behind causal assertions.</p>
<sec>
<title>3.1. Causal graphs</title>
<p>A causal model can encode the causal structure of its variables with a <italic>causal graph</italic>. A causal graph is a directed graph with a set of variables (nodes) and a set of directed edges among the variables. A directed edge between two variables in the graph conveys that the variable at the tail of the edge has a direct causal effect on the variable at the head (Spirtes et al., <xref ref-type="bibr" rid="B30">2000</xref>; Pearl, <xref ref-type="bibr" rid="B23">2009</xref>).</p>
<p>Via its structure (i.e., its connectivity), a causal graph encodes probabilistic dependence and independence relations. The graphical criterion known as <italic>d-separation</italic> (Pearl, <xref ref-type="bibr" rid="B23">2009</xref>) can be used to read such relations off a causal graph; d-separation thus translates the edges of a graph into probabilistic statements. There is a key connection between d-separation and probabilistic independence relations: considering a directed acyclic graph (DAG) with the causal Markov and causal faithfulness assumptions (Spirtes et al., <xref ref-type="bibr" rid="B30">2000</xref>), any independence implied by d-separation holds if and only if the probability distribution associated with this DAG also exhibits this independence (Pearl, <xref ref-type="bibr" rid="B23">2009</xref>).</p>
<sec>
<title>3.1.1. Markov equivalence classes</title>
<p>Per the rules of d-separation, even if two or more causal graphs have different structures, they can encode the same (in)dependencies. A set of causal graphs that all imply the same (in)dependencies is called a <italic>Markov equivalence class</italic> (Spirtes et al., <xref ref-type="bibr" rid="B30">2000</xref>), or simply an equivalence class. The right-hand side of <bold>Figure 2</bold> gives an example of an equivalence class consisting of three unique graphs: <italic>X</italic> &#x02192; <italic>Y</italic> &#x02192; <italic>Z</italic>; <italic>X</italic> &#x02190; <italic>Y</italic> &#x02192; <italic>Z</italic>; and <italic>X</italic> &#x02190; <italic>Y</italic> &#x02190; <italic>Z</italic>. Although the graphs disagree on the orientation of the edges, they all imply the same (in)dependence relations: <italic>X</italic> <inline-graphic xlink:href="fninf-11-00012-i0001.tif"/> <italic>Y</italic>; <italic>Y</italic> <inline-graphic xlink:href="fninf-11-00012-i0001.tif"/> <italic>Z</italic>; <italic>X</italic> <inline-graphic xlink:href="fninf-11-00012-i0001.tif"/> <italic>Z</italic>; and <italic>X</italic> &#x02AEB; <italic>Z</italic> &#x0007C; <italic>Y</italic>. Thus, these graphs are observationally Markov equivalent&#x02014;i.e., they are indistinguishable given only the observed (in)dependence relations.</p>
<p>It is important to note that an equivalence class can be extremely large; the number of possible causal graphs is super-exponential in the number of variables in the model. For a system with only six variables, there are over three million possible causal graphs (Robinson, <xref ref-type="bibr" rid="B24">1973</xref>); if we allow for feedback (cyclicity), there are 2<sup>30</sup> possible graphs. Causal discovery algorithms (that is, methods to identify the causal structure of a system) often cannot fully specify a single causal graph that accounts for the data; instead, they identify an equivalence class of graphs that satisfy the given (in)dependence relations. With only observational data, the graphs in an equivalence class will share the same adjacencies and vary in their edges&#x00027; orientations. Interventional data, where the experimenter manipulates one of the variables, can eliminate specific causal structures from consideration.</p>
</sec>
</sec>
<sec>
<title>3.2. Research maps</title>
<p>Although experimental data can be used to derive causal graphs, the source data from publications are often not available. Given the abundance of research articles, a lack of source data could be addressed in part by methods to extract causal information from literature. One such method is to annotate literature using a <italic>research map</italic>, a graphical representation with epistemic components relevant to experiment planning. Like a causal graph, a research map is a graphical representation of causal assertions, but it also includes methodological information pertaining to the evidence for these assertions. Such evidence is assessed using integration principles that operationalize experimental strategies for testing causal relations; these same principles can be used prospectively and explicitly to plan experiments (Landreth and Silva, <xref ref-type="bibr" rid="B21">2013</xref>; Silva et al., <xref ref-type="bibr" rid="B27">2014</xref>; Silva and M&#x000FC;ller, <xref ref-type="bibr" rid="B28">2015</xref>). See Figure <xref ref-type="fig" rid="F1">1</xref> for an example of a research map, which was created using ResearchMaps<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>, a web application that implements this framework<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>An example of a research map that depicts the causal information in Costa et al. (<xref ref-type="bibr" rid="B9">2002</xref>)</bold>. Each of the three types of causal relations are shown&#x02014;for example, an excitatory edge from K-ras to LTP, an inhibitory edge from NF1 to GABA inhibition, and a no-connection edge from N-ras to hippocampal learning. The symbol on the edge from NF1 to hippocampal learning (&#x02193;) indicates that at least one negative intervention experiment was performed to test the relation between these two phenomena. The edges in gray (from GABA inhibition to LTP, and from LTP to hippocampal learning) are hypothetical edges: putative causal assertions for which the research article does not present empirical evidence. Hypothetical edges are useful for incorporating assumptions or background knowledge about a causal system; they give the research map additional structure to facilitate interpretation of the empirical results.</p></caption>
<graphic xlink:href="fninf-11-00012-g0001.tif"/>
</fig>
<p>Although, for usability, ResearchMaps implements a simple approach to identify nodes, the research map representation is agnostic to the specific ontology used, and can in principle identify nodes with any ontology. Research maps therefore should be generally applicable, as they instead emphasize the epistemological information used to gauge experimental evidence.</p>
<p>In a research map, each node represents a biological phenomenon (e.g., a protein, behavior, etc.), and each directed edge (from an <italic>agent</italic> node to a <italic>target</italic> node) represents one of three possible types of causal relations: (i) an excitatory edge (sharp arrowhead) indicates that the agent promotes its target; (ii) an inhibitory edge (flat arrowhead) indicates that the agent inhibits its target; and (iii) a no-connection edge (dotted line; circular arrowhead) indicates that the agent has no measurable effect on its target<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. Because they represent phenomena and their relations, a research map&#x00027;s nodes and edges are thus ontological components.</p>
<p>To complement this ontological information, annotations on the edge of a research map give epistemological information regarding the type and amount of evidence for the edge&#x00027;s relation. The <italic>type</italic> of evidence for an edge is conveyed via symbols, one for each of four possible types of experiments that can give evidence for the relation. In a (i) positive intervention experiment (&#x02191;) and a (ii) negative intervention experiment (&#x02193;), either the quantity or probability of the agent is actively increased or decreased, respectively; in a (iii) positive non-intervention experiment (&#x02205;<sup>&#x02191;</sup>) and a (iv) negative non-intervention experiment (&#x02205;<sup>&#x02193;</sup>), an increase or decrease, respectively, in either the quantity or probability of the agent is observed, without intervention. The <italic>amount</italic> of evidence represented by an edge is conveyed by a score that, for convenience, ranges from zero to one. The score is calculated using integration principles (Silva et al., <xref ref-type="bibr" rid="B27">2014</xref>) whose semantics reflect two common modes of reasoning in neuroscience: consistency and convergence. A detailed description of this score&#x00027;s calculation is beyond the scope of this article; we instead outline the epistemological concepts behind the score as they relate to experiment planning.</p>
<p>The integration principle of <italic>consistency</italic> states that evidence for a particular causal relation is stronger when an experiment is repeated and produces the same result. The consistency (i.e., the reproducibility) of a finding is important because any one experiment is always prone to errors and artifacts. Neuroscientists repeat experiments to mitigate this issue. The integration principle of <italic>convergence</italic> states that evidence for a particular causal relation is stronger when different types of experiments (e.g., positive and negative interventions and non-interventions) produce evidence for the same type of causal relation. The convergence of a finding is important because any one type of experiment, even when repeated multiple times with consistent results, can be biased and thus give a misleading perspective on the system under consideration. By performing multiple types of experiments, neuroscientists mitigate the risk of experimental artifacts.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Planning experiments with graphical representations of causality</title>
<p>Below we outline two complementary approaches to experiment planning using graphical representations of causality. In the first approach, a constraint-based algorithm is used to find the equivalence class that satisfies a set of causal-structure constraints (Hyttinen et al., <xref ref-type="bibr" rid="B17">2014</xref>). Characterization of the equivalence class&#x00027;s uncertainty (underdetermination) then provides a quantitative framework for experiment planning, where the goal is to minimize uncertainty. In the second approach, the integration principles of research maps are extended to multiple edges to provide additional guidelines for experiment planning, where the goal is to maximize evidence.</p>
<sec>
<title>4.1. Minimizing uncertainty in an equivalence class</title>
<p>Due to the enormity of the causal model space, a researcher is unlikely to be able to consider all of the causal graphs whose structures accommodate a set of results. What seems more likely is that domain knowledge and past experience will cause the researcher to subjectively prefer specific causal structures over others. Although this informed subjectivity can be practically useful, it could also bias researchers toward familiar causal structures. The method we outline below uses a computer to search the model space exhaustively; therefore, all graphs that accommodate either data or domain knowledge from the literature remain viable candidates. This approach has become feasible due to a recent advance in causal discovery (Hyttinen et al., <xref ref-type="bibr" rid="B17">2014</xref>).</p>
<p>For this constraint-based approach to causal discovery, research maps can be used as an intuitive and accessible representation for neuroscientists to articulate causal-structure constraints in a familiar language (Silva et al., <xref ref-type="bibr" rid="B27">2014</xref>). The edges in the resulting research maps can then be translated into constraints on causal structure, which are expressed probabilistically. For example, if an edge in a research map represents a positive intervention in an agent, <italic>A</italic>, and a resulting change in a target, <italic>T</italic>, this edge is translated into the causal-structure constraint <italic>A</italic> <inline-graphic xlink:href="fninf-11-00012-i0001.tif"/> <italic>T</italic> &#x0007C; &#x02205; &#x0007C;&#x0007C; <italic>A</italic>, which states that <italic>A</italic> is not independent of <italic>T</italic> when not conditioning on any variables, and when intervening on <italic>A</italic>.</p>
<p>To accommodate cases of conflicting constraints, each constraint is assigned a weight, which represents a level of confidence. One option for weights is to use the scores of research map edges from which the constraints were derived. Epistemic information regarding the methodological diversity of how those constraints were derived would then inform the search over causal graphs. Assigning weights to constraints allows the causal discovery problem to be formulated as a constrained optimization: a Boolean maximum-satisfiability solver (Biere et al., <xref ref-type="bibr" rid="B3">2009</xref>) searches for the causal graph that minimizes the sum of weights of unsatisfied constraints (Hyttinen et al., <xref ref-type="bibr" rid="B17">2014</xref>). Having found the graph that is optimal in this sense, a forward inference method (Hyttinen et al., <xref ref-type="bibr" rid="B18">2013</xref>) can be used to obtain the equivalence class of graphs that encode the same (in)dependence relations. A system diagram for this method is shown in Figure <xref ref-type="fig" rid="F2">2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>A system diagram for planning experiments with causal graphs</bold>. In this approach to experiment planning, research articles are annotated to produce a research map. Each edge in the research map is then translated into a causal-structure constraint of the form <italic>A</italic> &#x02AEB; <italic>B</italic> &#x0007C; C &#x0007C;&#x0007C; J, where C is a conditioning set and J is the intervention set. Both C and J can be the empty set (&#x02205;), as is the case for the non-intervention experiments depicted above (indicated by &#x02205;<sup>&#x02191;</sup> and &#x02205;<sup>&#x02193;</sup>). To handle conflicting constraints, each causal-structure constraint is assigned a weight. A maximum-satisfiability solver then finds the causal graph that satisfies these constraints, while minimizing the sum of weights of (conflicting) unsatisfied constraints. With this one optimal graph, a forward inference method is used to identify the complete equivalence class of causal graphs that all imply the same (in)dependence relations. This equivalence class is then used as the basis for experiment planning. (Note that in the research map, the two experiments involving <italic>X</italic> and <italic>Z</italic> are shown as separate edges for clarity).</p></caption>
<graphic xlink:href="fninf-11-00012-g0002.tif"/>
</fig>
<p>The uncertainty (i.e., underdetermination) of the resulting equivalence class can then be characterized, and experiment planning can be formalized as the search for experiments that would most effectively reduce this uncertainty. This experiment-selection criterion requires a metric that can quantify an experiment&#x00027;s reduction of uncertainty. We thus need a metric to characterize the uncertainty of an equivalence class, so that the metrics for different equivalence classes (i.e., before and after a particular experiment) can be compared. Below we outline a few approaches to defining such metrics.</p>
<p>A na&#x000EF;ve approach would be to quantify the uncertainty of an equivalence class by simply counting the number of graphs it contains. By this metric, an equivalence class with <italic>n</italic> graphs would carry half the uncertainty of an equivalence class with 2<italic>n</italic> graphs. While perhaps useful, this metric fails to account for network connectivity (e.g., the existence and orientation of edges).</p>
<p>A more nuanced approach is the following &#x0201C;degrees-of-freedom&#x0201D; strategy. Consider that in any causal graph, each pair of variables will have one of four possible edge relations: (i) a &#x0201C;left-to-right&#x0201D; orientation (e.g., <italic>X</italic> &#x02192; <italic>Y</italic>), (ii) a &#x0201C;right-to-left&#x0201D; orientation (e.g., <italic>X</italic> &#x02190; <italic>Y</italic>), (iii) neither orientation (e.g., <italic>XY</italic>),<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> or, when we allow for feedback, (iv) both orientations (e.g., <italic>X</italic> &#x021C6; <italic>Y</italic>). Once a particular edge relation is instantiated for a pair of variables (e.g., <italic>X</italic> &#x02192; <italic>Y</italic>), there are three other possible edge relations, three &#x0201C;degrees of freedom,&#x0201D; that the pair can take. For a system with <italic>n</italic> variables and thus <inline-formula><mml:math id="M5"><mml:mo>(</mml:mo><mml:mfrac linethickness="0"><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:math></inline-formula> pairs of variables, the number of degrees of freedom for a causal graph is then <inline-formula><mml:math id="M6"><mml:mn>3</mml:mn><mml:mo>(</mml:mo><mml:mfrac linethickness="0"><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:math></inline-formula>. For each pair of variables in an equivalence class, we can count the number of instantiations that remain undetermined. For example, in the equivalence class of Figure <xref ref-type="fig" rid="F2">2</xref>, the graphs all agree that there is no edge between <italic>X</italic> and <italic>Z</italic> and that edges exist between the node pairs (<italic>X, Y</italic>) and (<italic>Y, Z</italic>); however, they specify different orientations for these latter two edges. By this metric, this equivalence class has two degrees of freedom. It may be useful to express this metric as a percentage: again, for the equivalence class in Figure <xref ref-type="fig" rid="F2">2</xref>, <inline-formula><mml:math id="M7"><mml:mn>2</mml:mn><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>3</mml:mn><mml:mo>(</mml:mo><mml:mfrac linethickness="0"><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02248;</mml:mo><mml:mn>22</mml:mn><mml:mo>.</mml:mo><mml:mn>2</mml:mn><mml:mi>%</mml:mi></mml:math></inline-formula> of the degrees of freedom remain.</p>
<p>Yet a more nuanced approach is the following strategy, based on the concept of <italic>edge entropy</italic>. Tong and Koller (<xref ref-type="bibr" rid="B32">2001</xref>) consider graphs with three edge relations: <italic>X</italic> &#x02192; <italic>Y</italic>, <italic>X</italic> &#x02190; <italic>Y</italic>, and <italic>XY</italic>. Given a distribution <italic>P</italic> over these relations, they quantify the uncertainty regarding the relation of an edge using the edge entropy expression
<disp-formula id="E1"><label>(1)</label><mml:math id="M8"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02190;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02190;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mo>&#x000A0;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mo>&#x000A0;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>).</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
This equation can be extended naturally to accommodate the fourth edge relation (i.e., <italic>X</italic> &#x021C6; <italic>Y</italic>) that was considered for the degrees-of-freedom metric:
<disp-formula id="E2"><label>(2)</label><mml:math id="M9"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02190;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x02190;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mo>&#x000A0;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mo>&#x000A0;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x021C6;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;log&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x021C6;</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
Instead of actual probabilities for this expression, we can use the empirical distribution exhibited by a given equivalence class, and include only terms with nonzero values for <italic>P</italic>(&#x000B7;). For example, in the equivalence class of Figure <xref ref-type="fig" rid="F2">2</xref>, one-third of the edges between <italic>X</italic> and <italic>Y</italic> show the causal parent as <italic>X</italic>, and two-thirds of the edges show the causal parent as <italic>Y</italic>. The entropy of this edge is then <italic>H</italic>(<italic>X, Y</italic>) &#x0003D; &#x02212;(1/3) log (1/3) &#x02212; (2/3) log (2/3) &#x02248; 0.918. For the edges between <italic>Y</italic> and <italic>Z</italic>, two-thirds show the causal parent as <italic>Y</italic>, and one-third shows the causal parent as <italic>Z</italic>. Similarly, the entropy of this edge is then <italic>H</italic>(<italic>Y, Z</italic>) &#x0003D; &#x02212;(2/3) log (2/3) &#x02212; (1/3) log (1/3) &#x02248; 0.918. Appropriately, once an edge&#x00027;s existence and orientation are determined, the entropy drops to zero. Such is the case for the edge relation between <italic>X</italic> and <italic>Z</italic> in the equivalence class of Figure <xref ref-type="fig" rid="F2">2</xref>: every graph agrees on the absence of this edge. The entropy of an entire equivalence class can then be defined as the sum (or average) of the entropies for every pair of variables in the system.</p>
<p>We can use such metrics to ask: Which experiment, if performed, would most effectively minimize the uncertainty of the equivalence class? The ideal experiment is then the one that minimizes these metrics. For example, when applied to the equivalence class of Figure <xref ref-type="fig" rid="F2">2</xref>, these metrics would prioritize experiments to test the relations between the pairs (<italic>X, Y</italic>) and (<italic>Y, Z</italic>): compared to the pair (<italic>X, Z</italic>), the other pairs have more degrees of freedom, and thus higher entropies. The possible outcomes of an experiment could be expressed in terms of the causal-structure constraints that could result, and these potential constraints could be used to determine the potential equivalence classes that could result from the experiment. The uncertainty metrics for both the current and prospective equivalence classes could be compared, yielding a method for quantifying the <italic>information gain</italic> of an experiment.</p>
</sec>
<sec>
<title>4.2. Maximizing evidence in a research map</title>
<p>Another approach to experiment planning is to rank experiments by the value of the evidence they could potentially yield. Given that convergence and consistency are used to gauge evidence in research maps, these principles can also be used to determine which experiments could most effectively strengthen or weaken the evidence for a particular edge. For example, if the evidence for an edge is based solely on a positive intervention experiment, then the principle of convergence would suggest that negative interventions and non-intervention experiments could be used to strengthen the evidence for that edge. Additionally, the principle of consistency would suggest that repetitions of any one of these experiments could strengthen the evidence. This reasoning represents a straightforward approach commonly used by neuroscientists to plan experiments. Beyond just single edges, these integration rules can be extended to entire research maps. To facilitate the presentation of these principles, we limit our discussion to research maps that contain only three nodes, representing part of a signal pathway or any other biological cascade.</p>
<p>It is important to remember that experiments are usually carried out with reference to a specific hypothesis that is commonly suggested by findings and theories. In research maps, hypotheses are represented by hypothetical edges. Unlike edges representing empirical experiments, hypothetical edges have no score or experiment symbols (see Figure <xref ref-type="fig" rid="F1">1</xref>). Hypothetical edges can thus organize and structure empirical edges based on actual experiments. Although the causal relations represented by hypothetical edges cannot always be directly tested&#x02014;perhaps we lack the required tools&#x02014;they nevertheless inform the choice among feasible experiments by contextualizing empirical results within specific theories, interpretations, etc.</p>
<p>With a given research map, we can use a number of principles, including the <italic>pioneering</italic> rule, to develop its evidence. This <italic>pioneering</italic> rule states that when a research map&#x00027;s edges imply the existence of an edge that spans other edges, testing this edge can significantly inform the model. For example, if we have a research map with empirical edges <italic>X</italic> &#x02192; <italic>Y</italic> &#x02192; <italic>Z</italic>, then designing an experiment to test the connection <italic>X</italic> &#x02192; <italic>Z</italic> will likely be instructive as to whether <italic>X</italic> contributes to <italic>Z</italic>. Finding that manipulations of <italic>X</italic> reliably affect <italic>Z</italic>, for example, will provide further evidence for the existence of a pathway from <italic>X</italic> to <italic>Z</italic>.</p>
<p>Having considered all of the pairwise edges in a research map, we then refer to what we call the <italic>weakest-link</italic> rule. This rule simply states that edges with the lowest score (i.e., the least evidence) should receive the most attention when designing experiments to assess a given research map. Using the example above, if the <italic>X</italic> &#x02192; <italic>Y</italic> edge has a score of 0.250 while the <italic>Y</italic> &#x02192; <italic>Z</italic> edge has a score of 0.125, the weakest-link rule states that we should further test the <italic>Y</italic> &#x02192; <italic>Z</italic> edge first. Note that once a particular edge has been selected for additional experiments, the single-edge integration rules of convergence and consistency (see Section 3.2) provide guidelines for selecting the optimal type of experiment to perform.</p>
<p>There are cases when the above rules cannot identify a single experiment that is optimal: there may be two or more experiment types (e.g., positive and negative interventions) that could (potentially) provide equally consistent and convergent evidence, given the experiments that have already been performed. In such cases we refer to what we call the rule of <italic>multi-edge convergence</italic>. This rule states that when given a choice between (potentially) equally convergent experiment types, we should select the type that is least represented of the experiments recorded for the entire research map. The rationale for this rule is that increasing the methodological diversity of a set of findings will lower the chances of systematic artifacts. For example, the prevalence of negative interventions depicted in Figure <xref ref-type="fig" rid="F1">1</xref> would motivate the use of positive interventions, as well as non-interventions, to study this system.</p>
<p>These rules&#x02014;(single-edge) consistency and convergence, the pioneering rule, the weakest-link rule, and multi-edge convergence&#x02014;provide guidelines for experiment planning when working with research maps. These rules attempt to make explicit and quantitative the epistemological strategies commonly used by neuroscientists. In articulating and further extending these rules to larger networks, we are attempting to expand the research maps framework so that it is useful not only for representing results but also for planning experiments.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s5">
<title>5. Discussion</title>
<p>In this paper, we outline ways to formalize experiment planning in the context of causal discovery. These methods are not designed to replace inspiration or creativity in science; for example, they cannot determine the topics that scientists should pursue. These methods could instead help scientists to quantify and communicate the rationale for selecting particular experiments when testing an hypothesis. Just as statistical methods convey the significance of a finding, we propose that formalisms are needed to make the experiment-planning process more objective&#x02014;e.g., by quantitatively assessing the amount of information (i.e., the <italic>information gain</italic>) that could be gleaned from a particular set of experiments.</p>
<p>In the last few decades, the causal modeling community has developed robust formalisms and algorithms for representing and identifying causal relations. Despite these advances, these methods remain surprisingly underused by neuroscientists seeking to identify causal mechanisms. Part of this trend is likely due to a lack of communication between researchers in these fields: many neuroscientists simply lack fluency in these methods, and thus do not use them. However, even for neuroscientists who wish to leverage the robust methods that causal models afford, there are significant challenges when applying these methods, given the many practical constraints imposed.</p>
<p>The first experiment-planning approach we propose is our attempt to render these methods usable by practicing neuroscientists, such that literature, in addition to data, can be used to derive causal graphs. If such methods are adopted, experiment planning will be made more objective, systematic, and communicable to the research community: potential experiments could be selected on the basis of their ability to reduce the space of possible causal graphs. The second approach proposed is our attempt to express epistemological principles (already used in neuroscience) in a quantitative framework to guide experiment planning. Together, these approaches form the basis of a mathematical framework that could be used in the scientific method alongside statistics: quantitative formalisms would then be used not only to validate scientific findings but also to justify the experiments themselves.</p>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>NM, AS, JW, and WH contributed to the development of the two experiment-planning approaches presented. NM and AS wrote the manuscript. WW provided valuable advice for the project, and helped to edit the manuscript.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>This work was supported by the Leslie Chair in Pioneering Brain Research to AS, an NIH T32 (5T32EB016640-02) to NM, and an NIH-NCI T32 (T32CA201160) to JW. This project also received support from the NIH/NCATS UCLA CTSI Grant Number (UL1TR000124).</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer MB and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.</p>
</sec>
</sec>
</body>
<back>
<ack><p>We would like to thank Frederick Eberhardt for reading early versions of this manuscript and providing helpful feedback.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ashburner</surname> <given-names>M.</given-names></name> <name><surname>Ball</surname> <given-names>C. A.</given-names></name> <name><surname>Blake</surname> <given-names>J. A.</given-names></name> <name><surname>Botstein</surname> <given-names>D.</given-names></name> <name><surname>Butler</surname> <given-names>H.</given-names></name> <name><surname>Cherry</surname> <given-names>J. M.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Gene ontology: tool for the unification of biology</article-title>. <source>Nat. Genet.</source> <volume>25</volume>, <fpage>25</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1038/75556</pub-id><pub-id pub-id-type="pmid">10802651</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bandrowski</surname> <given-names>A.</given-names></name> <name><surname>Brinkman</surname> <given-names>R.</given-names></name> <name><surname>Brochhausen</surname> <given-names>M.</given-names></name> <name><surname>Brush</surname> <given-names>M. H.</given-names></name> <name><surname>Bug</surname> <given-names>B.</given-names></name> <name><surname>Chibucos</surname> <given-names>M. C.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>The ontology for biomedical investigations</article-title>. <source>PLoS ONE</source> <volume>11</volume>:<fpage>e0154556</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0154556</pub-id><pub-id pub-id-type="pmid">27128319</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Biere</surname> <given-names>A.</given-names></name> <name><surname>Heule</surname> <given-names>M.</given-names></name> <name><surname>van Maaren</surname> <given-names>H.</given-names></name></person-group> (<year>2009</year>). <source>Handbook of Satisfiability</source>, <volume>Vol. 185</volume>. <publisher-loc>Amsterdam</publisher-loc>: <publisher-name>IOS Press</publisher-name>.</citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bodenreider</surname> <given-names>O.</given-names></name></person-group> (<year>2004</year>). <article-title>The unified medical language system (umls): integrating biomedical terminology</article-title>. <source>Nucleic Acids Res.</source> <volume>32</volume>(<supplement>Suppl. 1</supplement>), <fpage>D267</fpage>&#x02013;<lpage>D270</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh061</pub-id><pub-id pub-id-type="pmid">14681409</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bodenreider</surname> <given-names>O.</given-names></name></person-group> (<year>2008</year>). <article-title>Biomedical ontologies in action: role in knowledge management, data integration and decision support</article-title>. <source>Yearb. Med. Inform.</source> <fpage>67</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="pmid">18660879</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Yu</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>J. Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Semantic web meets integrative biology: a survey</article-title>. <source>Brief. Bioinformatics</source> <volume>14</volume>, <fpage>109</fpage>&#x02013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbs014</pub-id><pub-id pub-id-type="pmid">22492191</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chibucos</surname> <given-names>M. C.</given-names></name> <name><surname>Mungall</surname> <given-names>C. J.</given-names></name> <name><surname>Balakrishnan</surname> <given-names>R.</given-names></name> <name><surname>Christie</surname> <given-names>K. R.</given-names></name> <name><surname>Huntley</surname> <given-names>R. P.</given-names></name> <name><surname>White</surname> <given-names>O.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Standardized description of scientific evidence using the evidence ontology (eco)</article-title>. <source>Database</source> <volume>2014</volume>:<fpage>bau075</fpage>. <pub-id pub-id-type="doi">10.1093/database/bau075</pub-id><pub-id pub-id-type="pmid">25052702</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname> <given-names>P. R.</given-names></name></person-group> (<year>2015</year>). <article-title>Darpa&#x00027;s big mechanism program</article-title>. <source>Phys. Biol.</source> <volume>12</volume>:<fpage>045008</fpage>. <pub-id pub-id-type="doi">10.1088/1478-3975/12/4/045008</pub-id><pub-id pub-id-type="pmid">26178259</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Costa</surname> <given-names>R. M.</given-names></name> <name><surname>Federov</surname> <given-names>N. B.</given-names></name> <name><surname>Kogan</surname> <given-names>J. H.</given-names></name> <name><surname>Murphy</surname> <given-names>G. G.</given-names></name> <name><surname>Stern</surname> <given-names>J.</given-names></name> <name><surname>Ohno</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2002</year>). <article-title>Mechanism for the learning deficits in a mouse model of neurofibromatosis type 1</article-title>. <source>Nature</source> <volume>415</volume>, <fpage>526</fpage>&#x02013;<lpage>530</lpage>. <pub-id pub-id-type="doi">10.1038/nature711</pub-id><pub-id pub-id-type="pmid">11793011</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>de Waard</surname> <given-names>A.</given-names></name> <name><surname>Maat</surname> <given-names>H. P.</given-names></name></person-group> (<year>2012</year>). <article-title>Epistemic modality and knowledge attribution in scientific discourse: a taxonomy of types and overview of features;</article-title>, in <source>Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, Association for Computational Linguistics</source> (<publisher-loc>Jeju</publisher-loc>), <fpage>47</fpage>&#x02013;<lpage>55</lpage>.</citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Waard</surname> <given-names>A.</given-names></name> <name><surname>Schneider</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Formalising uncertainty: an ontology of reasoning, certainty and attribution (orca)</article-title>, in <source>Proceedings of the Joint 2012 International Conference on Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine</source>, <volume>Vol. 930</volume>, <fpage>10</fpage>&#x02013;<lpage>17</lpage>. Available online at: CEUR-WS.org</citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donnelly</surname> <given-names>K.</given-names></name></person-group> (<year>2006</year>). <article-title>Snomed-ct: the advanced terminology and coding system for ehealth</article-title>. <source>Stud. Health Technol. Inform.</source> <volume>121</volume>:<fpage>279</fpage>. <pub-id pub-id-type="pmid">17095826</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dumontier</surname> <given-names>M.</given-names></name> <name><surname>Chepelev</surname> <given-names>L. L.</given-names></name> <name><surname>Hoehndorf</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>Semantic systems biology: formal knowledge representation in systems biology for model construction, retrieval, validation and discovery</article-title>, in <source>Systems Biology</source>, eds <person-group person-group-type="editor"><name><surname>Prokop</surname> <given-names>A.</given-names></name> <name><surname>Csuk&#x000E1;s</surname> <given-names>B.</given-names></name></person-group> (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>355</fpage>&#x02013;<lpage>373</lpage>.</citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fern&#x000E1;ndez-Breis</surname> <given-names>J. T.</given-names></name> <name><surname>Maldonado</surname> <given-names>J. A.</given-names></name> <name><surname>Marcos</surname> <given-names>M.</given-names></name> <name><surname>Legaz-Garc&#x000ED;a</surname> <given-names>M.</given-names></name> <name><surname>Moner</surname> <given-names>D.</given-names></name> <name><surname>Torres-Sospedra</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Leveraging electronic healthcare record standards and semantic web technologies for the identification of patient cohorts</article-title>. <source>J. Am. Med. Inform. Assoc.</source> <volume>20</volume>, <fpage>e288</fpage>&#x02013;<lpage>e296</lpage>. <pub-id pub-id-type="doi">10.1136/amiajnl-2013-001923</pub-id><pub-id pub-id-type="pmid">23934950</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friedman</surname> <given-names>N.</given-names></name></person-group> (<year>2004</year>). <article-title>Inferring cellular networks using probabilistic graphical models</article-title>. <source>Science</source> <volume>303</volume>, <fpage>799</fpage>&#x02013;<lpage>805</lpage>. <pub-id pub-id-type="doi">10.1126/science.1094068</pub-id><pub-id pub-id-type="pmid">14764868</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fung</surname> <given-names>K. W.</given-names></name> <name><surname>Bodenreider</surname> <given-names>O.</given-names></name></person-group> (<year>2012</year>). <article-title>Knowledge representation and ontologies</article-title>, in <source>Clinical Research Informatics</source>, eds <person-group person-group-type="editor"><name><surname>Richesson</surname> <given-names>R. L.</given-names></name> <name><surname>Andrews</surname> <given-names>J. E.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>255</fpage>&#x02013;<lpage>275</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-84882-448-5_14</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hyttinen</surname> <given-names>A.</given-names></name> <name><surname>Eberhardt</surname> <given-names>F.</given-names></name> <name><surname>J&#x000E4;rvisalo</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>Constraint-based causal discovery: conflict resolution with answer set programming</article-title>, in <source>Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014)</source>, eds <person-group person-group-type="editor"><name><surname>Zhang</surname> <given-names>N. L.</given-names></name> <name><surname>Tian</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Quebec City, QC</publisher-loc>), <fpage>340</fpage>&#x02013;<lpage>349</lpage>.</citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hyttinen</surname> <given-names>A.</given-names></name> <name><surname>Hoyer</surname> <given-names>P. O.</given-names></name> <name><surname>Eberhardt</surname> <given-names>F.</given-names></name> <name><surname>J&#x000E4;rvisalo</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>Discovering cyclic causal models with latent variables: a general sat-based procedure</article-title>, in <source>Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013)</source>, eds <person-group person-group-type="editor"><name><surname>Nicholson</surname> <given-names>A.</given-names></name> <name><surname>Smyth</surname> <given-names>P.</given-names></name></person-group> (<publisher-loc>Bellevue, WA</publisher-loc>), <fpage>301</fpage>&#x02013;<lpage>310</lpage>.</citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Koller</surname> <given-names>D.</given-names></name> <name><surname>Friedman</surname> <given-names>N.</given-names></name></person-group> (<year>2009</year>). <source>Probabilistic Graphical Models: Principles and Techniques</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krallinger</surname> <given-names>M.</given-names></name> <name><surname>Leitner</surname> <given-names>F.</given-names></name> <name><surname>Vazquez</surname> <given-names>M.</given-names></name> <name><surname>Salgado</surname> <given-names>D.</given-names></name> <name><surname>Marcelle</surname> <given-names>C.</given-names></name> <name><surname>Tyers</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>How to link ontologies and protein&#x02013;protein interactions to literature: text-mining approaches and the biocreative experience</article-title>. <source>Database</source> <volume>2012</volume>:<fpage>bas017</fpage>. <pub-id pub-id-type="doi">10.1093/database/bas017</pub-id><pub-id pub-id-type="pmid">22438567</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Landreth</surname> <given-names>A.</given-names></name> <name><surname>Silva</surname> <given-names>A. J.</given-names></name></person-group> (<year>2013</year>). <article-title>The need for research maps to navigate published work and inform experiment planning</article-title>. <source>Neuron</source> <volume>79</volume>, <fpage>411</fpage>&#x02013;<lpage>415</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2013.07.024</pub-id><pub-id pub-id-type="pmid">23931992</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pearl</surname> <given-names>J.</given-names></name></person-group> (<year>1995</year>). <article-title>Causal diagrams for empirical research</article-title>. <source>Biometrika</source> <volume>82</volume>, <fpage>669</fpage>&#x02013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.1093/biomet/82.4.669</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pearl</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <source>Causality, 2nd Edn</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Robinson</surname> <given-names>R. W.</given-names></name></person-group> (<year>1973</year>). <article-title>Counting labeled acyclic digraphs</article-title>, in <source>New Directions in the Theory of Graphs</source>, ed <person-group person-group-type="editor"><name><surname>Harary</surname> <given-names>F.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>239</fpage>&#x02013;<lpage>273</lpage>.</citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rubin</surname> <given-names>D. L.</given-names></name> <name><surname>Shah</surname> <given-names>N. H.</given-names></name> <name><surname>Noy</surname> <given-names>N. F.</given-names></name></person-group> (<year>2008</year>). <article-title>Biomedical ontologies: a functional perspective</article-title>. <source>Brief. Bioinform.</source> <volume>9</volume>, <fpage>75</fpage>&#x02013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbm059</pub-id><pub-id pub-id-type="pmid">18077472</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Russ</surname> <given-names>T. A.</given-names></name> <name><surname>Ramakrishnan</surname> <given-names>C.</given-names></name> <name><surname>Hovy</surname> <given-names>E. H.</given-names></name> <name><surname>Bota</surname> <given-names>M.</given-names></name> <name><surname>Burns</surname> <given-names>G. A.</given-names></name></person-group> (<year>2011</year>). <article-title>Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>:<fpage>351</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-12-351</pub-id><pub-id pub-id-type="pmid">21859449</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>A. J.</given-names></name> <name><surname>Landreth</surname> <given-names>A.</given-names></name> <name><surname>Bickle</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <source>Engineering the Next Revolution in Neuroscience: The New Science of Experiment Planning</source>. <publisher-loc>Oxford</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>A. J.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2015</year>). <article-title>The need for novel informatics tools for integrating and planning research in molecular and cellular cognition</article-title>. <source>Learn. Mem.</source> <volume>22</volume>, <fpage>494</fpage>&#x02013;<lpage>498</lpage>. <pub-id pub-id-type="doi">10.1101/lm.029355.112</pub-id><pub-id pub-id-type="pmid">26286658</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>B.</given-names></name> <name><surname>Ashburner</surname> <given-names>M.</given-names></name> <name><surname>Rosse</surname> <given-names>C.</given-names></name> <name><surname>Bard</surname> <given-names>J.</given-names></name> <name><surname>Bug</surname> <given-names>W.</given-names></name> <name><surname>Ceusters</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>The obo foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>. <source>Nat. Biotechnol.</source> <volume>25</volume>, <fpage>1251</fpage>&#x02013;<lpage>1255</lpage>. <pub-id pub-id-type="doi">10.1038/nbt1346</pub-id><pub-id pub-id-type="pmid">17989687</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Spirtes</surname> <given-names>P.</given-names></name> <name><surname>Glymour</surname> <given-names>C.</given-names></name> <name><surname>Scheines</surname> <given-names>R.</given-names></name></person-group> (<year>2000</year>). <source>Causation, Prediction, and Search, 2nd Edn</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tallis</surname> <given-names>M.</given-names></name> <name><surname>Thompson</surname> <given-names>R.</given-names></name> <name><surname>Russ</surname> <given-names>T. A.</given-names></name> <name><surname>Burns</surname> <given-names>G. A.</given-names></name></person-group> (<year>2011</year>). <article-title>Knowledge synthesis with maps of neural connectivity</article-title>. <source>Front. Neuroinformatics</source> <volume>5</volume>:<fpage>24</fpage>. <pub-id pub-id-type="doi">10.3389/fninf.2011.00024</pub-id><pub-id pub-id-type="pmid">22053155</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tong</surname> <given-names>S.</given-names></name> <name><surname>Koller</surname> <given-names>D.</given-names></name></person-group> (<year>2001</year>). <article-title>Active learning for structure in bayesian networks</article-title>, in <source>Seventeenth International Joint Conference on Artificial Intelligence (IJCAI)</source> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>863</fpage>&#x02013;<lpage>869</lpage>.</citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>V&#x000E1;zquez-Naya</surname> <given-names>J. M.</given-names></name> <name><surname>Mart&#x000ED;nez-Romero</surname> <given-names>M.</given-names></name> <name><surname>Porto-Pazos</surname> <given-names>A. B.</given-names></name> <name><surname>Novoa</surname> <given-names>F.</given-names></name> <name><surname>Valladares-Ayerbes</surname> <given-names>M.</given-names></name> <name><surname>Pereira</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Ontologies of drug discovery and design for neurology, cardiology and oncology</article-title>. <source>Curr. Pharm. Design</source> <volume>16</volume>, <fpage>2724</fpage>&#x02013;<lpage>2736</lpage>. <pub-id pub-id-type="doi">10.2174/138161210792389199</pub-id><pub-id pub-id-type="pmid">20642429</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vieland</surname> <given-names>V. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Thermometers: something for statistical geneticists to think about</article-title>. <source>Hum. Hered.</source> <volume>61</volume>, <fpage>144</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1159/000093775</pub-id><pub-id pub-id-type="pmid">16770079</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="http://researchmaps.org/">http://researchmaps.org/</ext-link>.</p></fn>
<fn id="fn0002"><p><sup>2</sup>ResearchMaps allows a user to create a research map for an article in a few steps: (i) each experiment in the article, its type, and the identities of its agent&#x02013;target pair are recorded; (ii) notes are made on the methods used to observe (or manipulate) the agent and to measure the change (or lack thereof) in the target; (iii) the result measured in the target is recorded. Users can enter this information either for empirical edges (with supporting experimental evidence in the article) or for hypothetical edges (with support from expert opinion or elsewhere in the literature). While the annotation of more established experimental methods may be straightforward, the annotation of work on the cutting edge of a field may be more open to interpretation, much like the findings themselves.</p></fn>
<fn id="fn0003"><p><sup>3</sup>The semantics of an edge in a research map differ from the semantics of an edge in a causal graph (see Section 3). One key distinction is that an edge in a research map does not necessarily imply a direct causal connection: the relation that is represented may instead be <italic>ancestral</italic>.</p></fn>
<fn id="fn0004"><p><sup>4</sup>The blank space between the two variables is intentional; it is meant to call attention to the fact that the corresponding nodes in the graph lack any type of edge between them.</p></fn>
</fn-group>
</back>
</article>