<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neuroinform.</journal-id>
<journal-title>Frontiers in Neuroinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neuroinform.</abbrev-journal-title>
<issn pub-type="epub">1662-5196</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fninf.2013.00044</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Stevens</surname> <given-names>Jean-Luc R.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Elver</surname> <given-names>Marco</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Bednar</surname> <given-names>James A.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Informatics, Institute for Adaptive and Neural Computation, University of Edinburgh</institution> <country>Edinburgh, UK</country></aff>
<aff id="aff2"><sup>2</sup><institution>School of Informatics, Institute for Computing Systems Architecture, University of Edinburgh</institution> <country>Edinburgh, UK</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Andrew P. Davison, CNRS, France</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Padraig Gleeson, University College London, UK; Thomas G. Close, Okinawa Institute of Science and Technology Graduate University, Japan</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Jean-Luc R. Stevens, School of Informatics, Institute for Adaptive and Neural Computation, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK e-mail: <email>jlstevens&#x00040;inf.ed.ac.uk</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to the journal Frontiers in Neuroinformatics.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>30</day>
<month>12</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<volume>7</volume>
<elocation-id>44</elocation-id>
<history>
<date date-type="received">
<day>04</day>
<month>11</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>12</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2013 Stevens, Elver and Bednar.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change.</p></abstract>
<kwd-group>
<kwd>IPython</kwd>
<kwd>pandas</kwd>
<kwd>reproducibility</kwd>
<kwd>workflow</kwd>
<kwd>simulation</kwd>
<kwd>batch computation</kwd>
<kwd>provenance</kwd>
<kwd>big data</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="14"/>
<page-count count="11"/>
<word-count count="8615"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>1. Introduction</title>
<p>Computational neuroscience is a rapidly developing scientific field that relies on a large ecosystem of software tools that is continually evolving as high-performance computing infrastructure is updated. Every computational neuroscientist must therefore keep up with new developments in neuroscience, software engineering, and computer hardware while advancing novel computational theories of the nervous system. The drive to explore different scientific hypotheses rapidly has made Python the language of choice for many researchers due to its flexibility and wide range of libraries already provided. Despite this fast pace of change, it is crucial that results remain reproducible once they are obtained, if computational neuroscientists are to have long-term confidence in the integrity of their work.</p>
<p>The formidable challenges associated with developing replicable scientific publications in a rapidly advancing field are well recognized by the computational neuroscience community. The difficulties include problems replicating results between simulators (Crook et al.,<xref ref-type="bibr" rid="B3">2013</xref>) and insufficiently constrained model parameters in publications (Nordlie et al., <xref ref-type="bibr" rid="B12">2009</xref>), along with an important debate about the distinction between replicability and reproducibility (Drummond, <xref ref-type="bibr" rid="B6">2009</xref>; Freire et al., <xref ref-type="bibr" rid="B7">2011</xref>). Fundamentally, neuroscience is concerned with the study of dynamic, history dependent biological systems of exceedingly high dimensionality. Although computational models abstract away most of the complexity of nervous systems by necessity, it is still a formidable challenge to communicate this type of work to other scientists while also capturing the key properties of the biological system under study. These broad issues must be addressed by the community as a whole, and cannot be solved by any one piece of software.</p>
<p>The approach we present to improve reproducibility is by offering a small number of useful utilities that first aim to improve a researcher&#x00027;s scientific productivity. If properly designed and useful enough to become a core part of a researcher&#x00027;s regular workflow, it is hoped that such tools will allow reproducible science to emerge naturally as researchers seek to increase productivity. This approach is in sharp contrast to more heavyweight automated scientific workflow systems (Curcin and Ghanem, <xref ref-type="bibr" rid="B4">2008</xref>; Freire et al., <xref ref-type="bibr" rid="B8">2014</xref>) that can be effective for mature research areas but would be constraining for this young and ever-changing field.</p>
<p>We developed the Lancet package as a small set of flexible, lightweight components that allow a researcher to generate and analyze large data sets more efficiently. These components are designed to help improve research efficiency by allowing the user to capture the essence of a scientific task with very little code and by catching errors early on, before expensive computational processes begin. By distilling a problem into a small number of short, declarative specifications, the researcher can focus on important scientific details, spending less time worrying about issues of implementation. Every component in Lancet is written to satisfy an immediate need; the end goal of generating automated, reproducible results should then be satisfied as a natural outcome of a clean and efficient solution to a problem.</p>
<p>By design, Lancet is a general utility, allowing it to work with any external tool or simulator. This ensures that as tools change or as researchers switch between software and platforms, the code written with Lancet remains unchanged. This generality is strictly enforced by the requirements of one of the authors, who is successfully applying Lancet outside the domain of computational neuroscience, i.e., to run simulations of varying microprocessor architectures. Lancet is pure, platform-independent Python with minimal dependencies, and supports both Python 2 and Python 3. Together, these properties should help ensure that code written using this utility will remain viable for the foreseeable future.</p>
<p>The goal of this new package is to allow reproducible, agile workflows to develop organically when used together with other tools, namely a suitable version control system and IPython Notebook. Since version 0.12 of IPython (P&#x000E9;rez and Granger, <xref ref-type="bibr" rid="B13">2007</xref>), a notebook feature has been provided which allows code, data, and figures to be interactively explored while maintaining a complete record of the source code. Lancet is designed to integrate well with IPython and the pandas library (pandas.pydata.org), without having either of these two projects as a core dependency.</p>
<p>The next section introduces the components of Lancet, starting with a very small toy example of a workflow that begins with an initial specification and ends in a simple analysis. Section 3 provides an overview of the three main types of components offered in Lancet. At every stage, we show how these components make research tasks easier to complete by making the intentions of exploratory and publication-specific code clearer and more succinct. With the basic design established, Section 4 presents the full reproducible workflow, showing how Lancet can help turn reproducible science into practical reality when used together with IPython Notebook and other popular tools such as Git and the pandas data analysis library. To demonstrate that this workflow is both practical and relevant to a real research project, we then briefly describe how it was used to generate all the results in Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>), recently published in the Journal of Neuroscience.</p>
</sec>
<sec>
<title>2. Basic lancet example</title>
<p>Python is a flexible, interpreted language that comes with many modules that extend the functionality of the base language. Closely related modules are collected into packages, some of which are included together with Python in the standard library and others that are available as third party libraries. The new Lancet package is designed to work together with the many excellent Python packages already available for scientific computing, to help capture and simplify a researcher&#x00027;s workflow. Lancet integrates particularly well with the interactive IPython notebook environment, which improves on Python&#x00027;s facilities for exploratory research and works across multiple platforms (Linux, MacOS, Windows). More information about Lancet, including installation instructions, may be found on Lancet&#x00027;s website (<ext-link ext-link-type="uri" xlink:href="http://ioam.github.io/lancet">http://ioam.github.io/lancet</ext-link>).</p>
<p>To introduce Lancet, we will first look at a minimal, toy example of a Python-based workflow with Lancet, listed in Figure <xref ref-type="fig" rid="F1">1</xref>. This example uses the simple <monospace>factor</monospace> command (included in GNU coreutils) to find the prime numbers that lie within a specific range of integers. Although brief, this example demonstrates how to use an initial specification of a parameter space to obtain results collated across 16 independent jobs. Section 4 will show how this approach fits into an agile, exploratory workflow. Meanwhile, even this simple example illustrates some of the key component types that are commonly applicable to many research tasks:
<list list-type="bullet">
<list-item><p><italic>What you aim to achieve</italic>. It is common to define a parameter space to be explored by some simulator or analysis tool. In Figure <xref ref-type="fig" rid="F1">1</xref> this is the list of integers to factorize, highlighted in red. This level of specification expresses the scientific goal and is normally both <italic>tool-independent</italic> and <italic>platform-independent</italic>. Given a parameter space, it is conceivable that the desired results may be achieved using alternative software tools executed on different platforms. When exploring a parameter space, the key information is specified by the set of parameters explored and not by the details of the software used.</p></list-item>
<list-item><p><italic>How you intend to achieve your goal</italic>. This refers to the target software that runs a model or performs an analysis. In Figure <xref ref-type="fig" rid="F1">1</xref> this is the <monospace>factor</monospace> command which factorizes integers, as highlighted in green. This type of specification is often <italic>platform-independent</italic> but <italic>tool-dependent</italic>, encapsulating how a specific piece of software is to be invoked with tool-dependent arguments, independent of the computational platform on which the software is run.</p></list-item>
<list-item><p><italic>Where you want to execute the task</italic>. If the software can run on multiple different platforms, there may be alternative ways to execute the tool. Executing a task in a particular environment is normally <italic>platform-dependent</italic> but <italic>tool-independent</italic>. In Figure <xref ref-type="fig" rid="F1">1</xref> the <monospace>factor</monospace> command is executed locally using the <monospace>Launcher</monospace> class supplied by Lancet, highlighted in blue. By switching to the <monospace>QLauncher</monospace> class, the exact same task could be executed in parallel on a Grid Engine cluster without changing the rest of the code.</p></list-item>
</list></p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>A simple, end-to-end workflow using Lancet to factorize a range of integers, highlighted using the three colors used in the bullet points at the start of Section 2.</bold> This simple example factorizes a list of integers with the factor command, with no other dependencies. The five prime numbers found are an example of a prime quintuplet, the closest admissible constellation of five consecutive prime numbers.</p></caption>
<graphic xlink:href="fninf-07-00044-g0001.tif"/>
</fig>
<p>Of course, it is difficult to appreciate the advantages of using Lancet, if one simply wants to factor 16 small integers in Python. These advantages would be much more apparent if a multidimensional parameter space were to be explored with a complex neural simulator, as described below. Even so, non-Lancet Python code for launching these simple factor runs is likely to be longer, more error-prone and harder to read. Iteration over the input parameter space and output files (highlighted in red) would probably be expressed as multiple <monospace>for</monospace> loops, losing the flat structure of the example. Specification of the simulator (highlighted in green) and the code needed to execute it (highlighted in blue) would be interleaved and complex calls to the <monospace>subprocess</monospace> module would be required to execute jobs. Switching from local execution to Grid Engine would no longer be trivial.</p>
<p>This example demonstrates how Lancet can help free the researcher from such implementation details. Substantial code would also be needed to reproduce the way Lancet keeps your output files consistently organized (within timestamped folders by default) with a common directory structure, whether working locally or on a cluster. After executing the listing in Figure <xref ref-type="fig" rid="F1">1</xref>, a <monospace>.info</monospace> file will be generated together with the output, recording which Python version was used, the operating system on which the jobs were run, and the version of Lancet, alongside other useful metadata. Other information supplied by the user, such as the task description, versions of libraries and executables used, and other comments may be easily passed down to the <monospace>metadata</monospace> field of the <monospace>.info</monospace> file for storage. Lancet also offers a simple function that helps record version control information and improves reproducibility by maintaining an explicit log of all the parameters used. As shown later in Figure <xref ref-type="fig" rid="F5">5</xref>, all of this can be expressed clearly, succinctly, and declaratively, even for realistically complex sets of simulations.</p>
</sec>
<sec>
<title>3. Using lancet to rapidly specify a task</title>
<p>The example in Figure <xref ref-type="fig" rid="F1">1</xref> briefly introduced the three core class hierarchies in Lancet. In this section, each of the three types is examined in greater detail, before in the next section we consider how Lancet can assist the natural development of an agile, reproducible workflow with IPython Notebook. A list of all the components available to the user, split into the three class families, is shown in Table <xref ref-type="table" rid="T1">1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Lancet components available for specifying jobs</bold>.</p></caption>
<graphic xlink:href="fninf-07-00044-i0001.tif"/>
<table-wrap-foot>
<p><italic>All Arguments are subclasses of Args and specify static sets of parameters, except for SimpleGradientDescent which is an example of dynamic parameter optimization. ShellCommand is generic and included with Lancet whereas the Command classes marked by an asterisk are included with the Topographica simulator; other tools may offer their own custom Command classes. The Launcher class runs jobs locally, but other options are easy to implement, such as the QLauncher class for use on clusters.</italic></p>
</table-wrap-foot>
</table-wrap>
<p>First, <monospace>Arguments</monospace> declaratively specify the parameter space to be covered by a set of runs (see e.g., the <monospace>Range</monospace> object at the top of Figure <xref ref-type="fig" rid="F1">1</xref>, highlighted in red), or specify filenames and data of interest on the filesystem. The latter object type allows data on disk to be collated for analysis in Python, or for launching the next stage of a pipeline workflow.</p>
<p>Next, a <monospace>Command</monospace> class handles the interface to an external tool, allowing the rest of Lancet to remain simulator-independent. The example shown in Figure <xref ref-type="fig" rid="F1">1</xref> uses a <monospace>ShellCommand</monospace>, which is supplied with Lancet for basic support of command-line programs. For supporting complex tools and simulators, <monospace>Command</monospace> can be subclassed while reimplementing only a constructor and a call method. As a workflow develops over time, it is likely that a user will want to make a custom <monospace>Command</monospace> to allow full control over important tools being used, but the other components of Lancet will not normally need to be extended for most users.</p>
<p>Finally, a <monospace>Launcher</monospace> pulls together the <monospace>Arguments</monospace> and <monospace>Command</monospace> objects to launch the specified jobs on a particular platform. Currently, jobs can be run either locally with the <monospace>Launcher</monospace>, or with Grid Engine using the <monospace>QLauncher</monospace>. As the <monospace>Launcher</monospace> object accepts the other two core component types as arguments and is a fully declarative object (as are all Lancet components), a <monospace>Launcher</monospace> object fully specifies the intended parameter space, the command to execute, and the platform to execute it on.</p>
<sec>
<title>3.1. Succinctly specifying a parameter space with lancet</title>
<p>Figure <xref ref-type="fig" rid="F2">2</xref> demonstrates some of the fundamental properties of all <monospace>Arguments</monospace> objects. These objects express parameter spaces that will result in many sets of parameter values to be passed to an external analysis tool or simulator, e.g., as command-line arguments. These are simple, compositional objects designed to express declarations of intent, independently of the other two types of Lancet component.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Arguments express parameter spaces succinctly and declaratively. (A)</bold> Example illustrating the most basic, most explicit use of the Args class to specify three sets of sequential arguments. <bold>(B)</bold> A more succinct and less error-prone way of specifying the same arguments. <bold>(C)</bold> An example expressing a parameter space for use with a hypothetical neural simulator. This parameter space covers a range of excitation and inhibition strengths, while toggling a homeostatic mechanism. <bold>(D)</bold> The concatenation operator allows arguments specified by Arguments objects to be sequenced, allowing special cases to be incrementally appended to a parameter space.</p></caption>
<graphic xlink:href="fninf-07-00044-g0002.tif"/>
</fig>
<p>Part A of Figure <xref ref-type="fig" rid="F2">2</xref> shows the most basic and explicit example of an <monospace>Arguments</monospace> definition, using an <monospace>Args</monospace> object to specify a static set of arguments. The list of dictionaries format is a verbose and completely flexible specification. However, this style of definition is neither succinct nor declarative, and therefore is not recommended unless absolutely necessary. Nonetheless, this constructor illustrates two key points: argument values are always paired with the corresponding argument name, and Lancet <monospace>Args</monospace> objects have a similar structure to the <monospace>DataFrame</monospace> objects used by the pandas data analysis library. As <monospace>DataFrames</monospace> accept an identical data format in the constructor, Lancet <monospace>Args</monospace> objects allow easy conversion to <monospace>DataFrames</monospace> via the <monospace>dframe</monospace> property (if the pandas library is available). This easy transition to the highly flexible pandas <monospace>DataFrames</monospace> data structure is a key part of enabling the agile workflow described in the next section. These objects are easy to create and automatically display themselves as HTML tables in the IPython Notebook environment.</p>
<p>Part B of Figure <xref ref-type="fig" rid="F2">2</xref> expresses an identical parameter space using a more readable, less error-prone approach that clearly conveys the intended structure of the parameter space. In the explicit format shown in part A, the first argument <monospace>&#x00027;arg1&#x00027;</monospace> remains constant with a value of 1.0 whereas the argument <monospace>&#x00027;arg2&#x00027;</monospace> ranges over the numbers 1.0, 2.0 and 3.0. As a result, this parameter space is conveniently described as the Cartesian product of a constant argument for <monospace>&#x00027;arg1&#x00027;</monospace> and a <monospace>Range</monospace> object that defining a range of values for <monospace>&#x00027;arg2&#x00027;</monospace>.</p>
<p>The Cartesian product (also called the &#x0201C;cross product&#x0201D;) of different arguments is a natural way to specify parameter spaces, supported by Lancet <monospace>Arguments</monospace> via the multiplication operator. In imperative code, these appear as nested <monospace>for</monospace> loops where each parameter is iterated by one of the loops. The Cartesian product of <monospace>Args(arg1&#x0003D;1)</monospace> and the <monospace>Range</monospace> object is therefore a succinct way of declaring a parameter space with one argument kept constant as the second argument spans a range of values. Note that the <monospace>Args</monospace> object accepts arbitrary keyword arguments, allowing any constant values for named parameters to be easily declared.</p>
<p>Part C of Figure <xref ref-type="fig" rid="F2">2</xref> shows a generic example of what a parameter space might look like in a simple, hypothetical neural simulation. A range of excitatory and inhibitory strengths is covered and a homeostatic mechanism is toggled on and off using the <monospace>List</monospace> declaration. Although simple, this object expresses 200 different argument sets (each leading to an independent simulation), as shown by the <monospace>summary</monospace> method.</p>
<p>Finally, in part D of Figure <xref ref-type="fig" rid="F2">2</xref>, the second compositional operator for <monospace>Arguments</monospace> objects is shown. The addition operator can concatenate (or sequence) <monospace>Arguments</monospace> objects together. The result is an object that first covers the parameter space of the first <monospace>Arguments</monospace> object before spanning the parameter space of the second <monospace>Arguments</monospace> object. This is a useful way to segment a parameter space in a piece-wise manner, allowing special cases to be easily added or the behavior at singularities to be investigated.</p>
<p>Using the Cartesian product and concatenation operations on the three basic <monospace>Arguments</monospace> objects, <monospace>Args, List,</monospace> and <monospace>Range</monospace>, many common parameter spaces can be expressed in a readily understood, compositional format. <monospace>Arguments</monospace> composed out of these basic objects have the property that the parameter space explored is known ahead of time, before jobs are executed. Although this is typical for many research tasks, Lancet also allows parameter spaces to be explored in an online fashion, where results returned by the jobs determine what portion of the parameter space is to be explored at the next step. Online parameter space exploration algorithms can be implemented in Lancet by subclassing <monospace>DynamicArguments</monospace>.</p>
<p>Figure <xref ref-type="fig" rid="F3">3</xref> illustrates how Lancet can be used to dynamically explore a simple parameter space using the <monospace>SimpleGradientDescent</monospace> component. This instance of <monospace>DynamicArguments</monospace> is designed to demonstrate how a simple gradient descent algorithm operating on a single, scalar argument can operate in Lancet. In Figure <xref ref-type="fig" rid="F3">3</xref>, <monospace>ShellCommand</monospace> is used to run a short script that evaluates the function <italic>f</italic> (<italic>x</italic>) &#x0003D; (<italic>x</italic> &#x02212; 3)<sup>2</sup> on the input argument &#x000D7; when executed. <monospace>SimpleGradientDescent</monospace> then explores the local parameter space from the starting point <italic>x</italic> &#x0003D; 0 in steps of magnitude <monospace>stepsize</monospace>. Driven by the output of the script, <monospace>SimpleGradientDescent</monospace> descends the local gradient in &#x000D7; until it terminates at the local minimum, <italic>x</italic> &#x0003D; 3. In practice, well-established optimization procedures are likely to be more useful than this example class, such as those available in <monospace>scipy.optimize</monospace>, when trying to optimize parameter spaces that are not solvable analytically. Thus <monospace>SimpleGradientDescent</monospace> should be considered as one example of the types of <monospace>DynamicArguments</monospace> that can be implemented for advanced parameter space exploration procedures such as hill climbing or genetic algorithms.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Lancet allows dynamic exploration of parameter spaces using components of type DynamicArguments.</bold> In this example, the minimum of <italic>f</italic> (<italic>x</italic>) &#x0003D; (<italic>x</italic> &#x02212; 3)<sup>2</sup> is found using SimpleGradientDescent, starting from <italic>x</italic> &#x0003D; 0 and terminating at the minimum where <italic>x</italic> &#x0003D; 3.</p></caption>
<graphic xlink:href="fninf-07-00044-g0003.tif"/>
</fig>
<p>In summary, the <monospace>Arguments</monospace> objects are declarative, composable objects that can vary from simple declarations of constant argument values to complex optimization procedures. In addition to the <monospace>Arguments</monospace> objects presented so far, Lancet offers <monospace>FilePattern Arguments</monospace> for matching filenames. The filenames found may then be used as arguments for a simulator, or used to specify a list of files for loading into the Python environment. There are also other more specialized <monospace>Arguments</monospace> objects such as <monospace>Log</monospace>, which allows previously explored parameter spaces to be loaded from the <monospace>.log</monospace> files saved by Lancet when running external tools.</p>
</sec>
<sec>
<title>3.2. Specifying how lancet supports your external tools</title>
<p>There are many different simulators and analysis tools used in computational neuroscience, each constantly being developed and updated. Some popular neural simulators include Brian (Goodman and Brette, <xref ref-type="bibr" rid="B10">2008</xref>), Neuron (Hines and Carnevale, <xref ref-type="bibr" rid="B11">1997</xref>), and NEST (Gewaltig and Diesmann, <xref ref-type="bibr" rid="B9">2007</xref>), each of which uses different custom command-line interfaces. The most general approach to support such a wide range of tools is to treat them as external executables run on the command line. If a command-line specification is impractical or not supported by a particular tool, it is straightforward to write a <monospace>Command</monospace> that instead writes the specification for a run to a file to be read by the external program.</p>
<p>Even if you have the option of working exclusively with Python, such as for the Brian simulator, there can be clear advantages to writing your Python scripts as independent tools that can be invoked on the command line. Firstly, doing so ensures that independent runs are genuinely separate, sandboxing execution into separate processes to guarantee that independent jobs will not interact in unexpected ways. This requirement for process independence is explicit when running jobs on a cluster (for instance, when using Grid Engine). It is therefore useful to define a command-line interface to your Python scripts (perhaps using the <monospace>argparse</monospace> module) if you want code that can be executed both locally and in parallel on a cluster. Finally, defining a clear command-line interface can help document your code and allows useful standalone utilities to be pulled out of your code base.</p>
<p>When invoking tools with a standard command-line interface, Lancet supplies <monospace>ShellCommand</monospace> which can help avoid writing explicit interfacing code in many situations. For instance, <monospace>ShellCommand</monospace> is used to invoke the <monospace>factor</monospace> command in Figure <xref ref-type="fig" rid="F1">1</xref>. The <monospace>ShellCommand</monospace> is an instance of a <monospace>Command</monospace> that defines how Lancet can invoke an external tool via the command line. <monospace>ShellCommand</monospace> only supports communication via command-line arguments, but other <monospace>Command</monospace> classes may e.g., generate specification files appropriate to the chosen tool.</p>
<p>For interfacing with complex external software, users will often need to write a new <monospace>Command</monospace> subclass to extend Lancet&#x00027;s functionality for the new tool. Writing such a class is straightforward, as the subclass only needs to implement a constructor and a <monospace>__call__</monospace> method. The <monospace>__call__</monospace> method is supplied with arguments generated by an <monospace>Arguments</monospace> object in dictionary format (along with optional runtime information) and the <monospace>Command</monospace> must then return a list of strings suitable for Python&#x00027;s <monospace>subprocess.Popen</monospace> class. If the tool needs to load arguments from file, the <monospace>Command</monospace> may also save part of the parameter list specification to disk in an appropriate format before the command is executed. As described in the Discussion section, a special <monospace>Command</monospace> type could also be used to group small, lightweight jobs to avoid startup overhead.</p>
<p>Such interfacing code is designed to be simple, allowing the user to easily support new tools as required. These new components can then be supplied in a &#x0201C;Lancet extension&#x0201D; which may be bundled with the external software. For instance, the Topographica project (Bednar, <xref ref-type="bibr" rid="B2">2009</xref>) offers a sophisticated <monospace>Command</monospace> subclass in a file named <monospace>lancext.py</monospace>. This component can invoke the simulator with a particular model file and defines Python analysis and measurement code for execution across a specified list of simulation times. Note that in this particular use case, although the <monospace>Command</monospace> passes the model file path to the command line, all parameters are specified on the command line rather than in the model file.</p>
<p>The <monospace>lancext.py</monospace> code is sufficiently flexible to support day-to-day exploratory work using the simulator, and was used throughout the development of the results in Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>). The <monospace>Command</monospace> used is called <monospace>RunBatchCommand</monospace>, and is highlighted in green in Figure <xref ref-type="fig" rid="F5">5</xref>. The overall approach is general enough to be applicable to any simulator or tool, ranging from simple programs like <monospace>factor</monospace>, to complex neural simulators like Topographica, or even for running complex software outside the scope of computational neuroscience, such as time-consuming microprocessor simulations.</p>
</sec>
<sec>
<title>3.3. Specifying your chosen computational platform</title>
<p>The parameter space and the chosen tool are defined independently and do not interact until a platform is chosen by selecting a <monospace>Launcher</monospace> object. The purpose of a <monospace>Launcher</monospace> is to take an <monospace>Arguments</monospace> object declaring a parameter space and feed the instantiated arguments to the <monospace>Command</monospace>, which then passes the appropriate command specification back to the <monospace>Launcher</monospace>, which executes the tool on the appropriate platform. As all the components needed to launch jobs and generate data form the arguments of the <monospace>Launcher</monospace>, the printed representation (also known as the <monospace>repr</monospace>) of the <monospace>Launcher</monospace> captures a complete specification of how the output files are created.</p>
<p>As Lancet itself only uses cross-platform portions of the Python library, code that uses Lancet can work across operating systems (Linux, MacOS, Windows). One reason to subclass <monospace>Command</monospace> to support a given tool is to ensure appropriate command-line invocations are generated across different operating systems. Simple tools with a consistent format of command-line invocation can instead be safely launched with <monospace>ShellCommand</monospace>, on any operating system.</p>
<p>Lancet currently provides a basic <monospace>Launcher</monospace> class for running jobs locally, and a subclass <monospace>QLauncher</monospace> that launches jobs with Grid Engine. Although the jobs are launched in very different ways, both classes ensure that the output is organized consistently. This approach ensures that the rest of the researcher&#x00027;s code can be used as-is across all the available platforms. For instance, code that needs to locate output files can use the same approach regardless of whether the files were generated locally or on a cluster. This is an essential feature for an agile workflow: as your requirements grow, it is important to have the option to painlessly transition from readily accessible local computational resources to a high-throughput cluster that can run your jobs in parallel, and then back again for debugging.</p>
<p>Lancet&#x00027;s <monospace>QLauncher</monospace> component wraps the Grid Engine <monospace>qsub</monospace> command and has been extensively tested on an open-source variant of the original Grid Engine system (Son of Grid Engine, version 8.0.0e). <monospace>QLauncher</monospace> assumes only the basic options applicable across the various versions of Grid Engine (Sun/Oracle/Univa Grid Engine) and should be usable on any machine where a Grid Engine <monospace>qsub</monospace> command is available. More information about Grid Engine and the Son of Grid Engine project may be found at <ext-link ext-link-type="uri" xlink:href="http://arc.liv.ac.uk/SGE/">http://arc.liv.ac.uk/SGE/</ext-link>.</p>
<p>In addition to making the process of switching between platforms easy, <monospace>Launchers</monospace> help save important information alongside the output data that help ensure reproducibility and assist in later analysis. The <monospace>.info</monospace> file contains metadata which records important details requested from the version control system, the active Python and Lancet versions, operating system information and the complete representation of the source <monospace>Launcher</monospace>. The <monospace>.log</monospace> file contains an explicit list of all parameters used, allowing output to be quickly associated with the parameters used to generate it. This feature provides scientific provenance information for data analysis, which is crucial because the files output by a tool do not necessarily include the scientifically relevant parameters that were used to generate that data.</p>
</sec>
</sec>
<sec>
<title>4. A realistic, agile, and evolvable workflow</title>
<p>Having introduced the general facilities offered by Lancet, we now examine how it can enable an agile and reproducible workflow using IPython Notebook. The use of external Python packages as appropriate is encouraged, and in particular the pandas library has proven very useful for analyzing data. To keep track of the code in the various Python scripts and IPython notebooks that appear as the workflow develops, it is also encouraged to keep a log of development by means of frequent code commits. Lancet works well together with distributed version control systems like Git and Mercurial, or with management and tracking tools tailored towards scientific use, such as Sumatra (Davison, <xref ref-type="bibr" rid="B5">2012</xref>).</p>
<p>Note that our proposed workflow using Lancet does not aim to be prescriptive or impose requirements on the user. It is our view that the researcher must primarily choose the tools that allows the most productive research possible. Our goal is therefore to make Lancet general and useful, allowing each researcher to organically develop their own workflow according to their own particular needs. By incorporating more Lancet components into your workflow over time, the code can become more succinct while increasing the overall level of automation and reproducibility. A schematic of how the workflow evolves over time is shown in Figure <xref ref-type="fig" rid="F4">4</xref> and the stages of a typical research project using Lancet and IPython Notebook are now described:
<list list-type="order">
<list-item><p>An excellent way to start exploratory research is by creating a new IPython notebook. This offers an unconstrained environment where new ideas can be rapidly coded, tested and discarded as necessary. Using text and Markdown cells, notes can be interleaved with code to keep track of new ideas that relate either to scientific material or to coding. In this exploratory phase, the notebook is likely to be fairly disorganized and rapidly changing with many unrelated code snippets, outdated textual notes, HTML links, and other content (such as images) referencing external resources and documentation. Even so, even this early stage can be captured by committing the notebook to version control, preserving any progress made even though the user has not yet used any specific tool for reproducibility beyond the standard notebook.</p></list-item>
<list-item><p>Once a simulator or analysis tool has been chosen, small parameter spaces can be defined using the <monospace>Arguments</monospace> objects to be executed locally. If there is no <monospace>Command</monospace> available for the chosen tool, it is likely that <monospace>ShellCommand</monospace> will be sufficient to begin with. Otherwise, only a few lines of code are needed to subclass <monospace>Command</monospace> and satisfy the immediate requirements. At this stage, the output can be explored in an <italic>ad hoc</italic> manner, e.g., by inspecting files with a file manager or image viewer, as illustrated by the first column of Figure <xref ref-type="fig" rid="F4">4</xref>.</p></list-item>
<list-item><p>Lancet will store the <monospace>repr</monospace> (Python&#x00027;s term for an object&#x00027;s representation string) of the <monospace>Launchers</monospace> used along with the data in the <monospace>.info</monospace> files, maintaining a declarative record of how all the data was generated over time. As the project grows, it becomes crucial that version control is used to track notebook and code contents. A helper utility <monospace>vcs_metadata</monospace> is offered by Lancet that allows Git, Mercurial, or SVN version control information to be automatically stored in the <monospace>.info</monospace> files.</p></list-item>
<list-item><p>As the IPython Notebook is a very flexible environment for plotting and exploration, it quickly becomes worth writing small sections of Python code to automate away any <italic>ad hoc</italic> data inspection steps. It is also easy to load your data into the IPython notebook and rapidly generate plots with matplotlib. In particular, parameters associated with the loaded data can be brought into the notebook session by specifying a <monospace>.log</monospace> file to a <monospace>Log Arguments</monospace> object. This <monospace>Log</monospace> object may be used to re-run previously explored parameters, but also offers a convenient way to inspect and browse parameters previously logged by Lancet. By calling the <monospace>dframe</monospace> method of a <monospace>Log</monospace> object, a pandas <monospace>DataFrame</monospace> is generated that will present the logged parameters as an HTML table, offering a simple alternative to the web interface functionality offered by tools such as Sumatra. This stage is illustrated by the middle column of Figure <xref ref-type="fig" rid="F4">4</xref>.</p></list-item>
<list-item><p>Although small parameter spaces and local runs are often suitable initially when rapidly testing and debugging code, it is rare that this will prove sufficient for the whole project. As the code gets longer and more stable, it should be split out into Python modules to keep the notebook short and readable. As the code matures, parameter spaces tend to grow and simulation runs get longer and slower to obtain higher quality data sets. As the computational requirements increase, running simulations locally may become prohibitively slow, making it worth switching to a cluster if available. Lancet is designed to make such a transition painless:after switching <monospace>Launcher</monospace> for <monospace>QLauncher</monospace> and supplying a few basic settings appropriate to the cluster environment, the same code will immediately run in parallel on the cluster.</p></list-item>
<list-item><p>If a new <monospace>Command</monospace> class was implemented to support the external tool, this class may have matured to the stage where it is sufficiently general and flexible to become a reusable component, in which case it should also migrate to a separate file. By sharing this code with other Lancet users, the need to implement <monospace>Commands</monospace> will be alleviated in future as more and more tools are supported.</p></list-item>
<list-item><p>This particular stage of a research project may be quite prolonged, ending only when a particularly worthwhile avenue of research has been found. As the emphasis moves from exploration to publication, a particular subset of the code written is likely to become relevant. This code can be cleaned up and factored out into a Python module to keep the notebook manageable and to express the intentions of the developing paper clearly. Key plot types that are likely to become part of published figures may also be moved into a separate module.</p></list-item>
<list-item><p>In the final stages of developing a paper for submission, it can become cumbersome to generate complex, publication-quality figures using matplotlib alone. For this reason, to generate the final Figures in Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>), a different approach was used&#x02014;a small utility was written that allows SVG templates to be quickly authored in the Inkscape graphics editor. This utility then can then embed vector assets dynamically generated by Matplotlib to create the final, publication quality figure. At this stage, the notebook should embody a completely automated and reproducible workflow for published work, as illustrated by the final column of Figure <xref ref-type="fig" rid="F4">4</xref> and demonstrated for Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>).</p></list-item>
</list></p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Lancet captures a full declarative specification of the parameters, tools and platform employed, each time data is generated at every stage of the workflow.</bold> Early in the project, the output files may be rapidly explored in an <italic>ad hoc</italic> way that does not need to be automated or reproducible, as illustrated in the first column. As the research project matures, more of the analysis and plotting procedure may be pulled back into IPython Notebook where it can be automated (middle column). Finally, as the research nears publication, SVG templates may be used to ease the automatic generation of publication figures, as shown in the last column.</p></caption>
<graphic xlink:href="fninf-07-00044-g0004.tif"/>
</fig>
<p>The key characteristic of this proposed workflow is that although the final outcome is an IPython notebook that captures and automates all the steps needed to generate a published result, there is no stage where the researcher needs any motive other than a desire to increase productivity. Writing a new <monospace>Command</monospace> to interface with a new external tool (if such a class is not already available) may at first appear more trouble than writing a simple, <italic>ad hoc</italic> script such as a shell script, a Python script using <monospace>subprocess</monospace>, or a script in some other language such as Perl. But the key difference is that the initial <monospace>Command</monospace> is normally trivial, using a few lines of code to return a fixed list of strings to the command line.</p>
<p>Unlike <italic>ad hoc</italic> scripts that can rapidly become unmanageable, a new <monospace>Command</monospace> class remains maintainable as it becomes more general and useful, remaining viable across multiple research projects. Implementing such an object allows the same, clean declarative representation to be seamlessly used with either local simulations or when working on a cluster. A workflow that relies on scripting solutions to individual problems as they appear is likely to become unreadable over time, and is unlikely to be reused between projects. To illustrate, the <monospace>RunBatchCommand</monospace> and associated classes implemented for the Topographica simulator now offer significantly more functionality for batch simulations than was initially available with the simulator. Although the latest Topographica Lancet extension is still under 500 lines of code and documentation, it has helped make regular research work with this simulator much easier than before.</p>
<p>So far, the declarative, reproducible nature of Lancet objects has only been demonstrated with very simple examples. Figure <xref ref-type="fig" rid="F5">5</xref> shows the full specification for a batch of Topographica simulations used in Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>) in the form of a launcher <monospace>repr</monospace>. This newly created object can be run to regenerate the same data, without needing the notebook that originally launched it. The printed representation of the <monospace>Launcher</monospace> object shown in Figure <xref ref-type="fig" rid="F5">5</xref> contains a real example of how the <monospace>RunBatchCommand</monospace> component is used in practice.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>A real example of recreating a launcher from the complete, declarative specification saved to the .info file.</bold> The repr (the string representation) of the launcher is shown above, matching the corresponding string saved in the .info file. This example fully specifies 21 Topographica simulations used to generate Figures 5 and 6 from Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>). Using a version control system also allows the state of the executed code (simulator, analysis, measurement code etc) to be restored based on the information stored in the .info file.</p></caption>
<graphic xlink:href="fninf-07-00044-g0005.tif"/>
</fig>
<p>In this example, the <monospace>.info</monospace> file in one of the output directories is loaded using the <monospace>json</monospace> library and the contents of the <monospace>Launcher</monospace> key is evaluated. As the <monospace>repr</monospace> of a <monospace>Launcher</monospace> is always saved to the <monospace>.info</monospace> file and this <monospace>repr</monospace> is a complete, declarative object that is a valid Python expression, running <monospace>eval(info[&#x00027;launcher&#x00027;])</monospace> creates a new Launcher with identical behavior to the original. This object is easily inspected and captures the full set of parameters, including the path to the simulator executable, the executed Topographica model file, and a list of analysis functions to be executed repeatedly over the course of each simulation run.</p>
<p>Calling this object without supplying any arguments in a cluster environment would relaunch the 21 Topographica simulations necessary to regenerate Figure 5 from Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>). This code will reproduce identical results, as long as the Topographica simulator is working correctly. If the results change due to differences in the simulator code, the recorded version control information allows all the code to be restored to the same state as when the data was originally generated. Note that the code listing in Figure <xref ref-type="fig" rid="F5">5</xref> is only one of the launchers needed to reproduce all the Figures in Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>). In total, 842 simulation jobs were specified with Lancet to generate all the figures of the paper. Each job (simulation and analysis) takes over an hour to complete, so the full set of jobs takes several days to complete when running on a cluster, but the entire specification is still compact and human-readable.</p>
</sec>
<sec sec-type="discussion" id="s2">
<title>5. Discussion</title>
<p>This paper has demonstrated a lightweight, flexible, and pragmatic approach to achieving scientific reproducibility without constraining innovation. There are many other approaches also available, ranging from just writing a complete Python script to automate all your tasks, to using a heavyweight workflow-automation system. These more ambitious workflow engines are in regular use by large commercial organizations and research groups in some fields (Freire et al., <xref ref-type="bibr" rid="B7">2011</xref>), but are not currently common in computational neuroscience. Such workflow engines are typically designed to manage complex workflows with long pipelines, involving many different people. In contrast, the workflow presented here is designed to be minimalistic, suitable for small groups of researchers who wish to keep their research work flexible and do not want to embrace more complex and prescriptive workflow tools.</p>
<p>Our aim is to show that for a general class of exploratory research in Python, using IPython Notebook and Lancet together allows for an agile workflow that very naturally gradually becomes more reproducible and automated over time. The final result of this process is a set of IPython notebooks that fully reproduce published scientific results, without constraining the user at any stage of the process. Lancet deliberately does not prescribe any fixed way of doing research, and every component offered to the user should be evaluated on the basis of how well it improves immediate research efficiency.</p>
<p>As a historical note, each of the components of Lancet was originally developed to satisfy the needs of a real research project spanning multiple years, not simply to try to achieve reproducibility after the fact. In this project, many hundreds of simulations were executed locally using Lancet, and tens of thousands of jobs were launched on a cluster. But unlike the custom, <italic>ad hoc</italic> scripts that would normally be the result of such a project, Lancet was designed from the start to work just as well for completely different scientific domains, to ensure that the concepts and tools would be general and meaningful long into the future.</p>
<p>As a general tool, Lancet does not become any less relevant to research in computational neuroscience. To the contrary, having a general approach ensures that the essence of a workflow is valid over time as the underlying simulator tools come and go. The flexible and compositional nature of Lancet objects is suited to fast, exploratory research of interest to the computational neuroscience community using Python. Even though Lancet is newly available, it has already formed the basis for a complete scientific publication, made publicly available as an IPython notebook that automatically reproduces all the scientific results of the paper. This notebook allows all the code and results to be presented in a clear, automated way, and may be viewed and downloaded from the <monospace>models/stevens.jn13</monospace> subdirectory of Topographica&#x00027;s GitHub repository.</p>
<p>For a tool that aims to be general, it is unsurprising that some functionality overlaps with other projects, given the many excellent third party libraries available for Python. For instance, there are several projects that offer sophisticated interfaces with Grid Engine, such as <monospace>pythongrid</monospace> and <monospace>drmaa-python</monospace>. IPython itself includes the <monospace>IPython.parallel</monospace> package which can help accelerate the pace of interactive work on a cluster. Some of the goals of Lancet&#x00027;s <monospace>Arguments</monospace> objects are shared by the <monospace>parameters</monospace> module of the NeuroTools package, which also allows parameter spaces to be defined. What distinguishes Lancet from these other libraries is that it offers all the tools needed to span an entire agile workflow with a collection of independent, declarative objects that work together.</p>
<p>Various workflow tools already exist with the computational neuroscientist in mind. VisTrails (Freire et al., <xref ref-type="bibr" rid="B8">2014</xref>) is a scientific workflow and provenance system that integrates well with Python projects, taking a GUI-centric approach. The Mozaik framework (Antol&#x000ED;k and Davison, <xref ref-type="bibr" rid="B1">2013</xref>) is designed to encapsulate the workflows relevant to researchers who use spiking neural models. In contrast to these projects, Lancet is lightweight, with almost no dependencies, and is not tied to any particular set of simulator tools or workflows. Researchers exclusively using the appropriate spiking simulators may find Mozaik to be more specialized for their needs than Lancet, while Lancet is suitable for those who desire a more interactive workflow or need to use a broader class of tools or tools that are expected to change over time.</p>
<p>Projects like Sumatra (Davison <xref ref-type="bibr" rid="B5">2012</xref>) take a far more general approach for achieving reproducibility, tailoring functionality offered by version controls to the needs of the scientist. In this way, Sumatra offers functionality that is orthogonal to Lancet, allowing both tools to be used successfully together. Lancet&#x00027;s approach aims for the middle of the spectrum between Sumatra and Mozaik, capturing declarative specifications within Python code that assists with automation and reproducibility without losing generality. Lancet is BSD-licensed and supports Python 3, and helps the researcher exploit well-established tools such as IPython Notebook and pandas in a way that makes day-to-day research easier and ultimately makes results more reproducible.</p>
<p>Lancet is also extremely extensible. The interface between Lancet objects has been deliberately kept simple, to allow new components to be added whenever required. The <monospace>Command</monospace> class allows Lancet to work with new external tools, invoking the tool appropriately for each set of arguments specified. In some situations, individual jobs may run quickly relative to the time for setup and initialization, making it inefficient for Lancet to span the parameter space directly. In such cases, Lancet can instruct the tool to cover the parameter range itself, with Lancet only specifying starting and stopping points (e.g., <monospace>Args(start &#x0003D; 0, end &#x0003D; 5)</monospace>). If necessary, the <monospace>Command</monospace> object could then use these values to build a range specification in a format the tool can use.</p>
<p>The process of executing jobs may also be customized to satisfy specific needs. For instance, there are currently two types of <monospace>Launcher</monospace>, one for running jobs locally and one for running jobs on Grid Engine. Other types of <monospace>Launcher</monospace> may be written to extend Lancet to new platforms. For instance, it should be very straightforward to write a <monospace>Launcher</monospace> that launches jobs over SSH, or one that allocates computational resources on demand with Amazon EC2. This new <monospace>Launcher</monospace> would then fit seamlessly into the other components offered by Lancet.</p>
<p>The <monospace>Arguments</monospace> objects are also designed to be extensible. Although the basic objects offered are already suitable for many research requirements, new <monospace>Arguments</monospace> objects can be written if desired. By building a new <monospace>DynamicArguments</monospace> component, Lancet can be used for more complex, online parameter space exploration, utilizing optimization techniques such as hill climbing or genetic algorithms. Currently, <monospace>SimpleGradientDescent</monospace> is the only such object supplied with Lancet, designed to demonstrate how more practical algorithms may be quickly implemented. It is hoped that the ability to employ optimization algorithms as necessary will extend the utility of Lancet and that by making use of mature, third party libraries, users will easily be able to rapidly implement the optimization procedures necessary to solve their problems.</p>
<p>Of course, it is important to remember that Lancet is just one small part of a toolset for achieving reproducibility. More-basic tools like Python, pandas, and matplotlib are crucial for making it practical to automate scientific tasks, which is a prerequisite for being able to capture the process for later playback. Distributed version control systems like Git and Mercurial make it easy to capture the state of anything that can be expressed in text. IPython Notebook and matplotlib make it feasible to explore and analyze results in a text-based way that can be captured by the VCS. Lancet simply helps tie these together with launching runs and collating the results, to fill in the missing pieces that allow the entire process to become reproducible in practice. In that way, it addresses the fundamental barrier to reproducibility, which is the large and extra investment of time and effort that would be needed to automate and preserve tasks once the research has been published.</p>
<p>Essentially, what Lancet offers are the missing utilities that make it easy to capture all the required steps within a single IPython notebook, from initial exploration to published results. Using Lancet you can quickly specify and launch jobs, keep output files consistently organized, switch from local execution to working on a cluster, record metadata and other key information together with your data, and load simulation output back into the notebook for analysis and plotting. By keeping everything under version control, the entire scientific process can then be captured, providing a flexible and agile yet reproducible research workflow.</p>
<p>The IPython notebooks that fully and automatically reproduce Stevens et al. (<xref ref-type="bibr" rid="B14">2013</xref>) are publicly available from the GitHub repository of the Topographica project (<ext-link ext-link-type="uri" xlink:href="http://www.topographica.org">www.topographica.org</ext-link>) in the <monospace>models/stevens.jn13</monospace> directory (<ext-link ext-link-type="uri" xlink:href="http://github.com/ioam/topographica/tree/master/models/stevens.jn13">https://github.com/ioam/topographica/tree/master/models/stevens.jn13</ext-link>). Lancet itself is freely available under a BSD license and may be downloaded from <ext-link ext-link-type="uri" xlink:href="http://ioam.github.io/lancet/">http://ioam.github.io/lancet/</ext-link>. Other examples of using Lancet are available at these Web sites.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack>
<p>The work has made use of resources provided by the Edinburgh Compute and Data Facility (ECDF; <ext-link ext-link-type="uri" xlink:href="http://www.ecdf.ed.ac.uk">www.ecdf.ed.ac.uk</ext-link>). Thanks to Philipp R&#x000FC;diger for his helpful comments and suggestions.</p>
</ack>
<sec>
<title>Funding</title>
<p>This work was supported in part by grants EP/F500385/1 and BB/F529254/1 to the University of Edinburgh Doctoral Training Centre in Neuroinformatics and Computational Neuroscience (<ext-link ext-link-type="uri" xlink:href="http://www.anc.ed.ac.uk/dtc">www.anc.ed.ac.uk/dtc</ext-link>) from the UK EPSRC, BBSRC, and MRC research councils.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antol&#x000ED;k</surname> <given-names>J.</given-names></name> <name><surname>Davison</surname> <given-names>A. P.</given-names></name></person-group> (<year>2013</year>). <article-title>Integrated workflows for spiking neuronal network simulations</article-title>. <source>Front. Neuroinform</source>. <volume>7</volume>:<issue>34</issue>. <pub-id pub-id-type="doi">10.3389/fninf.2013.00034</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bednar</surname> <given-names>J. A.</given-names></name></person-group> (<year>2009</year>). <article-title>Topographica: building and analyzing map-level simulations from Python, C/C&#x0002B;&#x0002B;, MATLAB, NEST, or NEURON components</article-title>. <source>Front. Neuroinform</source>. <volume>3</volume>:<fpage>8</fpage>. <pub-id pub-id-type="doi">10.3389/neuro.11.008.2009</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Crook</surname> <given-names>S. M.</given-names></name> <name><surname>Davison</surname> <given-names>A. P.</given-names></name> <name><surname>Plesser</surname> <given-names>H. E.</given-names></name></person-group> (<year>2013</year>). <article-title>Learning from the past: approaches for reproducibility in computational neuroscience</article-title>, in <source>20 Years of Computational Neuroscience</source>, ed <person-group person-group-type="editor"><name><surname>Bower</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>73</fpage>&#x02013;<lpage>102</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Curcin</surname> <given-names>V.</given-names></name> <name><surname>Ghanem</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). <article-title>Scientific workflow systems - can one size fit all?</article-title> in <source>Cairo International Biomedical Engineering Conference (CIBEC)</source> (<publisher-loc>Cairo</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davison</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>Automated capture of experiment context for easier reproducibility in computational research</article-title>. <source>Comput. Sci. Eng</source>. <volume>14</volume>, <fpage>48</fpage>&#x02013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1109/MCSE.2012.41</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Drummond</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Replicability is not reproducibility: nor is it good science</article-title>, in <source>Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th International Conference on Machine Learning</source> (<publisher-loc>Montreal</publisher-loc>: <publisher-name>SITE, University of Ottawa</publisher-name>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.site.uottawa.ca/ICML09WS">http://www.site.uottawa.ca/ICML09WS</ext-link>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freire</surname> <given-names>J.</given-names></name> <name><surname>Bonnet</surname> <given-names>P.</given-names></name> <name><surname>Shasha</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <article-title>Exploring the coming repositories of reproducible experiments: challenges and opportunities</article-title>. <source>Proc. VLDB Endow</source>. <volume>4</volume>, <fpage>1494</fpage>&#x02013;<lpage>1497</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.vldb.org/pvldb/vol4/p1494-freire.pdf">http://www.vldb.org/pvldb/vol4/p1494-freire.pdf</ext-link></citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Freire</surname> <given-names>J.</given-names></name> <name><surname>Koop</surname> <given-names>D.</given-names></name> <name><surname>Chirigati</surname> <given-names>F.</given-names></name> <name><surname>Silva</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Reproducibility using VisTrails</article-title>, in <source>Implementing Reproducible Computational Research</source>, eds <person-group person-group-type="editor"><name><surname>Stodden</surname> <given-names>V.</given-names></name> <name><surname>Leisch</surname> <given-names>F.</given-names></name> <name><surname>Peng</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>Chapman &#x00026; Hall/CRC</publisher-name>), (in press). Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.crcpress.com/product/isbn/9781466561595">http://www.crcpress.com/product/isbn/9781466561595</ext-link></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gewaltig</surname> <given-names>M.-O.</given-names></name> <name><surname>Diesmann</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>NEST (NEural Simulation Tool)</article-title>. <source>Scholarpedia</source> <volume>2</volume>:<fpage>1430</fpage>. <pub-id pub-id-type="doi">10.4249/scholarpedia.1430</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodman</surname> <given-names>D. F. M.</given-names></name> <name><surname>Brette</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Brian: a simulator for spiking neural networks in Python</article-title>. <source>Front. Neuroinform</source>. <volume>2</volume>:<fpage>5</fpage>. <pub-id pub-id-type="doi">10.3389/neuro.11.005.2008</pub-id><pub-id pub-id-type="pmid">19115011</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hines</surname> <given-names>M. L.</given-names></name> <name><surname>Carnevale</surname> <given-names>N. T.</given-names></name></person-group> (<year>1997</year>). <article-title>The NEURON simulation environment</article-title>. <source>Neural Comput</source>. <volume>9</volume>, <fpage>1179</fpage>&#x02013;<lpage>1209</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.6.1179</pub-id><pub-id pub-id-type="pmid">9248061</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nordlie</surname> <given-names>E.</given-names></name> <name><surname>Gewaltig</surname> <given-names>M.-O.</given-names></name> <name><surname>Plesser</surname> <given-names>H. E.</given-names></name></person-group> (<year>2009</year>). <article-title>Towards reproducible descriptions of neuronal network models</article-title>. <source>PLoS Comput. Biol</source>. <volume>5</volume>:<fpage>e1000456</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000456</pub-id><pub-id pub-id-type="pmid">19662159</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>P&#x00026;#x00027;erez</surname> <given-names>F.</given-names></name> <name><surname>Granger</surname> <given-names>B. E.</given-names></name></person-group> (<year>2007</year>). <article-title>IPython: a system for interactive scientific computing</article-title>. <source>Comput. Sci. Eng</source>. <volume>9</volume>, <fpage>21</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1109/MCSE.2007.53</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevens</surname> <given-names>J.-L. R.</given-names></name> <name><surname>Law</surname> <given-names>J. S.</given-names></name> <name><surname>Antol&#x000ED;k</surname> <given-names>J.</given-names></name> <name><surname>Bednar</surname> <given-names>J. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Mechanisms for stable, robust, and adaptive development of orientation maps in the primary visual cortex</article-title>. <source>J. Neurosci</source>. <volume>33</volume>, <fpage>15747</fpage>&#x02013;<lpage>15766</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.1037-13.2013</pub-id><pub-id pub-id-type="pmid">24089483</pub-id></citation>
</ref>
</ref-list>
</back>
</article>
