<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2013.00160</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Reward-based learning under hardware constraints&#x02014;using a RISC processor embedded in a neuromorphic substrate</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Friedmann</surname> <given-names>Simon</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Fr&#x000E9;maux</surname> <given-names>Nicolas</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Schemmel</surname> <given-names>Johannes</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Gerstner</surname> <given-names>Wulfram</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Meier</surname> <given-names>Karlheinz</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Kirchhoff Institute for Physics, Ruprecht-Karls-University Heidelberg</institution> <country>Heidelberg, Germany</country></aff>
<aff id="aff2"><sup>2</sup><institution>School of Computer and Communication Sciences and Brain-Mind Institute, Ecole Polytechnique F&#x000E9;d&#x000E9;rale de Lausanne</institution> <country>Lausanne, Switzerland</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Elisabetta Chicca, University of Bielefeld, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jennifer Hasler, Georgia Insitute of Technology, USA; Piotr Dudek, University of Manchester, UK; Stefan Mihalas, Allen Institute for Brain Science, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Simon Friedmann, Kirchhoff Institute for Physics, Ruprecht-Karls-University Heidelberg, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany e-mail: <email>simon.friedmann&#x00040;kip.uni-heidelberg.de</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>09</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<volume>7</volume>
<elocation-id>160</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>03</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>08</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2013 Friedmann, Fr&#x000E9;maux, Schemmel, Gerstner and Meier.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spike train learning task. A single layer of neurons is trained to fire at specific points in time with only the reward as feedback. This model is simulated to measure its performance, i.e., the increase in received reward after learning. Using this performance as baseline, we then simulate the model with various constraints imposed by the proposed implementation and compare the performance. The simulated constraints include discretized synaptic weights, a restricted interface between analog synapses and embedded processor, and mismatch of analog circuits. We find that probabilistic updates can increase the performance of low-resolution weights, a simple interface between analog synapses and processor is sufficient for learning, and performance is insensitive to mismatch. Further, we consider communication latency between wafer and the conventional control computer system that is simulating the environment. This latency increases the delay, with which the reward is sent to the embedded processor. Because of the time continuous operation of the analog synapses, delay can cause a deviation of the updates as compared to the not delayed situation. We find that for highly accelerated systems latency has to be kept to a minimum. This study demonstrates the suitability of the proposed implementation to emulate the selected reward modulated STDP learning rule. It is therefore an ideal candidate for implementation in an upgraded version of the wafer-scale system developed within the BrainScaleS project.</p></abstract>
<kwd-group>
<kwd>neuromorphic hardware</kwd>
<kwd>wafer-scale integration</kwd>
<kwd>large-scale spiking neural networks</kwd>
<kwd>spike-timing dependent plasticity</kwd>
<kwd>reinforcement learning</kwd>
<kwd>hardware constraints analysis</kwd>
</kwd-group>
<counts>
<fig-count count="12"/>
<table-count count="3"/>
<equation-count count="38"/>
<ref-count count="44"/>
<page-count count="17"/>
<word-count count="13655"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>1. Introduction</title>
<p>In reinforcement learning, an agent learns to achieve a goal through interaction with an environment (Sutton and Barto, <xref ref-type="bibr" rid="B39">1998</xref>). The environment provides a single scalar number, the reward, as feedback to the actions performed by the learning agent. The agent tries to maximize the reward it receives over time by changing its behavior. In contrast to supervised learning, where an instructor supplies the correct actions to take, here the agent has to learn the correct strategy itself through trial-and-error. Typically this is done by introducing randomness in the selection of actions and taking into account the resulting reward. Recently, several studies have suggested extending classical spike-timing dependent plasticity (STDP, Caporale and Dan, <xref ref-type="bibr" rid="B3">2008</xref>; Morrison et al., <xref ref-type="bibr" rid="B22">2008</xref>) into reward-modulated STDP to implement reinforcement learning in the context of spiking neural networks (Farries and Fairhall, <xref ref-type="bibr" rid="B8">2007</xref>; Florian, <xref ref-type="bibr" rid="B10">2007</xref>; Izhikevich, <xref ref-type="bibr" rid="B17">2007</xref>; Legenstein et al., <xref ref-type="bibr" rid="B18">2008</xref>; Fr&#x000E9;maux et al., <xref ref-type="bibr" rid="B11">2010</xref>; Potjans et al., <xref ref-type="bibr" rid="B26">2011</xref>). One of the key issues in reinforcement learning is solving the so-called temporal credit assignment problem: reward arrives some time after the action that caused it. So how does the agent know how to change its behavior? It needs to retain some information about recent actions in order to assign proper credit for the rewards it receives. To do this, reward modulated STDP generates an eligibility trace for every synapse that depends on pre- and postsynaptic firing. This trace, modulated by the reward, determines the change of synaptic weight, thereby solving the credit assignment problem.</p>
<p>Spike-based implementations do not only offer an approach to biological models of learning, they are also suitable for implementation in neuromorphic hardware devices. Existing systems offer a number of interesting characteristics, such as low-power consumption (e.g., Wijekoon and Dudek, <xref ref-type="bibr" rid="B43">2008</xref>, Livi and Indiveri, <xref ref-type="bibr" rid="B19">2009</xref>, Seo et al., <xref ref-type="bibr" rid="B35">2011</xref>), faster than real-time dynamics (Wijekoon and Dudek, <xref ref-type="bibr" rid="B43">2008</xref>; Schemmel et al., <xref ref-type="bibr" rid="B30">2010</xref>), and scalability to large networks (Schemmel et al., <xref ref-type="bibr" rid="B30">2010</xref>; Furber et al., <xref ref-type="bibr" rid="B13">2012</xref>). They are typically built with two goals in mind: as new kind of brain inspired information processing device and to provide a scalable platform for the experimental exploration of networks. Several studies so far have focused on the implementation of variants of unsupervised STDP in neuromorphic hardware (Indiveri et al., <xref ref-type="bibr" rid="B16">2006</xref>; Schemmel et al., <xref ref-type="bibr" rid="B33">2006</xref>; Ramakrishnan et al., <xref ref-type="bibr" rid="B28">2011</xref>; Seo et al., <xref ref-type="bibr" rid="B35">2011</xref>; Davies et al., <xref ref-type="bibr" rid="B4">2012</xref>). The synapse circuit presented by Wijekoon and Dudek (<xref ref-type="bibr" rid="B44">2011</xref>) implements the model proposed by Izhikevich (<xref ref-type="bibr" rid="B17">2007</xref>) with the goal of enabling reward modulated STDP.</p>
<p>In this study we analyze the implementability of a reward modulated STDP model derived from Fr&#x000E9;maux et al. (<xref ref-type="bibr" rid="B11">2010</xref>) as one example of a flexible hardware learning system. To that end, we propose an extended version of the BrainScaleS wafer-scale system (Fieres et al., <xref ref-type="bibr" rid="B9">2008</xref>; Schemmel et al., <xref ref-type="bibr" rid="B32">2008</xref>, <xref ref-type="bibr" rid="B30">2010</xref>) to serve as a conceptual basis for this analysis. This system is designed as a faster than real-time and flexible emulation platform for large neural networks. The use of specialized analog circuits promises a higher power-efficiency than conventional digital simulations on supercomputers (Mead, <xref ref-type="bibr" rid="B20">1990</xref>). The acceleration in time compared to biology also makes the system interesting for reinforcement learning, which typically suffers from slow convergence (Sutton and Barto, <xref ref-type="bibr" rid="B39">1998</xref>). Starting from an existing system with limited modifications leads to a more realistic design prototype compared to starting from scratch.</p>
<p>A key objective for the proposed neuromorphic system is to be a valuable tool for neuroscience. Therefore, the design must not be targeted at a single network architecture, task or learning rule, but instead stay as flexible as is reasonably possible. On the other hand, implementing large-scale neural networks with accelerated time-scale raises technical challenges and trade-offs have to be made between flexibility and performance. The proposed extension represents a plasticity mechanism reflecting this design philosophy: specialized analog circuits in every synapse are combined with a general purpose embedded plasticity-processor (EPP). This way, the benefits from the worlds of analog and processor-based computing can be combined: analog circuits are used for compact, power-efficient and fast local processing, and digital processors allow for programmable plasticity rules. Integrating the processors into the same application specific integrated circuits (ASIC) on the wafer as the neuromorphic substrate allows for scalability to wafer size networks and beyond.</p>
<p>In the following, we will consider only the aforementioned rule studied in Fr&#x000E9;maux et al. (<xref ref-type="bibr" rid="B11">2010</xref>) and analyze effects caused by the adaptation to the hardware system in simulations. We want to answer the question whether the hybrid approach combining processor and analog circuits is a suitable platform for this particular learning rule. Among the hardware-induced constraints are non-continuous weights, drift of analog circuits and communication latency between hardware substrate and the controlling computer system. We want to test and compare the performance of the unconstrained and the constrained plasticity rules in order to find guidelines for the hardware implementation, for example the required weight resolution or maximum noise levels. Section 2 describes the extended hardware system and the plasticity model. Section 3 presents results from simulations showing performance under hardware constraints. Section 4 provides a discussion of our results.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and methods</title>
<sec>
<title>2.1. Using an embedded processor for plasticity</title>
<p>The key concept of our hardware implementation of synaptic plasticity is to use a programmable general-purpose processor in combination with fixed-function analog hardware. Software running on the processor can use observables and controls to interface with the neuromorphic substrate. Thereby, it is possible to flexibly switch between synaptic learning rules or use different ones in parallel for different synapses. The alternative to this concept would be to use fixed-function hardware instead of a general-purpose processor. This would allow a more efficient implementation of one specific rule, at the cost of system versatility. In the following, we give background information on a complete neuromorphic system following the concept of processor-enabled plasticity. From the system described, we derive hardware constraints that are used in the simulations reported in section 3.</p>
<sec>
<title>2.1.1. System overview</title>
<p>Figure <xref ref-type="fig" rid="F1">1</xref> gives a schematic overview of the complete hardware system. The experimenter controls the system through a control cluster of off-the-shelf computers. The network is provided in a description abstracted from the details of the system using the PyNN modeling language (Davison et al., <xref ref-type="bibr" rid="B5">2008</xref>). An automated mapping process translates the description into the detailed configuration that is written to the wafer-modules (Wendt et al., <xref ref-type="bibr" rid="B42">2008</xref>; Ehrlich et al., <xref ref-type="bibr" rid="B7">2010</xref>). These modules are interconnected by a high-speed network to communicate spike-events (Scholze et al., <xref ref-type="bibr" rid="B34">2011</xref>). External stimulation can be applied to the network from the control cluster, using the high-speed links that are also used for configuration. The wafer itself is subdivided into building blocks that contain the neuromorphic substrate, i.e., synapses, neurons, parameter storage and networking resources for spike transmission.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Overview of the system.</bold> The user controls the system through a cluster of conventional computers by sending configuration and spike data to a number of modules that each carry a wafer. These wafer modules are interconnected with a high-speed network to exchange spike events. The wafer contains identical building blocks, of which one is shown in an expanded view. The proposed extension to the BrainScaleS wafer-scale system in form of the embedded plasticity processor is marked in red. Input/output access from the processor to other components of the building block is indicated with triangles.</p></caption>
<graphic xlink:href="fnins-07-00160-g0001.tif"/>
</fig>
<p>Our proposed extension adds an EPP to every building block on the wafer, together with its own memory for instructions and data. It will be equipped with three interfaces to the fixed-function hardware: read and write access on the synapses, rate counters and event generation for the network and access to the control bus of the building block. The latter is also used by external control accesses and thus, a plasticity program running on the embedded processor will be able to do everything that could be done from an off-wafer control computer as long as it only requires information local to the block. There is no direct communication channel between processors envisioned, but software on the control computer could be used for data exchange.</p>
</sec>
<sec>
<title>2.1.2. Implementing plasticity</title>
<p>Our proposed design represents a hybrid system, in which the digital EPP interacts closely with analog components. Every synapse contains an analog accumulation circuit, similar to the version used in an earlier design (Schemmel et al., <xref ref-type="bibr" rid="B31">2007</xref>). For each pre-post and post-pre spike-pair, the time difference &#x00394;<italic>t</italic> is measured and weighted exponentially using the amplitude <italic>A</italic><sub>&#x000B1;</sub> and time constant &#x003C4;<sub>&#x000B1;</sub>:
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x0007C;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>These values are added to two local capacitors <italic>a</italic><sub>&#x0002B;</sub> and <italic>a</italic><sub>&#x02212;</sub>, respectively. In the extended version the EPP will select synapses for readout and use an analog evaluation unit to produce a series of bits <italic>b</italic><sub><italic>i</italic></sub> out of <italic>a</italic><sub>&#x0002B;</sub> and <italic>a</italic><sub>&#x02212;</sub>. The evaluation function can perform different readout operations controlled by configuration bits <italic>e</italic><sup><italic>i</italic></sup><sub><italic>cc</italic></sub>, <italic>e</italic><sup><italic>i</italic></sup><sub><italic>ca</italic></sub>, <italic>e</italic><sup><italic>i</italic></sup><sub><italic>ac</italic></sub> and <italic>e</italic><sup><italic>i</italic></sup><sub><italic>aa</italic></sub> and analog parameters <italic>a</italic><sub><italic>tl</italic></sub> and <italic>a</italic><sub><italic>th</italic></sub>:
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>1</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:msub><mml:mi>a</mml:mi><mml:mo>+</mml:mo></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:msub><mml:mi>a</mml:mi><mml:mo>&#x02212;</mml:mo></mml:msub></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>&#x0003E;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:msub><mml:mi>a</mml:mi><mml:mo>+</mml:mo></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>a</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:msub><mml:mi>a</mml:mi><mml:mo>&#x02212;</mml:mo></mml:msub></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>a</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Using <italic>b</italic><sub>0</sub> &#x02026; <italic>b</italic><sub><italic>N</italic> &#x02212; 1</sub>, the current weight of the synapse <italic>w</italic> and possibly further global parameters <italic>P</italic><sub>0</sub> &#x02026; <italic>P</italic><sub><italic>M</italic> &#x02212; 1</sub> as input, the weight update &#x00394; is then calculated in software by the EPP:
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>P</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>Then, the new weight <italic>w</italic>&#x02032; &#x0003D; <italic>w</italic> &#x0002B; &#x00394; is written to weight storage by the plasticity program. Using two evaluations <italic>b</italic><sub>0</sub>, <italic>b</italic><sub>1</sub> with different sets of configuration bits, a simple example for <italic>F</italic> would be:
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>b</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math></disp-formula></p>
<p>With arbitrary constants <inline-formula><mml:math id="M26"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M27"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
<p>Synapses in the system are organized in an array of synapse-units, where each synapse has a 4 bit weight memory implemented with static random-access memory (SRAM) cells. These offer the ability to combine adjacent units to increase resolution to 8 bit. Of course this has the negative effect of reducing the total amount of implementable synapses.</p>
</sec>
<sec>
<title>2.1.3. Embedded micro-processor</title>
<p>Plasticity algorithms will be implemented by software programs executed on the EPP. A large class of micro-processors is in use today for various different applications from supercomputers, to smartphones and embedded controllers for traffic lights. They all use different computer architectures reflecting the specific requirements and constraints of their application.</p>
<p>There are three important characteristics for a processor: one, the used instruction set architecture (ISA) that defines coding and semantics of instructions and registers. Two, whether instructions are executed out-of-order and three, whether the design is super-scalar, i.e., instructions can execute in parallel. The instruction set architecture used here is a subset of the PowerISA 2.06 specification for 32 bit (PowerISA, <xref ref-type="bibr" rid="B27">2010</xref>). The main reason to use an existing ISA is the availability of compilers and tools. Code for the EPP can be generated using the GNU Compiler Collection (Stallman, <xref ref-type="bibr" rid="B38">2012</xref>), using the C programming language.</p>
<p>The micro-architecture of the EPP is shown in Figure <xref ref-type="fig" rid="F2">2</xref>. The frontend fetches and issues instruction in program order to the functional units. Due to different latencies, instructions can retire out of program order to the write back stage. For example a slow memory access may be overtaken by a quick add instruction issued after it. Program and data are stored in a 12 kiB memory. A direct-mapped cache (<italic>ICache</italic>) is used for instruction access and to avoid the von-Neumann bottleneck (Backus, <xref ref-type="bibr" rid="B1">1978</xref>). Branches can be predicted with a fully associative branch predictor using 2 bit saturating counters to track branch outcome (Strategy 7 in Smith, <xref ref-type="bibr" rid="B36">1998</xref>. The functional units include load/store for memory access, a branch facility for control transfers, fixed-point arithmetic and logical instructions including a barrel shifter, multiply and divide. The <sc>SYNAPSE</sc> special-function unit implements application specific instructions and registers. It allows for accelerated weight computation and synapse access.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Micro-architecture of the embedded plasticity processor.</bold> The design is separated into frontend and backend. The frontend takes four clock cycles to decode instructions and issue them in-order to the applicable functional unit. The functional units take a minimum of two cycles. Writing the result back to the register file takes another cycle. Input/output operations are performed through a bus interface served by the load/store unit and a specialized interface to the synapse array.</p></caption>
<graphic xlink:href="fnins-07-00160-g0002.tif"/>
</fig>
<p>An important goal for our proposed design is to maintain small area requirements to allow integration into the existing BrainScaleS wafer-scale system. To this end, we chose in-order issue of instructions to avoid additional control logic associated with tracking of instructions and reordering. However, out-of-order completion can be achieved with relatively small area overhead using a result shift-register (Smith and Pleszkun, <xref ref-type="bibr" rid="B37">1985</xref>) and was therefore included to improve performance.</p>
</sec>
</sec>
<sec>
<title>2.2. Model for reinforcement learning</title>
<p>To demonstrate reinforcement learning using the proposed system architecture, we chose a plasticity rule and a learning task described in Fr&#x000E9;maux et al. (<xref ref-type="bibr" rid="B11">2010</xref>). The R-STDP rule (Florian, <xref ref-type="bibr" rid="B10">2007</xref>; Izhikevich, <xref ref-type="bibr" rid="B17">2007</xref>) is a three-factor synaptic plasticity learning rule that modulates classical two-factor STDP with a reward-based success signal <italic>S</italic>. At the end of each trial of the learning task, a reward <italic>R</italic> is calculated according to the performance of the network and is used to modify the weights according to the learning rule.</p>
<sec>
<title>2.2.1. Network model</title>
<p>The network we simulate consists of two layers, connected with plastic synapses using the reward-modulated learning rule. The input layer consists of units repeating a given set of spike trains. The output layer consists of spiking neurons, being excited by the fixed activity from the input layer.</p>
<p>The original network in Fr&#x000E9;maux et al. (<xref ref-type="bibr" rid="B11">2010</xref>) uses the simplified Spike Response Model (SRM<sub>0</sub>, Gerstner and Kistler, <xref ref-type="bibr" rid="B14">2002</xref>) for the output neurons. It is an intrinsically stochastic neuron that emits spikes based on the exponentially weighted distance to the threshold. In hardware the most commonly used neuron type is the deterministic leaky integrate-and-fire (LIF). The proposed system would use the hardware neuron reported in Millner et al. (<xref ref-type="bibr" rid="B21">2010</xref>) that can be operated as Adaptive Exponential Integrate-and-Fire (AdEx, Brette and Gerstner, <xref ref-type="bibr" rid="B2">2005</xref>) or conventional LIF model. Since a certain amount of randomness in the firing behavior is required for reinforcement learning, we add background noise stimulation in the form of Poisson processes.</p>
<p>A tabular description of the network model can be found in Table <xref ref-type="table" rid="T1">1</xref>. <italic>N</italic><sub><italic>U</italic></sub> input units project onto <italic>N</italic><sub><italic>T</italic></sub> neurons that are additionally stimulated by <italic>N</italic><sub><italic>B</italic></sub> random background sources. All neurons are connected to all inputs, but each has individual random stimulation from equally sized and disjoint subsets of the random background. In every trial the same input spike pattern is presented, but the background noise realization is different.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Description of the network model used for the learning task after Nordlie et al. (<xref ref-type="bibr" rid="B23">2009</xref>)</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" colspan="3"><bold>A: MODEL SUMMARY</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Populations</td>
<td align="left" colspan="2">Three: input <italic>U</italic>, random background <italic>B</italic>, target <italic>T</italic></td>
</tr>
<tr>
<td align="left">Connectivity</td>
<td align="left" colspan="2">Feed-forward</td>
</tr>
<tr>
<td align="left">Neuron model</td>
<td align="left" colspan="2">Leaky-integrate-and-fire, fixed voltage threshold, fixed absolute refractory period (voltage clamp)</td>
</tr>
<tr>
<td align="left">Synapse model</td>
<td align="left" colspan="2">Exponentially shaped post-synaptic conductances</td>
</tr>
<tr>
<td align="left">Plasticity</td>
<td align="left" colspan="2">Three-factor STDP</td>
</tr>
<tr>
<td align="left">Input</td>
<td align="left" colspan="2">Fixed-length spike-trains with uniformly distributed firing times</td>
</tr>
<tr>
<td align="left" colspan="3"><bold>B: POPULATIONS</bold></td>
</tr>
<tr>
<td align="left">Name</td>
<td align="left">Elements</td>
<td align="left">Population size</td>
</tr>
<tr>
<td align="left"><italic>U</italic></td>
<td align="left">Stimulus generator</td>
<td align="left"><italic>N</italic><sub><italic>U</italic></sub></td>
</tr>
<tr>
<td align="left"><italic>B</italic></td>
<td align="left">Poisson generator</td>
<td align="left"><italic>N</italic><sub><italic>B</italic></sub></td>
</tr>
<tr>
<td align="left"><italic>T</italic></td>
<td align="left">LIF neurons</td>
<td align="left"><italic>N</italic><sub><italic>T</italic></sub></td>
</tr>
<tr>
<td align="left" colspan="3"><bold>C: CONNECTIVITY</bold></td>
</tr>
<tr>
<td align="left">Source</td>
<td align="left">Target</td>
<td align="left">Pattern</td>
</tr>
<tr>
<td align="left"><italic>U</italic></td>
<td align="left"><italic>T</italic></td>
<td align="left">All-to-all, initial weights <italic>w</italic><sub><italic>S</italic></sub></td>
</tr>
<tr>
<td align="left"><italic>B</italic></td>
<td align="left"><italic>T</italic></td>
<td align="left">Non-overlapping 250 &#x02192; 1, weight <italic>w</italic><sub><italic>B</italic></sub></td>
</tr>
<tr>
<td align="left" colspan="3"><bold>D: NEURON AND SYNAPSE MODEL</bold></td>
</tr>
<tr>
<td align="left">Name</td>
<td align="left" colspan="2">LIF neuron</td>
<td/>
</tr>
<tr>
<td align="left">Type</td>
<td align="left" colspan="2">Leaky integrate-and-fire, exponential-shaped synaptic conductances</td>
</tr>
<tr>
<td align="left">Sub-threshold dynamics</td>
<td align="left" colspan="2"><inline-formula><mml:math id="M28"><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>g</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mi>t</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mrow><mml:mtext>ref</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mtext>reset</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>else</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left" rowspan="4">Spiking</td>
<td align="left" colspan="2"><italic>g</italic>(<italic>t</italic>) &#x0003D; <italic>w</italic> ex(&#x02212;<italic>t</italic>/&#x003C4;<sub>syn</sub>)</td>
</tr>
<tr>
<td align="left" colspan="2">if <italic>V</italic>(<italic>t</italic>&#x02212;) &#x0003C; <italic>V</italic><sub>th</sub> &#x02227; <italic>V</italic>(<italic>t</italic>&#x0002B;) &#x02265; <italic>V</italic><sub>th</sub></td>
</tr>
<tr>
<td align="left" colspan="2">&#x000A0;&#x000A0;&#x000A0;1. set <italic>t</italic><sup>&#x0002A;</sup> &#x0003D; <italic>t</italic></td>
</tr>
<tr>
<td align="left" colspan="2">&#x000A0;&#x000A0;&#x000A0;2. emit emit spike with time-stamp <italic>t</italic><sup>&#x0002A;</sup></td>
</tr>
<tr>
<td align="left" colspan="3"><bold>E: PLASTICITY</bold></td>
</tr>
<tr>
<td align="left">Name</td>
<td align="left" colspan="2">Three-factor STDP</td>
<td/>
</tr>
<tr>
<td align="left">Spike pairing scheme</td>
<td align="left" colspan="2">Reduced symmetric nearest-neighbor (Morrison et al., <xref ref-type="bibr" rid="B22">2008</xref>)</td>
</tr>
<tr>
<td align="left">Weight dynamics</td>
<td align="left" colspan="2">&#x00394; &#x0003D; <italic>Sa</italic>(<italic>t</italic>)</td>
</tr>
<tr>
<td/>
<td align="left" colspan="2"><inline-formula><mml:math id="M29"><mml:mrow><mml:mi>a</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:msub><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mi>i</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0003C;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>t</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x00394;</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td/>
<td align="left" colspan="2"><italic>w</italic> &#x02208; [<italic>w</italic><sub>min</sub>, <italic>w</italic><sub>max</sub>]</td>
</tr>
<tr>
<td align="left" colspan="3"><bold>F: INPUT</bold></td>
</tr>
<tr>
<td align="left">Type</td>
<td align="left">Target</td>
<td align="left">Description</td>
</tr>
<tr>
<td align="left">Stimulus generator</td>
<td align="left"><italic>U</italic></td>
<td align="left"><italic>N</italic><sub>stim</sub> spikes at random firing times distributed uniformly within the trial duration.</td>
</tr>
<tr>
<td align="left">Poisson generators</td>
<td align="left"><italic>B</italic></td>
<td align="left">Independent Poisson spike-trains with rate &#x003BD;<sub><italic>B</italic></sub></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>See Table <xref ref-type="table" rid="T2">2</xref> for numerical values of the parameters</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>For each input <italic>i</italic> &#x0003D; 0 &#x02026; <italic>N</italic><sub><italic>U</italic></sub> &#x02212; 1, the input pattern consists of randomly drawn spike times <italic>S</italic><sub><italic>ij</italic></sub> &#x02208; <graphic xlink:href="fnins-07-00160-i0001.tif"/> (0, <italic>t</italic><sub>trial</sub>) with <italic>j</italic> &#x0003D; 0 &#x02026; <italic>N</italic><sub>stim</sub> &#x02212; 1, where <graphic xlink:href="fnins-07-00160-i0001.tif"/> (0, <italic>t</italic><sub>trial</sub>) is the uniform distribution on the interval [0, <italic>t</italic><sub>trial</sub>]. All simulations use the same input spike times <italic>S</italic><sub><italic>ij</italic></sub> that are generated once to ensure comparability.</p>
<p>Weights for the random background have a uniform value <italic>w</italic><sub><italic>B</italic></sub>, so that every background spike causes the neuron to fire. Weights for input synapses are initialized to <italic>w</italic><sub><italic>S</italic></sub>, chosen so that single input spikes do not cause firing. See Table <xref ref-type="table" rid="T2">2</xref> for the numerical values.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Numerical values for parameters</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th align="left"><bold>Parameter</bold></th>
<th align="left"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td/>
<td align="left"><italic>N</italic><sub>U</sub></td>
<td align="left">250</td>
</tr>
<tr>
<td/>
<td align="left"><italic>N</italic><sub><italic>B</italic></sub></td>
<td align="left"><italic>N</italic><sub><italic>T</italic></sub> &#x000B7; 250</td>
</tr>
<tr>
<td/>
<td align="left"><italic>N</italic><sub><italic>T</italic></sub></td>
<td align="left">5</td>
</tr>
<tr>
<td/>
<td align="left"><italic>C</italic><sub>m</sub></td>
<td align="left">500 pF</td>
</tr>
<tr>
<td/>
<td align="left"><italic>g</italic><sub><italic>L</italic></sub></td>
<td align="left">10 nS</td>
</tr>
<tr>
<td/>
<td align="left"><italic>E</italic><sub><italic>L</italic></sub></td>
<td align="left">&#x02212;70 mV</td>
</tr>
<tr>
<td/>
<td align="left"><italic>E</italic><sub><italic>e</italic></sub></td>
<td align="left">0 mV</td>
</tr>
<tr>
<td/>
<td align="left">&#x003C4;<sub>ref</sub></td>
<td align="left">10 ms</td>
</tr>
<tr>
<td/>
<td align="left"><italic>V</italic><sub>reset</sub></td>
<td align="left">&#x02212;60 mV</td>
</tr>
<tr>
<td/>
<td align="left"><italic>V</italic><sub>th</sub></td>
<td align="left">&#x02212;50 mV</td>
</tr>
<tr>
<td/>
<td align="left"><italic>A</italic><sub>&#x000B1;</sub></td>
<td align="left">&#x000B1;32 pS</td>
</tr>
<tr>
<td/>
<td align="left">&#x003C4;<sub>&#x000B1;</sub></td>
<td align="left">20 ms</td>
</tr>
<tr>
<td/>
<td align="left">&#x003C4;<sub><italic>e</italic></sub></td>
<td align="left">0.1 &#x02026; 1000 s</td>
</tr>
<tr>
<td/>
<td align="left"><italic>w</italic><sub>min</sub></td>
<td align="left">0 nS</td>
</tr>
<tr>
<td/>
<td align="left"><italic>w</italic><sub>max</sub></td>
<td align="left">0.5 nS</td>
</tr>
<tr>
<td/>
<td align="left"><italic>w</italic><sub><italic>B</italic></sub></td>
<td align="left">20.0 nS</td>
</tr>
<tr>
<td/>
<td align="left"><italic>w</italic><sub><italic>S</italic></sub></td>
<td align="left">0.21 nS</td>
</tr>
<tr>
<td/>
<td align="left"><inline-formula><mml:math id="M30"><mml:mover accent='true'><mml:mi>W</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula></td>
<td align="left">0.45 nS</td>
</tr>
<tr>
<td/>
<td align="left">&#x003BD;<sub><italic>B</italic></sub></td>
<td align="left">0.008 Hz</td>
</tr>
<tr>
<td/>
<td align="left"><italic>t</italic><sub>trial</sub></td>
<td align="left">1 s</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>For parameter definitions see Table <xref ref-type="table" rid="T1">1</xref> and text</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>2.2.2. Synaptic plasticity model</title>
<p>In the reward modulated STDP learning rule, the outcome of standard STDP drives so-called eligibility trace changes &#x00394;<italic>e</italic><sub><italic>k</italic></sub>:
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msub><mml:mi>e</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:msub><mml:mi>A</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
with learning rate &#x003B7;, time-difference between pre- and post-synaptic spike &#x00394;<italic>t</italic><sub><italic>k</italic></sub> for the <italic>k</italic>-th pair, STDP time constant &#x003C4;<sub>&#x0002B;</sub> for pre-before-post pairings, &#x003C4;<sub>&#x02212;</sub> for post-before-pre pairings, and, in the same fashion, amplitude parameters <italic>A</italic><sub>&#x000B1;</sub>. The &#x00394;<italic>e</italic><sub><italic>k</italic></sub> are accumulated on a per-synapse eligibility trace <italic>e</italic>. This trace decays exponentially according to:
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mrow><mml:mi>e</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mtable><mml:mtr><mml:mtd><mml:mi>k</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>t</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:munder><mml:mi>&#x00394;</mml:mi></mml:mstyle><mml:msub><mml:mi>e</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
with time-constant &#x003C4;<sub><italic>e</italic></sub> of the decay and <italic>t</italic><sub><italic>k</italic></sub> being the time of the post-synaptic spike for pre-before-post pairings and of the pre-synaptic spike otherwise.</p>
<p>To calculate the weight update, a success signal <italic>S</italic> is used as modulating third factor. It represents the difference between reward received <italic>R</italic> and a running average of reward <inline-formula><mml:math id="M31"><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula>
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The reward is given at the end of each trial according to the learning task as defined in the next section. The running average is calculated as <inline-formula><mml:math id="M32"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo stretchy='true'>&#x000AF;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula> for the <italic>n</italic>-th trial. The weight update is then given by
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mtext>trial</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
with the trial duration <italic>t</italic><sub>trial</sub>.</p>
<p>In Fr&#x000E9;maux et al. (<xref ref-type="bibr" rid="B11">2010</xref>) different time constants for pre-before-post (&#x003C4;<sub>&#x0002B;</sub> &#x0003D; 20 ms) and post-before-pre (&#x003C4;<sub>&#x02212;</sub> &#x0003D; 40 ms) are used. The amplitudes <italic>A</italic><sub>&#x0002B;</sub> and <italic>A</italic><sub>&#x02212;</sub> are chosen so that both parts are balanced, i.e., <italic>A</italic><sub>&#x0002B;</sub> &#x003C4;<sub>&#x0002B;</sub> &#x0003D; &#x02212;<italic>A</italic><sub>&#x02212;</sub>&#x003C4;<sub>&#x02212;</sub>. Synapses of the BrainScaleS wafer-scale system are designed for time constants of 20 ms. We do not want to assume, that this can be increased by a factor of two and therefore, we reduce &#x003C4;<sub>&#x02212;</sub> to the same value as &#x003C4;<sub>&#x0002B;</sub>. Consequently we also use identical amplitudes to keep the STDP window <italic>W</italic> balanced. The plasticity rule described in this section represents the theoretical ideal model for our comparison that we refer to as the baseline model. Section 2.2.4 describes how this is mapped to hardware and the resulting constraints.</p>
</sec>
<sec>
<title>2.2.3. Learning task</title>
<p>In reinforcement learning, reward given is determined by the nature of the learning task considered. In our case, the goal of the network is to reproduce a given target spike train. Hence, reward should be given in proportion to the similarity of the actual and target outputs, as measured by some metric. Here, we use a normalized version of the metric <italic>D</italic><sup>spike</sup>[<italic>q</italic>] by Victor and Purpura (<xref ref-type="bibr" rid="B40">1996</xref>). <italic>D</italic><sup>spike</sup>[<italic>q</italic>] represents the minimal cost of transforming the output of a trial into the target pattern by adding, deleting and shifting spikes. Adding and deleting have unit cost, while shifting by &#x00394;<italic>t</italic> has a cost of <italic>q</italic>&#x00394;<italic>t</italic>. For &#x00394;<italic>t</italic> &#x0003E; 2/<italic>q</italic>, deleting the spike and adding a new one at the correct time is cheaper than shifting it. Therefore, the parameter <italic>q</italic> controls the precision of the comparison. The cost parameter is set to 1/<italic>q</italic> &#x0003D; 20 ms for our simulations.</p>
<p>Thus in a trial where neuron <italic>j</italic> fires with a spike train <italic>X</italic><sub>out, j</sub> and the target was <italic>X</italic><sub>target</sub>, the contribution of neuron <italic>j</italic> to the reward is
<disp-formula id="E9"><label>(9)</label><mml:math id="M9"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>spike</mml:mtext></mml:mrow></mml:msup><mml:mo stretchy='false'>[</mml:mo><mml:mi>q</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mtext>out</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x000A0;j</mml:mtext></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mtext>target</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mtext>out</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x000A0;j</mml:mtext></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mtext>target</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
where <italic>N</italic><sub>out, j</sub> and <italic>N</italic><sub>target</sub> are the number of spikes in <italic>X</italic><sub>out, j</sub> and <italic>X</italic><sub>target</sub>, respectively. Because <italic>D</italic><sup>spike</sup>[<italic>q</italic>] is bound to [0, <italic>N</italic><sub>out, j</sub> &#x0002B; <italic>N</italic><sub>target</sub>], <italic>R</italic><sub><italic>j</italic></sub> is limited to [0, 1]. The total reward <italic>R</italic> used for the weight update is the average of <italic>R</italic><sub><italic>j</italic></sub> over all <italic>N</italic><sub><italic>T</italic></sub> neurons.</p>
<p>The target spike train is generated by simulating the neural network with a set of reference weights <italic>W</italic><sub><italic>ij</italic></sub> for inputs <italic>i</italic> &#x0003D; 0 &#x02026; <italic>N</italic><sub><italic>U</italic></sub> &#x02212; 1 and neurons <italic>j</italic> &#x0003D; 0 &#x02026; <italic>N</italic><sub><italic>T</italic></sub> &#x02212; 1. All simulations use the same set of reference weights to ensure fair comparison:
<disp-formula id="E10"><label>(10)</label><mml:math id="M10"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mover accent='true'><mml:mi>W</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>sin</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>i</mml:mi><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>U</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mn>0</mml:mn><mml:mo>&#x02264;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>U</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>U</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:mfrac><mml:mo>&#x0003C;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>U</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
with <inline-formula><mml:math id="M33"><mml:mrow><mml:mover accent='true'><mml:mi>W</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0.45</mml:mn><mml:mtext>&#x000A0;nS</mml:mtext></mml:mrow></mml:math></inline-formula>. An example of an output spike pattern produced by the network is shown in Figure <xref ref-type="fig" rid="F3">3</xref>. A new target spike train is generated at the beginning of every simulation run. Its firing times can be different even for identical weights and stimulation, because of the random background stimulation.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Raster-plot of output spike-events for all five neurons at intervals of 2000 trials.</bold> Red bars indicate the target firing times.</p></caption>
<graphic xlink:href="fnins-07-00160-g0003.tif"/>
</fig>
</sec>
<sec>
<title>2.2.4. Simulated hardware constraints</title>
<p>The baseline plasticity model described in Equations (5&#x02013;8) can not be reproduced exactly by the proposed system. This results in two distinct classes of effects: trade-offs introduced on purpose to reduce costs, for example in area, and non-ideal behavior of the hardware system.</p>
<p>In the first category, we analyze the effect of discretized weights and a limited access to analog variables by software running on the EPP. For the second category we study leakage in analog circuits and timing effects caused by finite processor speed and communication latencies.</p>
<p><bold><italic>2.2.4.1. Discrete weights.</italic></bold> In the hardware system, synaptic weights are discretized since they are stored as digital values in the synapse circuit. The number of bits per synapse is a critical design decision when building a neuromorphic hardware system. Having fewer bits saves wafer area, so that more synapses can be implemented. More bits, on the other hand, allow for a higher dynamic range of the synaptic efficacies. The weight resolution also defines the minimum step size that can be taken by a learning rule. To analyze the sensitivity of learning performance to weight resolution, we modify the baseline model to use discrete weights with different numbers of bits. On a learning rule update, we precisely calculate the new weight (64 bit floating point) and round it to the nearest representable discrete weight value. The tie-breaking rule is round-to-even.</p>
<p>In the case of non-continuous weights with <italic>r</italic> bits, all updates with
<disp-formula id="E11"><label>(11)</label><mml:math id="M11"><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x00394;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mfrac><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mtext>max</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mtext>min</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msup><mml:mn>2</mml:mn><mml:mi>r</mml:mi></mml:msup><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
are discarded by rounding. Here <italic>w</italic><sub>min</sub> and <italic>w</italic><sub>max</sub> are the minimum and maximum weight values that can be represented and &#x00394; is the true weight update (see Equation 8). Fewer bits per synapse means that more updates are discarded, causing the effective learning rule to increasingly deviate from the baseline learning rule.</p>
<p>A workaround to this problem is to perform discretized updates &#x00394;<sub><italic>d</italic></sub> probabilistically, depending on the exact weight update &#x00394; as given by Equation (8). In this way, some of the updates that would otherwise be lost can be preserved. Using the correct update probabilities results in the average weight change being identical to that of the baseline model, i.e., without discretization.</p>
<p>To see this, we note that &#x00394;<sub><italic>d</italic></sub> can only assume values that are multiples of the discretization step &#x003B4;<sub><italic>r</italic></sub> &#x0003D; (<italic>w</italic><sub>max</sub> &#x02212; <italic>w</italic><sub>min</sub>)/(2<sup><italic>r</italic></sup> &#x02212; 1), assuming <italic>w</italic><sub>min</sub> &#x0003D; 0. If the baseline weight change &#x00394; is between the <italic>k</italic>-th and (<italic>k</italic> &#x02212; 1)-th step, the discrete update &#x00394;<sub><italic>d</italic></sub> is picked from those with probability <italic>p</italic> &#x0003D; Pr (&#x00394;<sub><italic>d</italic></sub> &#x0003D; <italic>k</italic>&#x003B4;<sub><italic>r</italic></sub> | &#x00394;) and 1 &#x02212; <italic>p</italic>, respectively. Such a scheme leads to the average update &#x02329;&#x00394;<sub><italic>d</italic></sub>&#x0232A; for a given &#x00394; being
<disp-formula id="E12"><label>(12)</label><mml:math id="M12"><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x00394;</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mi>p</mml:mi><mml:mo>+</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E13"><label>(13)</label><mml:math id="M13"><mml:mrow><mml:mo>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;=&#x000A0;</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mi>p</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>By picking <italic>p</italic> as
<disp-formula id="E14"><label>(14)</label><mml:math id="M14"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
it holds that &#x02329;&#x00394;<sub><italic>d</italic></sub>&#x0232A; &#x0003D; &#x00394;.</p>
<p><bold><italic>2.2.4.2. Baseline model with added noise.</italic></bold> When performing weight updates probabilistically, randomization introduces additional noise to the weight dynamics. This noise is not present in the baseline model with continuous weights. Therefore, adding an equivalent amount of random noise to the baseline simulation allows for a more accurate assessment of weight discretization with probabilistic updates.</p>
<p>With every update, probabilistic rounding introduces an error <italic>z</italic> &#x0003D; &#x00394;<sub><italic>d</italic></sub> &#x02212; &#x00394;. For simplification, we introduce &#x003F5; &#x02208; [0, &#x003B4;<sub><italic>r</italic></sub>) and substitute &#x00394; &#x0003D; (<italic>k</italic> &#x02212; 1)&#x003B4;<sub><italic>r</italic></sub> &#x0002B; &#x003F5; in Equation (14) to get <italic>p</italic> &#x0003D; &#x003F5;/&#x003B4;<sub><italic>r</italic></sub>. Then, <italic>z</italic> is distributed according to
<disp-formula id="E15"><label>(15)</label><mml:math id="M15"><mml:mrow><mml:mi>Pr</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>&#x02223;</mml:mo><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mi>p</mml:mi></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003F5;</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003F5;</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>otherwise.</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>We are now interested in the unconditional probability distribution Pr(<italic>z</italic>) to add noise shaped accordingly to the baseline simulation with continuous weights. This is given by
<disp-formula id="E16"><label>(16)</label><mml:math id="M16"><mml:mrow><mml:mi>Pr</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mn>0</mml:mn><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:mi>Pr</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x02223;</mml:mo><mml:mi>&#x003F5;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>Pr</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003F5;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>d</mml:mtext><mml:mi>&#x003F5;</mml:mi></mml:mrow></mml:mrow></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Assuming &#x003F5; to be uniformly distributed in its allowed interval gives Pr(&#x003F5;) &#x0003D; &#x003B4;<sup>&#x02212;1</sup><sub><italic>r</italic></sub>. Using the Kronecker-Delta &#x003B4; to write down Pr(<italic>z</italic> | &#x003F5;) with <italic>p</italic> &#x0003D; &#x003F5;/&#x003B4;<sub><italic>r</italic></sub> (Equation 14) gives:
<disp-formula id="E17"><label>(17)</label><mml:math id="M17"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>Pr</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mn>0</mml:mn><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:mfrac><mml:mi>&#x003F5;</mml:mi><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mi>&#x003F5;</mml:mi><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>d</mml:mtext><mml:mi>&#x003F5;</mml:mi></mml:mrow></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for&#x000A0;</mml:mtext><mml:mn>0</mml:mn><mml:mo>&#x0003C;</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:msubsup><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x02264;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Equation (17) describes a triangular shaped probability density for the noise introduced by probabilistic updates. As is to be expected, the noise is bounded by &#x000B1;&#x003B4;<sub><italic>r</italic></sub>.</p>
<p><bold><italic>2.2.4.3. Thresholded readout.</italic></bold> The eligibility trace is implemented using the analog accumulation in the synapse unit. For every spike pair, Equation (1) is evaluated and the corresponding eligibility trace change is added as charge on the local storage capacitors <italic>a</italic><sub>&#x0002B;</sub> and <italic>a</italic><sub>&#x02212;</sub>, respectively. These values are not directly accessible to the EPP. Instead, using the evaluation unit described in section 2.1.2 with threshold &#x00398; &#x0003D; <italic>a</italic><sub>th</sub> &#x02212; <italic>a</italic><sub>tl</sub>, accumulation trace <italic>a</italic> &#x0003D; <italic>a</italic><sub>&#x0002B;</sub> &#x02212; <italic>a</italic><sub>&#x02212;</sub>, configuration bits <italic>e</italic><sup>&#x0002B;</sup><sub><italic>ac</italic></sub> &#x0003D; 1, <italic>e</italic><sup>&#x0002B;</sup><sub><italic>aa</italic></sub> &#x0003D; 1, <italic>e</italic><sup>&#x0002B;</sup><sub><italic>ca</italic></sub> &#x0003D; 0, <italic>e</italic><sup>&#x0002B;</sup><sub><italic>cc</italic></sub> &#x0003D; 0 for the evaluation of <italic>b</italic><sub>&#x0002B;</sub> and <italic>e</italic><sup>&#x02212;</sup><sub><italic>ac</italic></sub> &#x0003D; 0, <italic>e</italic><sup>&#x02212;</sup><sub><italic>aa</italic></sub> &#x0003D; 0, <italic>e</italic><sup>&#x02212;</sup><sub><italic>ca</italic></sub> &#x0003D; 1, <italic>e</italic><sup>&#x02212;</sup><sub><italic>cc</italic></sub> &#x0003D; 1 for <italic>b</italic><sub>&#x02212;</sub>, the readout computes
<disp-formula id="E18"><label>(18)</label><mml:math id="M18"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mo>&#x000B1;</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>1</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mo>&#x000B1;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mo>+</mml:mo></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mo>&#x02212;</mml:mo></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x0003E;</mml:mo><mml:mo>&#x00398;</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The weight update with threshold readout &#x00394;<sub><italic>t</italic></sub> is then performed using an update constant <italic>A</italic>
<disp-formula id="E19"><label>(19)</label><mml:math id="M19"><mml:mrow><mml:msub><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>A</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mo>+</mml:mo></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mo>&#x02212;</mml:mo></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The parameters &#x00398; and <italic>A</italic> should be chosen so as to minimize the deviation introduced by calculating weights according to Equation (19) instead of Equation (8). Ideally, one would like to satisfy &#x02329;&#x00394;<sub><italic>t</italic></sub>&#x0232A; &#x0003D; &#x00394;. However, detailed analysis of the simulations (not shown) showed that the eligibility trace distributions for different synapses at different stages of learning were very different. In that context, choosing parameters &#x00398; and <italic>A</italic> that minimize the difference between the baseline change &#x00394; and the average effective change &#x02329;&#x00394;<sub><italic>t</italic></sub>&#x0232A; for a particular synapse would not in general have the same effect for other synapses. Instead, we resort to a heuristic method to fix global threshold and update constant, described below, and assess its effectiveness in simulations.</p>
<p>For the simulations presented here, a precursor run over 100 trials without learning was used to measure the final absolute eligibility value &#x02329;|a|&#x0232A; averaged over all readout operations. The threshold &#x00398; was then set to &#x00398;<sup>&#x0002A;</sup> &#x0003D; &#x02329;|<italic>a</italic>|&#x0232A; for the actual learning simulation. In this way, the average (across synapses) final eligibility value encountered during weight updates is close to the threshold. This represents a trade-off between exceeding the threshold only seldom, but then causing large&#x02014;possibly disruptive&#x02014;weight changes, and exceeding the threshold often, but only applying small changes.</p>
<p>With <italic>N</italic><sub><italic>p</italic></sub>(&#x00398;) being the number of readout operations that exceed the threshold, i.e., <italic>b</italic><sub>&#x0002B;</sub> or <italic>b</italic><sub>&#x02212;</sub> are non-zero, and the total number of readout operations <italic>N</italic>, the update constant <italic>A</italic> is set to
<disp-formula id="E20"><label>(20)</label><mml:math id="M20"><mml:mrow><mml:msup><mml:mi>A</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mi>N</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>&#x003B8;</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:msup><mml:mi>&#x003B8;</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Thereby, the mean absolute eligibility value used with the readout <italic>N</italic><sub><italic>p</italic></sub>(&#x00398;<sup>&#x0002A;</sup>)<italic>A</italic><sup>&#x0002A;</sup>/<italic>N</italic> is effectively the same as &#x02329;|<italic>a</italic>|&#x0232A; in the baseline model.</p>
<p><bold><italic>2.2.4.4. Analog drift.</italic></bold> The local accumulation units in the hardware synapses do not have a mechanism for controlled decay of the eligibility trace. An ideal implementation of the circuit would stay unchanged over time, after a spike-pair has caused an update. In reality there are leakage currents causing the accumulation traces <italic>a</italic><sub>&#x0002B;</sub> and <italic>a</italic><sub>&#x02212;</sub> and their difference <italic>a</italic> to drift. Leakage is caused by a number of processes that depend on transistor geometry, manufacturing process, temperature and internal voltages (Roy et al., <xref ref-type="bibr" rid="B29">2003</xref>). It is therefore difficult to predict either time-scale, shape or variability of this effect. We try to get an estimate on the sensitivity of the model to uncontrolled temporal drift, by simulating learning with a drift function &#x003D5;<sub><italic>i</italic></sub>(<italic>t</italic>; <italic>a</italic><sub>0</sub>). Here <italic>t</italic> is the duration of the drift and <italic>a</italic><sub>0</sub> is the starting value for <italic>t</italic> &#x0003D; 0. The index <italic>i</italic> is over all synapses and both trace polarities. This function describes the development of <italic>a</italic><sub>&#x0002B;</sub> (<italic>t</italic>) and <italic>a</italic><sub>&#x02212;</sub>(<italic>t</italic>) between spike-pair induced updates. The accumulation value is given as the difference <italic>a</italic>(<italic>t</italic>) &#x0003D; <italic>a</italic><sub>&#x0002B;</sub> (<italic>t</italic>) &#x02212; <italic>a</italic><sub>&#x02212;</sub> (<italic>t</italic>). We define an exponential drift function
<disp-formula id="E21"><label>(21)</label><mml:math id="M21"><mml:mrow><mml:msub><mml:mi>&#x003D5;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>a</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for&#x000A0;</mml:mtext><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x0003E;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mtext>max</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mtext>max</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for&#x000A0;</mml:mtext><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>else</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
where <italic>a</italic><sub>max</sub> is the maximum value that <italic>a</italic><sub>&#x0002B;</sub> and <italic>a</italic><sub>&#x02212;</sub> can assume and &#x003BB;<sub><italic>i</italic></sub> &#x0003D; 1/&#x003C4;<sub><italic>e,i</italic></sub> is the inverse time constant. Positive &#x003BB;<sub><italic>i</italic></sub> leads to exponential decay as it was used so far. Negative &#x003BB;<sub><italic>i</italic></sub> causes a drift away from zero, toward the limit <italic>a</italic><sub>max</sub>. For every synapse and for positive and negative traces, &#x003C4;<sub><italic>e,i</italic></sub> is drawn from a Gaussian distribution with mean &#x003C4;<sub><italic>e</italic></sub> and standard deviation <italic>m</italic><sub><italic>e</italic></sub>&#x003C4;<sub><italic>e</italic></sub> using the mismatch factor <italic>m</italic><sub><italic>e</italic></sub>. In the limit of large <italic>t</italic>, this allows for four final states of <italic>a</italic> (<italic>t</italic>): Decay to zero, drift to <italic>a</italic><sub>max</sub> or &#x02212;<italic>a</italic><sub>max</sub> and remaining constant at <italic>a</italic><sub>0</sub> for &#x003BB;<sub><italic>i</italic></sub> &#x0003D; 0.</p>
<p>It is important to note that we do not intend to precisely model the leakage behavior of the analog circuit. Instead, we use a simple model capturing the essence of drifting analog values to get an estimate for the sensitivity to this effect.</p>
<p><bold><italic>2.2.4.5. Delayed reward.</italic></bold> The hardware system is a physical model of the emulated network. Therefore, emulated time progresses continuously during network operation with the acceleration factor &#x003B1; relative to wall-clock time. During all communication and computation, network operation continues. The amount of reward for each trial is calculated by the control cluster, after the nominal trial duration has ended and output spike events have been transmitted to the cluster. The success signal is then determined and sent back to the embedded processor. Then, the plasticity program will sequentially execute the weight update for all synapses taking a certain amount of time per synapse. This time is consumed by the synapse array access and the weight computation.</p>
<p>These two effects are modeled by adding a constant delay <italic>D</italic><sub><italic>R</italic></sub> after the trial has finished and an update rate &#x003BD;<sub><italic>s</italic></sub> giving the number of updated synapses per second. The weight update for synapse <italic>i</italic> occurs at <inline-formula><mml:math id="M34"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mtext>trial</mml:mtext></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mi>R</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mfrac><mml:mi>i</mml:mi><mml:mrow><mml:msub><mml:mi>&#x003BD;</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula>. The order in which synapses are updated is determined by their position in the synapse array and is therefore a result of the automated mapping process. For this study, we assume weight updates to be fast enough compared to the reward delay <italic>D</italic><sub><italic>R</italic></sub> and therefore use <italic>t</italic><sub><italic>i</italic></sub> &#x0003D; <italic>t</italic><sub>trial</sub> &#x0002B; <italic>D</italic><sub><italic>R</italic></sub>.</p>
<p>The delay causes a deviation from the ideal model because the accumulation capacitors <italic>a</italic><sub>&#x0002B;</sub>, <italic>a</italic><sub>&#x02212;</sub> used to store the eligibility trace continue to decay. The eligibility value used for the weight update is then reduced by a factor
<disp-formula id="E22"><label>(22)</label><mml:math id="M22"><mml:mrow><mml:mi>&#x003B2;</mml:mi><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>This can prevent a weight update that would have been made in the non-delayed case by reducing <italic>a</italic> below the readout threshold &#x00398;. We assume that the delay <italic>D</italic><sub><italic>R</italic></sub> is known or can be estimated and lower the threshold to &#x003B2;&#x00398;.</p>
<p>In theory, this would allow to correct for arbitrary delay, since the exponential decay never reaches zero. In hardware this is not the case, because the eligibility readout is subject to noise. Therefore, after a certain delay, traces will be indiscernible from noise. To account for this, we simulate Gaussian distributed noise &#x003B4;<italic>a</italic> on the readout with standard deviation &#x003C3;<sub><italic>a</italic></sub> and mean 0. The value used for comparison to the threshold is then given by <italic>a</italic>&#x02032; &#x0003D; <italic>a</italic> &#x0002B; &#x003B4;<italic>a</italic>. If a signal-to-noise ratio <italic>z</italic><sup>&#x0002A;</sup> is required for correct learning, a limit <italic>D</italic><sub>max</sub> for the delay can be calculated using the signal-to-noise ratio <italic>z</italic>(<italic>t</italic>) &#x0003D; <italic>a</italic>(<italic>t</italic>)/&#x003C3;<sub><italic>a</italic></sub>
<disp-formula id="E23"><label>(23)</label><mml:math id="M23"><mml:mrow><mml:mi>z</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>a</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mtext>trial</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mtext>trial</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>With <italic>z</italic>(<italic>D</italic><sub>max</sub> &#x0002B; <italic>t</italic><sub>trial</sub>) &#x0003D; <italic>z</italic><sup>&#x0002A;</sup> and <italic>a</italic>(<italic>t</italic><sub>trial</sub>) &#x0003D; <italic>a</italic><sub>max</sub>, the maximally tolerable delay in the presence of noise is given by
<disp-formula id="E24"><label>(24)</label><mml:math id="M24"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mtext>max</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003C4;</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mtext>max</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
<sec>
<title>2.2.5. Measuring performance</title>
<p>Simulations consist of 10,000 trials in 20 parallel runs with different random seeds. At the beginning of every run, 100 trials are simulated without learning: during this time the running average <inline-formula><mml:math id="M35"><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> can settle to a stable approximation of the reward. The average over <italic>R</italic> during these trials is used as the initial reward level <italic>R</italic><sub>before</sub> of this run. During the last 1000 trials of the simulation, it is assumed that learning has reached a stable state: the final reward level <italic>R</italic><sub>after</sub> is the average of <italic>R</italic> over these trials.</p>
<p>The model is simulated using the Brian simulator (Goodman and Brette, <xref ref-type="bibr" rid="B15">2008</xref>). Weight updates are calculated with custom Python code using the NumPy package (Numpy, <xref ref-type="bibr" rid="B24">2012</xref>).</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<p>In the previous section, we analyzed a synaptic learning rule (Florian, <xref ref-type="bibr" rid="B10">2007</xref>; Izhikevich, <xref ref-type="bibr" rid="B17">2007</xref>; Fr&#x000E9;maux et al., <xref ref-type="bibr" rid="B11">2010</xref>), and the necessary adjustments that have to be made in order to implement it on a hardware system. The goal of this section is to quantify the sensitivity to constraints of the system&#x02014;for example discretized weights or imperfections of analog circuits&#x02014;to identify those critical for the model. Starting from the baseline configuration without hardware effects, we add constraints and measure their effect on the learning performance.</p>
<sec>
<title>3.1. Baseline</title>
<p>The baseline model implements the learning rule described in section 2.2 and Table <xref ref-type="table" rid="T1">1</xref> without hardware effects, and serves as comparison for simulations including such effects. The eligibility trace <italic>e</italic> of the theoretical model is identified with the local accumulation <italic>a</italic> in hardware synapses. Thereby, changes to the weight are deferred until the success signal <italic>S</italic> is given from the attached control cluster, after the produced spike train has been evaluated. New weights are assumed to be calculated using a software program running on the EPP.</p>
<p>The raster plot in Figure <xref ref-type="fig" rid="F3">3</xref> shows the output spike train at several points in time during a learning simulation. In the beginning at trial 0, spikes are generated randomly by the background stimulation. Later on, the network learns to produce spikes at the targeted points of time indicated with red vertical bars. In the last trial, neurons fire close to most of the target times. The evolution of the reward obtained in each trial averaged over 20 runs is shown in Figure <xref ref-type="fig" rid="F5">5A</xref>. Variance in the last 1000 trials is due to the random background stimulation and to the exploratory behavior it generates in the learning rule. Most of the performance improvement is achieved within the first 2000 trials, the final level of reward being <italic>R</italic><sup>base</sup><sub>after</sub> &#x0003D; 0.54 &#x000B1; 0.05.</p>
<p>This is the result using one particular set of reference weights <italic>W</italic><sub><italic>ij</italic></sub> and stimulation pattern <italic>S</italic><sub><italic>ij</italic></sub> that were defined in section 2.2.1. To test how well this result generalizes to other weights and stimulation patterns we perform two additional experiments: first of all, we randomize the reference weights, so that in 20 simulation runs the network learns with a different set of reference weights in each run. These weights are drawn randomly from a uniform distribution, so that the <italic>k</italic>-th run uses reference weights <italic>W</italic><sup><italic>k</italic></sup><sub><italic>ij</italic></sub> &#x02208; <graphic xlink:href="fnins-07-00160-i0001.tif"/> (<italic>w</italic><sub>min</sub>, <italic>w</italic><sub>max</sub>) to generate its target spike train. This gives a final level of reward of <italic>R</italic><sup><italic>w</italic></sup><sub>after</sub> &#x0003D; 0.59 &#x000B1; 0.08 averaged over the 20 runs with different reference weights.</p>
<p>In the second experiment we again use the <italic>W</italic><sub><italic>ij</italic></sub> reference weights for all 20 simulations. The stimulation pattern is randomized by drawing new spike times for each run from a uniform distribution, so that the <italic>k</italic>-th run uses spike times <italic>S</italic><sup><italic>k</italic></sup><sub><italic>ij</italic></sub> &#x02208; <graphic xlink:href="fnins-07-00160-i0001.tif"/> (0, <italic>t</italic><sub>trial</sub>) for all trials. This gives a performance <italic>R</italic><sup><italic>s</italic></sup><sub>after</sub> &#x0003D; 0.53 &#x000B1; 0.08 averaged over the 20 different sets of stimulation patterns.</p>
<p>The final reward level for the baseline simulation, randomized reference weights and randomized stimulation pattern are shown in Figure <xref ref-type="fig" rid="F4">4</xref>. The data show, that the from here on used special case of reference weights <italic>W</italic><sub><italic>ij</italic></sub> and stimulation spike times <italic>S</italic><sub><italic>ij</italic></sub> is within the performance range of randomly selected reference weights and input spike timings. The variances on <italic>R</italic><sup><italic>w</italic></sup><sub>after</sub> and <italic>R</italic><sup><italic>s</italic></sup><sub>after</sub> also show that there is considerable variation in the unconstrained theoretical model. To reduce variation in our results, so that changes caused by hardware effects are more visible, we use <italic>W</italic><sub><italic>ij</italic></sub> and <italic>S</italic><sub><italic>ij</italic></sub> from here on.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Final level of reward for: baseline simulation, randomized reference weights, and randomized stimulation pattern.</bold> The final performance level of the baseline simulation <italic>R</italic><sup>base</sup><sub>after</sub> using reference weights <italic>W</italic><sub><italic>ij</italic></sub> and stimulation pattern <italic>S</italic><sub><italic>ij</italic></sub> is comparable to the final level of reward averaged over randomly chosen reference weights <italic>R</italic><sup><italic>w</italic></sup><sub>after</sub> and stimulation patterns <italic>R</italic><sup><italic>s</italic></sup><sub>after</sub>.</p></caption>
<graphic xlink:href="fnins-07-00160-g0004.tif"/>
</fig>
</sec>
<sec>
<title>3.2. Discretized weights</title>
<p>In designing the neuromorphic hardware system, one is faced with a trade-off between implementing more synapses with lower bit resolution and less synapses with higher resolution. Therefore, we would like to know how many bits are required for each synaptic weight to achieve good performance in the learning task. We perform a three-way comparison between the baseline model, a deterministic algorithm that simply rounds calculated weights to allowed representations and a probabilistic variant as outlined in section 2.2.4. Using deterministic weight updates, all updates satisfying Equation 11 do not cause a weight change. With fewer bits more updates are lost and learning performance is expected to suffer. This is what can be seen in Figure <xref ref-type="fig" rid="F5">5</xref>. The simulations shown there compare performance of the baseline model, to a constrained model with discretized weights of decreasing resolution. Figure <xref ref-type="fig" rid="F5">5A</xref> also shows the full reward trace of a single run picked arbitrarily. The plot exhibits a number of sharp drops in reward that last for less than 15 trials, before returning to the previous performance level. The final level of performance is not affected by these glitches. For the 8 bit case, performance is as good as using continuous weights (Figure <xref ref-type="fig" rid="F5">5B</xref>). Figure <xref ref-type="fig" rid="F5">5C</xref> shows a slightly reduced performance for 6 bit. Using only 4 bit with deterministic updates causes performance to degrade: it does not reach the same final level of reward (Figure <xref ref-type="fig" rid="F5">5D</xref> black trace). See Table <xref ref-type="table" rid="T3">3</xref> for the final performance values <italic>R</italic><sub>after</sub>. Using probabilistic updates improves the performance for 4 bit to <italic>R</italic><sup>4p</sup><sub>after</sub> &#x0003D; 0.46 &#x000B1; 0.03, which is (85 &#x000B1; 10)% of the baseline level <italic>R</italic><sup>base</sup><sub>after</sub> (Figure <xref ref-type="fig" rid="F5">5D</xref> green trace).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Reward traces showing the running average <inline-formula><mml:math id="M36"><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> (only every 50th point plotted) for different weight resolutions averaged over 20 runs. (A)</bold> Baseline performance with continuous weights. Additionally, the light gray trace shows the reward <italic>R</italic> for every trial of a single simulation. <bold>(B)</bold> Performance with 8 bit resolution. The lower plot shows the difference to the baseline model in <bold>(A)</bold>. The shaded area shows the difference for every point in the trace instead of only for every 50th. <bold>(C)</bold> Performance with 6 bit resolution. <bold>(D)</bold> Performance with 4 bit resolution. The black trace shows the result for deterministic updates. The green trace for probabilistic updates.</p></caption>
<graphic xlink:href="fnins-07-00160-g0005.tif"/>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Comparison of simulations with different hardware constraints</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"><bold>No</bold></th>
<th align="left"><bold>Description</bold></th>
<th align="left"><bold><italic>R</italic><sub>after</sub></bold></th>
<th align="left"><bold><italic>D</italic><sub>KS</sub></bold></th>
<th align="left"><bold>Reference</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">Baseline<sup>&#x0002A;</sup></td>
<td align="left">0.54 &#x000B1; 0.05</td>
<td align="left">&#x02013;</td>
<td align="left">&#x02013;</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">Baseline with noise<sup>&#x02020;</sup></td>
<td align="left">0.45 &#x000B1; 0.03</td>
<td align="left">&#x02013;</td>
<td align="left">&#x02013;</td>
</tr>
<tr>
<td align="left" colspan="5">DISCRETIZED WEIGHTS</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">8 bit</td>
<td align="left">0.53 &#x000B1; 0.03</td>
<td align="left">0.008<sup>&#x0002A;</sup></td>
<td align="left">(1)</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">6 bit</td>
<td align="left">0.52 &#x000B1; 0.03</td>
<td align="left">0.039<sup>&#x0002A;</sup></td>
<td align="left">(1)</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">4 bit, deterministic</td>
<td align="left">0.37 &#x000B1; 0.03</td>
<td align="left">0.098<sup>&#x0002A;</sup></td>
<td align="left">(1)</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">4 bit, probabilistic</td>
<td align="left">0.46 &#x000B1; 0.03</td>
<td align="left">0.053<sup>&#x02020;</sup></td>
<td align="left">(2)</td>
</tr>
<tr>
<td align="left" colspan="5">THRESHOLD READOUT</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">8 bit</td>
<td align="left">0.59 &#x000B1; 0.03</td>
<td align="left">0.140<sup>&#x0002A;</sup></td>
<td align="left">(1)</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">6 bit</td>
<td align="left">0.59 &#x000B1; 0.05</td>
<td align="left">0.120<sup>&#x0002A;</sup></td>
<td align="left">(1)</td>
</tr>
<tr>
<td align="left">9</td>
<td align="left">4 bit, deterministic</td>
<td align="left">0.27 &#x000B1; 0.04</td>
<td align="left">0.154<sup>&#x0002A;</sup></td>
<td align="left">(1)</td>
</tr>
<tr>
<td align="left">10</td>
<td align="left">4 bit, probabilistic</td>
<td align="left">0.48 &#x000B1; 0.05</td>
<td align="left">0.043<sup>&#x02020;</sup></td>
<td align="left">(2)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The table lists the final performance R<sub>after</sub> and the Kolmogorov&#x02013;Smirnov (KS) measure D<sub>KS</sub> comparing the final weight distribution to the one of the reference simulation indicated by its row number in the last column. For this comparison, the continuous reference distribution is rounded to the respective resolution</italic>.</p>
<p><italic>The p-value for the KS test is p &#x0003D; 0.35 for 8 bit and p &#x0003C; 0.01 in all other cases</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>So in the task studied here, there is no gain in building synapses using more than 8 bit. Because weight updates are controlled by a programmable processor, it is possible to switch between deterministic and probabilistic updating even after the system has been manufactured. In this context, a trade-off can be made between number of synapses and reachable performance by using either probabilistic 4 bit or deterministic 8 bit synapses.</p>
<sec>
<title>3.2.1. Baseline with added noise</title>
<p>As discussed in section 2.2.4, probabilistic updates introduce additional noise on the weights. The baseline simulation with added noise uses updates &#x00394;&#x02032; &#x0003D; &#x00394; &#x0002B; <italic>z</italic> with <italic>z</italic> drawn from the distribution given in Equation (17) using <italic>r</italic> &#x0003D; 4.</p>
<p>Figure <xref ref-type="fig" rid="F6">6A</xref> shows reward traces for the baseline simulation with and without added noise. One can see, that with noise learning is initially faster, but fails to reach the same level as without. The final level of performance in the former case is <italic>R</italic><sup>noise</sup><sub>after</sub> &#x0003D; 0.45 &#x000B1; 0.03, while it was <italic>R</italic><sup>base</sup><sub>after</sub> &#x0003D; 0.54 &#x000B1; 0.05 in the latter simulation. Figure <xref ref-type="fig" rid="F6">6B</xref> compares baseline with added noise to the case with 4 bit weights and probabilistic updates. Both variants reach the same final level of reward (<italic>R</italic><sup>noise</sup><sub>after</sub> &#x0003D; 0.45 &#x000B1; 0.03 and <italic>R</italic><sup>4p</sup><sub>after</sub> &#x0003D; 0.46 &#x000B1; 0.03), but with continuous weights this level is reached faster. In conclusion, Figure <xref ref-type="fig" rid="F6">6</xref> shows, that the achievable performance for 4 bit resolution with probabilistic updates is limited by the added noise and not the limitation to discrete weight values.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>(A)</bold> Comparison of the baseline simulation with and without added noise on the weight updates. The lower plot shows the difference between both traces in the upper plot. <bold>(B)</bold> Comparison between 4 bit discretized weights with probabilistic updates and baseline with added noise of equivalent magnitude. Again, the lower plot shows the difference between both traces in the upper box.</p></caption>
<graphic xlink:href="fnins-07-00160-g0006.tif"/>
</fig>
</sec>
<sec>
<title>3.2.2. Effect on weights</title>
<p>Besides comparing the received reward, it is also informative to compare the distribution of synaptic weights after learning for the different weight resolutions. Figure <xref ref-type="fig" rid="F7">7</xref> shows histograms of weights for different resolutions and deterministic and probabilistic updating. The weights of the baseline simulation are given in Figure <xref ref-type="fig" rid="F7">7A</xref> and with added noise for <italic>r</italic> &#x0003D; 4 in Figure <xref ref-type="fig" rid="F7">7E</xref>. For discretized weights with deterministic updates, the distribution from Figure <xref ref-type="fig" rid="F7">7A</xref> binned to the respective resolution is also shown in green (Figures <xref ref-type="fig" rid="F7">7B&#x02013;D</xref>). For Figure <xref ref-type="fig" rid="F7">7F</xref>, the green bars show the weights from the baseline simulation with added noise binned to 4 bit.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>Histograms of synaptic weights after learning.</bold> The weights from all 20 repetitions for each resolution and update mode are shown. <bold>(A)</bold> Continuous weights. <bold>(B)</bold> 8 bit weights in black. Continuous weights are discretized to this resolution and shown in green. <bold>(C)</bold> 6 bit weights in black, again with equally binned continuous weights in green. <bold>(D)</bold> 4 bit weights with deterministic updates in black and the continuous result in green. <bold>(E)</bold> Final weights for the baseline simulation with artificially added noise, of which the reward trace is shown in Figure <xref ref-type="fig" rid="F6">6</xref>. <bold>(F)</bold> Final weight histogram for 4 bit resolution with probabilistic updates in black. Now the green bars give the distribution of weights from the baseline simulation with added noise.</p></caption>
<graphic xlink:href="fnins-07-00160-g0007.tif"/>
</fig>
<p>The baseline histograms (Figures <xref ref-type="fig" rid="F7">7A,E</xref>) are bimodal with peaks at the maximum and minimum allowed weights. This is also the result, one would get for an unsupervised additive STDP rule (Morrison et al., <xref ref-type="bibr" rid="B22">2008</xref>). With discretized weights and deterministic updates, the bi-modality is maintained. For 6 and 4 bit an increasing deviation from the rounded baseline histogram is apparent. Here, more weights lie in the central region, so that the counts are lower than baseline toward the minimum and maximum weights. For 4 bit with deterministic updates (Figure <xref ref-type="fig" rid="F7">7D</xref>) a local maximum at 0.2 nS can be observed. This corresponds to the initial weight <italic>w</italic><sub><italic>S</italic></sub> &#x0003D; 0.21 nS and indicates that many synapses have not been updated at all or only with small increments.</p>
<p>The results of a Kolmogorov&#x02013;Smirnov (KS) test between the baseline distribution shown in Figure <xref ref-type="fig" rid="F7">7A</xref> and the respective result obtained with discrete weights is shown in Table <xref ref-type="table" rid="T3">3</xref>. The baseline distribution was rounded to the weight resolution of the respective simulation for the test. The data show increasing deviation with smaller weight resolution. The obtained <italic>p</italic>-values indicate, that the distributions are not identical to the discretized baseline case (<italic>p</italic> &#x0003D; 0.35 for 8 bit and <italic>p</italic> &#x0003C; 0.01 otherwise). Note, that the distribution is also different from the continuous baseline distribution, since it is discrete.</p>
<p>The root-mean-square error of the weights as compared to the baseline simulation is given by
<disp-formula id="E25"><label>(25)</label><mml:math id="M25"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>U</mml:mi></mml:msub><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>U</mml:mi></mml:msub><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mtext>base</mml:mtext></mml:mrow></mml:msubsup></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:msqrt><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Here, &#x02329;<italic>w</italic><sup>base</sup><sub><italic>i</italic></sub>&#x0232A; is the <italic>i</italic>-th weight averaged over 20 repetitions of the baseline simulation. Averaged over the individual runs of the baseline simulation itself, this gives &#x02329;<italic>E</italic><sup>base</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.10 &#x000B1; 0.06) nS. For 8, 6, and 4 bit, this increases to &#x02329;<italic>E</italic><sup>8</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.11 &#x000B1; 0.06) nS, &#x02329;<italic>E</italic><sup>6</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.12 &#x000B1; 0.06) nS, and &#x02329;<italic>E</italic><sup>4</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.17 &#x000B1; 0.07) nS. Compared to the total weight range of only 0.5 nS, those are large deviations. Since already the baseline simulation shows a root-mean-square error of 20% of this range, it can be concluded, that learning does not produce a single fixed set of weights. This is either due to redundancy in the weights or irrelevant synapses.</p>
<p>When noise on weight updates is added to the simulation, the distribution of final weights changes (Figure <xref ref-type="fig" rid="F7">7E</xref>). Here, the histogram is still bimodal with peaks at the weight boundaries, but in-between the distribution is flat. The weight noise modifies weights by up to &#x003B4;<italic>r</italic> &#x02248; 0.03 nS in each update (see Equation 17). This acts as a diffusion process smoothing the weight distribution. For 4 bit weights with probabilistic updates (Figure <xref ref-type="fig" rid="F7">7F</xref>), the histogram is also flattened compared to the variant with deterministic updates (Figure <xref ref-type="fig" rid="F7">7D</xref>). The result is qualitatively in good agreement with the rounded weights from the baseline simulation with added noise. The root-mean-square error using weights from the baseline simulation with added noise as reference is &#x02329;<italic>E</italic><sup>4p</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.15 &#x000B1; 0.03) nS. The KS test reveals a smaller deviation from the baseline simulation with noise compared to the 4 bit case with deterministic updates (No. 6 compared to no. 5 in Table <xref ref-type="table" rid="T3">3</xref>). However, the test also shows the weight distributions to not be identical (<italic>p</italic> &#x0003C; 0.01).</p>
</sec>
</sec>
<sec>
<title>3.3. Thresholded readout</title>
<p>The hybrid approach of combining processor based digital computing with analog special-function units necessitates an interface between these two. At this interface some form of analog-to-digital conversion (ADC) has to take place. The simplest form of ADC is comparison to a threshold. We next ask whether such a simple interface is sufficient for good performance on the learning task. Figure <xref ref-type="fig" rid="F8">8</xref> shows performance for different weight resolutions compared to baseline using the thresholded readout. In contrast to the simulations shown in Figure <xref ref-type="fig" rid="F5">5</xref>, updates are now calculated according to Equation (19) instead of Equation (8). In particular, Equation (19) does not directly use the eligibility trace <italic>e</italic>(<italic>t</italic><sub>trial</sub>), but the evaluation bits <italic>b</italic><sub>&#x0002B;</sub>, <italic>b</italic><sub>&#x02212;</sub> determined by the readout mechanism (Equation 18). Performance in the case of continuous, 8 and 6 bit synapses (6 bit with threshold readout mechanism not shown) qualitatively shows the same picture with and without threshold readout (compare Figures <xref ref-type="fig" rid="F5">5</xref>, <xref ref-type="fig" rid="F8">8</xref>): Resolutions of 8 and 6 bit reach good performance while 4 bit with deterministic updates is degraded. The precise values of the final reward <italic>R</italic><sub>after</sub> given in Table <xref ref-type="table" rid="T3">3</xref> indicate a small improvement of 0.06 &#x000B1; 0.04 in reward by the threshold mechanism for 8 and 6 bit. When comparing traces for weights of the same resolution in Figures <xref ref-type="fig" rid="F5">5</xref>, <xref ref-type="fig" rid="F8">8</xref>, those with threshold readout (Figure <xref ref-type="fig" rid="F8">8</xref>) show less variability between trials. For example, the trace of the single run in Figure <xref ref-type="fig" rid="F5">5A</xref> exhibits more noise than the one in Figure <xref ref-type="fig" rid="F8">8A</xref>. The variability can be quantified by the standard deviation &#x003C3;<sub><italic>S</italic></sub> of the success signal <italic>S</italic> (see Equation 7). For a resolution of 8 bit, &#x003C3;<sub><italic>S</italic></sub> &#x0003D; 4.0&#x000D7; 10<sup>&#x02212;5</sup> is reduced to &#x003C3;<sub><italic>S</italic></sub> &#x0003D; 1.2 &#x000D7; 10<sup>&#x02212;5</sup>, when using the threshold readout. This is caused by the smoothing effect of the readout threshold, which effectively replaces extreme values of the eligibility trace <italic>e</italic>(<italic>t</italic><sub>trial</sub>) with the update constant <italic>A</italic> &#x0003D; <italic>A</italic><sup>&#x0002A;</sup>. The update constant <italic>A</italic><sup>&#x0002A;</sup> is determined heuristically according to Equation (20).</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p><bold>Performance with threshold readout.</bold> As in Figure <xref ref-type="fig" rid="F5">5</xref> the running average of the reward <inline-formula><mml:math id="M37"><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> is plotted averaged over 20 runs. The lower plots show the difference to the baseline trace in Figure <xref ref-type="fig" rid="F5">5A</xref>. <bold>(A)</bold> Performance traces for continuous and 8 bit weights. In gray reward <italic>R</italic> for every trial in a single run with continuous weights is shown. <bold>(B)</bold> Performance traces for 4 bit resolution with deterministic and probabilistic updates.</p></caption>
<graphic xlink:href="fnins-07-00160-g0008.tif"/>
</fig>
<p>When using probabilistic updates (Figure <xref ref-type="fig" rid="F8">8B</xref>, green trace), the performance level of the baseline simulation with added noise on the weights of equivalent magnitude is also slightly surpassed (see Nos. 2 and 9 in Table <xref ref-type="table" rid="T3">3</xref>). With deterministic updates and 4 bit synapses, performance is further reduced by 0.10 &#x000B1; 0.05 using the threshold readout (black traces in Figures <xref ref-type="fig" rid="F5">5D</xref>, <xref ref-type="fig" rid="F8">8B</xref>).</p>
<p>Hence the simple readout method consisting in using only a threshold comparison does not reduce performance. Therefore, the qualitative result from the previous section still holds: with deterministic updates 6 bit is enough to achieve the performance level of the baseline simulation. If updates are performed in a probabilistic manner, 4 bit is sufficient to reach the performance of the baseline simulation with added noise.</p>
<sec>
<title>3.3.1. Effect on weights</title>
<p>Comparing the histograms of synaptic weights after learning gives a similar picture to the results of section 3.2: With deterministic updates, the histograms have maxima at the upper and lower weight limit as is shown in Figures <xref ref-type="fig" rid="F9">9A,B</xref>. The 4 bit case (Figure <xref ref-type="fig" rid="F9">9B</xref>) again shows a local maximum around the initial weight value <italic>w</italic><sub><italic>S</italic></sub> &#x0003D; 0.21 nS. In comparison to Figure <xref ref-type="fig" rid="F7">7D</xref> this maximum is broader. With probabilistic updates the histogram is nearly flat (Figure <xref ref-type="fig" rid="F9">9C</xref>). The average root-mean-square error to the mean baseline weights can be compared to the values given in section 3.2: For 8 bit resolution it is &#x02329;<italic>E</italic><sup>8t</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.19 &#x000B1; 0.06) nS, which is larger than &#x02329;<italic>E</italic><sup>8</sup><sub><italic>w</italic></sub>&#x0232A;. For 4 bit the error &#x02329;<italic>E</italic><sup>4t</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.18 &#x000B1; 0.11) nS is comparable to &#x02329;<italic>E</italic><sup>4</sup><sub><italic>w</italic></sub>&#x0232A;. With probabilistic updates the result &#x02329;<italic>E</italic><sup>4tp</sup><sub><italic>w</italic></sub>&#x0232A; &#x0003D; (0.15 &#x000B1; 0.01) nS is the same as &#x02329;<italic>E</italic><sup>4p</sup><sub><italic>w</italic></sub>&#x0232A; without the threshold readout.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>Histogram of synaptic weights after learning with threshold readout. (A)</bold> The histogram is plotted in black for 8 bit weights. The green histogram shows the result for continuous weights rounded to this resolution. <bold>(B)</bold> As in <bold>(A)</bold>, but for 4 bit weights with deterministic updating. <bold>(C)</bold> Final weights for a resolution of 4 bit with probabilistic updating in black. Now, the green histogram shows the final weights of the baseline simulation with added noise rounded to this resolution.</p></caption>
<graphic xlink:href="fnins-07-00160-g0009.tif"/>
</fig>
<p>The KS test shows larger deviations of the weight distribution for all simulations with deterministic updates compared to having only discrete weights (Table <xref ref-type="table" rid="T3">3</xref>). For 4 bit with probabilistic updates the deviation is decreased (Nos. 10 and 6 in Table <xref ref-type="table" rid="T3">3</xref>).</p>
</sec>
</sec>
<sec>
<title>3.4. Analog drift</title>
<p>In the hardware system, the eligibility trace is implemented as an analog variable inside the synapse circuit. It is therefore subject to drift caused by leakage currents. In Equation (21), we have proposed to model this using a drift function. Additionally, this behavior varies between synapses due to imperfections introduced by the manufacturing process. This is taken account for by randomly drawing parameters for the drift function from a Gaussian distribution.</p>
<p>To assess the impact of this drift on the performance in the learning task, we performed a sweep over a number of average time constants and degrees of mismatch between synapses. The results of the simulation, using continuous weights and the thresholded eligibility readout described above, are shown in Figure <xref ref-type="fig" rid="F10">10</xref>. The gray value indicates the difference between <italic>R</italic><sub>after</sub> and the baseline value <italic>R</italic><sup>base</sup><sub>after</sub> (section ??) in units of the standard deviation of the baseline simulation (darker color is better). All values fall within one standard deviation of the baseline case, which means that performance is only weakly sensitive to changes of time constant and mismatch of the eligibility trace. The best performance is achieved for &#x003C4;<italic>e</italic> &#x0003D; 0.5 s and no mismatch (<italic>R</italic><sub>after</sub> &#x0003D; 0.59 &#x000B1; 0.02). In section 3.3, the black trace in Figure <xref ref-type="fig" rid="F8">8A</xref> shows the reward trace for the same parameters. The simulation there reached the same performance. For very large time constants, i.e., &#x003C4;<sub><italic>e</italic></sub> &#x0003D; &#x000B1; 1000 s, drift is negligible compared to the trial duration <italic>t</italic><sub>trial</sub> &#x0003D; 1 s. This leads to minor deviations in the leftmost (&#x02329;<italic>R</italic><sub>after</sub>&#x0232A; &#x0003D; 0.55 &#x000B1; 0.02) and rightmost (&#x02329;<italic>R</italic><sub>after</sub>&#x0232A; &#x0003D; 0.55 &#x000B1; 0.01) columns of Figure <xref ref-type="fig" rid="F10">10</xref>. This is above the baseline level, but below the one reached in simulations with threshold readout and 8 bit resolution. The worst performance (<italic>R</italic><sub>after</sub> &#x0003D; 0.45 &#x000B1; 0.04) is obtained for small time constants &#x003C4;<sub><italic>e</italic></sub> &#x0003D; 0.5 s with large mismatch factor <italic>m</italic><sub><italic>e</italic></sub> &#x0003D; 1, because for &#x003C4;<sub><italic>e</italic></sub> lesser than or equal to the trial duration, the effect of drift is more important.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p><bold>Difference of final reward to the baseline simulation <italic>R</italic><sub>after</sub> &#x02013; <italic>R</italic><sup>base</sup><sub>after</sub> in units of the baseline standard deviation.</bold> The varied parameters are the average time constant and the amount of mismatch between synapses.</p></caption>
<graphic xlink:href="fnins-07-00160-g0010.tif"/>
</fig>
<p>In this test, the model has shown to be robust to large deviations from the temporal behavior of the eligibility trace in the baseline model. Drift toward the positive and negative extrema of the eligibility trace, which is the opposite of the desired decaying behavior, does not affect performance. Neither does variation of up to 150 % of the time constant. This shows the model to be a well-suited candidate for implementation in neuromorphic hardware, where large variations and distortions are often encountered.</p>
</sec>
<sec>
<title>3.5. Delayed reward</title>
<p>In the proposed system, the simulation of the neural network is carried on by analog hardware elements, while the simulation of the environment is left to a conventional computer system. In this context, latencies due to technical reasons&#x02014;e.g., by communication with the environment or computation by the EPP&#x02014;can cause temporal delays with respect to ideal calculations. Additionally, the analog readout of the accumulation traces <italic>a</italic><sub>&#x0002B;</sub>, <italic>a</italic><sub>&#x02212;</sub> is affected by noise.</p>
<p>To better understand the impact of these effects on learning performance, a sweep over readout noise and reward latency values was performed, the results of which are shown in Figure <xref ref-type="fig" rid="F11">11</xref>. The simulation did not include mismatched drift, but used a fixed time constant of 500 ms with continuous weights. The gray value represents the improvement in reward by learning <italic>R</italic><sub>after</sub> &#x02212; <italic>R</italic><sub>before</sub>. The data shows that depending on the amount of noise learning is impaired by the delay. The red bars indicate the predicted maximally tolerable delay assuming a signal-to-noise ratio of one is required (Equation 24). The simulation fits the prediction well. A noise level of &#x003C3;<sub><italic>a</italic></sub> &#x0003D; 500 pS corresponds to 50 % of the maximum of the eligibility trace <italic>a</italic><sub>max</sub>.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p><bold>Improvement in reward <italic>R</italic><sub>after</sub> &#x02013; <italic>R</italic><sub>before</sub> by learning for a range of delays and accumulator readout noise levels.</bold> Red bars indicate the predicted maximally tolerable delay (Equation 24). Data is averaged over 15 simulation runs.</p></caption>
<graphic xlink:href="fnins-07-00160-g0011.tif"/>
</fig>
<p>The simulation results confirm that noise on the local accumulation circuit limits tolerable delay. Because of the accelerated time base of the system, communication delays can easily reach seconds of emulated time. With an acceleration factor of &#x003B1; &#x0003D; 10<sup>5</sup> 1 s of emulated time is equivalent to 10 &#x003BC;s. So with 1 % of noise (&#x003C3;<sub><italic>a</italic></sub> &#x0003D; 10 pS), the round-trip-time to the environment must be less than 20 &#x003BC;s for a &#x003C4;<sub><italic>e</italic></sub> &#x0003D; 500 ms time constant. Equation (24) can be used to find working combinations of the parameters round-trip-time, analog noise and time constant.</p>
</sec>
<sec>
<title>3.6. Toward hardware implementation</title>
<p>The previous sections have presented results for the performance of the learning rule under various constraints caused by a hardware implementation. We now want to present simulation results and area estimates for the hardware implementation itself. So far, the EPP has been produced as an isolated general purpose processor in a 65 nm process technology. A version integrated into the BrainScaleS wafer-scale system was tested in simulation.</p>
<p>The EPP core produced in the 65 nm technology covers an area of 0.14 mm<sup>2</sup> excluding SRAM blocks for 32 kiB of main memory. It was tested using the CoreMark benchmark (EEMBC, <xref ref-type="bibr" rid="B6">2012</xref>) achieving a normalized score of 0.75 <inline-formula><mml:math id="M38"><mml:mrow><mml:mfrac><mml:mrow><mml:mtext>Iterations</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mtext>MHz</mml:mtext></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula>. At 500 MHz and 1.2 V supply voltage it consumes (48.0 &#x000B1; 0.1) mW of power executing the CoreMark benchmark.</p>
<p>The BrainScaleS wafer-scale system is built in a larger 180 nm process technology. A version with integrated EPP was prepared to estimate area requirements and to simulate the system. The design was synthesized and standard cell placement was carried out. This gave an area estimate for the EPP core of 0.895 mm<sup>2</sup>, excluding the 12 kiB of main memory. All plasticity related logic in the digital part make up 6.2 % of the total design area. In simulation we tested a weight updating program suitable for the reward modulated STDP rule discussed in this study. It requires 5.1 kiB of main memory and achieves a best-case update rate of 9552 synapses/s for 4 bit weight resolution. Due to the lack of hardware support for probabilistic updates and higher weight resolutions than 4 bit in the <sc>SYNAPSE</sc> special-function unit, performance is reduced in these cases. For probabilistic updates it is 802 synapses/s and for 8 bit weights 573 synapses/s. Note, that update rates are given in the biological time domain using an acceleration factor of 10<sup>4</sup>.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<p>In this study we have proposed a hybrid architecture for plasticity, combining local analog computing with global, program-based processing. We have then simulated a reward-modulated spike-timing-dependent plasticity learning rule studied by Fr&#x000E9;maux et al. (<xref ref-type="bibr" rid="B11">2010</xref>) to analyze its implementability. Starting from a baseline case with no hardware effects, the level of hardware detail of the simulations was increased, with a focus on the negative effects introduced by an implementation using the proposed system. Note that we did not try to precisely model the hardware device, as it would be done, for example, in a transistor level simulation. Instead, our goal was to find the effects to which the model is sensitive in order to guide future design decisions.</p>
<p>Overall, we did not find major obstacles for the proposed implementation, but we showed that some design choices are critical to the proper functioning of the learning rule. In the following, we will discuss guidelines concerning weight resolution, implementation of the eligibility trace and the importance of low-latency communication. After that, we will consider scalability and flexibility of the approach and compare the design with other hardware systems and discuss the limitations of this study.</p>
<sec>
<title>4.1. Weight resolution</title>
<p>For neuromorphic hardware systems using digitally represented weights, a key question is how many bits to use per synapse, as this determines the amount of wafer area the circuit requires. For networks with highly connected neurons, small synapses are important for scalability. This drives implementations to a reduction of the number of bits used for the weight compared to software simulators, which typically use a quasi-continuous 32 or 64 bit floating-point representation. On the other hand, on-line synaptic plasticity learning rules, for example STDP, require incremental changes to the weights. Discretization confines these changes to a grid with a resolution determined by the number of bits.</p>
<p>For the synaptic plasticity model and the learning task considered, we found that this indeed limits learning performance when using deterministic updates and 4 bit weights. Two solutions to this problem were tested: using higher resolutions and making updates probabilistically. In the former case, a performance comparable to the continuous case is reached with 6 bit. With probabilistic updates, the performance of 4 bit synapses could be improved to nearly the same level. The comparison to the baseline simulation with added noise of equivalent magnitude showed performance to be limited by the introduced noise and not the discretization of weight values. Therefore, it is not necessary to build high resolution hardware synapses comparable to software simulators, but even a modest number of bits gives good performance.</p>
<p>In Seo et al. (<xref ref-type="bibr" rid="B35">2011</xref>) the authors arrive at a similar result. They built a completely digital system in a version with 1 bit synapses and probabilistic updates and one with 4 bit synapses and deterministic updates. Learning performance in a benchmark task is improved in the latter case, but adds additional costs in area and power consumption.</p>
<p>In Pfeil et al. (<xref ref-type="bibr" rid="B25">2012</xref>) the question of weight resolution was also studied for the BrainScaleS wafer-scale system using a synchrony detection task. Comparable to our findings, they report 8 bit weights to perform as good as floating-point weights. 4 bit weights were sufficient for solving the task, but did not reach the same performance.</p>
</sec>
<sec>
<title>4.2. Implementation of the eligibility trace</title>
<p>In neural models of reinforcement learning, the eligibility trace serves an important purpose: it allows to connect neural activity with reward. Reward typically arrives with a delay with respect to the activity underlying causing actions respective spikes. But only when reward arrives does the agent know how to change the weights. The hybrid concept of local analog accumulation and global processor-based weight computation fits this model very well. Therefore, we can identify the local circuit in the synapse with the eligibility trace. However, there are two differences. First, the processor does not have direct access to the accumulated value, but can only do a simple comparison operation (Equation 2). Second, there is no controlled exponential decay of the accumulator. The analysis in sections 3.3 and 3.4 shows no degradation in learning performance by both effects. On the other hand, the lack of controlled and possibly configurable decay presents a constraint to the fidelity, with which learning rules can be implemented. It is not clear, how other learning tasks would be affected by this lack.</p>
</sec>
<sec>
<title>4.3. Impact of real-world timings</title>
<p>In the presence of delayed reward, three parameters govern whether learning is possible: (1) communication round-trip-time to the environment and back, (2) the amount of noise on the eligibility trace, and (3) the time constant of decay of the eligibility trace. Equation (24) allows to determine working combinations of them. Reducing the speed-up factor would make communication latency less of a problem, but it would require longer lasting analog storage to achieve the same time constant in emulated time. Small long-term analog memory is difficult to build due to leakage effects. Therefore, the triangle of parameters needs to be carefully balanced. A different approach to deal with communication latency would be to execute the environment on the EPP itself. This would require adding direct access to spike times by the processor.</p>
</sec>
<sec>
<title>4.4. Scalability and flexibility</title>
<p>It is important to note, that the synaptic weight and eligibility trace are stored local to the synapse circuit and therefore do not consume processor main memory. Therefore, for the tested learning rule the required memory does not increase with the number of synapses. The rule itself can be implemented using 5.1 kiB of memory for code and data, which is well below the provided 12 kiB. The time to update all of the synapses scales linearly with their number. In the proposed hardware system, one EPP processes up to 230 k synapses. Compared to this, the best-case updating rate of 9552 synapses/s for the reward modulated STDP rule implies delays on the order of tens of seconds if all synapses are used. Therefore, the same considerations apply as to the problem of delayed reward discussed in section 4.3. For the task tested here, simulations indicate no degradation of learning performance for update rates down to 500 synapses/s (data not shown). However, the task only uses a small subset of 1250 synapses.</p>
<p>In general and depending on the task, the updating rate can limit the number of usable plastic synapses per processor to a number below 230 k. This can be met with three strategies: Randomizing the order of updates, so that over time all synapses are updated with a short delay. Reducing the acceleration factor by recalibration as long as the resulting neuronal time-constants are still within the achievable range of the circuit. Distributing plastic synapses over the wafer, so that fewer are used per processor and thereby trading efficiency against fidelity of the emulation. The last approach is especially suitable if not all synapses in the model require plasticity.</p>
<p>Since the EPP is a general purpose processor, arbitrary C-code can be used to define learning rules. These rules are restricted by three constraints: (1) The program has to fit into 12 kiB of memory. (2) The updating rate establishes a soft limit on the number of plastic synapses per processor. (3) The program can only observe the network activity through the local accumulation circuits. The last point in particular excludes changing the shape of the STDP curve (Equation 5), since it is a fixed property of the local synapse circuit.</p>
<p>Although we only discuss one particular learning rule in detail in this study, a main strength of the system is its ability to implement a wide set of rules. Going beyond STDP-based rules, two examples would be gradient descent methods and evolutionary algorithms. In both cases&#x02014;as for the STDP rule studied here&#x02014;the environment provides a reward signal that guides the change of weights performed locally by the EPP. For these two examples, the local accumulation circuit is not used at all. Instead, for gradient descent, or ascent in the case of reward, the gradient of a randomly selected subset of weights is determined by evaluating the performance of the network multiple times and then changing the weights in direction of the gradient. For evolutionary algorithms, the weights belonging to an individual would be distributed over the wafer, so that every processor has access to a subset of weights of all individuals. After the reward for each individual is supplied by the environment, the processors can perform combination and mutation on their local subsets in parallel. Typically, gradient descent and evolutionary algorithms require many evaluations of network performance and are therefore computationally expensive on conventional computers. In the proposed hardware system, the high acceleration factor, implementation of the network dynamics as physical model, and the parallel weight update promise fast learning with these rules and good scalability with the number of synapses.</p>
</sec>
<sec>
<title>4.5. Comparison to other STDP implementations</title>
<p>Plasticity implementations found in the literature typically focus on variants of unsupervised STDP and use fixed-function hardware. For example in Indiveri et al. (<xref ref-type="bibr" rid="B16">2006</xref>) STDP works on bi-stable synapses and is implemented using fully analog circuits. In Ramakrishnan et al. (<xref ref-type="bibr" rid="B28">2011</xref>) analog floating-gate memory is used for weight storage that can be subjected to plasticity. In contrast, Seo et al. (<xref ref-type="bibr" rid="B35">2011</xref>) describes a fully digital implementation using counters and linear-feedback shift registers for probabilistic STDP with single-bit synapses. While these systems allow for flexibility, for example in the shape of the timing dependence, there are three main restrictions compared to the processor based implementation presented here: (1) Flexibility is restricted to parameterization of a more or less generic circuit. (2) Weight changes are triggered by spike-events and depend on the timing of spike-pairs. (3) The synapse has no state in addition to the weight. Points (2) and (3) imply that weights have to be changed immediately in reaction to pre- or postsynaptic spikes. This rules out the ability to implement an eligibility trace to solve the distal reward problem of reinforcement learning (Izhikevich, <xref ref-type="bibr" rid="B17">2007</xref>).</p>
<p>The analog synapse circuit in Wijekoon and Dudek (<xref ref-type="bibr" rid="B44">2011</xref>) does include a local eligibility trace and the ability to modulate the weight update by an external reward signal. The plasticity of the synapses can be configured to operate under modulation or as unsupervised STDP. Their approach represents a specialized implementation of reward modulation that emphasizes power and area efficiency. In contrast, our approach aims for flexibility, so that very different learning rules can be implemented on the same hardware substrate, thereby sacrificing some of the efficiency. Examples given previously for non-STDP type learning rules are gradient descent and evolutionary algorithms.</p>
<p>However, there are systems that also use a general-purpose processor for plasticity. For example, in Vogelstein et al. (<xref ref-type="bibr" rid="B41">2003</xref>) an implementation of STDP in an address-event representation (AER) routing system is presented. They use three individual chips: a custom integrate-and-fire neuron array, an SRAM based look-up table for synaptic connections and a micro-controller for plasticity. For STDP, the micro-controller processes every spike and maintains queues of pre- and post-synaptic events. This necessitates multiple off-chip memory accesses for every event and at regular time steps. Contrary to our approach, their system has access to the detailed timing of spikes and can therefore additionally implement rules including short-term effects, as in Froemke et al. (<xref ref-type="bibr" rid="B12">2010</xref>). However, in terms of scalability, our proposed system is superior due to the integration of processor, event routing and neuronal dynamics onto the same wafer. This reduces power consumption by eliminating communication across chip boundaries. Also, due to the hybrid architecture of analog accumulation and digital weight computation, the workload for the processor is reduced. This is an important aspect if a high speed-up factor is aimed for.</p>
<p>The system reported in Davies et al. (<xref ref-type="bibr" rid="B4">2012</xref>) is a specialized multi-processor platform for neural simulations. In implementing STDP, a key constraint for them is limited access to weights stored in external memory. They solve this problem by predicting firing times based on the membrane potential. This simultaneously illustrates the strength and weakness of this architecture. Since the system is completely digital, they have unconstrained access to state variables, such as the membrane potential. With analog neurons, this always requires some form of analog to digital conversion. On the other hand, weights are stored external to the processor and have to be transfered between chips. In our system, close integration of weight memory and processor on the same substrate in addition to the optimized input/output instructions of the <sc>SYNAPSE</sc> special-function unit, make weight access more efficient.</p>
<p>In conclusion, the hybrid processor based architecture proposed in this study represents a novel plasticity implementation for hardware. To our knowledge, it introduces two novel concepts: first, the integration of a general-purpose processor for plasticity onto the neuromorphic substrate, and second, the close interaction with specialized analog computational units using an extension of the instruction set. In combination, this allows for reward-based spike-timing-dependent synaptic plasticity in reinforcement learning tasks.</p>
</sec>
<sec>
<title>4.6. Limitations</title>
<p>The goal of this study was to analyze the implementability of a reinforcement learning task on a proposed novel hardware system. The technical implementability of the system itself was not subject of this study. We assumed a sufficiently fast processor for the delay analysis (section 2.2.4). It should be part of the design process of a future implementation to test performance against our simulations. The updating speed could limit the amount of plastic synapses per processor depending on the decay time constant &#x003C4;<sub><italic>e</italic></sub>. We also did not model the analog part of the system in detail, but restricted simulations to a generic drift function. Measurements in the existing BrainScaleS wafer-scale system could be used to characterize the drifting behavior. However, considering that we did not see degraded performance over a large range of time constants and fixed-pattern variation, it does not seem likely that performance would be worse in a more accurate model.</p>
<p>With regard to the model tested here, we restricted the study to one specific task of spike train learning, which is a generic and general learning task for spiking neurons: many tasks can be formulated as a relaxed version of spike train learning. We showed that the performance of the model is not negatively affected by hardware constraints. It remains an open question whether there are other tasks that give good performance in software simulations, but fail when hardware constraints are included. We restricted the study to epochal learning with defined trial-duration ended by the application of the reward. In a next step, this approach should be extended to continuous time learning scenarios. In this case, processor update speed and the size of the decay time constant could play a more important role.</p>
</sec>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>The research leading to these results has received funding by the European Union 7th Framework Program under grant agreement nos. 243914 (Brain-i-Nets) and 269921 (BrainScaleS). We would like to thank Thomas Pfeil for helpful discussions.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Backus</surname> <given-names>J.</given-names></name></person-group> (<year>1978</year>). <article-title>Can programming be liberated from the von neumann style? A functional style and its algebra of programs</article-title>. <source>Commun. ACM</source> <volume>21</volume>, <fpage>613</fpage>&#x02013;<lpage>641</lpage>. <pub-id pub-id-type="doi">10.1145/359576.359579</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brette</surname> <given-names>R.</given-names></name> <name><surname>Gerstner</surname> <given-names>W.</given-names></name></person-group> (<year>2005</year>). <article-title>Adaptive exponential integrate-and-fire model as an effective description of neuronal activity</article-title>. <source>J. Neurophysiol</source>. <volume>94</volume>, <fpage>3637</fpage>&#x02013;<lpage>3642</lpage>. <pub-id pub-id-type="doi">10.1152/jn.00686.2005</pub-id><pub-id pub-id-type="pmid">16014787</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caporale</surname> <given-names>N.</given-names></name> <name><surname>Dan</surname> <given-names>Y.</given-names></name></person-group> (<year>2008</year>). <article-title>Spike timing-dependent plasticity: a hebbian learning rule</article-title>. <source>Annu. Rev. Neurosci</source>. <pub-id pub-id-type="doi">10.1146/annurev.neuro.31.060407.125639</pub-id><pub-id pub-id-type="pmid">18275283</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>S.</given-names></name> <name><surname>Galluppi</surname> <given-names>F.</given-names></name> <name><surname>Rast</surname> <given-names>A. D.</given-names></name> <name><surname>Furber</surname> <given-names>S. B.</given-names></name></person-group> (<year>2012</year>). <article-title>A forecast-based stdp rule suitable for neuromorphic implementation</article-title>. <source>Neural Netw</source>. <volume>32</volume>, <fpage>3</fpage>&#x02013;<lpage>14</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.sciencedirect.com/science/article/pii/S0893608012000470">http://www.sciencedirect.com/science/article/pii/S0893608012000470</ext-link> (Selected Papers from IJCNN 2011). <pub-id pub-id-type="doi">10.1016/j.neunet.2012.02.018</pub-id><pub-id pub-id-type="pmid">22386500</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davison</surname> <given-names>A. P.</given-names></name> <name><surname>Br&#x000FC;derle</surname> <given-names>D.</given-names></name> <name><surname>Eppler</surname> <given-names>J.</given-names></name> <name><surname>Kremkow</surname> <given-names>J.</given-names></name> <name><surname>Muller</surname> <given-names>E.</given-names></name> <name><surname>Pecevski</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>PyNN: a common interface for neuronal network simulators</article-title>. <source>Front. Neuroinform</source>. <volume>2</volume>:<issue>11</issue>. <pub-id pub-id-type="doi">10.3389/neuro.11.011.2008</pub-id><pub-id pub-id-type="pmid">19194529</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="web"><person-group person-group-type="author"><collab>Embedded Microprocessor Benchmark Consortium EEMBC.</collab></person-group> (<year>2012</year>). <article-title>Coremark benchmark</article-title>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.coremark.org">http://www.coremark.org</ext-link></citation>
</ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ehrlich</surname> <given-names>M.</given-names></name> <name><surname>Wendt</surname> <given-names>K.</given-names></name> <name><surname>Z&#x000FC;uhl</surname> <given-names>L.</given-names></name> <name><surname>Sch&#x000FC;ffny</surname> <given-names>R.</given-names></name> <name><surname>Br&#x000FC;derle</surname> <given-names>D.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>A software framework for mapping neural networks to a wafer-scale neuromorphic hardware system</article-title>, in <source>Proceedings of the Artificial Neural Networks and Intelligent Information Processing Conference (ANNIIP) 2010</source>, (<publisher-loc>Funchal</publisher-loc>), <fpage>43</fpage>&#x02013;<lpage>52</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Farries</surname> <given-names>M. A.</given-names></name> <name><surname>Fairhall</surname> <given-names>A. L.</given-names></name></person-group> (<year>2007</year>). <article-title>Reinforcement learning with modulated spike timing&#x02013;dependent synaptic plasticity</article-title>. <source>J. Neurophysiol</source>. <volume>98</volume>, <fpage>3648</fpage>&#x02013;<lpage>3665</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://jn.physiology.org/content/98/6/3648.abstract">http://jn.physiology.org/content/98/6/3648.abstract</ext-link>. <pub-id pub-id-type="doi">10.1152/jn.00364.2007</pub-id><pub-id pub-id-type="pmid">17928565</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fieres</surname> <given-names>J.</given-names></name> <name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Meier</surname> <given-names>K.</given-names></name></person-group> (<year>2008</year>). <article-title>Realizing biological spiking network models in a configurable wafer-scale hardware system</article-title>, in <source>Proceedings of the 2008 International Joint Conference on Neural Networks (IJCNN)</source>, (<publisher-loc>Hong Kong</publisher-loc>).</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Florian</surname> <given-names>R. V.</given-names></name></person-group> (<year>2007</year>). <article-title>Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity</article-title>. <source>Neural Comput</source>. <volume>19</volume>, <fpage>1468</fpage>&#x02013;<lpage>1502</lpage>. <pub-id pub-id-type="doi">10.1162/neco.2007.19.6.1468</pub-id><pub-id pub-id-type="pmid">17444757</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fr&#x000E9;maux</surname> <given-names>N.</given-names></name> <name><surname>Sprekeler</surname> <given-names>H.</given-names></name> <name><surname>Gerstner</surname> <given-names>W.</given-names></name></person-group> (<year>2010</year>). <article-title>Functional requirements for reward-modulated spike-timing-dependent plasticity</article-title>. <source>J. Neurosci</source>. <volume>30</volume>, <fpage>13326</fpage>&#x02013;<lpage>13337</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.6249-09.2010</pub-id><pub-id pub-id-type="pmid">20926659</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Froemke</surname> <given-names>R. C.</given-names></name> <name><surname>Debanne</surname> <given-names>D.</given-names></name> <name><surname>Bi</surname> <given-names>G.-Q.</given-names></name></person-group> (<year>2010</year>). <article-title>Temporal modulation of spike-timing-dependent plasticity</article-title>. <source>Front. Synap. Neurosci</source>. <volume>2</volume>:<issue>19</issue>. <pub-id pub-id-type="doi">10.3389/fnsyn.2010.00019</pub-id><pub-id pub-id-type="pmid">21423505</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Furber</surname> <given-names>S. B.</given-names></name> <name><surname>Lester</surname> <given-names>D. R.</given-names></name> <name><surname>Plana</surname> <given-names>L. A.</given-names></name> <name><surname>Garside</surname> <given-names>J. D.</given-names></name> <name><surname>Painkras</surname> <given-names>E.</given-names></name> <name><surname>Temple</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Overview of the SpiNNaker system architecture</article-title>. <source>IEEE Trans. Comput</source>. <volume>99</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TC.2012.142</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gerstner</surname> <given-names>W.</given-names></name> <name><surname>Kistler</surname> <given-names>W.</given-names></name></person-group> (<year>2002</year>). <source>Spiking Neuron Models: Single Neurons, Populations, Plasticity</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodman</surname> <given-names>D.</given-names></name> <name><surname>Brette</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Brian: a simulator for spiking neural networks in Python</article-title>. <source>Front. Neuroinform</source>. <volume>2</volume>:<issue>5</issue>. <pub-id pub-id-type="doi">10.3389/neuro.11.005.2008</pub-id><pub-id pub-id-type="pmid">19115011</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Indiveri</surname> <given-names>G.</given-names></name> <name><surname>Chicca</surname> <given-names>E.</given-names></name> <name><surname>Douglas</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity</article-title>. <source>IEEE Trans. Neural Netw</source>. <volume>17</volume>, <fpage>211</fpage>&#x02013;<lpage>221</lpage>. <pub-id pub-id-type="doi">10.1109/TNN.2005.860850</pub-id><pub-id pub-id-type="pmid">16526488</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Izhikevich</surname> <given-names>E. M.</given-names></name></person-group> (<year>2007</year>). <article-title>Solving the distal reward problem through linkage of stdp and dopamine signaling</article-title>. <source>Cereb. Cortex</source> <volume>17</volume>, <fpage>2443</fpage>&#x02013;<lpage>2452</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://cercor.oxfordjournals.org/content/17/10/2443.abstract">http://cercor.oxfordjournals.org/content/17/10/2443.abstract</ext-link> <pub-id pub-id-type="doi">10.1093/cercor/bhl152</pub-id><pub-id pub-id-type="pmid">17220510</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Legenstein</surname> <given-names>R.</given-names></name> <name><surname>Pecevski</surname> <given-names>D.</given-names></name> <name><surname>Maass</surname> <given-names>W.</given-names></name></person-group> (<year>2008</year>). <article-title>A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback</article-title>. <source>PLoS Comput. Biol</source>. <volume>4</volume>:<fpage>e1000180</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000180</pub-id><pub-id pub-id-type="pmid">18846203</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Livi</surname> <given-names>P.</given-names></name> <name><surname>Indiveri</surname> <given-names>G.</given-names></name></person-group> (<year>2009</year>). <article-title>A current-mode conductance-based silicon neuron for address-event neuromorphic systems</article-title>, in <source>IEEE International Symposium on Circuits and Systems, ISCAS 2009</source>, (<publisher-loc>Taipei</publisher-loc>), <fpage>2898</fpage>&#x02013;<lpage>2901</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2009.5118408</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mead</surname> <given-names>C. A.</given-names></name></person-group> (<year>1990</year>). <article-title>Neuromorphic electronic systems</article-title>. <source>Proc. IEEE</source> <volume>78</volume>, <fpage>1629</fpage>&#x02013;<lpage>1636</lpage>. <pub-id pub-id-type="doi">10.1109/5.58356</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Millner</surname> <given-names>S.</given-names></name> <name><surname>Gr&#x000FC;bl</surname> <given-names>A.</given-names></name> <name><surname>Meier</surname> <given-names>K.</given-names></name> <name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Schwartz</surname> <given-names>M.-O.</given-names></name></person-group> (<year>2010</year>). <article-title>A VLSI implementation of the adaptive exponential integrate-and-fire neuron model</article-title>, in <source>Advances in Neural Information Processing Systems 23</source>, eds <person-group person-group-type="editor"><name><surname>Lafferty</surname> <given-names>J.</given-names></name> <name><surname>Williams</surname> <given-names>C. K. I.</given-names></name> <name><surname>Shawe-Taylor</surname> <given-names>J.</given-names></name> <name><surname>Zemel</surname> <given-names>R. S.</given-names></name> <name><surname>Culotta</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>La Jolla, CA</publisher-loc> : <publisher-name>Neural Information Processing Systems</publisher-name>), <fpage>1642</fpage>&#x02013;<lpage>1650</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morrison</surname> <given-names>A.</given-names></name> <name><surname>Diesmann</surname> <given-names>M.</given-names></name> <name><surname>Gerstner</surname> <given-names>W.</given-names></name></person-group> (<year>2008</year>). <article-title>Phenomenological models of synaptic plasticity based on spike timing</article-title>. <source>Biol. Cybern</source>. <volume>98</volume>, <fpage>459</fpage>&#x02013;<lpage>478</lpage>. <pub-id pub-id-type="doi">10.1007/s00422-008-0233-1</pub-id><pub-id pub-id-type="pmid">18491160</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nordlie</surname> <given-names>E.</given-names></name> <name><surname>Gewaltig</surname> <given-names>M.-O.</given-names></name> <name><surname>Plesser</surname> <given-names>H. E.</given-names></name></person-group> (<year>2009</year>). <article-title>Towards reproducible descriptions of neuronal network models</article-title>. <source>PLoS Comput. Biol</source>. <volume>5</volume>:<fpage>e1000456</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000456</pub-id><pub-id pub-id-type="pmid">19662159</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="web"><person-group person-group-type="author"><collab>NumPy.</collab></person-group> (<year>2012</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://numpy.scipy.org">http://numpy.scipy.org</ext-link></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pfeil</surname> <given-names>T.</given-names></name> <name><surname>Potjans</surname> <given-names>T. C.</given-names></name> <name><surname>Schrader</surname> <given-names>S.</given-names></name> <name><surname>Potjans</surname> <given-names>W.</given-names></name> <name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Diesmann</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Is a 4-bit synaptic weight resolution enough? &#x02013; constraints on enabling spike-timing dependent plasticity in neuromorphic hardware</article-title>. <source>Front. Neurosci</source>. <volume>6</volume>:<issue>90</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2012.00090</pub-id><pub-id pub-id-type="pmid">22822388</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Potjans</surname> <given-names>W.</given-names></name> <name><surname>Diesmann</surname> <given-names>M.</given-names></name> <name><surname>Morrison</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>An imperfect dopaminergic error signal can drive temporal-difference learning</article-title>. <source>PLoS Comput. Biol</source>. <volume>7</volume>:<fpage>e1001133</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1001133</pub-id><pub-id pub-id-type="pmid">21589888</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="web"><person-group person-group-type="author"><collab>PowerISA.</collab></person-group> (<year>2010</year>). <article-title>PowerISA version 2.06 revision b. Technical report</article-title>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.power.org/resources/reading/">http://www.power.org/resources/reading/</ext-link></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramakrishnan</surname> <given-names>S.</given-names></name> <name><surname>Hasler</surname> <given-names>P. E.</given-names></name> <name><surname>Gordon</surname> <given-names>C.</given-names></name></person-group> (<year>2011</year>). <article-title>Floating gate synapses with spike-time-dependent plasticity</article-title>. <source>IEEE Trans. Biomed. Circ. Syst</source>. <volume>5</volume>, <fpage>244</fpage>&#x02013;<lpage>252</lpage>. <pub-id pub-id-type="doi">10.1109/TBCAS.2011.2109000</pub-id><pub-id pub-id-type="pmid">23851475</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roy</surname> <given-names>K.</given-names></name> <name><surname>Mukhopadhyay</surname> <given-names>S.</given-names></name> <name><surname>Mahmoodi-Meimand</surname> <given-names>H.</given-names></name></person-group> (<year>2003</year>). <article-title>Leakage current mechanisms and leakage reduction techniques in deep-submicrometer cmos circuits</article-title>. <source>Proc. IEEE</source> <volume>91</volume>, <fpage>305</fpage>&#x02013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1109/JPROC.2002.808156</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Br&#x000FC;derle</surname> <given-names>D.</given-names></name> <name><surname>Gr&#x000FC;bl</surname> <given-names>A.</given-names></name> <name><surname>Hock</surname> <given-names>M.</given-names></name> <name><surname>Meier</surname> <given-names>K.</given-names></name> <name><surname>Millner</surname> <given-names>S.</given-names></name></person-group> (<year>2010</year>). <article-title>A wafer-scale neuromorphic hardware system for large-scale neural modeling</article-title>, in <source>Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS)</source>, (<publisher-loc>Paris</publisher-loc>), <fpage>1947</fpage>&#x02013;<lpage>1950</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2010.5536970</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Br&#x000FC;derle</surname> <given-names>D.</given-names></name> <name><surname>Meier</surname> <given-names>K.</given-names></name> <name><surname>Ostendorf</surname> <given-names>B.</given-names></name></person-group> (<year>2007</year>). <article-title>Modeling synaptic plasticity within networks of highly accelerated I&#x00026;F neurons</article-title>, in <source>Proceedings of the 2007 IEEE International Symposium on Circuits and Systems (ISCAS)</source> (<publisher-loc>New Orleans, LA</publisher-loc>: <publisher-name>IEEE Press</publisher-name>), <fpage>3367</fpage>&#x02013;<lpage>3370</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2007.378289</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Fieres</surname> <given-names>J.</given-names></name> <name><surname>Meier</surname> <given-names>K.</given-names></name></person-group> (<year>2008</year>). <article-title>Wafer-scale integration of analog neural networks</article-title>, in <source>Proceedings of the 2008 International Joint Conference on Neural Networks (IJCNN)</source>, (<publisher-loc>Hong Kong</publisher-loc>).</citation>
</ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schemmel</surname> <given-names>J.</given-names></name> <name><surname>Gr&#x000FC;bl</surname> <given-names>A.</given-names></name> <name><surname>Meier</surname> <given-names>K.</given-names></name> <name><surname>Muller</surname> <given-names>E.</given-names></name></person-group> (<year>2006</year>). <article-title>Implementing synaptic plasticity in a VLSI spiking neural network model</article-title> in <source>Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN)</source> (<publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>IEEE Press</publisher-name>). <pub-id pub-id-type="doi">10.1109/IJCNN.2006.246651</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scholze</surname> <given-names>S.</given-names></name> <name><surname>Schiefer</surname> <given-names>S.</given-names></name> <name><surname>Partzsch</surname> <given-names>J.</given-names></name> <name><surname>Hartmann</surname> <given-names>S.</given-names></name> <name><surname>Georg Mayr</surname> <given-names>C.</given-names></name> <name><surname>H&#x000F6;oppner</surname> <given-names>S.</given-names></name></person-group> et al. (<year>2011</year>). <article-title>VLSI implementation of a 2.8GEvent/s packet based AER interfacewith routing and event sorting functionality</article-title>. <source>Front. Neurosci</source>. <volume>5</volume>:<issue>117</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2011.00117</pub-id><pub-id pub-id-type="pmid">22016720</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Seo</surname> <given-names>J.</given-names></name> <name><surname>Brezzo</surname> <given-names>B.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Parker</surname> <given-names>B. D.</given-names></name> <name><surname>Esser</surname> <given-names>S. K.</given-names></name> <name><surname>Montoye</surname> <given-names>R. K.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>A 45nm cmos neuromorphic chip with a scalable architecture for learning in networks of spiking neurons</article-title>, in <source>Custom Integrated Circuits Conference (CICC), 2011 IEEE</source>, <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1109/CICC.2011.6055293</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>J. E.</given-names></name></person-group> (<year>1998</year>). <article-title>A study of branch prediction strategies</article-title>, in <source>25 years of the International Symposia on ComputerArchitecture (Selected Papers) ISCA &#x00027;98</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>202</fpage>&#x02013;<lpage>215</lpage>.<pub-id pub-id-type="doi">10.1145/285930.285980</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>J. E.</given-names></name> <name><surname>Pleszkun</surname> <given-names>A. R.</given-names></name></person-group> (<year>1985</year>). <source>Implementation of Precise Interrupts in Pipelined Processors</source>. <volume>Vol. 13</volume>. <publisher-loc>Los Angeles, CA</publisher-loc>: <publisher-name>IEEE Computer Society Press</publisher-name>.</citation>
</ref>
<ref id="B38">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Stallman</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <source>Using the GNU Compiler Collection</source>. For gcc version 4.5.4 edition <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Free Software Foundation</publisher-name>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://gcc.gnu.org">http://gcc.gnu.org</ext-link></citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sutton</surname> <given-names>R. S.</given-names></name> <name><surname>Barto</surname> <given-names>A. G.</given-names></name></person-group> (<year>1998</year>). <source>Reinforcement learning: An introduction</source>. <volume>Vol. 1</volume>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Victor</surname> <given-names>J. D.</given-names></name> <name><surname>Purpura</surname> <given-names>K. P.</given-names></name></person-group> (<year>1996</year>). <article-title>Nature and precision of temporal coding in visual cortex: a metric-space analysis</article-title>. <source>J. Neurophysiol</source>. <volume>76</volume>, <fpage>1310</fpage>&#x02013;<lpage>1326</lpage>. <pub-id pub-id-type="pmid">8871238</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vogelstein</surname> <given-names>R. J.</given-names></name> <name><surname>Tenore</surname> <given-names>F.</given-names></name> <name><surname>Philipp</surname> <given-names>R.</given-names></name> <name><surname>Adlerstein</surname> <given-names>M. S.</given-names></name> <name><surname>Goldberg</surname> <given-names>D. H.</given-names></name> <name><surname>Cauwenberghs</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). <article-title>Spike timing-dependent plasticity in the address domain</article-title>, in <source>Advances in Neural Information Processing Systems 15</source>, eds <person-group person-group-type="editor"><name><surname>Thrun</surname> <given-names>S.</given-names></name> <name><surname>Becker</surname> <given-names>S.</given-names></name> <name><surname>Obermayer</surname> <given-names>K.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>. <publisher-name>MIT Press</publisher-name>), <fpage>1147</fpage>&#x02013;<lpage>1154</lpage>.</citation>
</ref>
<ref id="B42">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wendt</surname> <given-names>K.</given-names></name> <name><surname>Ehrlich</surname> <given-names>M.</given-names></name> <name><surname>Sch&#x000FC;ffny</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>A graph theoretical approach for a multistep mapping software for the facets project</article-title>, in <source>CEA&#x00027;08: Proceedings of the 2nd WSEAS International Conference on Computer Engineering and Applications</source> (<publisher-loc>Wisconsin</publisher-loc>: <publisher-name>World Scientific and Engineering Academy and Society</publisher-name>), <fpage>189</fpage>&#x02013;<lpage>194</lpage>. ISBN 978-960-6766-33-6</citation>
</ref>
<ref id="B43">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Wijekoon</surname> <given-names>J. H. B.</given-names></name> <name><surname>Dudek</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <article-title>Compact silicon neuron circuit with spiking and bursting behaviour</article-title>. <source>Neural Netw</source>. <volume>21</volume>, <fpage>524</fpage>&#x02013;<lpage>534</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.sciencedirect.com/science/article/B6T08-4RFSCV3-5/2/c005fcc0c2482bf724210a079932484e">http://sciencedirect.com/science/article/B6T08-4RFSCV3-5/2/c005fcc0c2482bf724210a079932484e</ext-link> (<publisher-name>Advances in Neural Networks Research</publisher-name>: IJCNN &#x00027;07, 2007 International Joint Conference on Neural Networks IJCNN &#x00027;07). <pub-id pub-id-type="doi">10.1016/j.neunet.2007.12.037</pub-id><pub-id pub-id-type="pmid">18262751</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wijekoon</surname> <given-names>J. H. B.</given-names></name> <name><surname>Dudek</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>Analogue cmos circuit implementation of a dopamine modulated synapse</article-title>, in <source>2011 IEEE International Symposium on Circuits and Systems (ISCAS)</source> (<publisher-name>IEEE</publisher-name>), <fpage>877</fpage>&#x02013;<lpage>880</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2011.5937706</pub-id></citation>
</ref>
</ref-list>
</back>
</article>