<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Nanotechnol.</journal-id>
<journal-title>Frontiers in Nanotechnology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Nanotechnol.</abbrev-journal-title>
<issn pub-type="epub">2673-3013</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1128667</article-id>
<article-id pub-id-type="doi">10.3389/fnano.2023.1128667</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Nanotechnology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Energy-efficient and noise-tolerant neuromorphic computing based on memristors and domino logic</article-title>
<alt-title alt-title-type="left-running-head">Hendy and Merkel</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnano.2023.1128667">10.3389/fnano.2023.1128667</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hendy</surname>
<given-names>Hagar</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1935570/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Merkel</surname>
<given-names>Cory</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1425554/overview"/>
</contrib>
</contrib-group>
<aff>
<institution>Brain Lab</institution>, <institution>Department of Computer Engineering</institution>, <institution>Rochester Institute of Technology</institution>, <addr-line>Rochester</addr-line>, <addr-line>NY</addr-line>, <country>United States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1133172/overview">Ying-Chen Chen</ext-link>, Northern Arizona University, United States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/953842/overview">Jiyong Woo</ext-link>, Kyungpook National University, Republic of Korea</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1152840/overview">Xumeng Zhang</ext-link>, Fudan University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Cory Merkel, <email>cemeec@rit.edu</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Nanotechnology, a section of the journal Frontiers in Nanotechnology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>5</volume>
<elocation-id>1128667</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>02</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Hendy and Merkel.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Hendy and Merkel</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The growing scale and complexity of artificial intelligence (AI) models has prompted several new research efforts in the area of neuromorphic computing. A key aim of neuromorphic computing is to enable advanced AI algorithms to run on energy-constrained hardware. In this work, we propose a novel energy-efficient neuromorphic architecture based on memristors and domino logic. The design uses the delay of memristor RC circuits to represent synaptic computations and a simple binary neuron activation function. Synchronization schemes are proposed for communicating information between neural network layers, and a simple linear power model is developed to estimate the design&#x2019;s energy efficiency for a particular network size. Results indicate that the proposed architecture can achieve 1.26&#x00A0;fJ per classification per synapse and achieves high accuracy on image classification even in the presence of large noise.</p>
</abstract>
<kwd-group>
<kwd>neuromorphic</kwd>
<kwd>memristor</kwd>
<kwd>neural network</kwd>
<kwd>domino logic</kwd>
<kwd>artificial intelligence</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>During the last decade, significant advancements have been made in the accuracy of neural network models for many artificial intelligence (AI) tasks such as object classification (<xref ref-type="bibr" rid="B21">Rueckauer et al., 2017</xref>), voice recognition (<xref ref-type="bibr" rid="B4">Dahl et al., 2011</xref>), machine translation (<xref ref-type="bibr" rid="B24">Seide et al., 2011</xref>), and more. Three main factors are responsible for this progress: 1) vast amounts of data that are available to train large neural network models, 2) continuous growth of processing power (i.e.,better/faster graphics processing units (GPUs), memory,etc.), and 3) development and innovation in the neural network architecture and training algorithms. However, these developments have come at the cost of significant resource requirements (compute resources, energy, etc.) for both training and inference (<xref ref-type="bibr" rid="B10">Hendy and Merkel, 2022</xref>), which has stymied AI solutions on edge devices. Edge devices like mobile phones, implantable medical devices, wireless sensors, and others, have stringent size, weight, and power (SWaP), which calls for new approaches like neuromorphic computing to implement intelligent processing on these platforms.</p>
<p>Custom neuromorphic hardware platforms are gaining popularity in this area, owing to their ability to efficiently perform complex tasks that are analogous of the physical processes underlying biological nervous systems (<xref ref-type="bibr" rid="B6">Douglas et al., 1995</xref>). A key feature of these systems is that they overcome the limitations caused by the von Neumann bottleneck by collocating computation and memory (<xref ref-type="bibr" rid="B19">Nandakumar et al., 2018</xref>). While modern digital complementary-metal-oxide-semiconductor (CMOS) technology is used to replicate the behavior of the neurons, the absence of a device that can efficiently perform synaptic operations stunted progress for several years. However, recent advancements in nanoscale materials and realization of devices such as memristors have opened possibilities for developing compact memory device arrays that are potentially transformative for the design of ultra energy-efficient neuromorphic systems.</p>
<p>Previous work has studied several aspects of memristor-based neuromorphic systems, including device properties, reliability, crossbar implementation, on-chip training, quantization, and much more (<xref ref-type="bibr" rid="B23">Schuman et al., 2017</xref>; <xref ref-type="bibr" rid="B26">Sung et al., 2018</xref>). One of the most power-efficient design approaches is combining memristor synapses with an integrate-and-fire (IF) neuron design. The energy efficiency of the IF neuron comes from i.) all-or-nothing representation of information and ii.) little-to-no short-circuit current between the neuron&#x2019;s input and the synapses driving it (since they are just driving the membrane capacitor). In this work, we explore a similar idea applied to networks of binary neurons inspired by domino logic. Domino logic, a type of dynamic logic, separates a circuit into pre-charge and evaluation phases to avoid short circuit current and reduce power consumption. Here, we propose a domino logic style neuron that uses memristor-based RC delays for evaluation and offers good power efficiency. The specific novel contributions of our work are:<list list-type="simple">
<list-item>
<p>&#x2022; Design of a memristor-based domino logic circuit that encodes information using delay</p>
</list-item>
<list-item>
<p>&#x2022; Combination of multiple domino logic circuits with an arbiter to create binary neurons</p>
</list-item>
<list-item>
<p>&#x2022; Integration of dynamic pipelining techniques with domino logic-based binary neurons</p>
</list-item>
<list-item>
<p>&#x2022; Analysis and comparison of the proposed design for a handwritten digit classification task</p>
</list-item>
</list>
</p>
<p>The rest of this paper is organized as follows: <xref ref-type="sec" rid="s2">Section 2</xref> provides background and related work on memristor-based neuromorphic computing, as well as quantized neural networks, including neural networks with binary neurons. <xref ref-type="sec" rid="s3">Section 3</xref> details the design approach employed in this work, from basic building blocks to multilayer neural network synchronization strategy. In <xref ref-type="sec" rid="s4">Section 4</xref>, we outline the strategy used for analyzing the effects of metastability and noise, particularly in the arbiter circuits, on the network-level performance. <xref ref-type="sec" rid="s5">Section 5</xref> provides results and comparisons of our design for handwritten digit classification. Finally <xref ref-type="sec" rid="s6">Section 6</xref> concludes this work.</p>
</sec>
<sec id="s2">
<title>2 Background and related work</title>
<sec id="s2-1">
<title>2.1 Memristor-based neuromorphic computing</title>
<p>Memristor is an umbrella term for describing a broad class of memory technologies that follow a state-dependent Ohm&#x2019;s law (<xref ref-type="bibr" rid="B3">Chua, 2014</xref>). Physical realization of memristors comes in several forms, including resistive random access memory (ReRAM), spin transfer torque RAM, phase change memory, ferroelectric RAM, and others (<xref ref-type="bibr" rid="B2">Chen, 2016</xref>). In essence, these devices store a non-volatile conductance (or resistance), which can be modified by providing a large write voltage and can be read using a smaller read voltage. The conductance is bounded between 2 extreme values, <italic>G</italic>
<sub>min</sub> and <italic>G</italic>
<sub>max</sub>. Memristors are particularly attractive for neuromorphic computing because they exhibit behavioral similarity to biological synapses, combining storage, adaptation, and physical connectivity in one device. Moreover, combining multiple memristors into high-density crossbars enables the efficient computation of vector-matrix multiplication (VMM), where the (voltage) input vector to the crossbar columns are multiplied by the matrix of memristor conductances to produce the (current) output vector.</p>
<p>Huge numbers of VMM operations are performed in neural networks during training and inference. When implementing neural network weights as memristor conductances in hardware, there will be no need for sparse design off chip weight storage and data movement as in most of digital based designs (<xref ref-type="bibr" rid="B13">Jouppi et al., 2017</xref>; <xref ref-type="bibr" rid="B5">Davies et al., 2018</xref>). This yields high energy efficiency, which is an important factor for AI on edge devices (<xref ref-type="bibr" rid="B15">Lee and Wong, 2016</xref>). Computation on information based VMM can be represented by currents, voltages (voltage mode), or a combination of the two, each approach has its own set of strengths and weaknesses (<xref ref-type="bibr" rid="B18">Merkel and Kudithipudi, 2017</xref>). Voltage-mode based VMM circuits, are the most common approach, in which the inputs are represented as voltages, and the outputs are represented as currents. The current-mode VMM (<xref ref-type="bibr" rid="B17">Merkel, 2019</xref>) has some advantages such as low supply voltage, current-mode design techniques, etc., but current distribution can be challenging (<xref ref-type="bibr" rid="B16">Marinella et al., 2018</xref>; <xref ref-type="bibr" rid="B25">Sinangil et al., 2020</xref>). Charge-based VMM is another approach aiming to perform dot product operation, using voltage inputs and to charge binary-weighted capacitors and performing summations through charge redistribution among capacitors <italic>via</italic> switched capacitor circuit principles (<xref ref-type="bibr" rid="B15">Lee and Wong, 2016</xref>). The main advantage of this approach that there is no static power in the circuit and there is no limitation in technology node down scaling. However, multiple clock cycles are needed to perform the multiplication operation. Time-based VMM approach is another way to implement addition in the analog domain by using a chain of buffers. The delay of each buffer in each stage can be modified according to the weight input summation for each stage (<xref ref-type="bibr" rid="B7">Everson et al., 2018</xref>). In time-domain computing, the values are encoded as discrete arrival times of signal edges (<xref ref-type="bibr" rid="B8">Freye et al., 2022</xref>). Another implementation of time-based VMM is discussed in (<xref ref-type="bibr" rid="B1">Bavandpour et al., 2019</xref>; <xref ref-type="bibr" rid="B22">Sahay et al., 2020</xref>), in which memristor-dependent currents are summed and integrated on a capacitor and then the charge is converted back to the time-domain representation.</p>
</sec>
<sec id="s2-2">
<title>2.2 Quantized neural networks</title>
<p>Quantization methods for deep learning are becoming popular for accelerating training, reducing model size, and mapping neural networks to specialized hardware. The simplest quantization methods use rounding to reduce activation and weight precision after training. This usually results in large drops in accuracy between the full-precision and quantized models. Other methods quantize weights, activations, and sometimes gradients during training, resulting in better performance (<xref ref-type="bibr" rid="B11">Hubara et al., 2017</xref>). In this work, we only quantize weights and activations. The core idea is to use quantized values during forward propagation and full-precision gradient estimates during backward propagation. For activations, we use a simple threshold model on the forward pass:<disp-formula id="e1">
<mml:math id="m1">
<mml:mi>x</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mi>sign</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(1)</label>
</disp-formula>where sign (&#x22c5;) is 1 if the argument is non-negative and &#x2212;1 otherwise. Since the sign has a gradient that is zero everywhere it will stall the backpropagation algorithm and nothing will be learned. To fix this, we approximate the gradient as<disp-formula id="e2">
<mml:math id="m2">
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2248;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>where <italic>k</italic> was empirically chosen as 2. In other words, on the backward pass, the gradient is calculated as if the activation had been a logistic sigmoid function. Of course, we note that the threshold activation function is indeed a logistic sigmoid with a <italic>k</italic> value of &#x2b;<italic>&#x221e;</italic>.</p>
<p>For weights, we use the following quantization technique:<disp-formula id="e3">
<mml:math id="m3">
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">u</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">c</mml:mi>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">p</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1,1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>Q</italic> is the desired number of bits for the weight, round (&#x22c5;) rounds to the nearest integer and clip (<italic>w</italic>, <italic>a</italic>, <italic>b</italic>) &#x3d; max (<italic>a</italic>, min (<italic>b</italic>, <italic>w</italic>)), where <italic>a</italic> &#x2264; <italic>b</italic>. For backpropagation, we estimate the gradient as <italic>&#x2202;J</italic>/<italic>&#x2202;w</italic> &#x2248; <italic>&#x2202;J</italic>/<italic>&#x2202;w</italic>
<sub>
<italic>q</italic>
</sub>
</p>
</sec>
</sec>
<sec id="s3">
<title>3 Design approach</title>
<sec id="s3-1">
<title>3.1 Overview</title>
<p>The core of the design uses domino logic style neuron based on memristor RC delays. In essence, memristors are used as a configurable RC delay. The delay of the memristor RC circuit represents synaptic weight computations and a simple binary neuron activation function represented by an inverter, as shown in <xref ref-type="fig" rid="F1">Figure 1A</xref>. The operation of the synaptic weight matrix is divided into two phases: pre-charge and evaluation. When the clock signal <italic>&#x3d5;</italic> is low, the dynamic node <italic>v</italic>
<sub>
<italic>d</italic>
</sub> (input to the inverter) is pre-charged to <italic>V</italic>
<sub>
<italic>dd</italic>
</sub> through a PMOS transistor. When the clock is high, evaluation starts and the dynamic node discharges at a rate depending on the pull-down network (<italic>RC</italic> time constant). Once the node reaches the threshold value of the inverter, the neuron&#x2019;s output will go high. During the evaluation phase, the voltage on the dynamic node of neuron <italic>i</italic> in layer <italic>l</italic> evolves as<disp-formula id="e4">
<mml:math id="m4">
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:munderover accentunder="false" accent="true">
<mml:mrow>
<mml:mo>&#x222b;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>&#x3be;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mi>&#x3be;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf1">
<mml:math id="m5">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is the equivalent pull down conductance. A memristor only contributes to the pull-down conductance when its selector transistor is on. Assuming that memristor conductance values are constant during the evaluation phase and input voltages are binary values, i.e., <inline-formula id="inf2">
<mml:math id="m6">
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>, then <inline-formula id="inf3">
<mml:math id="m7">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is a piecewise constant function written as:<disp-formula id="e5">
<mml:math id="m8">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover accentunder="false" accent="true">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munderover>
</mml:mstyle>
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>N</italic>
<sub>
<italic>l</italic>&#x2212;1</sub> is the number of neurons in the previous layer, plus 1 to account for the bias input.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>
<bold>(A)</bold> Domino logic-style neuron circuit schematic. <bold>(B)</bold> Two domino circuits (ex and in) enable both positive and negative weights. <bold>(C)</bold> Information is encoded as the time difference between the excitatory and inhibitory rising edges.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g001.tif"/>
</fig>
<p>We use a simple parasitic model for the capacitance in (Eq. <xref ref-type="disp-formula" rid="e4">4</xref>), where each 1T1R synapse contributes one unit of capacitance <italic>C</italic> to the dynamic node from the NMOS drain. Assuming the PMOS transistor has minimal sizing, and the inverter has 2:1 PMOS:NMOS sizes ration, the total capacitance is estimated as<disp-formula id="e6">
<mml:math id="m9">
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>4</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mi>C</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>4</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>A</italic>
<sub>
<italic>min</italic>
</sub> is the minimum transistor area, <italic>k</italic>
<sub>
<italic>ox</italic>
</sub> is the relative permittivity of SiO<sub>2</sub>, <italic>&#x3f5;</italic>
<sub>0</sub> is the permittivity of free space, and <italic>t</italic>
<sub>
<italic>ox</italic>
</sub> is the transistor gate oxide thickness. Bounds on the time that it takes to discharge the dynamic node to the inverter threshold will be important to set an appropriate clock frequency. From (Eqs. <xref ref-type="disp-formula" rid="e4">4</xref>&#x2013;<xref ref-type="disp-formula" rid="e6">6</xref>), the minimum discharge time will be<disp-formula id="e7">
<mml:math id="m10">
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mfrac>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>4</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(7)</label>
</disp-formula>where <italic>&#x3b8;</italic> is the threhold voltage of the inverter.</p>
<p>Notice that this expression is approximately independent of <italic>N</italic>
<sub>
<italic>l</italic>&#x2212;1</sub> as <italic>N</italic>
<sub>
<italic>l</italic>&#x2212;1</sub> becomes large, meaning that these neurons are self-normalizing. That is, the maximum excitability of a neuron from the activity of one-presynptic neuron is constant regardless of fan-in. In contrast to the minimum discharge time, the maximum time will be infinite if we ignore leakage through selector transistors. However, we need to ensure that the dynamic node does not take longer to discharge than the evaluation period. Otherwise, there will be information lost. Therefore, we set a bound on the bias, which corresponds to a 1T1R&#xa0;cell that has a constant gate voltage of <italic>V</italic>
<sub>
<italic>dd</italic>
</sub>:<disp-formula id="e8">
<mml:math id="m11">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:math>
<label>(8)</label>
</disp-formula>where <italic>T</italic> is the clock period. This ensures that <inline-formula id="inf4">
<mml:math id="m12">
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:math>
</inline-formula>.</p>
<p>
<xref ref-type="fig" rid="F2">Figure 2</xref> shows how the leakage current could affect the neuron behavior with increasing numbers of synapses. Here, we assume all of the synapse transistors are off, and each of the conductances is set to the maximum value, which will maximize the leakage current. The plot shows the final voltage on the dynamic node during a 50&#x00A0;ns evaluation period. As the number of synaptic input increases, both the capacitance on the dynamic node and the total leakage current increase. This has the effect of maintaining a relatively large voltage on the dynamic node, even with a large synaptic fan-in. In fact, as the number of synapses becomes large, the increased capacitance dominates, causing the effect of leakage to decrease. In all cases tested, the leakage current was not enough to discharge the dynamic node to the inverter&#x2019;s threshold.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Effect of leakage current on the dynamic node versus the number of synaptic inputs.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g002.tif"/>
</fig>
<p>To represent positive and negative weights, two of domino logic style neurons are used, inhibitory neuron to represent negative weight components and excitatory neuron to represent positive weight components as shown in <xref ref-type="fig" rid="F1">Figure 1B</xref>. The time difference between nodes (inhibitory and excitatory) to reach the threshold of the inverter can be given as:<disp-formula id="e9">
<mml:math id="m13">
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(9)</label>
</disp-formula>This time difference encodes the input to the neuron&#x2019;s binary activation function. <xref ref-type="fig" rid="F1">Figure 1C</xref> shows an example where the excitatory domino circuit discharges faster than the inhibitory, leading to a positive value of &#x394;<italic>t</italic>.</p>
<p>In this paper, we are interested in binary neurons, so it will be sufficient to know if <inline-formula id="inf5">
<mml:math id="m14">
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is negative or positive, which can be calculated using an arbiter circuit, the details of which are discussed later. First, though, it is important to point out how (Eq. <xref ref-type="disp-formula" rid="e9">9</xref>) corresponds to pre-synaptic neuron outputs, weights, and the post-synaptic neuron inputs. The neuron input is <inline-formula id="inf6">
<mml:math id="m15">
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>, and we define the unitless version of it as <inline-formula id="inf7">
<mml:math id="m16">
<mml:msubsup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. The vector of pre-synaptic neuron outputs are <inline-formula id="inf8">
<mml:math id="m17">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>, and the unitless version is <inline-formula id="inf9">
<mml:math id="m18">
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. Finally, the weights are related to the conductances <inline-formula id="inf10">
<mml:math id="m19">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf11">
<mml:math id="m20">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. There are infinite ways to map a weight <inline-formula id="inf12">
<mml:math id="m21">
<mml:msubsup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> to two conductances <inline-formula id="inf13">
<mml:math id="m22">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf14">
<mml:math id="m23">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. In this work, we use a power optimized scheme, where minimum conductance values are used for the excitatory component and inhibitory component when the weights are negative and positive, respectively. Then, a linear function maps the opposite component:<disp-formula id="e10">
<mml:math id="m24">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(10)</label>
</disp-formula>
<disp-formula id="e11">
<mml:math id="m25">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>Now, (Eq. <xref ref-type="disp-formula" rid="e9">9</xref>) can be rewritten completely in terms of unitless values as:<disp-formula id="e12">
<mml:math id="m26">
<mml:msubsup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(12)</label>
</disp-formula>where <inline-formula id="inf15">
<mml:math id="m27">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and <inline-formula id="inf16">
<mml:math id="m28">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, and <italic>&#x3b2;</italic>&#x2032; &#x3d; 2<italic>&#x3b2;</italic>/(<italic>TG</italic>
<sub>
<italic>max</italic>
</sub>). Contrast this with the usual dot product input of a neural network:<disp-formula id="e13">
<mml:math id="m29">
<mml:msubsup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
<label>(13)</label>
</disp-formula>
</p>
<p>In other words, the neuron input of our design has a non-linear relationship with pre-synaptic neuron outputs and the weights, whereas typical formulations of neural networks have a linear relationship. This non-linearity stems from the natural non-linearity of RC circuits, and to remove it would require an additional evaluation phase, such as the method proposed in <xref ref-type="bibr" rid="B1">Bavandpour et al. (2019)</xref>. Instead, we opt to keep the hardware as simple as possible and we note that the non-linearity could be accounted for in two ways. On one hand, the behavior of (Eq. <xref ref-type="disp-formula" rid="e12">12</xref>) could be modeled directly in Tensorflow. While this would be the simplest solution, we have found that this leads to several simulation challenges. For example, it is easy for (Eq. <xref ref-type="disp-formula" rid="e12">12</xref>) to have undefined values when inputs or weights become small. In addition, we observed unstable learning and, in some cases, inability to converge when working directly with (Eq. <xref ref-type="disp-formula" rid="e12">12</xref>).</p>
<p>Interestingly, though, because our neurons are binary, only the sign of <inline-formula id="inf17">
<mml:math id="m30">
<mml:msubsup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is important, and we can use (Eq. <xref ref-type="disp-formula" rid="e13">13</xref>) for training. <xref ref-type="fig" rid="F3">Figure 3</xref> compares the values in (Eq. <xref ref-type="disp-formula" rid="e9">9</xref>), (Eq. <xref ref-type="disp-formula" rid="e12">12</xref>), and (Eq. <xref ref-type="disp-formula" rid="e13">13</xref>). Here, we performed 1000 Monte Carlo simulations with uniformly-distributed weights between &#x2212;1 and 1, and Bernoulli-distributed inputs with probability value 0.5. From the plot, one can observe that the normal dot product operation in (Eq. <xref ref-type="disp-formula" rid="e13">13</xref>), which we refer to as &#x201c;software&#x201d; maps non-linearly to the excitatory-inhibitory time difference as well as the normalized version of the time difference, referred to as &#x201c;hardware&#x201d;. The ranges of the hardware values are inversely proportional to the ranges of the software values, which is expected due to the inverse relationships in (Eq. <xref ref-type="disp-formula" rid="e9">9</xref>) and (Eq. <xref ref-type="disp-formula" rid="e12">12</xref>). Critically, all of the datapoints are in the lower-left and upper-right quadrants, meaning that the sign of the software and hardware data are always the same. In other words, the non-linear behavior of the neuron&#x2019;s input will not affect its output. The next section discusses how this sign is captured using an arbiter to yield a binary activation function.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Time difference between excitatory and inhibitory domino circuits (Eq. <xref ref-type="disp-formula" rid="e9">9</xref>) and unitless neuron input (Eq. <xref ref-type="disp-formula" rid="e12">12</xref>) vs. a normal dot product input (Eq. <xref ref-type="disp-formula" rid="e13">13</xref>).</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g003.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>3.2 Arbiter design and placement</title>
<p>We explored two possible designs for implementing the arbiter-based activation function that converts the time difference between the excitatory and inhibitory domino circuits into a binary value:<disp-formula id="e14">
<mml:math id="m31">
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="array">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(14)</label>
</disp-formula>Initially, we designed the arbiter as shown in <xref ref-type="fig" rid="F4">Figure 4A</xref>. Here, NOR gates are used to disable the inhibitory (excitatory) circuit from discharging once the excitatory (inhibitory) circuit crosses the inverter threshold. An advantage of this approach is that only one of the domino circuits will fully discharge. For example, if the excitatory domino circuit discharges more quickly, it will cause the inhibitory circuit to go back into the pre-charge phase before it reaches the inverter threshold. This will reduce dynamic power consumed during the pre-charge phase. However, we observed that this approach has poor stability (seen in the waveform in <xref ref-type="fig" rid="F4">Figure 4A</xref> especially for small &#x394;<italic>t</italic>, and often led to oscillations as well as incorrect outputs.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>
<bold>(A)</bold> Domino logic style neuron with NOR gate as an arbiter, showing oscillatory behavior at the output node due to unstable feedback. <bold>(B)</bold> Simplification of domino logic style neuron with NAND gate-based arbiter, which eliminates oscillatory behavior.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g004.tif"/>
</fig>
<p>The second design is shown in <xref ref-type="fig" rid="F4">Figure 4B</xref>, which uses cross-coupled NAND gates and includes a metastability filter to ensure the output doesn&#x2019;t remain long in an invalid logic state and consequently, it is possible for that design to have that behaviour and enter metastability. In addition, NOR gate two large PMOS transistors in series, which means more capacitance and more delay compared to the NAND gate. The main advantage of this design is that the memristor domino circuits are not included in the feedback path, so the circuit&#x2019;s stability is not dependent on the memristor states. This is the design that we used for the rest of this paper.</p>
</sec>
<sec id="s3-3">
<title>3.3 Synchronization strategy</title>
<p>In order to implement large multi-layer neural networks information from one layer needs to be transferred to the next layer. Synchronizing the transfer of information between layers is critical, and here we explore three techniques with various tradeoffs. In this section, we use the XOR problem as a case study, where our multi-layer perceptron (MLP) neural network consists of 2 inputs, 2 hidden neurons, and 1 output. The output should be &#x2018;0&#x2019; when both inputs are the same and &#x2018;1&#x2019; when the inputs are different.</p>
<sec id="s3-3-1">
<title>3.3.1 Method 1: Multiple clocks with different duty cycles</title>
<p>The simplest synchronization strategy employs one clock per layer as shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. Notice that the neuron and synapse designs discussed earlier are easily integrated into a crossbar-like circuit for efficient implementation. Here, three clocks are used, corresponding to the input (<monospace>Clk</monospace>), hidden layer (<monospace>Clk-1</monospace>), and output layer (<monospace>Clk-2</monospace>). The inputs are <monospace>Vxex1-1</monospace> and <monospace>Vxex2-1</monospace> and the final output is <monospace>output</monospace>. For a given input, first all clocks are &#x2018;0&#x2019; to pre-charge all domino circuits. Then, <monospace>Clk</monospace> becomes &#x2018;1&#x2019; for input evaluation, enabling the input to be forwarded to the hidden layer. Then, the clocks of each layer transition to &#x2018;1&#x2019; one-at-a-time and remain at &#x2018;1&#x2019; until all layers have been evaluated. This technique has no sequencing overhead. However, the disadvantage of this technique is that each layer has to wait for all of the previous layers to finish before it performs any evaluation. Generally, the cycle time is the sum of logic delay and sequencing over head. The logic delay depends on the discharging rate of the pull down network which mainly depends on the memristor state during the evaluation phase.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>The MLP schematic and simulation for the XOR problem using multiple overlapping clocks with different duty cycles.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g005.tif"/>
</fig>
</sec>
<sec id="s3-3-2">
<title>3.3.2 Method 2: Flip-flop pipelining</title>
<p>The throughput of the design can be improved using conventional pipelining with flip-flips, shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. Here, a single clock <monospace>Clk</monospace> controls D flip-flops between layers, which hold evaluation results from the previous layer until the subsequent layer completes its evaluation. This is the classical clocking strategy and has been widely used due to its robustness. In this method, the entire network can be pipelined across layers and each neuron can perform evaluations on every clock cycle. The disadvantage of this approach is the sequencing overhead (time and area) of the flip-flop.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>The MLP schematic and simulation for the XOR problem using multiple overlapping clocks with conventional pipelining based on D flip-flops.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g006.tif"/>
</fig>
</sec>
<sec id="s3-3-3">
<title>3.3.3 Method 3: Dynamic pipelining</title>
<p>If clocks are overlapped, flip-flops could be eliminated and this is the third approach that is called the skew tolerant domino (<xref ref-type="bibr" rid="B9">Harris and Horowitz, 1997</xref>), shown in <xref ref-type="fig" rid="F7">Figure 7</xref>. The idea is that the overlapping clocks ensure that the evaluation of a neuron in the subsequent layer has enough time to evaluate before the previous layer neurons starts their pre-charging phase. Here, three overlapped clocks, <monospace>Clk-0</monospace>, <monospace>Clk-1</monospace>, and <monospace>Clk-2</monospace> are used for the input, hidden, and output layers, repsectively. These clocks can be generate using simple delay circuits based on, e.g. inverter chains. When the previous layer neurons starts their pre-charging phase, the dynamic gates will pre-charged to <italic>V</italic>
<sub>
<italic>dd</italic>
</sub> and therefore the static gates will be discharged to ground. This means that the input to the subsequent layer falls low, seemingly violating the monotonicity rule. The monotonicity rule states that inputs to dynamic gates must make only low to high transitions while the gates are in the evaluation phase. However, the domino logic in the subsequent layer will remain at whatever value it evaluated based on the results of the first layer when its inputs fall low because both the pull-down transistors and the pre-charge transistor will be off (<xref ref-type="bibr" rid="B9">Harris and Horowitz, 1997</xref>). Therefore, the neurons will keep their value even when previous layer pre-charges. Hence, there is no need for a latch or a flip-flop. This method has improved throughput over the multiple duty cycle method, but does not have the flip-flop area overhead associated with conventional pipelining. The rest of our simulations are based on this technique.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>The MLP schematic and simulation for the XOR problem using skew-tolerant dynamic pipelining.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g007.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4 Noise modeling</title>
<p>A key component of our design is the arbiter-based binary activation function. Arbiters are essential parts of many digital and mixed-signal designs such as memories and microprocessors (<xref ref-type="bibr" rid="B14">Kim and Dutton, 1990</xref>). However, these circuits could cause system failure due to metastability issues. In the proposed circuit, there will be stochastic behaviour of the neuron&#x2019;s output for small differences in arrival times of the edge of the excitatory and inhibitory signals at the arbiter&#x2019;s inputs. In particular, when the value of &#x394;<italic>t</italic> is close to the arbiter&#x2019;s aperture time (measured to be approximately 2 ps), it may enter a metastable state. Combined with noise, the metastable state will stochastically resolve to either &#x2018;0&#x2019; or &#x2018;1.&#x2019; <xref ref-type="fig" rid="F8">Figure 8A</xref> shows the distribution of &#x394;<italic>t</italic> for hidden layer neurons in an MLP network that we trained to classify handwritten digits from the MNIST dataset. The hidden is relatively small, with only 100 neurons, so the network does not give good accuracy. However, the point here is to show the distribution, which is approximately normal. This is expected, since trained neural networks will tend to have normally-distributed weights, so the dot product of pre-synaptic neuron outputs with post-synaptic neuron weight vectors will also be normally distributed. In this case, the mean is around 100 ps, but for other datasets and network sizes, the mean may be centered closer to 0 or at a negative value. In fact, as the size of the layer increases, it is expected that the mean will be closer to 0. This means, that arbiter inputs may often be within the aperture window, potentially leading to stochastic behavior. <xref ref-type="fig" rid="F8">Figure 8B</xref> shows a Monte Carlo simulation of the arbiter output for a small &#x394;<italic>t</italic> in the presence of noise. Here, the &#x394;<italic>t</italic> is positive, and should result in an output of &#x2018;1&#x2019; but in many samples, the arbiter output is &#x2018;0&#x2019;. <xref ref-type="fig" rid="F9">Figure 9</xref> shows a box plot of the mismatched neuron outputs between an ideal software simulation, where noise is not considered, and a hardware simulation with different levels of noise for the MNIST dataset. In this case, to keep the simulation tractable, the network only has 10 hidden neurons, and already there are some mismatched neuron outputs.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>
<bold>(A)</bold> Distribution of &#x394;<italic>t</italic> across 100 hidden neurons for 60,000 handwritten digit inputs. <bold>(B)</bold> Arbiter output for small &#x394;<italic>t</italic> value in the presence of noise.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g008.tif"/>
</fig>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>The mismatched bits with different levels of noise.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g009.tif"/>
</fig>
<p>To capture this behavior, we developed a simple stochastic model of the arbiter by simulating its behavior with different levels of transient noise (low, moderate, and high). The inputs to the arbiter are the time difference &#x394;<italic>t</italic> between the excitatory and inhibitory signals. These two signals are swept from &#x2212;10 ps to 10 ps. The simulations were performed 100 times for each run, and the neuron&#x2019;s output probability was calculated after. The results are shown in <xref ref-type="fig" rid="F10">Figure 10A</xref>. The data are fit to a sigmoid function:<disp-formula id="e15">
<mml:math id="m32">
<mml:mfrac>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(15)</label>
</disp-formula>where <italic>a</italic> and <italic>b</italic> are the fitting parameters for sigmoid function. For low noise: <italic>a</italic> &#x3d; 99.93, <italic>b</italic> &#x3d; 7.394. For moderate noise: <italic>a</italic> &#x3d; 99.59, <italic>b</italic> &#x3d; 2.681. For high noise: <italic>a</italic> &#x3d; 98.77, <italic>b</italic> &#x3d; 1.119. Note that other sources of noise such as process variations will also contribute to changes in the neuron probability for small differences in &#x394;<italic>t</italic>. For example, consider <xref ref-type="fig" rid="F10">Figure 10B</xref>, which shows the simulation of the arbiter under different process CMOS process corners. The overall effect of the different corners is to modify the slope of the sigmoid curve. Qualitatively similar behavior also results from variations in voltage and temperature.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>
<bold>(A)</bold> The probability of a neuron&#x2019;s output being equal to &#x2018;1&#x2019; as a function of input time difference between excitatory and inhibitory domino circuits. <bold>(B)</bold> The effect of process variations on the neuron&#x2019;s output probability for the high noise case.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g010.tif"/>
</fig>
</sec>
<sec id="s5">
<title>5 Results and analysis</title>
<sec id="s5-1">
<title>5.1 Simulation approach</title>
<p>Simulation of large neural networks in electronic solvers like SPICE is prohibitively slow. We use a combination of SPICE (Synopsys HSPICE) and Tensorflow in order to have fast simulations while still capturing key hardware behavior. Our simulation strategy is shown in <xref ref-type="fig" rid="F11">Figure 11</xref>. The key aspects of the circuit behavior discussed in the previous sections have been captured using HSPICE simulations with a predictive technology 130&#xa0;nm bulk CMOS transistor model (<ext-link ext-link-type="uri" xlink:href="https://ptm.asu.edu/">https://ptm.asu.edu/</ext-link>) and memristor parameters based on the memristor proposed in (<xref ref-type="bibr" rid="B20">Prezioso et al., 2015</xref>), which has ON and OFF conductance values of <italic>G</italic>
<sub>min</sub> &#x3d; 5 &#xd7; 10<sup>&#x2212;7</sup> and <italic>G</italic>
<sub>max</sub> &#x3d; 5 &#xd7; 10<sup>&#x2212;5</sup> and programming voltage magnitudes near 1&#xa0;V. In this work, only a subset of the device conductance range is used: <italic>G</italic>
<sub>min</sub> &#x3d; 1 &#xd7; 10<sup>&#x2212;6</sup>&#xa0;S and <italic>G</italic>
<sub>max</sub> &#x3d; 1 &#xd7; 10<sup>&#x2212;5</sup>&#xa0;S. Since the device has approximately linear behavior at low voltages that are below the programming threshold, we have modeled it as a resistor. Tensorflow is used to train the neural network to get the weights, which are converted into conductances based on the weight mapping in (Eq. <xref ref-type="disp-formula" rid="e10">10</xref>) and (Eq. <xref ref-type="disp-formula" rid="e11">11</xref>). Finally, another Tensorflow model with identical neural network topology but more accurate hardware behavior (e.g., stochastic behavior of neurons, conductance-based weights, etc.) is used to estimate the hardware performance on the dataset.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Simulation strategy used in this work, which combines SPICE-level simulation with Tensorflow neural network simulation.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g011.tif"/>
</fig>
</sec>
<sec id="s5-2">
<title>5.2 Performance on the MNIST dataset</title>
<p>We tested the proposed design approach for an MLP with 1000 hidden neurons on the MNIST handwritten digit dataset (<ext-link ext-link-type="uri" xlink:href="http://yann.lecun.com/exdb/mnist/">http://yann.lecun.com/exdb/mnist/</ext-link>). The results are shown in <xref ref-type="fig" rid="F12">Figure 12A</xref>. Without considering noise, the software and hardware simulations produce almost identical results, with accuracies in the high 90s. These accuracies hold approximately constant across different levels of weight precision, from 3 to 10 bits. Note that 1- and 2-bit precision results were much lower and are not included in the plot. As noise effects are added to the simulation, the accuracy generally decreases. Low and moderate noise results are almost identical, while high noise gives a clear drop in accuracy. However, even with high noise magnitude, the degradation is less than 2%. We have also explored the effects of conductance variations on the proposed hardware design. These variations may arise from imprecise programming or drift of conductance values over time. Small conductance variations (e.g., 10%) have negligible effects on the test accuracy while larger variations (20% and more) can start to have considerable effects. Note that the effects of conductance variations are more pronounced at higher levels of weight precisions, which may motivate employing lower-precision devices or fewer of devices&#x2019; conductance levels.</p>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption>
<p>
<bold>(A)</bold> Test accuracy of the software and proposed hardware implementations of an MLP with 1000 hidden neurons on the MNIST varying levels of noise. <bold>(B)</bold> Test accuracy of the hardware under varying levels of conductance variations.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g012.tif"/>
</fig>
</sec>
<sec id="s5-3">
<title>5.3 Power analysis and comparison with other works</title>
<p>The power consumption of the proposed design was modeled by assuming that most of the power is consumed when a neuron pre-charges. The justification for this is that, especially for neurons with high fan-in the switching capacitance of the neuron&#x2019;s dynamic node will be much larger than the capacitance at other nodes in the circuit. Therefore, the power can be formulated as<disp-formula id="e16">
<mml:math id="m33">
<mml:mi>P</mml:mi>
<mml:mo>&#x2248;</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b7;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mstyle displaystyle="true">
<mml:munderover accentunder="false" accent="true">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:munderover>
</mml:mstyle>
<mml:mi>&#x3b1;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msup>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mi>f</mml:mi>
</mml:math>
<label>(16)</label>
</disp-formula>where <italic>&#x3b7;</italic> is a fitting parameter that comes from the extra power associated with the inverter, arbiter, etc., <italic>&#x3b1;</italic> is the switching activity factor, <italic>L</italic> is the number of layers, and <italic>C</italic>
<sup>
<italic>l</italic>
</sup> is the total switching capacitance of the layer. For the dynamic pipelining synchronization scheme, <italic>&#x3b1;</italic> &#x3d; 1, since each layer will pre-charge every clock cycle. In addition, the value of <italic>C</italic>
<sup>
<italic>l</italic>
</sup> is 3<italic>C</italic> times twice the number of synapses in a the layer (to account for both excitatory and inhibitory). The factor of 3 comes from each synapse&#x2019;s source, drain, and memristor capacitance. We have empirically found <italic>&#x3b7;</italic> &#x2248; 0.19. In <xref ref-type="fig" rid="F13">Figure 13</xref>, we show the power consumption for 100 randomly-sized 3-layer networks vs. the number of synapses and neurons in the network. For each network, both the inputs and weights were generated randomly. Furthermore, the network used a clock frequency of 10&#xa0;MHz. From this data, we estimate the energy efficiency of our design to be approximately 1.26&#xa0;fJ per classification per synapse. A comparison with similar works that designed MLPs for MNIST classification is shown in <xref ref-type="table" rid="T1">Table 1</xref>. Our work has slightly better energy efficiency than that reported in <xref ref-type="bibr" rid="B27">Yakopcic et al. (2015)</xref> while giving much better accuracy. 2T2R are needed to represent positive and negative weights. For our work, the number of transistors per neuron is 20. For <xref ref-type="bibr" rid="B27">Yakopcic et al. (2015)</xref> we estimate the number of transistors per neuron to be 4, and the number of transistors per neuron in (<xref ref-type="bibr" rid="B12">Jiang et al., 2018</xref>) is 30 (15 for excitatory and 15 for inhibitory).</p>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption>
<p>Power consumption vs. total number of synapses for the proposed design.</p>
</caption>
<graphic xlink:href="fnano-05-1128667-g013.tif"/>
</fig>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Comparison of memristor-based neuromorphic designs on MNIST classification.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">References</th>
<th align="center">Tech. Node</th>
<th align="center">Accuracy (%)</th>
<th align="center">Power (mW)</th>
<th align="center">Latency (ns)</th>
<th align="center">Energy/% accuracy</th>
<th align="center">Energy/Synapse (fJ)</th>
<th align="center">Transistors/Synapse</th>
<th align="center">Transistors/Neuron</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">
<xref ref-type="bibr" rid="B12">Jiang et al. (2018)</xref>
</td>
<td align="center">130&#xa0;nm</td>
<td align="center">86</td>
<td align="center">53</td>
<td align="center">80</td>
<td align="center">4.93 &#xd7; 10<sup>&#x2212;11</sup>&#xa0;J/%</td>
<td align="center">77.0</td>
<td align="center">2T2R</td>
<td align="center">30</td>
</tr>
<tr>
<td align="center">
<xref ref-type="bibr" rid="B27">Yakopcic et al. (2015)</xref>
</td>
<td align="center">45&#xa0;nm</td>
<td align="center">92</td>
<td align="center">1.79</td>
<td align="center">40</td>
<td align="center">7.78 &#xd7; 10<sup>&#x2212;13</sup>&#xa0;J/%</td>
<td align="center">1.30</td>
<td align="center">2T2R</td>
<td align="center">4</td>
</tr>
<tr>
<td align="center">This work</td>
<td align="center">130&#xa0;nm</td>
<td align="center">97</td>
<td align="center">10</td>
<td align="center">100</td>
<td align="center">1.03 &#xd7; 10<sup>&#x2212;11</sup>&#xa0;J/%</td>
<td align="center">1.26</td>
<td align="center">2T2R</td>
<td align="center">20</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s6">
<title>6 Conclusion and future work</title>
<p>This paper presented a novel architecture for memristor-based neuromorphic computing using domino logic. The key behavioral elements of the hardware, including noise-induced stochasticity, were captured in behavioral simulations using Tensorflow, and the design was analyzed on an MNIST classification task. Results indicate that the proposed design has slightly better energy efficiency (1.26 fJ/synapse) than competing approaches while providing much higher accuracy (97%). Possible avenues for future work include design of on-chip training circuitry and further energy reduction using additional low-power design techniques.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>HH helped design all circuits, carried out all simulations, analyzed the data, and wrote the manuscript. CM conceived the domino logic design idea, helped design all circuits, wrote the manuscript, and supervised the study.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>Funding for this work was provided by the Rochester Institute of Technology Computer Engineering Department.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bavandpour</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sahay</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mahmoodi</surname>
<given-names>M. R.</given-names>
</name>
<name>
<surname>Strukov</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Efficient mixed-signal neurocomputing via successive integration and rescaling</article-title>. <source>IEEE Trans. Very Large Scale Integration Syst.</source> <volume>28</volume>, <fpage>823</fpage>&#x2013;<lpage>827</lpage>. <pub-id pub-id-type="doi">10.1109/tvlsi.2019.2946516</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A review of emerging non-volatile memory (nvm) technologies and applications</article-title>. <source>Solid-State Electron.</source> <volume>125</volume>, <fpage>25</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1016/j.sse.2016.07.006</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chua</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>If it&#x2019;s pinched it&#x2019;sa memristor</article-title>. <source>Semicond. Sci. Technol.</source> <volume>29</volume>, <fpage>104001</fpage>. <pub-id pub-id-type="doi">10.1088/0268-1242/29/10/104001</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dahl</surname>
<given-names>G. E.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Acero</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition</article-title>. <source>IEEE Trans. audio, speech, Lang. Process.</source> <volume>20</volume>, <fpage>30</fpage>&#x2013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1109/tasl.2011.2134090</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davies</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Srinivasa</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>T.-H.</given-names>
</name>
<name>
<surname>Chinya</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Choday</surname>
<given-names>S. H.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Loihi: A neuromorphic manycore processor with on-chip learning</article-title>. <source>Ieee Micro</source> <volume>38</volume>, <fpage>82</fpage>&#x2013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1109/mm.2018.112130359</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Douglas</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Mahowald</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mead</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Neuromorphic analogue vlsi</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>18</volume>, <fpage>255</fpage>&#x2013;<lpage>281</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.ne.18.030195.001351</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Everson</surname>
<given-names>L. R.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pande</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>C. H.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>A 104.8 tops/w one-shot time-based neuromorphic chip employing dynamic threshold error correction in 65nm</article-title>,&#x201d; in <conf-name>2018 IEEE Asian Solid-State Circuits Conference (A-SSCC)</conf-name> (<publisher-loc>Tainan, Taiwan</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>273</fpage>&#x2013;<lpage>276</lpage>. <pub-id pub-id-type="doi">10.1109/ASSCC.2018.8579302</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freye</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lou</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bengel</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Menzel</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wiefels</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Gemmeke</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Memristive devices for time domain compute-in-memory</article-title>. <source>IEEE J. Explor. Solid-State Comput. Devices Circuits</source> <volume>8</volume>, <fpage>119</fpage>&#x2013;<lpage>127</lpage>. <pub-id pub-id-type="doi">10.1109/jxcdc.2022.3217098</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harris</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Horowitz</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Skew-tolerant domino circuits</article-title>. <source>IEEE J. Solid-State Circuits</source> <volume>32</volume>, <fpage>1702</fpage>&#x2013;<lpage>1711</lpage>. <pub-id pub-id-type="doi">10.1109/4.641690</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hendy</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Merkel</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Review of spike-based neuromorphic computing for brain-inspired vision: Biology, algorithms, and hardware</article-title>. <source>J. Electron. Imaging</source> <volume>31</volume>, <fpage>010901</fpage>. <pub-id pub-id-type="doi">10.1117/1.jei.31.1.010901</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hubara</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Courbariaux</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Soudry</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>El-Yaniv</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Quantized neural networks: Training neural networks with low precision weights and activations</article-title>. <source>J. Mach. Learn. Res.</source> <volume>18</volume>, <fpage>6869</fpage>&#x2013;<lpage>6898</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yamada</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Kwok</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). &#x201c;<article-title>Pulse-width modulation based dot-product engine for neuromorphic computing system using memristor crossbar array</article-title>,&#x201d; in <conf-name>2018 IEEE International Symposium on Circuits and Systems (ISCAS)</conf-name> (<publisher-loc>Florence, Italy</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2018.8351276</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Jouppi</surname>
<given-names>N. P.</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Patil</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Agrawal</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bajwa</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). &#x201c;<article-title>In-datacenter performance analysis of a tensor processing unit</article-title>,&#x201d; in <conf-name>Proceedings of the 44th annual international symposium on computer architecture</conf-name> (<publisher-loc>Toronto, ON, Canada</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1145/3079856.3080246</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>L.-S.</given-names>
</name>
<name>
<surname>Dutton</surname>
<given-names>R. W.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Metastability of cmos latch/flip-flop</article-title>. <source>IEEE J. solid-state circuits</source> <volume>25</volume>, <fpage>942</fpage>&#x2013;<lpage>951</lpage>. <pub-id pub-id-type="doi">10.1109/4.58286</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>E. H.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>S. S.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Analysis and design of a passive switched-capacitor matrix multiplier for approximate computing</article-title>. <source>IEEE J. Solid-State Circuits</source> <volume>52</volume>, <fpage>261</fpage>&#x2013;<lpage>271</lpage>. <pub-id pub-id-type="doi">10.1109/jssc.2016.2599536</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marinella</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hsia</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Jacobs-Gedrim</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Niroula</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Multiscale co-design analysis of energy, latency, area, and accuracy of a reram analog neural training accelerator</article-title>. <source>IEEE J. Emerg. Sel. Top. Circuits Syst.</source> <volume>8</volume>, <fpage>86</fpage>&#x2013;<lpage>101</lpage>. <pub-id pub-id-type="doi">10.1109/jetcas.2018.2796379</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Merkel</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Current-mode memristor crossbars for neuromorphic computing</article-title>,&#x201d; in <conf-name>Proceedings of the 7th Annual Neuro-inspired Computational Elements Workshop</conf-name>, <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1145/3320288.3320298</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Merkel</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kudithipudi</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Neuromemristive systems: A circuit design perspective</article-title>,&#x201d; in <source>Advances in neuromorphic hardware exploiting emerging nanoscale devices</source>. <source>Cognitive systems monographs</source>. Editor <person-group person-group-type="editor">
<name>
<surname>Suri</surname>
<given-names>M.</given-names>
</name>
</person-group> (<publisher-loc>New Delhi</publisher-loc>: <publisher-name>Springer</publisher-name>), <volume>Vol. 31</volume>, <fpage>45</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1007/978-81-322-3703-7_3</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nandakumar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kulkarni</surname>
<given-names>S. R.</given-names>
</name>
<name>
<surname>Babu</surname>
<given-names>A. V.</given-names>
</name>
<name>
<surname>Rajendran</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Building brain-inspired computing systems: Examining the role of nanoscale devices</article-title>. <source>IEEE Nanotechnol. Mag.</source> <volume>12</volume>, <fpage>19</fpage>&#x2013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1109/mnano.2018.2845078</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prezioso</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Merrikh-Bayat</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hoskins</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Adam</surname>
<given-names>G. C.</given-names>
</name>
<name>
<surname>Likharev</surname>
<given-names>K. K.</given-names>
</name>
<name>
<surname>Strukov</surname>
<given-names>D. B.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Training and operation of an integrated neuromorphic network based on metal-oxide memristors</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>61</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1038/nature14441</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rueckauer</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lungu</surname>
<given-names>I.-A.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Pfeiffer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.-C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Conversion of continuous-valued deep networks to efficient event-driven networks for image classification</article-title>. <source>Front. Neurosci.</source> <volume>11</volume>, <fpage>682</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2017.00682</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sahay</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bavandpour</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mahmoodi</surname>
<given-names>M. R.</given-names>
</name>
<name>
<surname>Strukov</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Energy-efficient moderate precision time-domain mixed-signal vector-by-matrix multiplier exploiting 1t-1r arrays</article-title>. <source>IEEE J. Explor. Solid-State Comput. Devices Circuits</source> <volume>6</volume>, <fpage>18</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1109/jxcdc.2020.2981048</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Schuman</surname>
<given-names>C. D.</given-names>
</name>
<name>
<surname>Potok</surname>
<given-names>T. E.</given-names>
</name>
<name>
<surname>Patton</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Birdwell</surname>
<given-names>J. D.</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Rose</surname>
<given-names>G. S.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <source>A survey of neuromorphic computing and neural networks in hardware</source>. <comment>
<italic>arXiv preprint arXiv:1705.06963</italic>
</comment>.</citation>
</ref>
<ref id="B24">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Seide</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2011</year>). &#x201c;<article-title>Conversational speech transcription using context-dependent deep neural networks</article-title>,&#x201d; in <conf-name>Twelfth annual conference of the international speech communication association</conf-name>. <pub-id pub-id-type="doi">10.5555/3042573.3042574</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinangil</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Erbagci</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Naous</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Akarvardar</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Khwa</surname>
<given-names>W.-S.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>A 7-nm compute-in-memory sram macro supporting multi-bit input, weight and output and achieving 351 tops/w and 372.4 gops</article-title>. <source>IEEE J. Solid-State Circuits</source> <volume>56</volume>, <fpage>188</fpage>&#x2013;<lpage>198</lpage>. <pub-id pub-id-type="doi">10.1109/jssc.2020.3031290</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sung</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yoo</surname>
<given-names>I. K.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Perspective: A review on memristive hardware for neuromorphic computation</article-title>. <source>J. Appl. Phys.</source> <volume>124</volume>, <fpage>151903</fpage>. <pub-id pub-id-type="doi">10.1063/1.5037835</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Yakopcic</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hasan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Taha</surname>
<given-names>T. M.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Memristor based neuromorphic circuit for <italic>ex-situ</italic> training of multi-layer neural network algorithms</article-title>,&#x201d; in <conf-name>2015 International Joint Conference on Neural Networks (IJCNN)</conf-name> (<publisher-loc>Killarney, Ireland</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN.2015.7280813</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>