<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title>Frontiers in Computational Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5188</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fncom.2023.1114651</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Energy-based analog neural network framework</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Watfa</surname> <given-names>Mohamed</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2082056/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Garcia-Ortiz</surname> <given-names>Alberto</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2217686/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Sassatelli</surname> <given-names>Gilles</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/5281/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>LIRMM, University of Montpellier, CNRS</institution>, <addr-line>Montpellier</addr-line>, <country>France</country></aff>
<aff id="aff2"><sup>2</sup><institution>ITEM, University of Bremen</institution>, <addr-line>Bremen</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Kechen Zhang, Johns Hopkins University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Akhilesh Jaiswal, University of Southern California, United States; Cory Merkel, Rochester Institute of Technology, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Alberto Garcia-Ortiz <email>agarcia&#x00040;item.uni-bremen.de</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>03</day>
<month>03</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>17</volume>
<elocation-id>1114651</elocation-id>
<history>
<date date-type="received">
<day>02</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>02</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Watfa, Garcia-Ortiz and Sassatelli.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Watfa, Garcia-Ortiz and Sassatelli</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Over the past decade a body of work has emerged and shown the disruptive potential of neuromorphic systems across a broad range of studies, often combining novel machine learning models and nanotechnologies. Still, the scope of investigations often remains limited to simple problems since the process of building, training, and evaluating mixed-signal neural models is slow and laborious. In this paper, we introduce an open-source framework, called EBANA, that provides a unified, modularized, and extensible infrastructure, similar to conventional machine learning pipelines, for building and validating analog neural networks (ANNs). It uses Python as interface language with a syntax similar to Keras, while hiding the complexity of the underlying analog simulations. It already includes the most common building blocks and maintains sufficient modularity and extensibility to easily incorporate new concepts, electrical, and technological models. These features make EBANA suitable for researchers and practitioners to experiment with different design topologies and explore the various tradeoffs that exist in the design space. We illustrate the framework capabilities by elaborating on the increasingly popular Energy-Based Models (EBMs), used in conjunction with the local Equilibrium Propagation (EP) training algorithm. Our experiments cover 3 datasets having up to 60,000 entries and explore network topologies generating circuits in excess of 1,000 electrical nodes that can be extensively benchmarked with ease and in reasonable time thanks to the native EBANA parallelization capability.</p>
</abstract>
<kwd-group>
<kwd>neural networks</kwd>
<kwd>energy-based models</kwd>
<kwd>equilibrium propagation</kwd>
<kwd>framework</kwd>
<kwd>analog</kwd>
<kwd>mixed-signal</kwd>
<kwd>SPICE</kwd>
</kwd-group>
<counts>
<fig-count count="10"/>
<table-count count="2"/>
<equation-count count="9"/>
<ref-count count="31"/>
<page-count count="15"/>
<word-count count="9890"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>The past decade has seen a remarkable series of advances in deep learning (DL) approaches based on artificial neural networks (ANN). In the drive toward better accuracy, the complexity, and resource utilization of state-of-the-art (SOTA) models have been increasing at such an astounding rate that training and deploying these models often require computational and energy resources that lie outside the reach of most resource-constrained edge environments (Bianco et al., <xref ref-type="bibr" rid="B3">2018</xref>). As a result, most of the training and processing has been done in data centers that require access to the cloud. However, energy cost, scalability, latency, data privacy, etc., pose serious challenges to existing cloud computing. Alternatively, edge computing has emerged as an attractive possibility (Wang et al., <xref ref-type="bibr" rid="B28">2020</xref>).</p>
<p>The high computational and power demands of DL are driven by two key factors. The first is the efficiency of the DL algorithms. Current SOTA models require multiply-and-accumulate operations (MACs) that number in the billions. For example, VGGNet (Simonyan and Zisserman, <xref ref-type="bibr" rid="B26">2015</xref>), a model that enabled significant accuracy improvements in the ImageNet challenge, required 138M parameters and 15.5G MACs. These numbers are even higher for current SOTA models (Sevilla et al., <xref ref-type="bibr" rid="B24">2022</xref>).</p>
<p>The second component of the power equation is tied to the hardware architecture on which the DL workloads are executed. Machine learning and other data intensive workloads are fundamentally limited by computing systems based on the von Neumann architecture, which has separate memory and processing units, and thus wastes a lot of energy in memory access and data movement. For instance, to support its 724M MACs, AlexNet requires nearly 3 billion DRAM accesses, where fetching data from off-chip DRAM costs 200 &#x000D7; more energy compared to fetching data from the register file (Sze et al., <xref ref-type="bibr" rid="B27">2017</xref>).</p>
<p>With energy-efficiency being a primary concern, the success of bringing intelligence to the edge is pivoted on innovative circuits and hardware that simultaneously take into account the computation and communication that are required. Consequently, recent hardware architectures for DL show an evolution toward &#x0201C;in/near-memory&#x0201D; computing with the goal of reducing data movement as much as possible. One category of such architectures, the so-called Processing-In-Memory (PIM), consists in removing the necessity of moving data to the processing units by performing the computations inside the memory. This approach is commonly implemented by exploiting the analog characteristics of emerging non-volatile memories (NVM) such as ReRAM crossbars, though it is also possible to leverage mature CMOS-based technologies (Kim et al., <xref ref-type="bibr" rid="B13">2017</xref>). Furthermore, as ANN inference is inherently resilient to noise, this opens the opportunity to embrace analog computing, which can be much more efficient than digital especially in the low SNR (signal-to-noise ratio) regime (Murmann et al., <xref ref-type="bibr" rid="B21">2015</xref>). This work targets this class of ANNs.</p>
<p>Due to the highly demanding device and circuit requirements for accurate neural network training (Gokmen and Vlasov, <xref ref-type="bibr" rid="B7">2016</xref>), most mixed-signal implementations are inference-only. While the optimal implementation of the memory devices is an on-going challenge, there is an opportunity to simplify the circuit requirements by considering learning algorithms that are well-matched with the underlying hardware. One such algorithm is the Equilibrium Propagation (EP) algorithm that leverages the fact that the equilibrium point of a circuit corresponds to the minimization of an abstract energy function (Scellier and Bengio, <xref ref-type="bibr" rid="B23">2017</xref>), whose definition is discussed in Section 2. By allowing the bidirectional flow of signals, the EP method forgoes the need for a dedicated circuit during the backward phase of training, while also keeping the overhead of the periphery circuit that supports it to a minimum as there is no need for analog-to-digital converters between layers.</p>
<p>Given the growing rate of machine learning workloads, it is of paramount importance to have a framework that is capable of performing a comprehensive comparison across different accelerator designs and identify those that are most suitable for performing a particular ML task. Thanks to ML frameworks such as Google&#x00027;s Tensorflow and Keras, the ease of creating and training models is far less daunting than it was in the past. While training an analog neural network with EP could in theory be possible in Tensorflow, there are three major difficulties:</p>
<list list-type="bullet">
<list-item><p>First, the current-voltage (I-V) characteristic of each circuit element has to be completely defined. This also calls for the implementation of a non-linear equation solver.</p></list-item>
<list-item><p>Second, the network layers have to be designed in such a way that they can influence each other in both directions. Without the loading effect, the model will fail to learn.</p></list-item>
<list-item><p>Finally, implementing procedures that involve iterative updates, like differential equations, within automatic differentiation libraries like Tensorflow, would mean that we need to store all the temporary iterates created during this solution for each time step. This requires storing a great deal of information in memory. As will be explained later, when implemented on analog circuits, the EP method requires the data points at only two time steps.</p></list-item>
</list>
<p>Based on the above motivations, this work introduces an exploratory framework called EBANA (Energy-Based ANAlog neural networks), built in the spirit of Keras<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> with two goals in mind: ease-of-use and flexibility. By hiding the complexity inherent to machine learning and analog electronics behind a simple and intuitive API, the framework facilitates experimentation with different network topologies and the exploration of the various trade-offs that exist in the design space.</p>
<p>In the relatively few studies that strive to understand the inner workings of the EP algorithm on analog hardware, we observe several cases where our framework could prove immediately beneficial. In Kiraz et al. (<xref ref-type="bibr" rid="B14">2022</xref>), the authors studied the impacts of the learning rate and the scaling factor of the feedback current on the algorithm convergence. Their experiments were carried out on a simple two-input-one-output circuit, and, therefore, it is not clear whether their results generalize to more complex circuits. The size of the network in our framework is limited only by the underlying SPICE simulator, thus facilitating much more comprehensive studies. In Foroushani et al. (<xref ref-type="bibr" rid="B5">2020</xref>), the authors built a circuit based on the continuous Hopfield model to learn the XOR circuit. The modularity and graph-based data structure of our framework can easily accommodate new analog blocks and topologies, making it easy to study their effect on accuracy, power estimation, etc., as the network size grows.</p>
<p>Although research on EBM-based ANN accelerators is still in its early stages, a substantial amount of work has been done on non-EBM-based accelerators. Most of these accelerators are designed for inference only, as on-chip training has proven to be challenging (Krestinskaya et al., <xref ref-type="bibr" rid="B15">2018</xref>). To achieve speed and energy savings, these accelerators embed the computations inside memory elements such as emerging non-volatile memory (Li et al., <xref ref-type="bibr" rid="B18">2015</xref>; Hu et al., <xref ref-type="bibr" rid="B9">2016</xref>; Shafiee et al., <xref ref-type="bibr" rid="B25">2016</xref>; Gokmen et al., <xref ref-type="bibr" rid="B6">2018</xref>), floating-gate transistors (Agarwal et al., <xref ref-type="bibr" rid="B1">2019</xref>; Park et al., <xref ref-type="bibr" rid="B22">2019</xref>), or volatile capacitive memories (Boser et al., <xref ref-type="bibr" rid="B4">1991</xref>; Bankman et al., <xref ref-type="bibr" rid="B2">2019</xref>). For a more comprehensive overview of ANN architectures, the reader is referred to (Xiao et al., <xref ref-type="bibr" rid="B30">2020</xref>).</p>
<p>This paper is organized as follows. In Section 2, we give a very brief introduction into energy based learning, and explain why it is a natural fit for analog systems. In Section 3, we provide an overview of the internals of our API, and illustrate with an example how quickly and easily models can be created. In Section 4, we validate our framework by training an analog circuit on a non-trivial ML task, evaluate the performance, and show how the framework can be extended. Finally, we discuss the conclusions and further work.</p>
<p>In this work, we expand upon our previous introduction of the EBANA framework in Watfa et al. (<xref ref-type="bibr" rid="B29">2022</xref>) by elaborating on the relationship between energy-based models and electrical circuits. Specifically, we demonstrate how the energy function can be shaped and modified by the learning process and examine the impact of various parameters on the learning capacity of the analog circuit. Additionally, we discuss the potential for interfacing an analog neural network based on the EP algorithm with one based on the backpropagation algorithm in a mixed-mode design.</p>
</sec>
<sec id="s2">
<title>2. Energy based learning</title>
<p>The main goal of deep learning or statistical modeling is to find the dependencies between variables. Energy Based Models (EBMs) encode these dependencies in the form of an energy function <italic>E</italic> that assigns low energies to correct configurations and high energies to incorrect configurations. However, unlike statistical models which must be properly normalized, EBMs have no such requirements (LeCun et al., <xref ref-type="bibr" rid="B17">2006</xref>), and, as such, can be applied to a wider set of problems.</p>
<p>Two aspects must be considered when training EBMs. The first is finding an energy function that is rich enough to model the dependency between the input and output. This is usually tied to the architecture of the network. The second is shaping the energy function so that the desired input-output combinations have lower energy than all other (undesired) values. In the following sections, we consider one example of such a method, explain how it works, and discuss how it can be used to train analog neural networks.</p>
<sec>
<title>2.1. An alternative to backpropagation</title>
<p>The success of deep neural networks can be attributed to the backpropagation (BP) algorithm, which exploits the chain rule of derivatives to compute updates for the parameters in the network during learning. In spite of its success, BP poses a few difficulties for implementation in hardware. The requirement for different circuits in both phases of training is one of the core issues that the EP learning framework sets out to address (Scellier and Bengio, <xref ref-type="bibr" rid="B23">2017</xref>). It involves only local computations while leveraging the dynamics of energy-based physical systems. It has been used to train Spiking Neural Networks (Martin et al., <xref ref-type="bibr" rid="B20">2021</xref>) and in the bidirectional learning of Recurrent Neural Networks (Laborieux et al., <xref ref-type="bibr" rid="B16">2020</xref>).</p>
<p>The EP algorithm is a contrastive learning method in which the gradient of the loss function is defined as the difference between the equilibrium state energies of two different phases of the network. The two phases are as follows. In the <italic>free</italic> phase, the input is presented to the network and the network is allowed to settle into a <italic>free</italic> equilibrium state, thereby minimizing its energy. Once equilibrium is reached, inference result is available at the output neurons. In the second, <italic>nudging</italic> phase, an error is introduced to the output neurons, and the network settles into a <italic>weakly-clamped</italic> equilibrium state, which is closer to the desired state than the <italic>free</italic> equilibrium state. The parameters of the network are then updated based on these two equilibrium states. The idea is depicted in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Equilibrium propagation algorithm: <bold>(A)</bold> In the <italic>free</italic> phase, the input is presented to the network and the network settles in an equilibrium state. <bold>(B)</bold> In the <italic>nudging</italic> phase, an error signal (depicted in red) is introduced at the output, forcing the network to settle in nearby equilibrium state, having a slightly lower energy than the <italic>free</italic> equilibrium state. The parameters are updated based on these two states.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0001.tif"/>
</fig>
</sec>
<sec>
<title>2.2. Constructing the energy function</title>
<p>Supervised learning in a neural network is driven by the optimization of an error function of the output. A common objective is the minimization of mean squared error (MSE) or the cross-entropy of the network&#x00027;s output and the target output. However, in energy-based models the optimization objective is not a function of the output, but some scalar energy function of the entire network state.</p>
<p>The design of the energy <italic>E</italic> can be inspired from physics or hand-crafted based on the network architecture. An early example of EMBs is the Hopfield network and its stochastic variant, the Restricted Boltzmann Machine (Hinton, <xref ref-type="bibr" rid="B8">2012</xref>). In these networks, the energy function is constructed by observing that a neuron only flips when the state of the neuron is opposite that of the field. The energy function is defined as the negative sum of the output of all the neurons, a number bounded by the parameters of the network. As the neurons flip, the overall energy of the system decreases until a configuration is reached that corresponds to the minimum of the energy function. The energy function of the RBM is presented below.</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>v</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>h</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>b</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mstyle mathvariant="bold"><mml:mi>v</mml:mi></mml:mstyle><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>c</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mstyle mathvariant="bold"><mml:mi>h</mml:mi></mml:mstyle><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>v</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mstyle mathvariant="bold"><mml:mi>W</mml:mi></mml:mstyle><mml:mstyle mathvariant="bold"><mml:mi>h</mml:mi></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic><bold>&#x003B8;</bold></italic> &#x0003D; (<italic><bold>b</bold></italic>, <italic><bold>c</bold></italic>, <italic><bold>W</bold></italic>) are the real-valued parameters of the model. <italic><bold>b</bold></italic> and <italic><bold>c</bold></italic> are the bias vectors, and <italic><bold>W</bold></italic> is the weight matrix. The parameters represent the preference of the model for a particular value of <italic><bold>v</bold></italic> or <italic><bold>h</bold></italic>.</p>
<p>Despite being an energy-based model, the RBM is trained using maximum-likelihood estimation (MLE) (Hinton, <xref ref-type="bibr" rid="B8">2012</xref>), a standard method for training probabilistic models. The basic idea is to find the parameters of the network that maximize the likelihood of the dataset. This is a very slow and computationally expensive process, especially when the dimensionality of the dataset is high, as it requires sampling from the joint distribution of <italic><bold>v</bold></italic> and <italic><bold>h</bold></italic>. The EP algorithm is able to avoid this by introducing a cost function to the energy function that nudges the system toward a state that reduces the cost value.</p>
<p>In the EP algorithm, the state <italic><bold>s</bold></italic> of the system is governed by the network energy function</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic><bold>&#x003B8;</bold></italic> &#x0003D; (<italic><bold>W</bold></italic>, <italic><bold>b</bold></italic>) are the network parameters, <italic><bold>x</bold></italic> is the input to the network, <italic><bold>y</bold></italic> is the target output, and <italic><bold>s</bold></italic> &#x0003D; {<italic><bold>h</bold></italic>, <italic><bold>&#x00177;</bold></italic>} is the collection of neuron states, comprised of the hidden and output neurons, respectively.</p>
<p>The total energy function <italic>F</italic> is composed of two sub-parts: the internal energy <italic>E</italic>, which is a measure of the interaction of the neurons in the absence of any external force, and the external energy or cost function <italic>C</italic>, modulated by the influence parameter &#x003B2;. The states are gradually updated over time to minimize the overall energy. The introduction of the cost function to the energy function is one of the main features that distinguishes the EP algorithm from other EBM-based algorithms.</p>
</sec>
<sec>
<title>2.3. Equilibrium propagation algorithm</title>
<p>Given a training example (<italic><bold>x</bold></italic><sup>(<italic>i</italic>)</sup>, <italic><bold>y</bold></italic><sup>(<italic>i</italic>)</sup>) and <italic><bold>&#x003B8;</bold></italic> in the absence of an external potential (&#x003B2; &#x0003D; 0), the system reaches a state <inline-formula><mml:math id="M3"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> that minimizes the internal energy <italic>E</italic>(<italic><bold>&#x003B8;</bold></italic>, <italic><bold>x</bold></italic>, <italic><bold>s</bold></italic>). The cost function <italic>C</italic>(<italic><bold>&#x003B8;</bold></italic>, <italic><bold>x</bold></italic>, <italic><bold>y</bold></italic>, <italic><bold>s</bold></italic>) evaluates the quality of <italic><bold>s</bold></italic><sup>0</sup> in mapping <italic><bold>x</bold></italic><sup>(<italic>i</italic>)</sup> to <italic><bold>y</bold></italic><sup>(<italic>i</italic>)</sup>. If <italic><bold>s</bold></italic><sup>0</sup> isn&#x00027;t adequate, a force proportional to <inline-formula><mml:math id="M4"><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow></mml:mfrac></mml:math></inline-formula> is applied to drive the output units toward their target, moving the system to a nearby state <inline-formula><mml:math id="M5"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> that has a lower prediction error. As opposed to RBMs where the output units are clamped to the desired values during the second phase of training, the output units are driven to the desired values in the EP algorithm, hence the term <italic>weakly-clamped</italic>. The perturbation at the outputs propagates across the hidden layers, causing the network to relax at a nearby state <italic><bold>s</bold></italic><sup>&#x003B2;</sup>, which is better than <italic><bold>s</bold></italic><sup>0</sup> in terms of the prediction error. This corresponds to &#x0201C;pushing down&#x0201D; the energy of <italic><bold>s</bold></italic><sup>&#x003B2;</sup>, and &#x0201C;pulling up&#x0201D; the energy of <italic><bold>s</bold></italic><sup>0</sup>. A demonstration of this is presented in the next section.</p>
<p>The EP training algorithm is presented in <xref ref-type="table" rid="T3">Algorithm 1</xref>. Equation (3) shows how we can update the parameters of the network between the two phases. It is an approximation of the derivative of the loss function with respect to &#x003B2; (hence the <inline-formula><mml:math id="M6"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula> term). For the interest of brevity, the reader is directed to the source material (Scellier and Bengio, <xref ref-type="bibr" rid="B23">2017</xref>) for a detailed derivation of the equation.</p>
<table-wrap position="float" id="T3">
<label>Algorithm 1</label>
<caption><p>Equilibrium propagation.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left">1. Fix the inputs and allow the system to settle in <italic><bold>s</bold></italic><sup>0</sup> that corresponds to the local minimum of <italic>E</italic>(<italic><bold>&#x003B8;</bold></italic>, <italic><bold>x</bold></italic>, <italic><bold>s</bold></italic>) or <italic>F</italic>(<italic><bold>&#x003B8;</bold></italic>, <italic><bold>x</bold></italic>, <italic><bold>y</bold></italic>, 0, <italic><bold>s</bold></italic>). Collect <inline-formula><mml:math id="M7"><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. This is the <bold>free phase</bold>.</td>
</tr>
<tr>
<td valign="top" align="left">2. With the input still fixed, nudge the output units toward their target values. Allow the system to settle in a new but nearby fixed point <italic><bold>s</bold></italic><sup>&#x003B2;</sup> that corresponds to slightly smaller prediction error. Collect <inline-formula><mml:math id="M8"><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. This is the <bold>nudging phase</bold>.</td>
</tr>
<tr>
<td valign="top" align="left">3. Update the parameter <italic><bold>&#x003B8;</bold></italic> according to</td>
</tr>
<tr>
<td valign="top" align="left"><disp-formula id="E3"><label>(3)</label><mml:math id="M9"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x00394;</mml:mi><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>&#x0221D;</mml:mo><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>x</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>y</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>s</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr></mml:mtr></mml:mtable></mml:math></disp-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Compared to backpropagation, there are two important differences that make this approach especially attractive for implementation in hardware: First, propagating the errors toward the input does not require a special computational circuit (which is the case for backpropagation). Second, the learning rule is local due to the sum-separability property of the energy function in physical systems. We also touch on this in the next section.</p>
<p>The EP algorithm can be implemented on digital hardware using a discrete-time implementation of the state dynamics (Ji and Gross, <xref ref-type="bibr" rid="B10">2020</xref>). However, this is a slow process as it involves long phases of numerical optimization before convergence, in essence similar to a simulation. As the EP algorithm is inherently a continuous-time optimization method, this motivates the exploration of analog implementations.</p>
<p>Several works have proposed analog implementations of EP in the context of Hopfield networks (Foroushani et al., <xref ref-type="bibr" rid="B5">2020</xref>; Zoppo et al., <xref ref-type="bibr" rid="B31">2020</xref>). A recent study showed that a class of analog neural networks called non-linear resistive networks are EBMs and possess an energy function whose stationary point is the steady-state solution of the analog circuit (Kendall et al., <xref ref-type="bibr" rid="B12">2020</xref>). This result provides theoretical ground for implementing an end-to-end hardware that performs inference and training on the same circuit. Consequently, it serves as the inspiration on which our framework is based.</p>
</sec>
<sec>
<title>2.4. Example: A simple regression model</title>
<p>In this section, we elaborate on the learning process of an EBM by demonstrating the construction of the energy function and how the training process shapes the energy surface. To visualize the actual surface rather than its projection to a lower dimension, we construct a contrived example of a simple regression model that can learn the dataset shown in <xref ref-type="fig" rid="F2">Figure 2A</xref>. This shape was chosen for two reasons: (1) It can be implemented in real circuit components, such as a diode. (2) The pseudo-power of the circuit can be easily calculated.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Example of a learning task in EBM: <bold>(A)</bold> Dataset which the model learns. Each dot represents a sample point from the dataset. The dashed line represents the regression line. <bold>(B)</bold> Circuit model that learns the regression line. &#x003D5; is nonlinear function that relates the voltage to the current. The red arrow shows the direction of the current through the non-linear element.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0002.tif"/>
</fig>
<p><xref ref-type="fig" rid="F2">Figure 2B</xref> shows the schematic of the model. The input is provided through <italic>V</italic><sub>in</sub> and the output is taken from node <italic>X</italic>. The second input is held at a fixed voltage <italic>V</italic><sub>bias</sub>. Connection to the output node is made through series conductances <italic>G</italic><sub>1</sub> and <italic>G</italic><sub>2</sub>. A non-linear element is attached to node <italic>X</italic>. It has two regions of operation: it behaves as an open-circuit when <italic>V</italic><sub>out</sub>&#x0003C;<italic>V</italic><sub>TH</sub> and as a voltage source, <italic>V</italic><sub>TH</sub>, in series with a resistance <italic>r</italic><sub><italic>on</italic></sub> when <italic>V</italic><sub>out</sub>&#x0003E;<italic>V</italic><sub>TH</sub>.</p>
<p>The &#x0201C;energy function&#x0201D; of non-linear resistive networks is a quantity called the total pseudo-power of the circuit (Johnson, <xref ref-type="bibr" rid="B11">2010</xref>), and its existence can be derived directly from Kirchhoff&#x00027;s laws. Moreover, this energy function has the sum-separability property: the total pseudo-power of the circuit is the sum of the pseudo-powers of its individual elements. It can be shown that the pseudo-power of a two-terminal element with terminals <italic>i</italic> and <italic>j</italic>, characterized by a well-defined and continuous current-voltage characteristic <italic>I</italic><sub><italic>ij</italic></sub> &#x0003D; &#x003D5;<sub><italic>ij</italic></sub>(&#x00394;<italic>V</italic><sub><italic>ij</italic></sub>) is given by</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x0222B;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>d</mml:mtext><mml:mi>v</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The quantity <italic>p</italic><sub><italic>ij</italic></sub>(&#x00394;<italic>V</italic><sub><italic>ij</italic></sub>) has the physical dimensions of power, being a product of a voltage and a current.</p>
<p>With the above definition, and the sum-separability property of the energy function, the total pseudo-power of the circuit shown in <xref ref-type="fig" rid="F2">Figure 2B</xref> can now be calculated.</p>
<disp-formula id="E5"><mml:math id="M11"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x0222B;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">out</mml:mtext></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">in</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mi>v</mml:mi><mml:mtext>d</mml:mtext><mml:mi>v</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x0222B;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">out</mml:mtext></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">bias</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mi>v</mml:mi><mml:mtext>d</mml:mtext><mml:mi>v</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x0222B;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">out</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mi>v</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">TH</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">on</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>d</mml:mtext><mml:mi>v</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(5)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">out</mml:mtext></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">in</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">out</mml:mtext></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">bias</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">out</mml:mtext></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">TH</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">TH</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">on</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Given <italic><bold>&#x003B8;</bold></italic> &#x0003D; (<italic>G</italic><sub>1</sub>, <italic>G</italic><sub>2</sub>) and <italic>V</italic><sub>in</sub>, the energy function associates with each state <italic>s</italic> &#x0003D; {<italic>V</italic><sub>out</sub>} a real number <italic>E</italic>(<italic><bold>&#x003B8;</bold></italic>, <italic>V</italic><sub><italic>in</italic></sub>, <italic>s</italic>). For a given input, the effective state <inline-formula><mml:math id="M13"><mml:msup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B8;</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">in</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is the state <italic>s</italic> that minimizes the energy function; i.e., <italic>s</italic><sup>&#x022C6;</sup> such that <inline-formula><mml:math id="M14"><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>. In a non-linear resistive network with two-terminal components, this equilibrium state is exactly the steady state of the circuit imposed by Kirchhoff&#x00027;s laws.</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> shows three snapshots of the energy surface during the course of training. In the leftmost plot, initializing the network with random conductances defines an energy surface that associates low energy with states (depicted with red dots on the <italic>xy</italic>-plane) different from the desired ones (depicted with blue dots). The goal of training is to adjust the conductance values to generate an energy surface that associates low energy with the desired states. In some cases, this may not be possible if the energy function is not expressive enough. For instance, there is no set of conductance values that can mold the energy surface to produce equilibrium points defined along a parabola for the circuit in <xref ref-type="fig" rid="F2">Figure 2B</xref>. However, as shown in the rightmost plot, it is possible to obtain a set of conductance values that shape the energy function to produce the regression line in <xref ref-type="fig" rid="F2">Figure 2A</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Evolution of the energy surface during training for the task in <xref ref-type="fig" rid="F2">Figure 2</xref>. The goal of training is to mold the energy surface such that the minima are associated with points defined on the blue curve. <bold>(A)</bold> In the beginning of training, the minima of the energy surface are associated with points defined on the red curve, which depend on the random initialization of the parameters of the model. <bold>(B)</bold> After 5 epochs, the shape of the energy surface has changed to create minima closer to the desired points. <bold>(C)</bold> After 10 epochs, the minima are over the desired points.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0003.tif"/>
</fig>
</sec>
</sec>
<sec id="s3">
<title>3. Exploratory framework</title>
<p>Our framework, EBANA, provides a comprehensive solution for designing and training neural networks in the analog domain. The architecture is comprised of two main parts: one for defining the network model, and the other for training in the analog domain. A high level view is shown in <xref ref-type="fig" rid="F4">Figure 4A</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>High level organization of EBANA: <bold>(A)</bold> Topologies such as the one shown in <xref ref-type="fig" rid="F5">Figure 5</xref> can be easily composed with a few lines of python code. Together with the library files (SPICE models, subcircuits, etc.), the EBANA framework dispatches the generated netlist in batches to the SPICE simulator. The EBANA framework then executes the EP algorithm on the result of the simulation, as well as keep track of the evolution of important parameters such as voltages and currents. These can later be studied to reveal important trends that can help in fine-tuning the learning process, as well as in the evaluation of the power consumption of the system. <bold>(B)</bold> The implementation of the <monospace>fit</monospace> method follows closely that of any traditional ML training loop. The difference here is that a SPICE simulator is used, and the parameters of the network are updated based on the data points at only two time steps.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0004.tif"/>
</fig>
<p>The interface to EBANA is Python, leveraging its rich ecosystem of libraries for data processing and data analysis. With the exception of circuit simulation, all operations, including netlist generation, gradient computation, and weight updates, are performed in Python.</p>
<p>We employ a SPICE simulator for realistic simulation of the circuit dynamics, with PySpice<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> serving as the bridge between Python and the simulator. PySpice supports two of the most widely used open-source SPICE simulators, Ngspice<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> and Xyce.<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> Ngspice is readily available on almost all popular operating systems and is the default simulator for EBANA. Xyce supports large-scale parallel computing platforms and is attractive for complex deep learning problems. The choice between the two simulators can be made by simply setting a global variable. It&#x00027;s worth noting that the vanilla build of Ngspice has a subcircuit node limit of 1,000, whereas Xyce does not have this limitation, though it requires compiling the source code.</p>
<sec>
<title>3.1. Network structure</title>
<p>The process of designing and training a model in our framework starts with defining the model. A typical structure of an analog neural network that can be trained with the EP framework is shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. It consists of an input layer, several hidden layers, and an output layer. It looks similar to a regular neural network that can be trained by the backpropagation algorithm except for two major differences. First, the layers can influence each other bidirectionally; i.e., the information is not processed step-wise from inputs to outputs but in a global way. Second, the output nodes are linked to current sources which serve to inject loss gradient signals during training.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Analog neural network in the EP framework. IL, Input Layer; DL, Dense Layer; NL, Non-linearity Layer; AL, Amplification Layer; CL, Current Layer.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0005.tif"/>
</fig>
</sec>
<sec>
<title>3.2. Creating a model</title>
<p>Layers, which are essentially subcircuits in analog circuits, form the core data structure of our framework. They are expressed as Python classes whose constructors create and initialize the pin connections, and whose <monospace>call</monospace> methods build the netlist. The process of creating a model is heavily inspired by Keras&#x00027;s functional API due to its flexibility at composing layers in a non-linear fashion. In this manner, the user is able to construct models with multiple inputs/outputs, share layers, combine layers, disable layers, and much more. An example of this is given in <xref ref-type="fig" rid="F6">Figure 6</xref>, which follows the structure shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. In the following subsections, we provide details on only those layers that have a unique interpretation in the analog domain.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Example of a model in the EBANA framework (Iris model).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0006.tif"/>
</fig>
<sec>
<title>3.2.1. Input layer</title>
<p>This defines the number of inputs to the circuit, which are typically represented by voltage sources. Generally, the input layer is defined according to the dataset. However, the input layer can be defined slightly differently in the analog domain.</p>
<list list-type="bullet">
<list-item><p>First, since the weights are implemented by resistors, and resistances cannot be negative, a second set of inputs with the opposite polarity of the voltages defined in the dataset is added to the input layer. This accounts for negative weights and effectively doubles the number of inputs and storage elements. This idea (a voltage layer that is the same but opposite in polarity to another one) is depicted with two green rectangles in <xref ref-type="fig" rid="F5">Figure 5</xref>. Note that this can avoided by setting the reference voltage to some value other than 0. In this way, all voltages less than the reference voltage are considered negative. However, this requires shifting all other voltage nodes in the circuit by the new reference value.</p></list-item>
<list-item><p>Second, in typical software-based frameworks, the bias, when used, is implicitly set to 1. However, since circuits can work with a wide range of voltages, setting the bias voltage to values other than 1 is necessary. Hence, we provide the option to independently set the bias voltage in each layer. Note that it is also possible to learn the bias voltages.</p></list-item>
</list>
</sec>
<sec>
<title>3.2.2. Weight layers</title>
<p>Two kinds of weight layers are defined in the framework: the <monospace>Dense</monospace> layer and the <monospace>LocallyConnected2D</monospace> layer. The <monospace>Dense</monospace> class is the implementation of the fully-connected layer, which means that each neuron of the layer is connected to every neuron of its preceding layer. This connectivity pattern can be easily implemented in crossbar arrays by simply connecting each row of the crossbar array to all columns of the previous layer&#x00027;s crossbar array.</p>
<p>The implementation of fully-connected layers is straightforward, but implementing convolutional layers in the analog domain is challenging as the filters are connected to a local region of the previous layer. To achieve this connectivity pattern, a more complex wiring is necessary in crossbar arrays. While it can still be done by shifting the inputs and temporarily storing them in buffers, the dot product operation becomes a non-constant time process (Boser et al., <xref ref-type="bibr" rid="B4">1991</xref>). To overcome this, we have implemented a variant of the convolutional layer called the <monospace>LocallyConnected2D</monospace> layer, where the dot product operation is between a section of the input matrix and the filter, with a different filter used for each subregion of the input, avoiding the weight sharing issue.</p>
<p>Another issue that is specific to ANNs is the weight initialization problem. Neural networks are very sensitive to the initial weights, and thus selecting an appropriate weight initialization strategy is critical to stabilize the training process. As a result, a lot of research has gone into finding optimal weight initialization strategies (Li et al., <xref ref-type="bibr" rid="B19">2020</xref>). However, since conductances cannot be negative, these methods cannot be applied directly. Hence, although we provide a default range, some experimentation is advised.</p>
</sec>
<sec>
<title>3.2.3. Non-linearity</title>
<p>The non-linearity layer is implemented with a diode in series with a voltage source. We provide two kinds of diodes: a regular diode and a MOS diode. They have the following options:</p>
<list list-type="bullet">
<list-item><p><bold>Diode orientation</bold> (<monospace>direction</monospace>): This specifies the orientation of the anode and cathode of the diode with respect to the voltage source.</p></list-item>
<list-item><p><bold>Bias voltage</bold> (<monospace>bias</monospace>): By choosing a bias value other than zero, we can change the voltage at which the diode saturates, and therefore alter the shape of the non-linearity.</p></list-item>
<list-item><p><bold>SPICE model</bold> (<monospace>model</monospace>): This is a text description that is passed to the SPICE simulator that defines the behavior of the diode.</p></list-item>
</list>
</sec>
<sec>
<title>3.2.4. Amplification layer</title>
<p>Unlike the dense layer used in libraries like Tensorflow where the output is the weighted sum of the inputs, the output of a resistive crossbar array is the weighted mean of the inputs. This has the effect of reducing the dynamic range of the signal. As a result, amplifiers are needed to restore the dynamic range of the signal as it propagates between the input and output layers.</p>
<p>The amplification layer is implemented with ideal behavioral sources. It boosts the voltages in the forward direction by a factor of <italic>A</italic> and the currents in the reverse direction by a factor of <inline-formula><mml:math id="M15"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>. Without the reverse current, the circuit reduces to a signal-flow model where the outputs no longer affect the inputs and the algorithm fails. Furthermore, the <inline-formula><mml:math id="M16"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula> factor is to ensure that the gain of the amplifier does affect the magnitude of the reverse current; i.e., a load connected to the output node of the amplifier has the same effect at the input node as a load connected directly to the input node.</p>
</sec>
<sec>
<title>3.2.5. Current source layer</title>
<p>This layer simply adds current sources at each output node to inject current into the network during the nudging phase. It is implemented with ideal current sources. During the forward phase, the current sources are set to 0.</p>
</sec>
</sec>
<sec>
<title>3.3. Training</title>
<p>The training process that is implemented by the fit method is illustrated in <xref ref-type="fig" rid="F4">Figure 4B</xref>.</p>
<sec>
<title>3.3.1. Weight gradient calculation</title>
<p>The current gradients are calculated according to the chosen loss function. For instance, in the case of the mean squared-error (MSE), the loss is given by <inline-formula><mml:math id="M17"><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, where <italic>k</italic> is the index of output node, &#x00176;<sub><italic>k</italic></sub> is the output of the node, and <italic>Y</italic><sub><italic>k</italic></sub> is the target value. Other loss functions such as the cross-entropy loss are also available.</p>
<p>The current that is injected into output node <italic>k</italic> is some multiple &#x003B2; of the derivative of the loss with respect to that node: i.e., <inline-formula><mml:math id="M18"><mml:mo>-</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msub><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula>. The negative sign is for gradient descent.</p>
<p>To address the constraint of non-negative weights, the number of output nodes are doubled. That is, the output node &#x00176;<sub><italic>k</italic></sub> is represented as the difference between two nodes: <inline-formula><mml:math id="M19"><mml:msub><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula>. The currents, <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M21"><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula>, that are to be injected into <inline-formula><mml:math id="M22"><mml:msubsup><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M23"><mml:msubsup><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula>, respectively, are:</p>
<disp-formula id="E7"><label>(6)</label><mml:math id="M24"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x00176;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec>
<title>3.3.2. Weight update</title>
<p>During the free phase, the current sources at the output nodes are set to 0. The inputs are applied and circuit is allowed to settle. We then collect the node voltage <inline-formula><mml:math id="M25"><mml:msup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and calculate the voltage drop <inline-formula><mml:math id="M26"><mml:mi>&#x00394;</mml:mi><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> across each conductance.</p>
<p>In the nudging phase, the current given by Equation (6) is injected into each output node. After the circuit settles, we collect the node voltages <inline-formula><mml:math id="M27"><mml:msup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and calculate the voltage drop <inline-formula><mml:math id="M28"><mml:mi>&#x00394;</mml:mi><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> across each conductance once again. We then update each conductance according to the equation below (Kendall et al., <xref ref-type="bibr" rid="B12">2020</xref>).</p>
<disp-formula id="E8"><label>(7)</label><mml:math id="M29"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B1; is the learning rate.</p>
<p>The weight update rule as defined by (7) is one of the options available in the <monospace>optimizer</monospace> class, and is defined under the name <monospace>SGD</monospace> (stochastic gradient descent). Other weight update mechanisms such as <monospace>SGDMomentum</monospace> (stochastic gradient descent with momentum) and <monospace>ADAM</monospace> are also available.</p>
<p>The momentum method can speed up training in regions of the solution space that are nearly flat by adding history to the conductance update equation based on the gradient encountered in the previous updates. The <monospace>ADAM</monospace> update rule takes this idea one step further by adapting a learning rate for each conductance, thereby dulling the influence of conductances with higher gradients and boosting those with smaller gradients.</p>
<p>During the early stages of training when the conductances are rapidly changing, the value of the update term <inline-formula><mml:math id="M30"><mml:mfrac><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:msubsup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> can sometimes be larger than <italic>G</italic><sub><italic>ij</italic></sub>. This will result in numerical instability as some conductances are now negative. To address this issue, all conductance values that fall below a certain threshold are clipped to that threshold.</p>
</sec>
</sec>
<sec>
<title>3.4. Parallelism</title>
<p>Training with EP requires performing the free phase and nudging phase, after which the conductances are updated. Both of these phases are done sequentially in SPICE, and are the critical path in the pipeline. While SPICE simulations are always going to be time consuming, the overall simulation time can be reduced by running many simulations in parallel. This is achieved by noting that all the samples in a mini-batch are independent and, therefore, could be simulated independently. As a result, the simulation time could in theory be limited only by the time it takes to simulate a single sample in a batch.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Evaluation</title>
<p>In this section, we evaluate our framework focusing on three aspects: correctness, extensibility, and performance.</p>
<sec>
<title>4.1. Illustrative example: Learning the iris dataset</title>
<p>As a first step in the evaluation, we built a model that could learn the Iris dataset.<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref> This example is a well-known problem of moderate complexity, containing 150 samples, with 4 input variables and 1 output variable that takes values 3 values.</p>
<p>Two preprocessing steps are needed before the data is ready for training. First, the input variables have to be normalized. Second, we associate with each unique output value a 3-bit one-hot encoded value. Hence, after the preprocessing step, the dataset has 4 inputs and 3 outputs.</p>
<p>We constructed a model with 1 input layer, 1 hidden layer, and 1 output layer, as shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. The input layer has 9 nodes; 4 for the regular inputs, 4 for the inverted set, and 1 for the bias. In the preprocessing step, the data was scaled to take real values in the range [&#x02212;0.5V, 0.5V] so that it is compatible with modern CMOS process voltages.</p>
<p>The hidden layer was implemented with 10 nodes and the output layer with 6 nodes. The weights were initialized from samples drawn randomly from the range <inline-formula><mml:math id="M31"><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:msup><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msup><mml:mtext>S</mml:mtext><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mn>8</mml:mn><mml:mo>&#x000B7;</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">in</mml:mo></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo class="qopname">out</mml:mo></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mtext>S</mml:mtext></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, where <italic>n</italic><sub>in</sub> is the size of the inputs and <italic>n</italic><sub>out</sub> is the number of nodes. The learning rate of both layers was set to 4&#x000B7;10<sup>&#x02212;4</sup>.</p>
<p>The dataset was split into two parts: 105 samples for training, and 45 samples to evaluate the model on new data while training. The optimizer was set to <monospace>ADAM</monospace> and the model was trained for 400 iterations. It achieved an accuracy of 100% on the test dataset. A plot of the loss and accuracy as a function of the number of the training epochs is shown in <bold>Figure 8A1</bold>. This validates the correctness of our framework.</p>
</sec>
<sec>
<title>4.2. Effect of model parameters on model performance</title>
<p>The non-linearity of activation functions used in deep learning models is crucial for the learning process. Without them, the model reduces to a linear composition of layers. In terms of the energy surface, this means that all the equilibrium points lie along a straight line, preventing the model to capture anything but linear responses. The addition of non-linearities creates a much richer energy surface that greatly enhances the model&#x00027;s capability to learn.</p>
<p>Our model incorporates non-linearity through the non-linear current-voltage (I-V) characteristics of a diode, as depicted in the blue curve in <xref ref-type="fig" rid="F7">Figure 7</xref>. This plot resembles the ReLu function commonly used in deep learning, but with two key differences:</p>
<list list-type="simple">
<list-item><p>(1) The plot here represents the current-voltage transfer function, not the voltage-voltage transfer function. When the voltages applied in the circuit are below the knee of the diode&#x00027;s I-V curve, the diode draws minimal current, resulting in a nearly linear circuit. Non-linearity only arises when the operating point is above the knee of the curve and the diode begins to draw current, which is the opposite behavior to the ReLu function.</p></list-item>
<list-item><p>(2) Because of loading effects in the analog domain, the voltage at the output node of the diode is a non-linear function of the entire circuit, not just the layers preceding it. This has two implications: (a) To know the voltage at the output node of the diode, the entire circuit has to be solved. (b) The shape of the non-linearity (or the voltage at which the diode saturates) is affected by the circuit parameters.</p></list-item>
</list>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Illustration of the operating point of the circuit as <italic>R</italic><sub>Thev</sub> is changed for both an ideal diode and a real diode. In the case of a real diode, the smaller the resistances in the dense layer, the higher is the voltage at which the diodes saturate or the lesser is the non-linearity effect for the same voltage range. Even though we need amplifiers with lower gains, but because the resistances are now smaller, a larger current flows through the circuit, causing the power consumption to go up.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0007.tif"/>
</fig>
<p>We can get some insight into the non-linear behavior of the diode by modeling the circuit around it with a Thevenin voltage (<italic>V</italic><sub>Thev</sub>) and a Thevenin resistance (<italic>R</italic><sub>Thev</sub>). In this case, the operating point (Q-point) of the circuit is the intersection of the I-V characteristic of the diode and that of the load line, given by the equation <inline-formula><mml:math id="M32"><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">Thev</mml:mtext></mml:mstyle></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">Thev</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula>, where <italic>I</italic><sub><italic>D</italic></sub> is the current through the diode, and <italic>V</italic><sub><italic>D</italic></sub> is the voltage across it. In the case of an ideal diode, <italic>V</italic><sub><italic>D</italic></sub> &#x0003D; const, indicating that the diode saturates at the same voltage irrespective of the value of <italic>R</italic><sub>Thev</sub> (similar to how a non-linear function behaves). However, real diodes offer a resistance in series with <italic>R</italic><sub>Thev</sub>, causing the diode to saturate at different values depending on the size of <italic>R</italic><sub>Thev</sub>. This complex relation affects the input dynamic range that can be used, the gain needed for the amplifiers, and the overall power consumption of the circuit. Here, we investigate the interplay between these factors on the model performance.</p>
<p><xref ref-type="fig" rid="F7">Figure 7</xref> shows how the Q-point of the circuit changes as <italic>R</italic><sub>Thev</sub> is changed. For a fixed <italic>V</italic><sub>Thev</sub>&#x0003E;<italic>V</italic><sub>TH</sub>, increasing the resistance reduces the voltage at which the diode saturates. This reduces the dynamic range of the signal, forcing the use of amplifiers with higher gains. While the actual behavior of the circuit is more complex, this insight equips us with a beacon to search the parameter space for better initial points. To test this hypothesis, we designed an experiment similar to the one in the previous section, but with the conductances multiplied by 10<sup>4</sup>. The distribution of the conductances (or resistances) in the first and second experiments is shown in <xref ref-type="fig" rid="F8">Figures 8C1</xref>, <xref ref-type="fig" rid="F8">C2</xref>, respectively. The values of beta (&#x003B2;) and the learning rates (&#x003B1;) have to be scaled by roughly the same factor. We obtained an accuracy of 93% after 200 epochs (<xref ref-type="fig" rid="F8">Figure 8A2</xref>), compared to 100% in the first (<xref ref-type="fig" rid="F8">Figure 8A1</xref>). The loss in accuracy can explained by the fact that the nonlinearity is weaker in the second experiment due to the smaller resistances in the dense layer. To support the claim that the non-linearity is weaker due to the smaller resistances, bias voltages were applied to allow the diodes to saturate earlier by 0.05V. With this modification, the accuracy improved to 100% (<xref ref-type="fig" rid="F8">Figure 8A3</xref>). The distribution of the voltages at input of the amplifer for the three cases is shown <xref ref-type="fig" rid="F8">Figures 8B1</xref>&#x02013;<xref ref-type="fig" rid="F8">B3</xref>. The weaker non-linearity in the second experiment resulted in a voltage distribution with a higher density around 0V, as opposed to the other two, where the density is highest around the saturation voltages. Finally, even though the adjustment made to the second experiment improved the accuracy, the power consumption of the circuit is roughly 10<sup>4</sup> more (<xref ref-type="fig" rid="F8">Figure 8D</xref>).</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>A plot of the loss/accuracy for experiments 1, 2, and 3 is shown in <bold>(A1&#x02013;A3)</bold>. <bold>(B1&#x02013;B3)</bold> Show the voltage distribution. <bold>(C1, C2)</bold> Show the resistance distribution of the first dense layer at the end of training. <bold>(D)</bold> Compares the power consumption of the circuit during training for experiments 1 and 3.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0008.tif"/>
</fig>
</sec>
<sec>
<title>4.3. Mixed-signal application</title>
<p>In this section, we explore the possibility of integrating a digital component to our analog model, in a so-called mixed-signal design. The idea is depicted in <xref ref-type="fig" rid="F9">Figure 9</xref>. Here, the inputs, for example a high-dimensional image, is introduced to the digital block, preprocesses the data and embeds the input in a lower-dimensional space before passing it to the analog block. The reason for doing this is the following: A convolutional layer reuses the same input data and a relatively small number of weights over many sequential operations. Meanwhile, a fully connected layer typically involves a much larger number of weights with no input data reuse. Furthermore, as convolutional operations tend to be computation-bound, while fully connected layers are bounded by the memory bandwidth, it is thus advantageous to implement convolutional layers in digital and fully-connected layers in analog.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>High level view of integrating a digital neural network with an analog neural network based on the EP algorithm. The digital block receives a high-dimensional input, downscales it, and feeds the result to the analog block, which does the bulk of the computation. The analog block &#x0201C;backpropagates&#x0201D; error signals to the digital block such that the parameters of both blocks are adjusted in the direction that reduces the energy of the system.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0009.tif"/>
</fig>
<p>The proposed mixed-signal implementation was evaluated on the Fashion-MNIST dataset. Fashion-MNIST is a popular machine learning benchmark task that improves on MNIST by introducing a harder problem, increasing the diversity of testing sets, and more accurately representing a modern computer vision task.</p>
<p>In our approach, Keras was used to play the role of the digital block while EBANA acted as the analog block. The process of training the mixed-signal system is as follows:</p>
<list list-type="order">
<list-item><p>We built a model with the parameters shown in <xref ref-type="table" rid="T1">Table 1</xref> and trained it for 20 epochs. We achieved an accuracy of 90% on the test dataset.</p></list-item>
<list-item><p>Using the trained model, we passed the entire Fashion-MNIST dataset through the layers of the model, and collected the result from the Flatten layer. This step represents embedding the input vector from a dimension of 784 into dimension of 150.</p></list-item>
<list-item><p>We then trained an analog model similar to the one in <xref ref-type="fig" rid="F6">Figure 6</xref> but with 100 nodes in the hidden layer. We stopped training after 1 epoch after achieving an accuracy of 85% on the test dataset using the cross-entropy loss.</p></list-item>
<list-item><p>Using the trained analog model, we then trained the inputs (i.e., gradient descent on the input) until we achieved an accuracy of 100%. This new set of inputs represents the inputs that the analog block expects from the digital block if the accuracy is to be improved.</p></list-item>
<list-item><p>Back in Keras, a new model was trained using the original dataset but with the objective of producing the trained inputs from the previous step. We then repeated steps 2 and 3. The accuracy improved by 3%.</p></list-item>
</list>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Keras model for Fashion-MNIST dataset.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Layer</bold></th>
<th valign="top" align="center"><bold>Parameters</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Conv2D</td>
<td valign="top" align="center">Filters = 8, kernel_size = 5, activation = &#x0201C;relu&#x0201D;</td>
</tr> <tr>
<td valign="top" align="left">MaxPooling2D</td>
<td valign="top" align="center">Pool_size = 2</td>
</tr> <tr>
<td valign="top" align="left">Conv2D</td>
<td valign="top" align="center">Filters = 8, kernel_size = 5, activation = &#x0201C;relu&#x0201D;</td>
</tr> <tr>
<td valign="top" align="left">MaxPooling2D</td>
<td valign="top" align="center">Pool_size = 2</td>
</tr> <tr>
<td valign="top" align="left">Flatten</td>
<td valign="top" align="center">&#x02013;</td>
</tr> <tr>
<td valign="top" align="left">Dropout</td>
<td valign="top" align="center"><italic>p</italic> = 0.15</td>
</tr> <tr>
<td valign="top" align="left">BatchNormalization</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Dense</td>
<td valign="top" align="center">Units = 512, activation = &#x0201C;relu&#x0201D;</td>
</tr> <tr>
<td valign="top" align="left">Dense</td>
<td valign="top" align="center">Units = 10, activation = &#x0201C;softmax&#x0201D;</td>
</tr></tbody>
</table>
</table-wrap>
<p>The result of this experiment shows that we can &#x0201C;backpropagate&#x0201D; through the analog layer, opening the possibility of a full-fledged mixed-signal implementation where the analog block benefits from the preprocessing opportunities available in the digital domain.</p>
</sec>
<sec>
<title>4.4. Extensibility</title>
<p>Even though it is possible to design fully functional ANNs with the EBANA framework, we provide sufficient system encapsulation and model extensibility to meet the individual requirements of incorporating new models and extending the functionality of the framework, beyond Energy-Based Models. This includes adding new layers, defining new loss functions, changing the training loop, and much more. The only constraint in defining new components is that they must be constructed of linear and non-linear dipoles to ensure stability, as stated by Johnson (<xref ref-type="bibr" rid="B11">2010</xref>).</p>
<p>To demonstrate the extensibility capabilities of our framework, we consider the example shown in <xref ref-type="fig" rid="F10">Figure 10</xref>. Here, we show that by subclassing the <monospace>SubCircuit</monospace> class, and with a just a few lines of code, a new kind of non-linearity can be defined using MOSFET transistors and voltages sources.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Example of defining a new kind of layer.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fncom-17-1114651-g0010.tif"/>
</fig>
<p>Our library is modularized to easily plug in or swap out components. For instance, to investigate the circuit behavior with this new kind of non-linearity, all we have to do is replace the <monospace>DiodeLayer</monospace> in <xref ref-type="fig" rid="F6">Figure 6</xref> with <monospace>MOSDiode</monospace> layer in <xref ref-type="fig" rid="F10">Figure 10</xref> and rerun the simulation. Moreover, while the circuit in <xref ref-type="fig" rid="F5">Figure 5</xref> is setup for training, it can be easily converted to one that measures the compatibility of an input-output pair by simply swapping the current layer with a voltage layer that represents the output.</p>
</sec>
<sec>
<title>4.5. Performance</title>
<p>To evaluate the performance of the simulator, two experiments were conducted. The first experiment was conducted on the Iris model with the goal of measuring the speed-up gained through parallelism. We fixed the number of samples in the mini-batch and ran the simulation for the same number of epochs on a single thread, followed by two, and then four. While the speed-up factor was indeed almost doubled when the thread count was increased from 1 to 2, doubling the thread count further resulted in just 1.5x increase in speed. Due to the resulting circuit being relatively simple, and the small batch size, the overhead of starting new processes for every batch is a non-trivial percentage of the overall simulation time. However, this would not be a problem for experiments with reasonably large datasets.</p>
<p>For the second experiment, we wanted to measure the simulation performance as a function of problem complexity. To this end, we considered 3 datasets; xor, iris, and wine.<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref> To obtain an estimate for the complexity of the circuit, we counted the number of nodes only in those models that achieved an accuracy greater than 95% on the test dataset. This is due to the fact that the bias-variance trade-off is a property of the model size.</p>
<p>The circuits were simulated and the average simulation time in seconds is recorded in <xref ref-type="table" rid="T2">Table 2</xref>. For a measure of the intrinsic speed of the simulator, a column with a calculated property <italic>K</italic> is added. The property is calculated according to Equation (8) and takes into account the simulation time <italic>T</italic>, the number of allocated threads <italic>P</italic>, the number of nodes in the generated circuit <italic>N</italic>, the number of epochs <italic>E</italic>, and the size of the training dataset <italic>D</italic>. We can see from <xref ref-type="table" rid="T2">Table 2</xref> that <italic>K</italic> is about the same for the two examples whose simulation time is not dominated by the overhead of starting the SPICE simulator. We expect this to hold true for larger datasets.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>DC simulation time as a function of circuit size and training dataset size.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Datasets</bold></th>
<th valign="top" align="center"><bold>I/O units</bold></th>
<th valign="top" align="center"><bold>Circuit nodes (<italic>N</italic>)</bold></th>
<th valign="top" align="center"><bold>Dataset size (<italic>D</italic>)</bold></th>
<th valign="top" align="center"><bold>Epochs (<italic>E</italic>)</bold></th>
<th valign="top" align="center"><bold>Time (<italic>T</italic>)</bold></th>
<th valign="top" align="center"><bold>Threads (<italic>P</italic>)</bold></th>
<th valign="top" align="center"><bold>K(10<sup>&#x02212;4</sup>)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Xor</td>
<td valign="top" align="center">5/2</td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">85</td>
<td valign="top" align="center">14 s</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">25.74</td>
</tr> <tr>
<td valign="top" align="left">Iris</td>
<td valign="top" align="center">9/6</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">105</td>
<td valign="top" align="center">155</td>
<td valign="top" align="center">182 s</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">7.99</td>
</tr> <tr>
<td valign="top" align="left">Wine</td>
<td valign="top" align="center">25/4</td>
<td valign="top" align="center">111</td>
<td valign="top" align="center">5,000</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">217 s</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">7.92</td>
</tr></tbody>
</table>
</table-wrap>
<p>Training for all the experiments was carried out in a laptop with an Intel i7-6700HQ CPU and 32 GB of RAM.</p>
<disp-formula id="E9"><label>(7)</label><mml:math id="M33"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mi>E</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusion</title>
<p>In this paper, we presented an open-source unified, modularized, and extensible framework called EBANA, that can be used to easily build, train, and validate analog neural networks. By using Python as the interface language, with a syntax similar to Keras, we&#x00027;re able to hide the complexity of the underlying analog simulations and offer researchers in neuroscience and machine learning a conceptual and practical framework to experiment with and explore the various tradeoffs that exist in the design space.</p>
<p>EBANA does not only include the building blocks required for the design of EBMs (i.e., IL, DL, NL, and CL layers); it also maintains sufficient modularity and extensibility to easily incorporate new concepts, electrical and technological models. For example, adding a new non-linear layer requires less than 15 lines of code. New learning concepts beyond EBM can also be easily implemented, as illustrated with the co-training of an EBM with a conventional CNN that uses the backpropagation algorithm. Finally, EBANA has a graph-based data structure that facilitates the composition of networks with a great deal of flexibility. All of these features enable the implementation of a broad range of supervised machine learning tasks in EBANA, and not just those with linear topologies.</p>
<p>While EBANA is already fully functional and can reduce by orders of magnitude the effort required to analyze new analog neural networks, more features and functionalities will be added in future iterations, including a suite of hardware blocks in nanometric technologies for proper evaluation of the energy consumption of the system. At the moment, the framework supports only the open-source simulators Ngspice and Xyce, which introduce some artificial limitations: The default distribution of Ngspice places a limit of 1,000 nodes on the size of subcircuits. This is not an issue for Xyce, but it is not always as readily available. We plan to add support for commercially available simulators such as Specter and Hspice. We also plan on improving the training speed by optimizing the training loop and avoid generating a new netlist for every simulation. This can result in massive speedups, both in Python (where the netlist is generated) and the SPICE simulator which builds a conductance matrix every time it is presented with a new netlist. Finally, we plan to add methods for distributed training over multiple machines.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found at: MNIST-Fashion dataset: <ext-link ext-link-type="uri" xlink:href="https://github.com/zalandoresearch/fashion-mnist">https://github.com/zalandoresearch/fashion-mnist</ext-link>, Iris dataset: <ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/iris">https://archive.ics.uci.edu/ml/datasets/iris</ext-link>, Wine dataset: <ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/wine">https://archive.ics.uci.edu/ml/datasets/wine</ext-link>.</p>
</sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p>
</sec>
</body>
<back>
<ack>
<p>The content of this manuscript has been presented in part at the 2022 edition of the IEEE SOCC Conference in Northern Ireland (Watfa et al., <xref ref-type="bibr" rid="B29">2022</xref>).</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="https://keras.io/guides/functional_api/">https://keras.io/guides/functional_api/</ext-link></p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://pypi.org/project/PySpice/">https://pypi.org/project/PySpice/</ext-link></p></fn>
<fn id="fn0003"><p><sup>3</sup><ext-link ext-link-type="uri" xlink:href="http://ngspice.sourceforge.net/">http://ngspice.sourceforge.net/</ext-link></p></fn>
<fn id="fn0004"><p><sup>4</sup><ext-link ext-link-type="uri" xlink:href="https://xyce.sandia.gov/">https://xyce.sandia.gov/</ext-link></p></fn>
<fn id="fn0005"><p><sup>5</sup><ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/iris">https://archive.ics.uci.edu/ml/datasets/iris</ext-link></p></fn>
<fn id="fn0006"><p><sup>6</sup><ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/wine">https://archive.ics.uci.edu/ml/datasets/wine</ext-link></p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agarwal</surname> <given-names>S.</given-names></name> <name><surname>Garland</surname> <given-names>D.</given-names></name> <name><surname>Niroula</surname> <given-names>J.</given-names></name> <name><surname>Jacobs-Gedrim</surname> <given-names>R. B.</given-names></name> <name><surname>Hsia</surname> <given-names>A.</given-names></name> <name><surname>Van Heukelom</surname> <given-names>M. S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Using floating-gate memory to train ideal accuracy neural networks</article-title>. <source>IEEE J. Explor. Solid State Comput. Devices Circuits</source> <volume>5</volume>, <fpage>52</fpage>&#x02013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1109/JXCDC.2019.2902409</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bankman</surname> <given-names>D.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Moons</surname> <given-names>B.</given-names></name> <name><surname>Verhelst</surname> <given-names>M.</given-names></name> <name><surname>Murmann</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>An always-on 3.8 &#x003BC; j/86% cifar-10 mixed-signal binary cnn processor with all memory on chip in 28-nm cmos</article-title>. <source>IEEE J. Solid State Circuits</source> <volume>54</volume>, <fpage>158</fpage>&#x02013;<lpage>172</lpage>. <pub-id pub-id-type="doi">10.1109/JSSC.2018.2869150</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bianco</surname> <given-names>S.</given-names></name> <name><surname>Cadene</surname> <given-names>R.</given-names></name> <name><surname>Celona</surname> <given-names>L.</given-names></name> <name><surname>Napoletano</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Benchmark analysis of representative deep neural network architectures</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>64270</fpage>&#x02013;<lpage>64277</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2018.2877890</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boser</surname> <given-names>B.</given-names></name> <name><surname>Sackinger</surname> <given-names>E.</given-names></name> <name><surname>Bromley</surname> <given-names>J.</given-names></name> <name><surname>Le Cun</surname> <given-names>Y.</given-names></name> <name><surname>Jackel</surname> <given-names>L.</given-names></name></person-group> (<year>1991</year>). <article-title>An analog neural network processor with programmable topology</article-title>. <source>IEEE J. Solid State Circuits</source> <volume>26</volume>, <fpage>2017</fpage>&#x02013;<lpage>2025</lpage>. <pub-id pub-id-type="doi">10.1109/4.104196</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Foroushani</surname> <given-names>A. N.</given-names></name> <name><surname>Assaf</surname> <given-names>H.</given-names></name> <name><surname>Noshahr</surname> <given-names>F. H.</given-names></name> <name><surname>Savaria</surname> <given-names>Y.</given-names></name> <name><surname>Sawan</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Analog circuits to accelerate the relaxation process in the equilibrium propagation algorithm,</article-title> in <source>2020 IEEE International Symposium on Circuits and Systems (ISCAS)</source> (<publisher-loc>Seville</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS45731.2020.9181250</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gokmen</surname> <given-names>T.</given-names></name> <name><surname>Rasch</surname> <given-names>M. J.</given-names></name> <name><surname>Haensch</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>Training lstm networks with resistive cross-point devices</article-title>. <source>Front. Neurosci</source>. <volume>12</volume>, <fpage>745</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2018.00745</pub-id><pub-id pub-id-type="pmid">30405334</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gokmen</surname> <given-names>T.</given-names></name> <name><surname>Vlasov</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Acceleration of deep neural network training with resistive cross-point devices: design considerations</article-title>. <source>Front. Neurosci</source>. <volume>10</volume>, <fpage>333</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2016.00333</pub-id><pub-id pub-id-type="pmid">27493624</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2012</year>). <source>A Practical Guide to Training Restricted Boltzmann Machines</source>. <publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>, <fpage>599</fpage>&#x02013;<lpage>619</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-35289-8_32</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>M.</given-names></name> <name><surname>Strachan</surname> <given-names>J. P.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Grafals</surname> <given-names>E. M.</given-names></name> <name><surname>Davila</surname> <given-names>N.</given-names></name> <name><surname>Graves</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication,</article-title> in <source>Proceedings of the 53rd Annual Design Automation Conference</source> (<publisher-loc>Austin, TX</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1145/2897937.2898010</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>Z.</given-names></name> <name><surname>Gross</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Towards efficient on-chip learning using equilibrium propagation,</article-title> in <source>2020 IEEE International Symposium on Circuits and Systems (ISCAS)</source> (<publisher-loc>Seville</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS45731.2020.9180548</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>W.</given-names></name></person-group> (<year>2010</year>). <source>Nonlinear Electrical Networks</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://sites.math.washington.edu/reu/papers/2017/willjohnson/directed-networks.pdf">https://sites.math.washington.edu/reu/papers/2017/willjohnson/directed-networks.pdf</ext-link> (accessed January 31, 2023).</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kendall</surname> <given-names>J.</given-names></name> <name><surname>Pantone</surname> <given-names>R.</given-names></name> <name><surname>Manickavasagam</surname> <given-names>K.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Scellier</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>Training end-to-end analog neural networks with equilibrium propagation</article-title>. <source>arXiv Preprint</source> arXiv:2006.01981. <pub-id pub-id-type="doi">10.48550/ARXIV.2006.01981</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Gokmen</surname> <given-names>T.</given-names></name> <name><surname>Lee</surname> <given-names>H.-M.</given-names></name> <name><surname>Haensch</surname> <given-names>W. E.</given-names></name></person-group> (<year>2017</year>). <article-title>Analog CMOS-based resistive processing unit for deep neural network training,</article-title> in <source>2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>422</fpage>&#x02013;<lpage>425</lpage>. <pub-id pub-id-type="doi">10.1109/MWSCAS.2017.8052950</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiraz</surname> <given-names>F. Z.</given-names></name> <name><surname>Pham</surname> <given-names>D.-K. G.</given-names></name> <name><surname>Desgreys</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>Impacts of feedback current value and learning rate on equilibrium propagation performance,</article-title> in <source>2022 20th IEEE Interregional NEWCAS Conference (NEWCAS)</source>, <fpage>519</fpage>&#x02013;<lpage>523</lpage>. <pub-id pub-id-type="doi">10.1109/NEWCAS52662.2022.9842178</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Krestinskaya</surname> <given-names>O.</given-names></name> <name><surname>Salama</surname> <given-names>K. N.</given-names></name> <name><surname>James</surname> <given-names>A. P.</given-names></name></person-group> (<year>2018</year>). <article-title>Analog backpropagation learning circuits for memristive crossbar neural networks,</article-title> in <source>2018 IEEE International Symposium on Circuits and Systems (ISCAS)</source> (<publisher-loc>Florence</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2018.8351344</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laborieux</surname> <given-names>A.</given-names></name> <name><surname>Ernoult</surname> <given-names>M.</given-names></name> <name><surname>Scellier</surname> <given-names>B.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Grollier</surname> <given-names>J.</given-names></name> <name><surname>Querlioz</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Scaling equilibrium propagation to deep ConvNets by drastically reducing its gradient estimator bias</article-title>. <source>arXiv Preprint</source> arXiv:2006.03824 [cs]. <pub-id pub-id-type="doi">10.3389/fnins.2021.633674</pub-id><pub-id pub-id-type="pmid">33679315</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Chopra</surname> <given-names>S.</given-names></name> <name><surname>Hadsell</surname> <given-names>R.</given-names></name> <name><surname>Ranzato</surname> <given-names>M.</given-names></name> <name><surname>Huang</surname> <given-names>F. J.</given-names></name></person-group> (<year>2006</year>). <article-title>A tutorial on energy-based learning,</article-title> in <source>Predicting Structured Data</source>, eds <person-group person-group-type="editor"><name><surname>Bakir</surname> <given-names>G.</given-names></name> <name><surname>Hofman</surname> <given-names>T.</given-names></name> <name><surname>Scholkopt</surname> <given-names>B.</given-names></name> <name><surname>Smola</surname> <given-names>A.</given-names></name> <name><surname>Taskar</surname> <given-names>B.</given-names></name></person-group> (<publisher-name>MIT Press</publisher-name>).</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Gu</surname> <given-names>P.</given-names></name> <name><surname>Shan</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>H.</given-names></name></person-group> (<year>2015</year>). <article-title>Rram-based analog approximate computing</article-title>. <source>IEEE Trans. Comput. Aid. Design Integr. Circuits Syst</source>. <volume>34</volume>, <fpage>1905</fpage>&#x02013;<lpage>1917</lpage>. <pub-id pub-id-type="doi">10.1109/TCAD.2015.2445741</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Kr&#x0010D;ek</surname> <given-names>M.</given-names></name> <name><surname>Perin</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>A comparison of weight initializers in deep learning-based side-channel analysis,</article-title> in <source>Applied Cryptography and Network Security Workshops, Vol. 12418</source>, eds <person-group person-group-type="editor"><name><surname>Zhou</surname> <given-names>J.</given-names></name> <name><surname>Conti</surname> <given-names>M.</given-names></name> <name><surname>Ahmed</surname> <given-names>C. M.</given-names></name> <name><surname>Au</surname> <given-names>M. H.</given-names></name> <name><surname>Batina</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Losiouk</surname> <given-names>E.</given-names></name> <name><surname>Luo</surname> <given-names>B.</given-names></name> <name><surname>Majumdar</surname> <given-names>S.</given-names></name> <name><surname>Meng</surname> <given-names>W.</given-names></name> <name><surname>Ochoa</surname> <given-names>M.</given-names></name> <name><surname>Picek</surname> <given-names>S.</given-names></name> <name><surname>Portokalidis</surname> <given-names>G.</given-names></name> <name><surname>Wang</surname></name> <name><surname>Zhang</surname> <given-names>K.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>126</fpage>&#x02013;<lpage>143</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-61638-0_8</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin</surname> <given-names>E.</given-names></name> <name><surname>Ernoult</surname> <given-names>M.</given-names></name> <name><surname>Laydevant</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Querlioz</surname> <given-names>D.</given-names></name> <name><surname>Petrisor</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Eqspike: spike-driven equilibrium propagation for neuromorphic implementations</article-title>. <source>iScience</source> <volume>24</volume>, <fpage>102222</fpage>. <pub-id pub-id-type="doi">10.1016/j.isci.2021.102222</pub-id><pub-id pub-id-type="pmid">33748709</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Murmann</surname> <given-names>B.</given-names></name> <name><surname>Bankman</surname> <given-names>D.</given-names></name> <name><surname>Chai</surname> <given-names>E.</given-names></name> <name><surname>Miyashita</surname> <given-names>D.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name></person-group> (<year>2015</year>). <article-title>Mixed-signal circuits for embedded machine-learning applications,</article-title> in <source>2015 49th Asilomar Conference on Signals, Systems and Computers</source> (<publisher-loc>Pacific Grove, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1341</fpage>&#x02013;<lpage>1345</lpage>. <pub-id pub-id-type="doi">10.1109/ACSSC.2015.7421361</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>Y. J.</given-names></name> <name><surname>Kwon</surname> <given-names>H. T.</given-names></name> <name><surname>Kim</surname> <given-names>B.</given-names></name> <name><surname>Lee</surname> <given-names>W. J.</given-names></name> <name><surname>Wee</surname> <given-names>D. H.</given-names></name> <name><surname>Choi</surname> <given-names>H.-S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>3-d stacked synapse array based on charge-trap flash memory for implementation of deep neural networks</article-title>. <source>IEEE Trans. Electron Devices</source> <volume>66</volume>, <fpage>420</fpage>&#x02013;<lpage>427</lpage>. <pub-id pub-id-type="doi">10.1109/TED.2018.2881972</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scellier</surname> <given-names>B.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Equilibrium propagation: bridging the gap between energy-based models and backpropagation</article-title>. <source>Front. Comput. Neurosci</source>. <volume>11</volume>:<fpage>24</fpage>. <pub-id pub-id-type="doi">10.3389/fncom.2017.00024</pub-id><pub-id pub-id-type="pmid">28522969</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sevilla</surname> <given-names>J.</given-names></name> <name><surname>Heim</surname> <given-names>L.</given-names></name> <name><surname>Ho</surname> <given-names>A.</given-names></name> <name><surname>Besiroglu</surname> <given-names>T.</given-names></name> <name><surname>Hobbhahn</surname> <given-names>M.</given-names></name> <name><surname>Villalobos</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>Compute trends across three eras of machine learning,</article-title> in <source>2022 International Joint Conference on Neural Networks (IJCNN)</source>, <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN55064.2022.9891914</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shafiee</surname> <given-names>A.</given-names></name> <name><surname>Nag</surname> <given-names>A.</given-names></name> <name><surname>Muralimanohar</surname> <given-names>N.</given-names></name> <name><surname>Balasubramonian</surname> <given-names>R.</given-names></name> <name><surname>Strachan</surname> <given-names>J. P.</given-names></name> <name><surname>Hu</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Isaac: a convolutional neural network accelerator with <italic>in-situ</italic> analog arithmetic in crossbars,</article-title> in <source>2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)</source>, <fpage>14</fpage>&#x02013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1109/ISCA.2016.12</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. <source>arXiv Preprint</source> arXiv:1409.1556. <pub-id pub-id-type="doi">10.48550/ARXIV.1409.1556</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sze</surname> <given-names>V.</given-names></name> <name><surname>Chen</surname> <given-names>Y.-H.</given-names></name> <name><surname>Yang</surname> <given-names>T.-J.</given-names></name> <name><surname>Emer</surname> <given-names>J. S.</given-names></name></person-group> (<year>2017</year>). <article-title>Efficient processing of deep neural networks: a tutorial and survey</article-title>. <source>Proc. IEEE</source> <volume>105</volume>, <fpage>2295</fpage>&#x02013;<lpage>2329</lpage>. <pub-id pub-id-type="doi">10.1109/JPROC.2017.2761740</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Han</surname> <given-names>Y.</given-names></name> <name><surname>Leung</surname> <given-names>V. C. M.</given-names></name> <name><surname>Niyato</surname> <given-names>D.</given-names></name> <name><surname>Yan</surname> <given-names>X.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>Convergence of edge computing and deep learning: a comprehensive survey</article-title>. <source>IEEE Commun. Surv. Tutor</source>. <volume>22</volume>, <fpage>869</fpage>&#x02013;<lpage>904</lpage>. <pub-id pub-id-type="doi">10.1109/COMST.2020.2970550</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watfa</surname> <given-names>M.</given-names></name> <name><surname>Garcia-Ortiz</surname> <given-names>A.</given-names></name> <name><surname>Sassatelli</surname> <given-names>G.</given-names></name></person-group> (<year>2022</year>). <article-title>Energy-based analog neural network framework,</article-title> in <source>2022 IEEE 35th International System-on-Chip Conference (SOCC)</source>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/SOCC56010.2022.9908086</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiao</surname> <given-names>T. P.</given-names></name> <name><surname>Bennett</surname> <given-names>C. H.</given-names></name> <name><surname>Feinberg</surname> <given-names>B.</given-names></name> <name><surname>Agarwal</surname> <given-names>S.</given-names></name> <name><surname>Marinella</surname> <given-names>M. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Analog architectures for neural network acceleration based on non-volatile memory</article-title>. <source>Appl. Phys. Rev</source>. <volume>7</volume>, <fpage>031301</fpage>. <pub-id pub-id-type="doi">10.1063/1.5143815</pub-id><pub-id pub-id-type="pmid">31553955</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zoppo</surname> <given-names>G.</given-names></name> <name><surname>Marrone</surname> <given-names>F.</given-names></name> <name><surname>Corinto</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>Equilibrium propagation for memristor-based recurrent neural networks</article-title>. <source>Front. Neurosci</source>. <volume>14</volume>:<fpage>240</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2020.00240</pub-id><pub-id pub-id-type="pmid">32265641</pub-id></citation></ref>
</ref-list>
</back>
</article>