<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2022.850932</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Heterogeneous Ensemble-Based Spike-Driven Few-Shot Online Learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Yang</surname> <given-names>Shuangming</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/821971/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Linares-Barranco</surname> <given-names>Bernabe</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Chen</surname> <given-names>Badong</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/594500/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Electrical and Information Engineering, Tianjin University</institution>, <addr-line>Tianjin</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Microelectronics Institute of Seville</institution>, <addr-line>Seville</addr-line>, <country>Spain</country></aff>
<aff id="aff3"><sup>3</sup><institution>Institute of Artificial Intelligence and Robotics, Xi&#x2019;an Jiaotong University</institution>, <addr-line>Xi&#x2019;an</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Guoqi Li, Tsinghua University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Priyadarshini Panda, Yale University, United States; Lei Deng, Tsinghua University, China</p></fn>
<corresp id="c001">&#x002A;Correspondence: Shuangming Yang, <email>yangshuangming@tju.edu.cn</email></corresp>
<corresp id="c002">Badong Chen, <email>chenbd@mail.xjtu.edu.cn</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>05</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>16</volume>
<elocation-id>850932</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Yang, Linares-Barranco and Chen.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Yang, Linares-Barranco and Chen</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Spiking neural networks (SNNs) are regarded as a promising candidate to deal with the major challenges of current machine learning techniques, including the high energy consumption induced by deep neural networks. However, there is still a great gap between SNNs and the few-shot learning performance of artificial neural networks. Importantly, existing spike-based few-shot learning models do not target robust learning based on spatiotemporal dynamics and superior machine learning theory. In this paper, we propose a novel spike-based framework with the entropy theory, namely, heterogeneous ensemble-based spike-driven few-shot online learning (HESFOL). The proposed HESFOL model uses the entropy theory to establish the gradient-based few-shot learning scheme in a recurrent SNN architecture. We examine the performance of the HESFOL model based on the few-shot classification tasks using spiking patterns and the Omniglot data set, as well as the few-shot motor control task using an end-effector. Experimental results show that the proposed HESFOL scheme can effectively improve the accuracy and robustness of spike-driven few-shot learning performance. More importantly, the proposed HESFOL model emphasizes the application of modern entropy-based machine learning methods in state-of-the-art spike-driven learning algorithms. Therefore, our study provides new perspectives for further integration of advanced entropy theory in machine learning to improve the learning performance of SNNs, which could be of great merit to applied developments with spike-based neuromorphic systems.</p>
</abstract>
<kwd-group>
<kwd>spiking neural network</kwd>
<kwd>few-shot learning</kwd>
<kwd>entropy-based learning</kwd>
<kwd>spike-driven learning</kwd>
<kwd>brain-inspired intelligence</kwd>
</kwd-group>
<contract-num rid="cn001">62006170</contract-num>
<contract-num rid="cn001">62088102</contract-num>
<contract-num rid="cn001">U21A20485</contract-num>
<contract-num rid="cn002">2020M680885</contract-num>
<contract-num rid="cn002">2021T140510</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<contract-sponsor id="cn002">China Postdoctoral Science Foundation<named-content content-type="fundref-id">10.13039/501100002858</named-content></contract-sponsor>
<counts>
<fig-count count="12"/>
<table-count count="3"/>
<equation-count count="32"/>
<ref-count count="47"/>
<page-count count="15"/>
<word-count count="11420"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>The human brain has the advantages of imagination, lifelong learning, and learning based on the interaction with the environment. Especially, the human brain can learn a new concept from a small number of examples and has the strong generalization capability, which outperforms current machine intelligence (<xref ref-type="bibr" rid="B11">Goelet et al., 1986</xref>). Some extraordinary capabilities exist in the human brain. For example, when giving a reference example, the brain can be easily generalized to new examples or create a new example. It is necessary and meaningful to develop a novel brain-inspired framework to break the current bottleneck of machine intelligence based on brain processing and learning mechanism.</p>
<p>A spiking neural network (SNN) is the third generation of an artificial neural network (ANN), which is based on the underlying mechanism of the biological brain (<xref ref-type="bibr" rid="B8">Falez et al., 2019</xref>; <xref ref-type="bibr" rid="B24">Paredes-Vall&#x00E9;s et al., 2019</xref>). It has the advantages of rich spatiotemporal dynamical characteristics, large diversities of the neural encoding mechanism, and low-power event-based computation (<xref ref-type="bibr" rid="B41">Yang et al., 2021a</xref>,<xref ref-type="bibr" rid="B42">b</xref>). It is critical and meaningful for artificial general intelligence (AGI), and is essential for high-efficiency edge computing devices with low power consumption and real-time processing capability (<xref ref-type="bibr" rid="B25">Pei et al., 2019</xref>).</p>
<p>In recent years, along with the development of computing devices, deep learning with a large amount of labeled data obtains successful and significant achievements in the fields of computer vision and natural language processing (<xref ref-type="bibr" rid="B32">Strack, 2019</xref>; <xref ref-type="bibr" rid="B47">Zou et al., 2019</xref>; <xref ref-type="bibr" rid="B34">Tolkach et al., 2020</xref>). The capability of deep learning has been stronger than that of human in some certain fields. For example, the classification accuracy of ResNet is significantly higher than that of human on the ImageNet data set, and AlphaGo performs better than the human champion at playing chess (<xref ref-type="bibr" rid="B30">Singh et al., 2017</xref>; <xref ref-type="bibr" rid="B19">Lu et al., 2018</xref>). However, current machine learning algorithms depend highly on a large amount of labeled data. In some practical applications, the cost of data labeling is expensive. For example, it requires experienced doctors to spend a large amount of time to label the images in detail. Therefore, it is vital to investigate the few-shot learning method, which has higher generalization capability based on a small limited amount of labeled data. Using machine learning models, such as support vector machine (SVM) or convolution neural networks (CNNs), it is difficult to realize the few-shot learning capability because the lack of enough training data will cause the overfitting problem. SNN-based few-shot learning is a novel perspective for few-shot learning tasks, which is a promising approach to solve this kind of problem.</p>
<p>The learning capability of current SNN models still suffers from their robust adaptation to the environment with non-Gaussian noise, which severely limits the application of spike-driven models in real-world problems. Correntropy is a kind of non-linear local similarity measure in kernel space, which is closely related to the cross-information potential (CIP) in information-theoretic learning (ITL) (<xref ref-type="bibr" rid="B2">Chen et al., 2018</xref>). The main advantages of correntropy include two aspects. The first aspect is that it has the local property of providing an effective mechanism to weaken the influence of outliers and non-Gaussian noise. Another major advantage is that it introduces a novel measure method in sample space. If the samples are close to each other, the measurement is similar to the L2 norm. If the samples separate from each other, the measurement is similar to the L1 norm. When the samples are far away from each other, the measurement finally approaches the L0 norm. Due to its robustness to outliers and non-Gaussian noise, the correntropy theory has been widely applied in various fields, including signal processing and machine learning (<xref ref-type="bibr" rid="B6">Du et al., 2018</xref>; Luo et al. (2018), and <xref ref-type="bibr" rid="B3">Chen et al., 2019a</xref>).</p>
<p>In recent years, some novel entropy-based learning principles have been proposed for robust learning, such as the maximum mixture correntropy criterion (MMCC) (<xref ref-type="bibr" rid="B37">Wang et al., 2021</xref>). Previous studies have revealed that MMCC is a better selection than current optimization criteria, including the minimum mean square error (MMSE) criterion (<xref ref-type="bibr" rid="B4">Chen et al., 2019b</xref>). The MMSE criterion depends on the assumption that the data are noise-free or obey the Gaussian distribution. Once the assumption is not satisfied, such as the data disturbed by heavy-tailed noise, the performance of current machine learning algorithms may be severely reduced. Therefore, this work proposes to adopt the MMCC as the optimization criterion to rederive a novel spike-driven few-shot online learning (SFOL) model, resulting in a heterogeneous ensemble-based SFOL (HESFOL). The proposed model can perform robust few-shot online learning for sequential data. The paper is organized as follows: Section &#x201C;Introduction&#x201D; describes the preliminaries of this study, including SNN and entropy-based learning theory. The proposed HESFOL model is introduced and explained in Section &#x201C;Materials and Methods.&#x201D; Section &#x201C;Results&#x201D; presents the experimental results. And finally, the discussions and conclusions are proposed in Sections &#x201C;Discussion&#x201D; and &#x201C;Conclusion,&#x201D; respectively.</p>
</sec>
<sec id="S2">
<title>Background</title>
<p>This study focuses on the two major broad areas of research, which are few-shot learning based on meta-learning method, and the entropy-based methods for machine learning. In this section, the related work in these two fields are covered and summarized.</p>
<sec id="S2.SS1">
<title>Few-Shot Learning Model Based on a Meta-Learning Framework</title>
<p>Few-shot learning based on the meta-learning method majorly uses the idea of learning-to-learn to realize the ambition. For example, meta-learning with augmented memory neural networks can solve the problem of how to quickly encode the vital information of new tasks by introducing an additional memory module (<xref ref-type="bibr" rid="B29">Santoro et al., 2016</xref>; <xref ref-type="bibr" rid="B38">Wang Y. et al., 2020</xref>). Model-agnostic meta-learning aims to learn a good initialization for the model, so that it can achieve good classification performance with only one or several gradient updates when facing a new task. Specifically, MAML introduces a new gradient, i.e., the two-order gradient, to find the most sensitive direction of the gradient change for fast learning of the new task. <xref ref-type="bibr" rid="B10">Gidaris et al. (2019)</xref> simultaneously identified the training category and the new category, and presented a dynamic network to generate the corresponding classification weight for the new category by designing a weight generator (meta-learner) based on the attention mechanism. <xref ref-type="bibr" rid="B33">Sun et al. (2019)</xref> presented a meta-transfer learning method, which pre-trains a feature extractor on the auxiliary data set and then fine tunes a learner based on a small amount of training data from the new tasks. Although there are a series of previous works to solve the few-shot learning problem using the meta-learner, there is no effective work based on SNN model to realize the few-shot learning performance by combining the brain mechanism with the machine learning theory, such as entropy learning theory.</p>
</sec>
<sec id="S2.SS2">
<title>Information-Theoretic Learning</title>
<p>The information-theoretic learning approach has been widely applied to improve the performance of machine learning algorithms in recent years. <xref ref-type="bibr" rid="B43">Zadeh and Schmid (2020)</xref> presented an alternative loss derived from a negative log-likelihood loss that results in much better calibrated prediction rules. <xref ref-type="bibr" rid="B44">Zhang et al. (2020)</xref> presented to learn saliency prediction from a single noisy labeling based on entropy theory. To optimize the performance of current learning algorithms, researchers have focused on the correntropy-based method. Zheng et al. (2020) presented a mixture correntropy-based kernel-based extreme learning machine (MC-KELM) to improve the robustness of KELM, which adopts the recently proposed MMCC as the optimization criterion, instead of using the MMSE criterion. <xref ref-type="bibr" rid="B12">Heravi and Hodtani (2018)</xref> presented a group of novel robust information theoretic backpropagation (BP) methods, such as correntropy-based conjugate gradient BP (CCG-BP). <xref ref-type="bibr" rid="B40">Xing et al. (2019)</xref> presented a novel correntropy-based multiview subspace clustering (CMVSC) method to efficiently learn the structure of the representation matrix from each view and make use of the extra information embedded in multiple views. Ensemble algorithms can also be used for improving the robustness of learning tasks, such as clustering. Bootstrap AGGregratING (Bagging) algorithms were proposed to improve the classification by combining the classification of randomly generated data sets (<xref ref-type="bibr" rid="B9">Fischer and Buhmann, 2003</xref>). Bagging is a successful example of an independent ensemble classifier to train the model independently and then combine the outputs for the final verdict. Although there are a number of studies on correntropy-based machine learning, there still lacks an efficient and effective way to adopt the entropy theory in the application of spike-based machine learning. Therefore, this study aims at presenting an optimized entropy-based spike-driven few-shot learning with ensemble loss functions for robust few-shot learning.</p>
</sec>
</sec>
<sec id="S3" sec-type="materials|methods">
<title>Materials And Methods</title>
<sec id="S3.SS1">
<title>Proposed Ensemble Loss</title>
<p>In this study, a novel objective function is proposed, which is the combination of single losses and integrates the proposed objective function into the spike-driven few-shot learning model. First, a mathematical explanation of the meaning of the proposed loss function is given to clarify the importance of the loss function. Let <italic>&#x0177;</italic> represents the estimated label of a true label <italic>&#x0177;</italic>. A loss function <italic>L</italic>(<italic>y</italic>, <italic>&#x0177;</italic>) represents a positive function, which indicates the difference between <italic>&#x0177;</italic> and <italic>y</italic>. Several types of loss functions are combined with trainable weights. Let <inline-formula><mml:math id="INEQ1"><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup></mml:math></inline-formula> represents <italic>K</italic> single loss functions. The aim is to find the best weights {&#x03BB;<sub>1</sub>, &#x03BB;<sub>2</sub>,., &#x03BB;<italic><sub><italic>K</italic></sub></italic>} to combine <italic>K</italic> basis loss function for the generation of the best application-oriented loss function. A further constraint is added to avoid values close to 0 for all the weights. The proposed ensemble loss function is expressed as</p>
<disp-formula id="S3.E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The optimization with <italic>N</italic> training samples can be expressed as</p>
<disp-formula id="S3.E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mtable displaystyle="true" rowspacing="0pt"><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:munder><mml:mtext>minimize</mml:mtext><mml:mrow><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mstyle displaystyle="false"><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mstyle><mml:mrow><mml:mstyle displaystyle="false"><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mstyle><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:msub><mml:mi>L</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mrow><mml:mrow><mml:mstyle displaystyle="false"><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup></mml:mstyle><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Then, the constraint is incorporated as a regularization term according to the concept of Augmented Lagrangian. The modified objective function based on Augmented Lagrangian is described as</p>
<disp-formula id="S3.E3"><label>(3)</label><mml:math id="M3"><mml:mtable><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mrow><mml:munder><mml:mtext>minimize</mml:mtext><mml:mrow><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:msub><mml:mi>L</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03B7;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03B7;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>First and second terms of the objective function induce the values of <inline-formula><mml:math id="INEQ2"><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula> to approach 0 but the third term satisfied <inline-formula><mml:math id="INEQ3"><mml:mrow><mml:mrow><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup><mml:msubsup><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>. The overall training process is described in <xref ref-type="other" rid="Box1">Algorithm 1</xref>.</p>
<boxed-text id="Box1" position="float">
<title>Algorithm 1: Pseudo-code of the whole training process for the proposed method.</title>
<table-wrap>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left"><bold>Input:</bold></td>
</tr>
<tr>
<td valign="top" align="left">The training set <italic>T</italic>, parameters <italic>&#x03BB;<sub><italic>i</italic></sub></italic> (Weights associated with each loss function), &#x03B7;<sub>1</sub>, &#x03B7;<sub>2</sub> (Lagrangian weights), &#x03C3; (Correntropy kernel bandwith), and <italic>m</italic> [maximum number of iterations (epochs)]</td>
</tr>
<tr>
<td valign="top" align="left">Base loss functions {<italic>L</italic><sub><italic>j</italic></sub>(<italic>X</italic><sub><italic>i</italic></sub>,<italic>y</italic><sub><italic>i</italic></sub>)}<sup>4</sup><italic><sub><italic>j</italic></sub></italic><sub>=1</sub>, <italic>K</italic> = 4 (MMCC, Cross-entropy, MMSE based on firing rate, MMSE based on membrane potential)</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Output:</bold></td>
</tr>
<tr>
<td valign="top" align="left">Parameter <italic>W</italic>, &#x03BB;<sub>1</sub>, &#x03BB;<sub>2</sub>, &#x03BB;<sub>3</sub>, &#x03BB; <sub>4</sub></td>
</tr>
<tr>
<td valign="top" align="left">1: Initiate Ensemble Loss Function using {<italic>L</italic><sub><italic>j</italic></sub>(<italic>X</italic><sub><italic>i</italic></sub>,<italic>y</italic><sub><italic>i</italic></sub>)}<sup>4</sup><italic><sub><italic>j</italic></sub></italic><sub>=1</sub> and random &#x03BB;<sub>1</sub>, &#x03BB;<sub>2</sub>, &#x03BB;<sub>3</sub>, &#x03BB;<sub>4</sub></td>
</tr>
<tr>
<td valign="top" align="left">2: Initialize parameters <italic>W</italic>~<italic>N</italic>(0,&#x03A3;) and <italic>t</italic> = 0</td>
</tr>
<tr>
<td valign="top" align="left">3: <bold>while</bold> not converged <bold>do</bold></td>
</tr>
<tr>
<td valign="top" align="left">4: Select a mini-batch of training samples {<italic>X</italic><sub><italic>i</italic></sub>, <italic>y</italic><sub><italic>i</italic></sub>}<italic><sub><italic>i</italic></sub></italic><sub>=1</sub><italic><italic><sup>N</sup></italic></italic> from training set <italic>T</italic>.</td>
</tr>
<tr>
<td valign="top" align="left">5: Perform a forward path, calculate the loss and regularization term:</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="INEQ4"><mml:mstyle displaystyle="true"><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover></mml:mstyle></mml:math></inline-formula><inline-formula><mml:math id="INEQ5"><mml:mstyle displaystyle="true"><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:munderover></mml:mstyle></mml:math></inline-formula><inline-formula><mml:math id="INEQ6"><mml:msubsup><mml:mi>&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula><italic>L</italic><sub><italic>j</italic></sub>(<italic>y</italic><sub><italic>i</italic></sub>, <italic>&#x0177;<sub><italic>i</italic></sub></italic>) + &#x03B7;<sub><italic>1</italic></sub>(<inline-formula><mml:math id="INEQ7"><mml:mstyle displaystyle="true"><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:munderover></mml:mstyle></mml:math></inline-formula><inline-formula><mml:math id="INEQ8"><mml:msubsup><mml:mi>&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula>&#x2212;1) + &#x03B7;(<inline-formula><mml:math id="INEQ9"><mml:mstyle displaystyle="true"><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:munderover></mml:mstyle></mml:math></inline-formula><inline-formula><mml:math id="INEQ10"><mml:msubsup><mml:mi>&#x03BB;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula>&#x2212;1)</td>
</tr>
<tr>
<td valign="top" align="left">6: Perform a backward propagation by the BPTT algorithm</td>
</tr>
<tr>
<td valign="top" align="left">7: Update W,&#x03BB;<sub>1</sub>, &#x03BB;<sub>2</sub>, &#x03BB;<sub>3</sub>, &#x03BB;<sub>4</sub> by gradient descent algorithm.</td>
</tr>
<tr>
<td valign="top" align="left">8: <italic>t</italic>&#x2190;<italic>t</italic>+1</td>
</tr>
<tr>
<td valign="top" align="left"><bold>return</bold> {<italic>W</italic>(<italic>t</italic>), &#x03BB;<sub>1</sub>(<italic>t</italic>), &#x03BB;<sub>2</sub>(<italic>t</italic>), &#x03BB;<sub>3</sub>(<italic>t</italic>), &#x03BB;<sub>4</sub>(<italic>t</italic>)}</td>
</tr>
</tbody>
</table>
</table-wrap>
</boxed-text>
</sec>
<sec id="S3.SS2">
<title>Mixture Maximum Correntropy Criterion</title>
<p>The correntropy has been widely used in various kinds of fields, such as machine learning and signal processing, which is defined as</p>
<disp-formula id="S3.E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo largeop="true" symmetric="true">&#x222B;</mml:mo><mml:mrow><mml:mo largeop="true" symmetric="true">&#x222B;</mml:mo><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>X</mml:mi><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo mathvariant="italic" rspace="0pt">d</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="italic" rspace="0pt">d</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>X</italic> and <italic>Y</italic> represent the stochastic variables, and <italic>E</italic>[.] represents the expectation operator. The function <italic>k</italic><sub>&#x03C3;</sub> (.,.) represents the kernel function with kernel width &#x03C3;, and <italic>f</italic><sub><italic>XY</italic></sub>(.,.) represents the joint probability density function (PDF). In practical engineering projects, PDF is usually unknown. Therefore, the sample estimator can be defined by finite usable samples as</p>
<disp-formula id="S3.E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>V</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The radial basis function is usually selected as the function of correntropy, which can be formulated as</p>
<disp-formula id="S3.E6"><label>(6)</label><mml:math id="M6"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>V</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:msup><mml:mrow><mml:mo fence="true">||</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo fence="true">||</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:msup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>As a local similarity measurement, the correntropy can effectively inhibit the influence of the outlier and the non-Gaussian distribution. Only if the variables <italic>X</italic> = <italic>Y</italic>, the correntropy reaches the maximum value, which is defined as maximum correntropy criterion (MCC). It can be used as the optimization criterion and robust loss function.</p>
<p>Therefore, this study uses a mixture correntropy, which can be described as</p>
<disp-formula id="S3.E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>S</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="INEQ12"><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:mo>.</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>S</mml:mi></mml:msubsup></mml:math></inline-formula> are <italic>S</italic> different Gaussian kernels based on each kernel size &#x03C3;<italic><sub><italic>s</italic></sub></italic>. <inline-formula><mml:math id="INEQ13"><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>S</mml:mi></mml:msubsup></mml:math></inline-formula> are <italic>S</italic> mixture parameters satisfying 0 &#x2264; &#x03BB;<italic><sub><italic>s</italic></sub></italic> &#x2264; 1 and <inline-formula><mml:math id="INEQ14"><mml:mrow><mml:mrow><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>S</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>. In this paper, <italic>S</italic> is selected to be 2. Thus, the sample estimator of mixture correntropy can be expressed as</p>
<disp-formula id="S3.E8"><label>(8)</label><mml:math id="M8"><mml:mrow><mml:mtable displaystyle="true" rowspacing="0pt"><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>V</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mrow><mml:mstyle displaystyle="false"><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mstyle><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mi/><mml:mo>=</mml:mo><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mrow><mml:mstyle displaystyle="false"><mml:msubsup><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mstyle><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac><mml:msup><mml:mrow><mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:msubsup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac><mml:msup><mml:mrow><mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:msubsup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>An unknown parameter can be estimated by maximizing the mixture correntropy between the desired signals and the estimated values. More details on the MMCC can be found in Zheng et al. (2020). The curve of influence functions of MCC and MMSE are shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. In this figure, the <italic>x</italic>-axis <italic>e</italic> represents the estimated error between the actual output and its corresponding estimate. The influence function &#x03A8;(<italic>e</italic>) is calculated as follows:</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Influence functions based on the minimum mean square error (MMSE) or maximum correntropy criterion (MCC).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g001.tif"/>
</fig>
<disp-formula id="S3.E9"><label>(9)</label><mml:math id="M9"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>e</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>e</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:mi>e</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mfrac><mml:mi>e</mml:mi><mml:msup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:msup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>G</italic><sub>&#x03C3;</sub> (&#x22C5;) represents the Gaussian kernel and &#x03C3; is the size of the Gaussian kernel. It is shown that the influence function of MMSE increases linearly with the amplitude of the estimated error, while MCC is constrained to larger errors. Since larger errors are induced by outliers, MCC is useful to deal with the robust learning problem.</p>
</sec>
<sec id="S3.SS3">
<title>Cross-Entropy Loss Function</title>
<p>The cross-entropy loss function is also regarded as log loss and is the most commonly used loss function for back propagation. It also increases as the predicted probability deviates from the actual label, which can be expressed as follows:</p>
<disp-formula id="S3.E10"><label>(10)</label><mml:math id="M10"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>In this study, a label <italic>l<sup>n</sup></italic> is used for each image, which assumes a value of 1 only for images that belong to the same class as the image in the test phase and assumes a value of 0 otherwise. Then, the formulation can be described as</p>
<disp-formula id="S3.E11"><label>(11)</label><mml:math id="M11"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>C</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>5</mml:mn></mml:munderover><mml:mo>-</mml:mo><mml:mrow><mml:msup><mml:mi>l</mml:mi><mml:mi>n</mml:mi></mml:msup><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mn>20</mml:mn><mml:mo>+</mml:mo><mml:mrow><mml:mn>20</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msup><mml:mi>l</mml:mi><mml:mi>n</mml:mi></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mrow><mml:mn>20</mml:mn><mml:mo>+</mml:mo><mml:mrow><mml:mn>20</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where the output of the SNN model only counts after all the images are fully presented.</p>
</sec>
<sec id="S3.SS4">
<title>Regularization by Minimum Mean Square Error</title>
<p>To obtain a sparse firing regime, additional terms are added for the regularization of spiking activities. Two types of regularization methods are employed, including firing rate regularization and voltage range regularization. Firstly, to keep the average firing rate <italic>f</italic><sub><italic>j</italic></sub> for all neurons <italic>j</italic> close to a predefined target firing rate <italic>f</italic><sub><italic>target</italic></sub>, a term is added, which is defined as</p>
<disp-formula id="S3.E12"><label>(12)</label><mml:math id="M12"><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mtext>target</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>f</italic><sub><italic>j</italic></sub> is computed as the average spike count, which is expressed as</p>
<disp-formula id="S3.E13"><label>(13)</label><mml:math id="M13"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mi>T</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:munderover><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msubsup><mml:mi>z</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="INEQ16"><mml:msubsup><mml:mi>z</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> indicates the neural spikes in a particular batch with <italic>n</italic>, and <italic>T</italic> represents the total duration on a particular task. In addition, the factor &#x03BB;<italic><sub><italic>f</italic></sub></italic> represents a hyperparameter that scales the importance of firing rate regularization.</p>
<p>Besides, to encourage the membrane potential to remain in a particular range, the membrane potential values are penalized, which are defined as</p>
<disp-formula id="S3.E14"><label>(14)</label><mml:math id="M14"><mml:mtable><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>V</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>R</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mi>v</mml:mi></mml:msub><mml:mrow><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mfrac><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:mrow><mml:mo maxsize="260%" minsize="260%">(</mml:mo><mml:mi>max</mml:mi><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mo>+</mml:mo><mml:mi>max</mml:mi><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo>-</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo maxsize="260%" minsize="260%">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where an index <italic>n</italic> is used to indicate each batch. The variables <italic>v<sub><italic>j</italic></sub><italic><sup>t</sup></italic></italic> and <italic>a<sub><italic>j</italic></sub><italic><sup>t</sup></italic></italic> represent the membrane potential and the adaptive firing threshold, respectively. The resultant threshold voltage is <italic>A</italic><sub><italic>j</italic></sub>(<italic>t</italic>). The factor &#x03BB;<italic><sub><italic>v</italic></sub></italic> represents a hyperparameter that scales the importance of the resulting membrane potential regularization.</p>
</sec>
<sec id="S3.SS5">
<title>Network Architecture of the Proposed Heterogeneous Ensemble-Based Spike-Driven Few-Shot Online Learning Model</title>
<p>In this study, the proposed HESFOL model contains a SFOL model with spiking neurons along with the ensemble loss function for back propagation. The proposed learning method is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, where the ensemble loss function is represented by the dashed box. The combination of the loss function is based on Equations (1)&#x2013;(3), which contains MMCC, cross-entropy loss function, and the two types of MMSE. Assume that in a multi-class data set <italic>X</italic>, <italic>x</italic><sub><italic>i</italic></sub>&#x2208;<italic>R<sup>K</sup></italic> represents the <italic>k</italic>-dimensional input. <italic>y</italic>&#x2208;{0,1}<italic><italic><sup>C</sup></italic></italic> represents the one-shot encoding of the label. <xref ref-type="fig" rid="F2">Figure 2</xref> depicts the proposed HESFOL model, extended with our ensemble loss function for the few-shot learning problem. In the backward step, the gradients of the proposed loss function flow back through the networks and weights. The weights are updated in the opposite direction of the gradient because the weights are determined and adjusted to decrease the loss value.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Schematic block figure of the proposed heterogeneous ensemble-based spike-driven few-shot learning (HESFOL) model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g002.tif"/>
</fig>
</sec>
<sec id="S3.SS6">
<title>Two-Compartment Spiking Neuron Model With Adaptation Mechanism</title>
<p>This study uses a two-compartment spiking neuron model for robust learning. Previous research has demonstrated that spike-driven learning with dendritic processing can fasten the convergence speed and reduce the number of spikes (<xref ref-type="bibr" rid="B41">Yang et al., 2021a</xref>). Therefore, a spiking and dendrite neuron model is proposed in this study. The soma compartment has two variables, which are the membrane potential <italic>v<sub><italic>j</italic></sub><italic><sup>t</sup></italic></italic> and the adaptive firing threshold <italic>a<sub><italic>j</italic></sub><italic><sup>t</sup></italic></italic>. The resulting threshold voltage <italic>A</italic><sub><italic>j</italic></sub>(<italic>t</italic>) increase along with each output spike and decays to the baseline threshold <italic>v</italic><sub><italic>th</italic></sub> based on an adaptation time constant <italic>&#x03C4;a</italic>. Specifically, the soma compartment can be formulated as</p>
<disp-formula id="S3.E15"><label>(15)</label><mml:math id="M15"><mml:mrow><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E16"><label>(16)</label><mml:math id="M16"><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:msub><mml:mi>a</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E17"><label>(17)</label><mml:math id="M17"><mml:mrow><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03BC;</mml:mi><mml:msub><mml:mi>a</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where &#x03BC; = <italic>e</italic><sup>&#x2212;&#x0394;<italic>t</italic>/&#x03C4;<sub><italic>a</italic></sub></sup>. The factor &#x03BB; represents the impact of threshold adaptation. The discretion form of the spiking soma and dendrite models can be formulated as</p>
<disp-formula id="S3.E18"><label>(18)</label><mml:math id="M18"><mml:mtable displaystyle="true" rowspacing="0pt"><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi><mml:mstyle displaystyle="false"><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac><mml:msub><mml:mi>g</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:msub><mml:mi>g</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mfrac></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msubsup><mml:mi>V</mml:mi><mml:mi>i</mml:mi><mml:mi>b</mml:mi></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mo lspace="91.4pt">+</mml:mo><mml:mrow><mml:mstyle displaystyle="false"><mml:msub><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mstyle><mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mi>D</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="S3.E19"><label>(19)</label><mml:math id="M19"><mml:mrow><mml:mrow><mml:msubsup><mml:mi>V</mml:mi><mml:mi>i</mml:mi><mml:mi>b</mml:mi></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>s</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>g</italic><sub><italic>l</italic></sub> and <italic>g</italic><sub><italic>b</italic></sub> represent the leak conductance and the basal dendrite conductance, respectively, and &#x0394;<italic>T</italic> represents the integration step. <italic>W</italic><sub><italic>ji</italic></sub><italic><sup>rec</sup></italic> represents the synaptic weight from the neuron <italic>i</italic> to the neuron <italic>j</italic> in the recurrent architecture, and D represents the transmission delay of recurrent spikes accordingly. The parameter &#x03C4; = <italic>C</italic><sub><italic>m</italic></sub>/<italic>g</italic><sub><italic>l</italic></sub> represents a time constant, where <italic>C</italic><sub><italic>m</italic></sub> represents the membrane capacitance. The variable <italic>z</italic><sub><italic>i</italic></sub> represents the output spikes of the <italic>i</italic>th spiking neuron. The variables <italic>V</italic><sub><italic>i</italic></sub> and <italic>V<sub><italic>i</italic></sub><italic><sup>b</sup></italic></italic> represent the membrane potentials of soma and basal dendrite of the <italic>i</italic>th neuron, respectively. The term <italic>W</italic><sub><italic>ij</italic></sub> represents the synaptic weights in the input layer, and the constant <italic>b</italic><sub><italic>i</italic></sub> is defined as a bias term. The variable <italic>s<sup>input</sup></italic> is calculated based on the following equation:</p>
<disp-formula id="S3.E20"><label>(20)</label><mml:math id="M20"><mml:mrow><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mi mathvariant="normal">&#x03BA;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:msubsup><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>t</italic><sub><italic>jk</italic></sub><italic><sup>input</sup></italic> represents the <italic>k</italic>th spiking time of the input neuron <italic>j</italic>, and the response kernel is expressed as follows:</p>
<disp-formula id="S3.E21"><label>(21)</label><mml:math id="M21"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03BA;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mi mathvariant="normal">/</mml:mi><mml:msub><mml:mi mathvariant="normal">&#x03C4;</mml:mi><mml:mi>L</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mi mathvariant="normal">/</mml:mi><mml:msub><mml:mi mathvariant="normal">&#x03C4;</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi mathvariant="normal">/</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03C4;</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03C4;</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>&#x03C4;<sub><italic>L</italic></sub></italic> and <italic>&#x03C4;<sub><italic>s</italic></sub></italic> represent long and short time constant, and &#x0398; represents the Heaviside step function.</p>
</sec>
<sec id="S3.SS7">
<title>Spike-Driven Online Learning Model</title>
<p>In the proposed HESFOL model, a regular leaky integrate-and-fire (LIF) neuron model is used, which is modeled based on the membrane potential <italic>v</italic><sub><italic>j</italic></sub>(<italic>t</italic>) at time <italic>t</italic>. The membrane potential can integrate the input current and decay to a resting potential based on its membrane time constant <italic>&#x03C4;<sub><italic>m</italic></sub></italic>. Each time <italic>v</italic><sub><italic>j</italic></sub>(<italic>t</italic>) reaches the threshold, the neuron generates a spike as <italic>z</italic><sub><italic>j</italic></sub>(<italic>t</italic>) = 1. The regular spiking neuron model can be expressed as</p>
<disp-formula id="S3.E22"><label>(22)</label><mml:math id="M22"><mml:mrow><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E23"><label>(23)</label><mml:math id="M23"><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:msub><mml:mi>a</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>W<sub><italic>ji</italic></sub><italic><sup>rec</sup></italic></italic> represents the synaptic weight from the neuron <italic>i</italic> to the neuron <italic>j</italic>, and <italic>W<sub><italic>ji</italic></sub><italic><sup>in</sup></italic></italic> represents the weight of input component <italic>x</italic><sub><italic>i</italic></sub>(<italic>t</italic>) for the neuron <italic>j</italic>. The factor describes the decay speed of the membrane potential, and <italic>H</italic> and <italic>d</italic> represent the Heaviside step function and the transmission delay of recurrent spikes, respectively. A refractory period <italic>t</italic><sub><italic>refrac</italic></sub> is used to set <italic>z</italic><sub><italic>j</italic></sub>(<italic>t</italic>) = 0 after a neural spike. The outputs from the proposed HESFOL model are constructed by a weighted sum of low-pass filtered spikes, which is defined as</p>
<disp-formula id="S3.E24"><label>(24)</label><mml:math id="M24"><mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="normal">&#x03BD;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>&#x2264;</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:mrow><mml:mi mathvariant="normal">&#x03BD;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msubsup><mml:mi>b</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>W</italic><sub><italic>kj</italic></sub><italic><sup>out</sup></italic>, <italic>b</italic><sub><italic>k</italic></sub><italic><sup>out</sup></italic>, &#x03BD; = <italic>e</italic><sup>&#x2212;&#x0394;<italic>t</italic>/&#x03C4;<sub><italic>out</italic></sub></sup>, and &#x03C4;<sub><italic>out</italic></sub> are the readout time constants.</p>
<p>In the proposed HESFOL model, an associated eligibility trace is considered at each synapse, which is the key concept of the <italic>e</italic>-prop algorithm. The eligibility trace <italic>e</italic><sub><italic>ji</italic></sub>(<italic>t</italic>) represents the influence of the weight <italic>W</italic><sub><italic>ji</italic></sub> on the spiking activities of the neuron <italic>j</italic> at time <italic>t</italic>, but requires taking into account dependencies that do not involve other neurons besides <italic>i</italic> and <italic>j</italic>. Eligibility traces exist separately for input and recurrent synapses. The variable <italic>h</italic><sub><italic>j</italic></sub>(<italic>t</italic>) represents the hidden variables for a neuron <italic>j</italic> at time <italic>t</italic>. Then, the dynamics of the eligibility trace is defined as follows:</p>
<disp-formula id="S3.E25"><label>(25)</label><mml:math id="M25"><mml:mrow><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03B5;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E26"><label>(26)</label><mml:math id="M26"><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03B5;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03B5;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>The eligibility vector <italic>&#x03B5;<sub>ji</sub></italic>(<italic>t</italic>) means that the quantity is propagated forward in time along with the computation of the proposed HESFOL model. The term <inline-formula><mml:math id="INEQ19"><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></inline-formula> cannot be calculated directly because the relationship between <italic>z</italic><sub><italic>j</italic></sub>(<italic>t</italic>) and <italic>h</italic><sub><italic>j</italic></sub>(<italic>t</italic>) contains the non-differentiable Heaviside function. Therefore, the derivative in Equation (22) is replaced with a pseudo derivative that is described as</p>
<disp-formula id="S3.E27"><label>(27)</label><mml:math id="M27"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mn>0.3</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mi>max</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The vector of hidden variables <italic>h</italic><sub><italic>j</italic></sub>(<italic>t</italic>) is defined by <italic>h</italic><sub><italic>j</italic></sub>(<italic>t</italic>) = <italic>v</italic><sub><italic>j</italic></sub>(<italic>t</italic>), and the eligibility traces applied in the LIF dynamics can be formulated as</p>
<disp-formula id="S3.E28"><label>(28)</label><mml:math id="M28"><mml:mrow><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03C8;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>z</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="INEQ20"><mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>z</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mo largeop="true" symmetric="true">&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>&#x2264;</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x03B1;</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msup><mml:msubsup><mml:mi>z</mml:mi><mml:mi>i</mml:mi><mml:msup><mml:mi>t</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> is defined as the low-pass filtered presynaptic spiking activities of the neuron <italic>i</italic>. In addition, the vector of hidden variables of a neuron, <italic>h</italic><sub><italic>j</italic></sub>(<italic>t</italic>), also contains the variable of the firing threshold <italic>h</italic><sub><italic>j</italic></sub>(<italic>t</italic>) = [<italic>v</italic><sub><italic>j</italic></sub>(<italic>t</italic>), <italic>a</italic><sub><italic>j</italic></sub>(<italic>t</italic>)]. For the adaptive LIF (ALIF) neuron model, the eligibility trace <italic>e</italic><sub><italic>ji</italic></sub>(<italic>t</italic>) is defined as</p>
<disp-formula id="S3.E29"><label>(29)</label><mml:math id="M29"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>z</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B2;</mml:mi><mml:msub><mml:mi mathvariant="normal">&#x03B5;</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E30"><label>(30)</label><mml:math id="M30"><mml:mrow><mml:mtable displaystyle="true" rowspacing="0pt"><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03B5;</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C1;</mml:mi><mml:mo>-</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B2;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03B5;</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mo lspace="38.6pt">+</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>z</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>To realize the plasticity of the proposed HESFOL model, the derivative of the Heaviside function <inline-formula><mml:math id="INEQ21"><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></inline-formula> is replaced with a pseudo derivative in the backward pass, which is formulated as</p>
<disp-formula id="S3.E31"><label>(31)</label><mml:math id="M31"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mn>0.3</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mi>max</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>In addition, the derivative of the Heaviside function <inline-formula><mml:math id="INEQ22"><mml:mfrac><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2202;</mml:mo><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></inline-formula> is replaced by the formula as</p>
<disp-formula id="S3.E32"><label>(32)</label><mml:math id="M32"><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mn>0.3</mml:mn><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mi>max</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where the actual update to the initial synaptic weight <italic>W</italic><sub><italic>init</italic></sub> of the proposed HESFOL model is realized by the application of Adam with a learning rate &#x03B7;<sub><italic>rate</italic></sub>.</p>
</sec>
</sec>
<sec id="S4" sec-type="results">
<title>Results</title>
<sec id="S4.SS1">
<title>Details of the Heterogeneous Ensemble-Based Spike-Driven Few-Shot Online Learning Architecture</title>
<p>As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, the overall architecture of the proposed HESFOL model contains two parts, which are the SFOL model and the ensemble loss. The SFOL model is inspired by the neural mechanism underlying the human brain, which is based on the interaction between the hippocampus and the prefrontal cortex (PFC). Therefore, there are two modules in the SFOL model, which are hippocampus-inspired SNN (HSNN) and the PFC-inspired SNN (PSNN). The external inputs are summed and integrated into the membrane potentials of neurons in HSNN and PSNN modules. The HSNN readout is composed of the weighted low-pass filtered spike trains of neurons in the HSNN module. Suppose there exists an infinitely large family <italic>F</italic> of possibly relevant learning tasks <italic>C</italic>. The HSNN module learns a particular tasks <italic>C</italic> from <italic>F</italic> based on the learning signals provided by the PSNN module. Each time HSNN receives the new C tasks from the family <italic>F</italic>, the synaptic weight is updated. The learning performance of HSNN on the task <italic>C</italic> is evaluated based on the loss function. After the first phase of learning, the parameters are fixed between HSNN and PSNN modules, and new <italic>C</italic> tasks from the family <italic>F</italic> are selected to evaluate the HSNN learning performance. The encoding module of the SFOL model uses the processing mechanism of the visual pathway, so there is a visual-pathway-inspired neural network (VNN) based on the 2D ConvNet. The images are input into the VNN in a pixel array manner for input encoding. The 2D ConvNet consists of three layers, which is based on the non-spiking McCulloch&#x2013;Pitts neuron model. HSNN contains 180 two-compartment LIF (TLIF) neurons and 260 conventional LIF neurons. The learning signals can be only transmitted from PSNN to HSNN in the first phase. To realize the outer loop optimization, the ensemble loss is employed in the BPTT algorithm, which contains the loss functions of the MMCC, MMSE, and cross-entropy loss. The values of the hyperparameters used in the HESFOL model are listed in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>An overview of the proposed HESFOL framework. We employ 2D convolution for the ConvNet, which is considered as a visual-pathway-inspired neural network (VNN). In addition, two subnetworks are realized, which are hippocampus-inspired SNN (HSNN) and PFC-inspired SNN (PSNN). The learning signals are transmitted from PSNN to HSNN.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g003.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Hyperparameter list used in the heterogeneous ensemble-based spike-driven few-shot online learning (HESFOL) architecture.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Parameters</td>
<td valign="top" align="center">Description</td>
<td valign="top" align="center">Values</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">&#x03C4;<sub><italic>m</italic></sub></td>
<td valign="top" align="left">Timing constant of membrane</td>
<td valign="top" align="center">15 ms</td>
</tr>
<tr>
<td valign="top" align="left">&#x03C4;<sub><italic>out</italic></sub></td>
<td valign="top" align="left">Timing constant of readout neurons</td>
<td valign="top" align="center">10 ms</td>
</tr>
<tr>
<td valign="top" align="left"><italic>d</italic></td>
<td valign="top" align="left">Synaptic transmission delay</td>
<td valign="top" align="center">1 ms</td>
</tr>
<tr>
<td valign="top" align="left"><italic>t</italic><sub><italic>refrac</italic></sub></td>
<td valign="top" align="left">Refractory period duration</td>
<td valign="top" align="center">5 ms</td>
</tr>
<tr>
<td valign="top" align="left"><italic>f</italic><sub><italic>target</italic></sub></td>
<td valign="top" align="left">Target firing rate</td>
<td valign="top" align="center">20 Hz</td>
</tr>
<tr>
<td valign="top" align="left">&#x03B7;<italic><sub><italic>out</italic></sub></italic></td>
<td valign="top" align="left">Learning rate of outer loop</td>
<td valign="top" align="center">2 &#x00D7; 10<sup>&#x2013;3</sup></td>
</tr>
<tr>
<td valign="top" align="left">&#x03BB;<italic><sub><italic>f</italic></sub></italic></td>
<td valign="top" align="left">Spike rate regularization</td>
<td valign="top" align="center">1.0</td>
</tr>
<tr>
<td valign="top" align="left"><italic>v</italic><sub><italic>th</italic></sub></td>
<td valign="top" align="left">Threshold</td>
<td valign="top" align="center">1.0</td>
</tr>
<tr>
<td valign="top" align="left">&#x03BB;<italic><sub><italic>v</italic></sub></italic></td>
<td valign="top" align="left">Voltage regularization</td>
<td valign="top" align="center">10<sup>&#x2013;2</sup></td>
</tr>
<tr>
<td valign="top" align="left"><italic>t</italic><sub><italic>img</italic></sub></td>
<td valign="top" align="left">Number of time steps per image</td>
<td valign="top" align="center">20 ms</td>
</tr>
<tr>
<td valign="top" align="left"><italic>&#x03C4;<sub><italic>a</italic></sub></italic></td>
<td valign="top" align="left">Adaptation timing constant</td>
<td valign="top" align="center">200 ms</td>
</tr>
<tr>
<td valign="top" align="left">&#x03B7;</td>
<td valign="top" align="left">Learning rate</td>
<td valign="top" align="center">1.915 &#x00D7; 10<sup>&#x2013;3</sup></td>
</tr>
<tr>
<td valign="top" align="left"><italic>N</italic><sub><italic>HSNN</italic></sub></td>
<td valign="top" align="left">Network size of HSNN</td>
<td valign="top" align="center">447</td>
</tr>
<tr>
<td valign="top" align="left"><italic>q</italic><sub><italic>ada</italic></sub></td>
<td valign="top" align="left">Neuron fractions using adaptation</td>
<td valign="top" align="center">40.5%</td>
</tr>
<tr>
<td valign="top" align="left">&#x03B2;</td>
<td valign="top" align="left">Impact of threshold adaptation</td>
<td valign="top" align="center">0.4902</td>
</tr>
<tr>
<td valign="top" align="left"><italic>N</italic><sub><italic>batch</italic></sub></td>
<td valign="top" align="left">Batch size for outer loop optimization</td>
<td valign="top" align="center">285</td>
</tr>
<tr>
<td valign="top" align="left"><italic>N</italic><sub><italic>PSNN</italic></sub></td>
<td valign="top" align="left">Network size of the PSNN</td>
<td valign="top" align="center">239</td>
</tr>
<tr>
<td valign="top" align="left"><italic>&#x03C4;<sub><italic>LS</italic></sub></italic></td>
<td valign="top" align="left">Timing constant learning signals of readouts</td>
<td valign="top" align="center">10 ms</td>
</tr>
<tr>
<td valign="top" align="left"><italic>f</italic><sub><italic>tarPSNN</italic></sub></td>
<td valign="top" align="left">Target firing rate for PSNN</td>
<td valign="top" align="center">20 Hz</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="S4.SS2">
<title>Few-Shot Learning Performance on Spike Patterns With Non-Gaussian Noise</title>
<p>In the first task, spiking patterns with the non-Gaussian noise are used to test the few-shot learning capability of the proposed HESFOL model. A spatiotemporal spike pattern classification task is considered, where each pattern is generated with the firing frequency ranging from 2 to 50 Hz. Indeed, the spike patterns describe the spatiotemporal dynamics of the neural population, in which the firing frequency and precise timing of spiking neurons contain the rich information of an external input of the environment. The spike patterns of each category are instantiated by adding the non-Gaussian noise to the corresponding template, which contains the Poisson noise and the spiking deletion noise. We first generate 1,000 spike pattern templates based on certain spiking neurons. Then, we generate 25 spike patterns for each template by randomly marking a uniform distribution of the neural firing rate. Therefore, we build a few-shot learning data set of the spike patterns with 1,000 classes and 25 samples for each class.</p>
<p>Two types of non-Gaussian noise are considered in few-shot learning in the spiking patterns classification task. In the first type, new noisy spatiotemporal pattern samples are generated by adding Poisson noise to the templates with the standard deviation (SD) of &#x03C3;<sub><italic>noise</italic></sub>. In the second type, random deletion noise is added to the templates to generate new noisy spiking pattern samples, where each spike is randomly deleted according to a probability of <italic>P</italic><sub><italic>del</italic></sub>. As shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, our proposed HESFOL model achieves remarkable performance in various noisy situations, highlighting the advantages of our heterogeneous ensemble-based approach. Among all the presented learning loss functions, the loss function with MMCC, MMSE, and cross-entropy loss is the best to realize the highest robustness to tolerate noise.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Comparison of the few-shot learning performance with non-Gaussian noise between the HESFOL model and the other models. <bold>(A)</bold> Few-shot learning accuracy with Poisson noise. <bold>(B)</bold> Few-shot learning accuracy with random deletion noise.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g004.tif"/>
</fig>
</sec>
<sec id="S4.SS3">
<title>Few-Shot Learning Performance With Non-Gaussian Noise</title>
<p>In this study, we test our HESFOL model using the Omniglot data set. The Omniglot data set contains a total of 1,623 classes and 32,460 images, and each class contains 20 images. The data set is split up into 964 training classes and 659 classes. There are two phases in the test, which means a sequence of images in which one image of the same class exactly appears in phase #2 as the one shown in phase #1. The 2D CNN with 15,488 neurons is organized into three layers, which contain 16, 32, and 64 filters, respectively. The kernel size used in the convolutional filters is 3 &#x00D7; 3. The average pooling layers and batch normalization layers are also used for optimization improvement in the HESFOL model. Salt-and-pepper noise is added to the Omniglot images by randomly flipping 15% of the images, which is a kind of non-Gaussian noise. <xref ref-type="fig" rid="F5">Figure 5</xref> shows the images in the Omniglot data set that are contaminated by the non-Gaussian salt-and-pepper noise. The loss value of the ensemble evolves with an iteration, which is shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. This reveals that the loss value of the proposed HESFOL model reduces to a stationary level of about 0.2 quickly within 1,000 iterations.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Images with non-Gaussian salt-and-pepper noise in the Omniglot data set using signal-noise rate of 1, 0.9, 0.7, and 0.5.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g005.tif"/>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>The evolution of the loss value based on the ensemble loss along with the iteration.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g006.tif"/>
</fig>
<p>The values 0 and 1 are used to encode phases #1 and #2, respectively, which are included in the input signal. Images from the Omniglot data set are presented to the VNN using the 28 &#x00D7; 28 grayscale pixels of arrays. A single output is used to determine in phase #2 whether the presented image belongs to the same class as that in phase #1. Spike-based learning is employed by the HESFOL model, and PSNN receives both the spiking activities from HSNN and the input information with phase ID. The learning signals are transmitted from PSNN to HSNN only in the first phase. <xref ref-type="fig" rid="F7">Figure 7</xref> shows the spiking activities of the proposed HESFOL model during the few-shot learning task on the Omniglot data set. This reveals that the sparse spiking activities of the HSNN and PSNN subsystems occur in the few-shot learning task. The ensemble loss, which contains MMCC, cross-entropy loss function, and two types of MMSE, can successfully solve the few-shot learning problem with the images with non-Gaussian noise.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>One sample trial for the few-shot learning on the Omniglot data set using the HESFOL model. <bold>(A)</bold> Output of the readout neuron. <bold>(B)</bold> Spiking activities of neurons in the HSNN module. <bold>(C)</bold> Spiking activities of neurons in the PSNN module. <bold>(D)</bold> Learning signals of PSNN for HSNN neurons.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g007.tif"/>
</fig>
</sec>
<sec id="S4.SS4">
<title>Few-Shot Learning Performance on Manipulator Control</title>
<p>We further demonstrate the few-shot learning capability for manipulator control. The manipulator uses the end-effector of a two-joint arm for a generic motor control task to trace a target trajectory in Euclidean coordinates (<italic>x</italic>, <italic>y</italic>), as shown in <xref ref-type="fig" rid="F8">Figure 8</xref>. In the motor control task, the proposed HESFOL model can learn to reproduce a particular randomly generated target movement with the actual movement of the arm end-effector. The learning task is divided into two trails, which contains a training and a testing trial. In the training trial, PSNN receives the target movement in Euclidean coordinates, and PSNN outputs the learning signals for the HSNN module. After the testing trial, the weight update is applied to HSNN. In the testing trial, HSNN is tested to reproduce the previously given target movement of the arm end-effector without receiving the target trajectory. The input of HSNN is the same across all trials and is given by a clock-like input signal. The output of HSNN is the motor commands for angular velocities of the joints <inline-formula><mml:math id="INEQ24"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="normal">&#x03A6;</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="normal">&#x03D5;</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mn>1</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="normal">&#x03D5;</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mn>2</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. As shown in <xref ref-type="fig" rid="F9">Figure 9</xref>, the trajectory generated by HSNN as solid lines during both the training and testing trial. HSNN can regenerate the target movement based on biologically realistic sparse spiking activities after PSNN send learning signals to HSNN during the training trial. <xref ref-type="fig" rid="F9">Figure 9</xref> also shows the learning signals and the spiking activities of the proposed HESFOL model. The mean square error between the target and actual movement in the testing trial is shown in <xref ref-type="fig" rid="F10">Figure 10</xref>. The result reveals that the HESFOL model with the ensemble loss performs better than the model with just one or less types of loss functions. This reveals that the proposed HESFOL provides a new point of view for efficient motor control and learning underlying the neural mechanism of the human brain.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>Few-shot motor control of the end-effector of a two-joint robotic arm.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g008.tif"/>
</fig>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption><p>Few-shot motor control performance of the proposed HESFOL model. It shows the one-shot learning of a new end-effector movement in 500 ms. It reveals control performance and spiking activities before and after training. <bold>(A1)</bold> Position in the <italic>x</italic>-direction based on HESFOL control before training and the target position in the <italic>x</italic>-direction. <bold>(A2)</bold> Position in the <italic>x</italic>-direction based on HESFOL control after training and the target position in the <italic>x</italic>-direction. <bold>(B1)</bold> Position in the <italic>y</italic>-direction based on HESFOL control before training and the target position in the <italic>y</italic>-direction. <bold>(B2)</bold> Position in the <italic>y</italic>-direction based on HESFOL control after training and the target position in the <italic>y</italic>-direction. <bold>(C1)</bold> Motor command in the form of joint angular velocity and target angular velocity in the <italic>x</italic>-direction before training. <bold>(C2)</bold> Motor command in the form of joint angular velocity and target angular velocity in the <italic>x</italic>-direction after training. <bold>(D1)</bold> Motor command in the form of joint angular velocity and target angular velocity in the <italic>y</italic>-direction before training. <bold>(D2)</bold> Motor command in the form of joint angular velocity and target angular velocity in the <italic>y</italic>-direction after training. <bold>(E1)</bold> Spiking activities of HSNN before training. <bold>(E2)</bold> Spiking activities of HSNN after training. <bold>(F1)</bold> Spiking activities of PSNN. <bold>(F2)</bold> Learning signals generated by PSNN for HSNN.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g009.tif"/>
</fig>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption><p>Control performance based on the mean square error of original and HESFOL models.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g010.tif"/>
</fig>
</sec>
<sec id="S4.SS5">
<title>Effects of the Ensemble Parameters on Learning Performance</title>
<p>In this study, we further explore how each of the base loss functions in the ensemble loss of the proposed HESFOL model contribute to the ensemble loss function in <xref ref-type="table" rid="T2">Table 2</xref>. We test the effects of the ensemble parameters on the few-shot learning performance on different types of data sets, including spiking patterns and the Omniglot data set. Overall, the cross-entropy loss has the largest weights for both the data sets, which means that the cross-entropy contributes the most to form the ensemble loss function of the proposed HESFOL model.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Test accuracies (%) of different ensemble parameter settings in the Omniglot data set.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Groups</td>
<td valign="top" align="left">Loss</td>
<td valign="top" align="center">Values</td>
<td valign="top" align="center">Omniglot accuracy</td>
<td valign="top" align="center">Groups</td>
<td valign="top" align="left">Loss</td>
<td valign="top" align="left">Values</td>
<td valign="top" align="center">Omniglot accuracy</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Group 1</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">90.6%</td>
<td valign="top" align="center">Group 5</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="left">0.1</td>
<td valign="top" align="center">90.6%</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="center">0.9</td>
<td/>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="left">0.9</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="center">0.5</td>
<td/>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="left">0.5</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="center">0.5</td>
<td/>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="left">0.5</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Group 2</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">90.6%</td>
<td valign="top" align="center"><bold>Group 6</bold></td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="left"><bold>0.2</bold></td>
<td valign="top" align="center"><bold>93.1%</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="center">1.3</td>
<td/>
<td/>
<td valign="top" align="left"><bold>Cross</bold></td>
<td valign="top" align="left"><bold>0.8</bold></td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="center">0.3</td>
<td/>
<td/>
<td valign="top" align="left"><bold>Rate</bold></td>
<td valign="top" align="left"><bold>0.5</bold></td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="center">0.3</td>
<td/>
<td/>
<td valign="top" align="left"><bold>Vol</bold></td>
<td valign="top" align="left"><bold>0.5</bold></td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Group 3</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">92.2%</td>
<td valign="top" align="center">Group 7</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="left">0.2</td>
<td valign="top" align="center">90.6%</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="center">1.0</td>
<td/>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="left">1.3</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="center">0.45</td>
<td/>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="left">0.25</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="center">0.45</td>
<td/>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="left">0.25</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Group 4</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">91.4%</td>
<td valign="top" align="center">Group 8</td>
<td valign="top" align="left">MMCC</td>
<td valign="top" align="left">0.2</td>
<td valign="top" align="center">89.8%</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="center">0.7</td>
<td/>
<td/>
<td valign="top" align="left">Cross</td>
<td valign="top" align="left">0.6</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="center">0.6</td>
<td/>
<td/>
<td valign="top" align="left">Rate</td>
<td valign="top" align="left">0.6</td>
<td/>
</tr>
<tr>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="center">0.6</td>
<td/>
<td/>
<td valign="top" align="left">Vol</td>
<td valign="top" align="left">0.6</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>The bolded values are the optimal configuration.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>In terms of the correntropy loss function, the weight value of 0.1 tends to be a suitable loss function in a very noisy environment, especially in the presence of outliers. The proposed SNN architecture realizes the few-shot learning tasks by back propagating the gradient of the loss and it is likely to suffer from the problem of gradient vanishing. Thus, a loss function that highlights the error can outperform the MMCC loss function. Therefore, the weight of the cross-entropy loss function is larger than the others in the ensemble loss function of the proposed HESFOL model.</p>
</sec>
<sec id="S4.SS6">
<title>Comparison With the Other Models on Few-Shot Learning Performance</title>
<p>To evaluate the few-shot learning performance more directly, we compare the HESFOL model with other models, including ANNs and SNNs. <xref ref-type="bibr" rid="B13">Jiang et al. (2021)</xref> proposed a novel SNN model with a long short-term memory (LSTM) unit for few-shot learning, called the multi-timescale optimization (MTSO) model. As the proposed HESFOL model has not used model augmentation to achieve the best accuracy, a fair comparison is conducted with the other models without augmentation and fine tuning. The MTSO model without augmentation can achieve 95.8% accuracy. In terms of ANN models, the MANN presented by <xref ref-type="bibr" rid="B29">Santoro et al. (2016)</xref> achieved 82.8% accuracy on the Omniglot data set. The learning accuracy of CNN presented by <xref ref-type="bibr" rid="B13">Jiang et al. (2021)</xref> only reached 92.1%, while the spiking CNN with L1 regularization for sparsity obtained 92.8% learning accuracy on Omniglot. The Siamese Net can get 96.7% accuracy with augmentation (<xref ref-type="bibr" rid="B18">Koch et al., 2015</xref>). The proposed HESFOL model achieved 93.1% accuracy on the Omniglot data set with non-Gaussian noise, which shows a comparative performance on the few-shot learning task. Although its learning accuracy is slightly lower than that of the Siamese Net, the HESFOL model uses a spike-based paradigm, which means that it owns the advantage of low power consumption and high biological plausibility. In addition, the HESFOL model is 2.7% lower than the MTSO, but the HESFOL model uses non-Gaussian noisy data to evaluate, other than the pure data set used by the MTSO model. This demonstrates that the proposed HESFOL model can achieve high robustness of few-shot learning without losing much accuracy. As the proposed HESFOL uses a simple spike-based few-shot learning framework, more complicated data set is not the aim of this study. However, we will conduct on more complicated data set in the future work. It should be noted that the major ambition is to present a robust spike-based few-shot learning framework based on the ITL theory.</p>
</sec>
<sec id="S4.SS7">
<title>Effects of the Critical Parameters of the Heterogeneous Ensemble-Based Spike-Driven Few-Shot Online Learning Model on Learning Performance</title>
<p>In addition, we further explore the critical parameter of the proposed HESFOL model on the few-shot learning performance. Three critical parameters are selected, which are the timing constant of membrane &#x03C4;<sub><italic>m</italic></sub>, timing constant of readout neurons &#x03C4;<sub><italic>out</italic></sub>, and membrane potential threshold <italic>v</italic><sub><italic>th</italic></sub>. We select the Omniglot data set to test the learning performance of the HESFOL model. As shown in <xref ref-type="fig" rid="F11">Figure 11</xref>, learning accuracy is demonstrated by changing parameters. <xref ref-type="fig" rid="F11">Figure 11A</xref> reveals that the highest learning accuracy can be obtained when &#x03C4;<sub><italic>m</italic></sub> = 15 and &#x03C4;<sub><italic>out</italic></sub> = 1 0. In addition, <xref ref-type="fig" rid="F11">Figure 11B</xref> shows that &#x03C4;<sub><italic>out</italic></sub> = 10 and <italic>v<sub><italic>th</italic></sub></italic> = 1.0 can result in the highest learning accuracy. It also suggests the preferred parameter values for neural dynamics when realizing the classification tasks to test the few-shot learning performance. As the proposed HESFOL model realizes the few-shot learning capability based on the meta-learning scheme, it also implies that the SNN model with this set of parameter values has the highest LSTM performance to store <italic>a priori</italic> experience for the current learning task.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption><p><bold>(A)</bold> The effects of timing constant of membrane &#x03C4;m and timing constant of readout neurons &#x03C4;out on learning accuracy. <bold>(B)</bold> The effects of membrane potential threshold vth and timing constant of readout neuorns &#x03C4;out on learning accuracy.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g011.tif"/>
</fig>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption><p>The loss function curve of MMCC along with the errors.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-850932-g012.tif"/>
</fig>
</sec>
</sec>
<sec id="S5" sec-type="discussion">
<title>Discussion</title>
<sec id="S5.SS1">
<title>Theoretical Analysis</title>
<p>The major components of a learning model are the loss function, which demonstrates the influence of samples on the model training. The loss function gives each sample a value, which demonstrates the participation level of each sample in the learning problem. For example, if the loss function assigns an outlier sample a large value, this outlier may generate a negative impact on the model parameters. If the 0&#x2013;1 loss function penalizes all samples that are classified incorrectly with the value 1, this can be considered as robustness. A robust learning machine requires that outliers do not influence the system performance too much. The ultimate goal of a learning approach is to own the capability to classify unseen data. Therefore, the classifier should have robustness to data disturbance. A more difficult situation exists in the noisy environment, where the outlier will damage the training or testing data. To deal with the noisy environment, an efficient approach is to use a robust loss function. If there exists a constant <italic>k</italic> and samples with <italic>e<sub><italic>i</italic></sub></italic> &#x003E; <italic>k</italic> do not be set with a large value by the loss function, where <italic>e</italic><sub><italic>i</italic></sub> represents the error of the <italic>i</italic>th sample, this loss function can be regarded as robust. Despite some learning classifiers can classify the training data with high performance, it cannot estimate the unknown data. Therefore, although the training error is low, it will induce high generation error. This failure is due to the overfitting problem, which means that the classifier matches the training data and loses the generalization capability. A better generalization solution is to use a loss function to realize a more general classifier.</p>
<p>If an error value is expected to be minimized, the loss function will generate a more generalized classifier with an enhanced margin. If a classifier has an enhanced margin, the performance will be improved to deal with unseen data with better generalization. An enhanced classifier can be realized when the correct samples close to the classification line are penalized, and the loss function can be regarded as margin enhancing. As each loss function has its own advantages and disadvantages, there is no comprehensive loss function to work well in all situations. Therefore, this research proposes the use of an ensemble of loss in the SNN model. As correntropy is a bounded function, it is less sensitive to outliers. The kernel size limit the influence of each independent sample on the total result, which can reduce the effects of non-Gaussian noise in the environment on learning performance. <xref ref-type="fig" rid="F11">Figure 12</xref> further presents the loss function of MMCC. It shows that MMCC is a measure to evaluate the local similarity of samples and present a unique mixed norm feature, which is specifically summarized as follows:</p>
<list list-type="simple">
<list-item>
<label>1.</label>
<p>MMCC shows the characteristics of the &#x2112;2 norm when the error is close to 0;</p>
</list-item>
<list-item>
<label>2.</label>
<p>The MMCC loss function shows the characteristics of the &#x2112;1 norm when the error increases from 0;</p>
</list-item>
<list-item>
<label>3.</label>
<p>The MMCC loss function demonstrates the characteristics of the &#x2112;0 norm when the error is particularly large.</p>
</list-item>
</list>
<p>Therefore, MMCC is sensitive to elements with high local similarity in the sample, but not to the two elements with large difference. Due to these characteristics, MMCC can effectively reduce the impact the non-Gaussian noise on learning tasks, inducing more robust spike-driven few-shot learning performance.</p>
<p>In addition, spiked dendrites in the HESFOL model also enhance the robustness of few-shot learning. It has been proven in some previous studies (<xref ref-type="bibr" rid="B41">Yang et al., 2021a</xref>). This is because the non-linear computation of spiked dendrites can inhibit the disturbance of input noise and in the transmission pathway, thus improving the learning performance. In addition, as spiked dendrites can solve the credit assignment problem and distinguish the information flow in feedforward and recurrent pathways, the learning performance, including robustness, can be further enhanced.</p>
</sec>
<sec id="S5.SS2">
<title>Power Efficiency Based on the Heterogeneous Ensemble-Based Spike-Driven Few-Shot Online Learning Model</title>
<p>Previous research has revealed that the lowest energy consumption of a synaptic operation is about 20 pJ in the state-of-the-art neuromorphic system (<xref ref-type="bibr" rid="B22">Merolla et al., 2014</xref>; <xref ref-type="bibr" rid="B26">Qiao et al., 2015</xref>). The proposed HESFOL model will cost around 60 spikes in HSNN and around 70 spikes in PSNN on the classification task using the Omniglot data set. Therefore, single spike classification using the proposed HESFOL will cost 2.6 pJ in such a neuromorphic system, which outperforms the current work based on digital neuromorphic hardware (&#x2248;2 &#x03BC;J) (<xref ref-type="bibr" rid="B7">Esser et al., 2016</xref>) and potentially 50,000 more power efficient than current graphics processing unit (GPU) platforms (<xref ref-type="bibr" rid="B27">Rodrigues et al., 2018</xref>). Our previous work has shown that the classification task using an improved DEP-based SNN model induces about 1,011 SynOps to obtain the highest classification accuracy (<xref ref-type="bibr" rid="B41">Yang et al., 2021a</xref>). Therefore, the proposed HESFOL model can reduce 87.14% of the totally induced spikes, i.e., the power consumption, in comparison with the state-of-the-art SNN model. The reasons for the low-power consumption by the proposed HESFOL model can be divided into three aspects. Firstly, the ensemble entropy theory is used, which can fasten the learning speed to reach the maximum learning accuracy. It is useful to reduce the power consumption cost during learning. Secondly, a few-shot learning procedure is used in the classification task, which will shorten the overall learning process and potentially reduce power consumption. Thirdly, spiked dendrites are used in the spike-driven learning task, which can further cut down the required spikes due to their non-linear information processing capability. Therefore, the proposed HESFOL model cannot only improve the learning accuracy and robustness of SNN models, but also further cut down the power efficiency of neuromorphic hardware.</p>
</sec>
<sec id="S5.SS3">
<title>Comparison With Spiking Neural Networks of Liquid State Machines and Future Work</title>
<p>Previously, <xref ref-type="bibr" rid="B28">Roy et al. (2019)</xref> presented a good overview of recent SNN training techniques in the context of reservoirs or liquid state machines (LSMs) whose architectures are similar to the proposed HESFOL framework. LSMs use unstructured, randomly connected recurrent networks paired with a simple linear readout. As shown in <xref ref-type="table" rid="T3">Table 3</xref>, such frameworks with spiking dynamics have shown a surprising degree of success for a variety of sequential recognition tasks (<xref ref-type="bibr" rid="B23">Panda and Roy, 2017</xref>; <xref ref-type="bibr" rid="B31">Soures and Kudithipudi, 2019</xref>; <xref ref-type="bibr" rid="B39">Wijesinghe et al., 2019</xref>). <xref ref-type="bibr" rid="B31">Soures and Kudithipudi (2019)</xref> presented a deep LSM with an STDP learning rule for video activity recognition. <xref ref-type="bibr" rid="B39">Wijesinghe et al. (2019)</xref> presented the ensemble approach for LSM to enhance class discrimination, leading to better accuracy in speech and image recognition tasks compared to a single large liquid. Wang et al. (2020) proposed a novel LSM model for sitting posture recognition. <xref ref-type="bibr" rid="B20">Luo S. et al. (2018)</xref> presented two different methods to improve LSM for real-time pattern classification from the perspectives of spatial integration and temporal integration. We introduce LSM as a model for an automatic feature extraction and prediction from raw electroencephalography (EEG) with a potential extension to a wider range of applications. <xref ref-type="bibr" rid="B1">Al Zoubi et al. (2018)</xref> introduced LSM as a model for an automatic feature extraction and prediction from raw EEG with a potential extension to a wider range of applications. Although these works presented different strategies for sequential recognition tasks, none of them have successfully solved the few-shot learning problem. This study firstly proposed a unified framework for the simultaneous realization of robust image classification and few-shot learning performance, which is superior to representative LSM models based on the recurrent architecture.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Comparison with the representative liquid state machine (LSM) models with the recurrent architecture.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Research</td>
<td valign="top" align="center">Application</td>
<td valign="top" align="center">Robustness</td>
<td valign="top" align="center">Few-shot learning</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B31">Soures and Kudithipudi, 2019</xref></td>
<td valign="top" align="left">Video activity recognition</td>
<td valign="top" align="center">No</td>
<td valign="top" align="center">No</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B39">Wijesinghe et al., 2019</xref></td>
<td valign="top" align="left">Image/speech recognition</td>
<td valign="top" align="center">No</td>
<td valign="top" align="center">No</td>
</tr>
<tr>
<td valign="top" align="left">Wang et al., 2020</td>
<td valign="top" align="left">Sitting posture recognition</td>
<td valign="top" align="center">No</td>
<td valign="top" align="center">No</td>
</tr>
<tr>
<td valign="top" align="left">Luo et al. 2018</td>
<td valign="top" align="left">Pattern classification</td>
<td valign="top" align="center">No</td>
<td valign="top" align="center">No</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B1">Al Zoubi et al., 2018</xref></td>
<td valign="top" align="left">Emotion recognition</td>
<td valign="top" align="center">No</td>
<td valign="top" align="center">No</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B23">Panda and Roy, 2017</xref></td>
<td valign="top" align="left">Visual recognition</td>
<td valign="top" align="center">Yes</td>
<td valign="top" align="center">No</td>
</tr>
<tr>
<td valign="top" align="left">HESFOL</td>
<td valign="top" align="left">Image classification</td>
<td valign="top" align="center">Yes</td>
<td valign="top" align="center">Yes</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For deep SNN training, the ANN&#x2013;SNN conversion requires less GPU computing than supervised training with surrogate gradients. Meanwhile, it has yielded the best performance on large-scale networks and data sets among the methodologies. For example, <xref ref-type="bibr" rid="B5">Ding et al. (2021)</xref> proposed a rate-norm layer to replace the ReLU activation function in source ANN training, enabling direct conversion from a trained ANN to an SNN. <xref ref-type="bibr" rid="B45">Zheng H. et al. (2020)</xref> also proposed a threshold-dependent batch normalization (tdBN) method based on the emerging spatiotemporal BP, enabling direct training of a very deep SNN and efficient implementation of its inference in neuromorphic hardware. These works have successfully realized pattern recognition functions on more complicated data set than the data set used in this research, and have achieved high performance on these tasks, such as classification on dynamic vision sensor- (DVS-) CIFAR10. However, none of these research have solved the few-shot learning problems, and learning robustness is also not focused and referred in these studies. In contrast, the proposed HESFOL model presented a robust few-shot learning framework with ITL approach, which is meaningful for combining the machine learning approach with brain-inspired SNN paradigms. On the other hand, future work will try to apply the ANN&#x2013;SNN conversion technique in few-shot learning algorithms based on ANN models, and it will be further combined with the ITL method that is used and plays a major part in the robust few-shot learning performance of the HESFOL model.</p>
<p>One of the critical issues is to present efficient training algorithms for SNN models to deal with complicated data set for more realistic applications. Shallow SNNs can be trained based on surrogate gradient descent, but they can only achieve high performance on simple data sets, such as MNIST. In fact, the discrepancy between a forward spike activation function and a backward surrogate gradient function during training limits the learning capability of deep SNNs. There are a series of studies in which SNN has shown to be trained from scratch using the surrogate gradient descent approach. For example, <xref ref-type="bibr" rid="B16">Kim and Panda (2020)</xref> proposed a technique called Batchnorm through time (BNTT) for training SNNs that dynamically changes the parameters and has an implicit effect as a dynamic threshold. They also proposed a spike activation lift training approach, which is essentially a threshold fine-tuning or initialization step before the actual training (<xref ref-type="bibr" rid="B14">Kim et al., 2021a</xref>,<xref ref-type="bibr" rid="B15">b</xref>). These two models can train SNN models with deep layers, and they are tested on complicated data sets, such as DVS, CIFAR100, and Tiny ImageNet. They demonstrate high performance on deep SNN models, which can be scaled for more realistic application. Therefore, in the next step, the proposed HESFOL model will be combined with the BNTT algorithm for deep network training. For example, the proposed ITL approach will be added to the current BNTT framework to explore the learning robustness or efficiency, and the HESFOL model can be used in the modeling of a single layer in a deep SNN architecture. Thanks to the spiking dendrites of the HESFOL model, it can naturally solve the credit assignment problem between feedforward and feedback pathways. It is meaningful for application in more complicated tasks and practical situations.</p>
<p>Another future work is to apply the proposed HESFOL model in tasks beyond recognition experiments. Previous research has presented a series of possibilities for SNNs to target complicated tasks other than visual recognition. For example, <xref ref-type="bibr" rid="B17">Kim and Panda (2021)</xref> presented a visual explanation technique to analyze and explain the internal spiking behavior of deep temporal SNNs to make SNNs ubiquitous. Kim et al. (2021) explored the applications of SNN beyond classification and presented semantic segmentation networks configured with spiking neurons. <xref ref-type="bibr" rid="B35">Venkatesha et al. (2021)</xref> designed a federated learning method to train decentralized and privacy-preserving SNNs. In addition, Kim et al. (2021) proposed PrivateSNN, which aims to build low-power SNNs from a pre-trained ANN model without leaking sensitive information contained in a data set. All these studies inspire the HESFOL model toward applications in other fields, such as federated learning and privacy preservation.</p>
</sec>
</sec>
<sec id="S6" sec-type="conclusion">
<title>Conclusion</title>
<p>In this work, we first introduced an entropy-based scheme for SNNs to realize robust few-shot learning performance. We developed a novel spike-based framework with the entropy theory, namely, the HESFOL model, to implement the gradient-based few-shot learning scheme in a recurrent SNN architecture. Several types of tasks are employed to test the few-shot learning performance, including the accuracy and robustness of learning. Experimental results based on spiking patterns, the Omniglot data set, and the motor control task reveal that the proposed HESFOL model can improve the learning accuracy and robustness of the spike-driven few-shot learning performance. The proposed framework offers a novel insight to improve the spike-based machine learning performance based on the entropy theory, which is meaningful for the fast development of brain-inspired intelligence and neuromorphic computing. It can be applied to the unmanned system, neuro-robotic control, as well as edge computing in the Internet-of-Things (IoT).</p>
</sec>
<sec id="S7" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="S8">
<title>Author Contributions</title>
<p>SY developed and tested algorithms, and wrote this manuscript with contributions from BL-B and BC. BL-B and BC conceptualized the problem and the technical framework. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="S9" sec-type="funding-information">
<title>Funding</title>
<p>This study was funded partly by the National Natural Science Foundation of China (Grant Nos. 62006170, 62088102, and U21A20485) and partly by China Postdoctoral Science Foundation (Grant Nos. 2020M680885 and 2021T140510).</p>
</sec>
<ack>
<p>All authors would like to thank the editor and reviewer for their comments on this manuscript.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Al Zoubi</surname> <given-names>O.</given-names></name> <name><surname>Awad</surname> <given-names>M.</given-names></name> <name><surname>Kasabov</surname> <given-names>N. K.</given-names></name></person-group> (<year>2018</year>). <article-title>Anytime multipurpose emotion recognition from EEG data using a Liquid State Machine based framework.</article-title> <source><italic>Artif. Intell. Med.</italic></source> <volume>86</volume> <fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/j.artmed.2018.01.001</pub-id> <pub-id pub-id-type="pmid">29366532</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Lu</surname> <given-names>N.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Cao</surname> <given-names>J.</given-names></name> <name><surname>Qin</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Mixture correntropy for robust learning.</article-title> <source><italic>Pattern Recognit.</italic></source> <volume>79</volume> <fpage>318</fpage>&#x2013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2018.02.010</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>B. D.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Principe</surname> <given-names>J. C.</given-names></name></person-group> (<year>2019a</year>). <article-title>Maximum correntropy criterion with variable center.</article-title> <source><italic>IEEE Signal Process. Lett.</italic></source> <volume>26</volume> <fpage>1212</fpage>&#x2013;<lpage>1216</lpage>. <pub-id pub-id-type="doi">10.1109/lsp.2019.2925692</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>B. D.</given-names></name> <name><surname>Xing</surname> <given-names>L.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Du</surname> <given-names>S.</given-names></name> <name><surname>Principe</surname> <given-names>J. C.</given-names></name></person-group> (<year>2019b</year>). <article-title>Effects of outliers on the maximum correntropy estimation: a robustness analysis.</article-title> <source><italic>IEEE Trans. Syst. Man Cybern. Syst.</italic></source> <volume>51</volume> <fpage>4007</fpage>&#x2013;<lpage>4012</lpage>. <pub-id pub-id-type="doi">10.1109/tsmc.2019.2931403</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname> <given-names>J.</given-names></name> <name><surname>Yu</surname> <given-names>Z.</given-names></name> <name><surname>Tian</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Optimal ann-snn conversion for fast and accurate inference in deep spiking neural networks.</article-title> <source><italic>arXiv [Preprint]</italic></source> <pub-id pub-id-type="doi">10.48550/arXiv.2105.11654</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>B.</given-names></name> <name><surname>Tang</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Tao</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Robust graph-based semisupervised learning for noisy labeled data <italic>via</italic> maximum correntropy criterion.</article-title> <source><italic>IEEE Trans. Cybern.</italic></source> <volume>49</volume> <fpage>1440</fpage>&#x2013;<lpage>1453</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2018.2804326</pub-id> <pub-id pub-id-type="pmid">29994595</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Esser</surname> <given-names>S. K.</given-names></name> <name><surname>Merolla</surname> <given-names>P. A.</given-names></name> <name><surname>Arthur</surname> <given-names>J. V.</given-names></name> <name><surname>Cassidy</surname> <given-names>A. S.</given-names></name> <name><surname>Appuswamy</surname> <given-names>R.</given-names></name> <name><surname>Andreopoulos</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Convolutional networks for fast, energy-efficient neuromorphic computing.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>113</volume> <fpage>11441</fpage>&#x2013;<lpage>11446</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1604850113</pub-id> <pub-id pub-id-type="pmid">27651489</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Falez</surname> <given-names>P.</given-names></name> <name><surname>Tirilly</surname> <given-names>P.</given-names></name> <name><surname>Bilasco</surname> <given-names>I. M.</given-names></name> <name><surname>Devienne</surname> <given-names>P.</given-names></name> <name><surname>Boulet</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Multi-layered spiking neural network with target timestamp threshold adaptation and stdp</article-title>,&#x201D; in <source><italic>Proceedings of the 2019 IEEE International Joint Conference on Neural Networks (IJCNN)</italic></source>, <publisher-loc>Washington, DC</publisher-loc>, <fpage>1</fpage>&#x2013;<lpage>8</lpage>.</citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fischer</surname> <given-names>B.</given-names></name> <name><surname>Buhmann</surname> <given-names>J. M.</given-names></name></person-group> (<year>2003</year>). <article-title>Bagging for path-based clustering.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intell.</italic></source> <volume>25</volume> <fpage>1411</fpage>&#x2013;<lpage>1415</lpage>. <pub-id pub-id-type="doi">10.1109/tpami.2003.1240115</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gidaris</surname> <given-names>S.</given-names></name> <name><surname>Bursuc</surname> <given-names>A.</given-names></name> <name><surname>Komodakis</surname> <given-names>N.</given-names></name> <name><surname>P&#x00E9;rez</surname> <given-names>P.</given-names></name> <name><surname>Cord</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Boosting few-shot visual learning with self-supervision</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE/CVF International Conference on Computer Vision</italic></source>, <publisher-loc>Washington, DC</publisher-loc>, <fpage>8059</fpage>&#x2013;<lpage>8068</lpage>.</citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goelet</surname> <given-names>P.</given-names></name> <name><surname>Castellucci</surname> <given-names>V. F.</given-names></name> <name><surname>Schacher</surname> <given-names>S.</given-names></name> <name><surname>Kandel</surname> <given-names>E. R.</given-names></name></person-group> (<year>1986</year>). <article-title>The long and the short of long&#x2013;term memory&#x2014;a molecular framework.</article-title> <source><italic>Nature</italic></source> <volume>322</volume> <fpage>419</fpage>&#x2013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.1038/322419a0</pub-id> <pub-id pub-id-type="pmid">2874497</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heravi</surname> <given-names>A. R.</given-names></name> <name><surname>Hodtani</surname> <given-names>G. A.</given-names></name></person-group> (<year>2018</year>). <article-title>A new correntropy-based conjugate gradient backpropagation algorithm for improving training in neural networks.</article-title> <source><italic>IEEE Trans. Neural Netw. Learn. Syst.</italic></source> <volume>29</volume> <fpage>6252</fpage>&#x2013;<lpage>6263</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2018.2827778</pub-id> <pub-id pub-id-type="pmid">29993752</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>R.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Yan</surname> <given-names>R.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Few-shot learning in spiking neural networks by multi-timescale optimization.</article-title> <source><italic>Neural Comp.</italic></source> <volume>33</volume> <fpage>2439</fpage>&#x2013;<lpage>2472</lpage>. <pub-id pub-id-type="doi">10.1162/neco_a_01423</pub-id> <pub-id pub-id-type="pmid">34280263</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Chough</surname> <given-names>J.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2021a</year>). <article-title>Beyond classification: directly training spiking neural networks for semantic segmentation.</article-title> <source><italic>arXiv [Preprint]</italic></source> <pub-id pub-id-type="doi">10.48550/arXiv.2110.07742</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Venkatesha</surname> <given-names>Y.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2021b</year>). <article-title>Privatesnn: fully privacy-preserving spiking neural networks.</article-title> <source><italic>arXiv [Preprint]</italic></source></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Revisiting batch normalization for training low-latency deep spiking neural networks from scratch.</article-title> <source><italic>Front. Neurosci.</italic></source> <volume>15</volume>:<issue>773954</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2021.773954</pub-id> <pub-id pub-id-type="pmid">34955725</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>Visual explanations from spiking neural networks using inter-spike intervals.</article-title> <source><italic>Sci. Rep.</italic></source> <volume>11</volume>:<issue>19037</issue>. <pub-id pub-id-type="doi">10.1038/s41598-021-98448-0</pub-id> <pub-id pub-id-type="pmid">34561513</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koch</surname> <given-names>G.</given-names></name> <name><surname>Zemel</surname> <given-names>R.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Siamese neural networks for one-shot image recognition</article-title>,&#x201D; in <source><italic>Proceedings of the International Conference on Machine Learning</italic></source>, <volume>Vol. 2</volume> <publisher-loc>Atlanta, GA</publisher-loc>.</citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>Z.</given-names></name> <name><surname>Jiang</surname> <given-names>X.</given-names></name> <name><surname>Kot</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep coupled resnet for low-resolution face recognition.</article-title> <source><italic>IEEE Signal Process. Lett.</italic></source> <volume>25</volume> <fpage>526</fpage>&#x2013;<lpage>530</lpage>. <pub-id pub-id-type="doi">10.1109/lsp.2018.2810121</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>S.</given-names></name> <name><surname>Guan</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Xue</surname> <given-names>F.</given-names></name> <name><surname>Zhou</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Improving liquid state machine in temporal pattern classification</article-title>,&#x201D; in <source><italic>Proceedings of the 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)</italic></source>, <publisher-loc>Singapore</publisher-loc>, <fpage>88</fpage>&#x2013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.3389/fnins.2018.00524</pub-id> <pub-id pub-id-type="pmid">30190670</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Zhao</surname> <given-names>W.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Short-term wind speed forecasting <italic>via</italic> stacked extreme learning machine with generalized correntropy.</article-title> <source><italic>IEEE Trans. Ind. Inform.</italic></source> <volume>14</volume> <fpage>4963</fpage>&#x2013;<lpage>4971</lpage>. <pub-id pub-id-type="doi">10.1109/tii.2018.2854549</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Merolla</surname> <given-names>P. A.</given-names></name> <name><surname>Arthur</surname> <given-names>J. V.</given-names></name> <name><surname>Alvarez-Icaza</surname> <given-names>R.</given-names></name> <name><surname>Cassidy</surname> <given-names>A. S.</given-names></name> <name><surname>Sawada</surname> <given-names>J.</given-names></name> <name><surname>Akopyan</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>A million spiking-neuron integrated circuit with a scalable communication network and interface.</article-title> <source><italic>Science</italic></source> <volume>345</volume> <fpage>668</fpage>&#x2013;<lpage>673</lpage>. <pub-id pub-id-type="doi">10.1126/science.1254642</pub-id> <pub-id pub-id-type="pmid">25104385</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Panda</surname> <given-names>P.</given-names></name> <name><surname>Roy</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Learning to generate sequences with combination of Hebbian and non-Hebbian plasticity in recurrent spiking neural networks.</article-title> <source><italic>Front. Neurosci.</italic></source> <volume>11</volume>:<issue>693</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2017.00693</pub-id> <pub-id pub-id-type="pmid">29311774</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paredes-Vall&#x00E9;s</surname> <given-names>F.</given-names></name> <name><surname>Scheper</surname> <given-names>K. Y. W.</given-names></name> <name><surname>de Croon</surname> <given-names>G. C. H. E.</given-names></name></person-group> (<year>2019</year>). <article-title>Unsupervised learning of a hierarchical spiking neural network for optical flow estimation: from events to global motion perception.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intell</italic>.</source> <volume>42</volume> <fpage>2051</fpage>&#x2013;<lpage>2064</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2019.2903179</pub-id> <pub-id pub-id-type="pmid">30843817</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Song</surname> <given-names>S.</given-names></name> <name><surname>Zhao</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Towards artificial general intelligence with hybrid Tianjic chip architecture.</article-title> <source><italic>Nature</italic></source> <volume>572</volume> <fpage>106</fpage>&#x2013;<lpage>111</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1424-8</pub-id> <pub-id pub-id-type="pmid">31367028</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qiao</surname> <given-names>N.</given-names></name> <name><surname>Mostafa</surname> <given-names>H.</given-names></name> <name><surname>Corradi</surname> <given-names>F.</given-names></name> <name><surname>Osswald</surname> <given-names>M.</given-names></name> <name><surname>Stefanini</surname> <given-names>F.</given-names></name> <name><surname>Sumislawska</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses.</article-title> <source><italic>Front. Neurosci.</italic></source> <volume>9</volume>:<issue>141</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2015.00141</pub-id> <pub-id pub-id-type="pmid">25972778</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rodrigues</surname> <given-names>C. F.</given-names></name> <name><surname>Riley</surname> <given-names>G.</given-names></name> <name><surname>Luj&#x00E1;n</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>SyNERGY: an energy measurement and prediction framework for convolutional neural networks on Jetson TX1</article-title>,&#x201D; in <source><italic>Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)</italic></source>, <publisher-loc>Washington, DC</publisher-loc>, <fpage>375</fpage>&#x2013;<lpage>382</lpage>.</citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roy</surname> <given-names>K.</given-names></name> <name><surname>Jaiswal</surname> <given-names>A.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Towards spike-based machine intelligence with neuromorphic computing.</article-title> <source><italic>Nature</italic></source> <volume>575</volume> <fpage>607</fpage>&#x2013;<lpage>617</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1677-2</pub-id> <pub-id pub-id-type="pmid">31776490</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Santoro</surname> <given-names>A.</given-names></name> <name><surname>Bartunov</surname> <given-names>S.</given-names></name> <name><surname>Botvinick</surname> <given-names>M.</given-names></name> <name><surname>Wierstra</surname> <given-names>D.</given-names></name> <name><surname>Lillicrap</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Meta-learning with memory-augmented neural networks</article-title>,&#x201D; in <source><italic>Proceedings of the 33rd International Conference on Machine Learning</italic></source>, <volume>Vol. 48</volume> <publisher-loc>New York NY</publisher-loc>, <fpage>1842</fpage>&#x2013;<lpage>1850</lpage>.</citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>S.</given-names></name> <name><surname>Okun</surname> <given-names>A.</given-names></name> <name><surname>Jackson</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Learning to play go from scratch.</article-title> <source><italic>Nature</italic></source> <volume>550</volume> <fpage>336</fpage>&#x2013;<lpage>337</lpage>. <pub-id pub-id-type="doi">10.1038/550336a</pub-id> <pub-id pub-id-type="pmid">29052631</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soures</surname> <given-names>N.</given-names></name> <name><surname>Kudithipudi</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Deep liquid state machines with neural plasticity for video activity recognition.</article-title> <source><italic>Front. Neurosci.</italic></source> <volume>13</volume>:<issue>686</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2019.00686</pub-id> <pub-id pub-id-type="pmid">31333404</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strack</surname> <given-names>R.</given-names></name></person-group> (<year>2019</year>). <article-title>Deep learning in imaging.</article-title> <source><italic>Nat. Methods</italic></source> <volume>16</volume>:<issue>17</issue>.</citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Q.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Chua</surname> <given-names>T. S.</given-names></name> <name><surname>Schiele</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Meta-transfer learning for few-shot learning</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</italic></source>, (<publisher-loc>Piscataway, NJ</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>403</fpage>&#x2013;<lpage>412</lpage>.</citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tolkach</surname> <given-names>Y.</given-names></name> <name><surname>Dohmg&#x00F6;rgen</surname> <given-names>T.</given-names></name> <name><surname>Toma</surname> <given-names>M.</given-names></name> <name><surname>Kristiansen</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>High-accuracy prostate cancer pathology using deep learning.</article-title> <source><italic>Nat. Mach. Intell</italic>.</source> <volume>2</volume> <fpage>411</fpage>&#x2013;<lpage>418</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-020-0200-7</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Venkatesha</surname> <given-names>Y.</given-names></name> <name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Tassiulas</surname> <given-names>L.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>Federated learning with spiking neural networks.</article-title> <source><italic>IEEE Trans. Signal Process.</italic></source> <volume>69</volume> <fpage>6183</fpage>&#x2013;<lpage>6194</lpage>. <pub-id pub-id-type="doi">10.1109/tsp.2021.3121632</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Hafidh</surname> <given-names>B.</given-names></name> <name><surname>Dong</surname> <given-names>H.</given-names></name> <name><surname>El Saddik</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Sitting posture recognition using a spiking neural network.</article-title> <source><italic>IEEE Sens. J.</italic></source> <volume>21</volume> <fpage>1779</fpage>&#x2013;<lpage>1786</lpage>. <pub-id pub-id-type="doi">10.1109/jsen.2020.3016611</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>T.</given-names></name> <name><surname>Cao</surname> <given-names>J.</given-names></name> <name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Lei</surname> <given-names>B.</given-names></name> <name><surname>Zeng</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Robust maximum mixture correntropy criterion based one-class classification algorithm.</article-title> <source><italic>IEEE Intell. Syst.</italic></source> <volume>2021</volume>:<issue>1</issue>. <pub-id pub-id-type="doi">10.1109/mis.2021.3122958</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Yao</surname> <given-names>Q.</given-names></name> <name><surname>Kwok</surname> <given-names>J. T.</given-names></name> <name><surname>Ni</surname> <given-names>L. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Generalizing from a few examples: a survey on few-shot learning.</article-title> <source><italic>ACM Comput. Surv.</italic></source> <volume>53</volume> <fpage>1</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1145/3386252</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wijesinghe</surname> <given-names>P.</given-names></name> <name><surname>Srinivasan</surname> <given-names>G.</given-names></name> <name><surname>Panda</surname> <given-names>P.</given-names></name> <name><surname>Roy</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Analysis of liquid ensembles for enhancing the performance and accuracy of liquid state machines.</article-title> <source><italic>Front. Neurosci.</italic></source> <volume>13</volume>:<issue>504</issue>. <pub-id pub-id-type="doi">10.3389/fnins.2019.00504</pub-id> <pub-id pub-id-type="pmid">31191219</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xing</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>B.</given-names></name> <name><surname>Du</surname> <given-names>S.</given-names></name> <name><surname>Gu</surname> <given-names>Y.</given-names></name> <name><surname>Zheng</surname> <given-names>N.</given-names></name></person-group> (<year>2019</year>). <article-title>Correntropy-based multiview subspace clustering.</article-title> <source><italic>IEEE Trans. Cybern.</italic></source> <volume>51</volume> <fpage>3298</fpage>&#x2013;<lpage>3311</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2019.2952398</pub-id> <pub-id pub-id-type="pmid">31794416</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Gao</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Deng</surname> <given-names>B.</given-names></name> <name><surname>Lansdell</surname> <given-names>B.</given-names></name> <name><surname>Linares-Barranco</surname> <given-names>B.</given-names></name></person-group> (<year>2021a</year>). <article-title>Efficient spike-driven learning with dendritic event-based processing.</article-title> <source><italic>Front. Neurosci.</italic></source> <volume>15</volume>:<issue>601109</issue>. <pub-id pub-id-type="doi">10.3389/FNINS.2021.601109</pub-id> <pub-id pub-id-type="pmid">33679295</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Deng</surname> <given-names>B.</given-names></name> <name><surname>Azghadim</surname> <given-names>M. R.</given-names></name> <name><surname>Linares-Barranco</surname> <given-names>B.</given-names></name></person-group> (<year>2021b</year>). <article-title>Neuromorphic context-dependent learning framework with fault-tolerant spike routing.</article-title> <source><italic>IEEE Trans. Neural Netw. Learn. Syst.</italic></source> <volume>2021</volume> <fpage>1</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2021.3084250</pub-id> <pub-id pub-id-type="pmid">34115596</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zadeh</surname> <given-names>S. G.</given-names></name> <name><surname>Schmid</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Bias in cross-entropy-based training of deep survival networks.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intell.</italic></source> <volume>43</volume> <fpage>3126</fpage>&#x2013;<lpage>3137</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2020.2979450</pub-id> <pub-id pub-id-type="pmid">32149626</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Dai</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>T.</given-names></name> <name><surname>Harandi</surname> <given-names>M.</given-names></name> <name><surname>Barnes</surname> <given-names>N.</given-names></name> <name><surname>Hartley</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>Learning saliency from single noisy labelling: a robust model fitting perspective.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intell.</italic></source> <volume>43</volume> <fpage>2866</fpage>&#x2013;<lpage>2873</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2020.3046486</pub-id> <pub-id pub-id-type="pmid">33351750</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>H.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>Going deeper with directly-trained larger spiking neural networks.</article-title> <source><italic>arXiv [Preprint]</italic></source> <pub-id pub-id-type="doi">10.48550/arXiv.2011.05280</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Qin</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Mixture correntropy-based kernel extreme learning machines.</article-title> <source><italic>IEEE Trans. Neural Netw. Learn. Syst.</italic></source> <volume>33</volume> <fpage>811</fpage>&#x2013;<lpage>825</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2020.3029198</pub-id> <pub-id pub-id-type="pmid">33079685</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zou</surname> <given-names>J.</given-names></name> <name><surname>Huss</surname> <given-names>M.</given-names></name> <name><surname>Abid</surname> <given-names>A.</given-names></name> <name><surname>Mohammadi</surname> <given-names>P.</given-names></name> <name><surname>Torkamani</surname> <given-names>A.</given-names></name> <name><surname>Telenti</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>A primer on deep learning in genomics.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>51</volume> <fpage>12</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1038/s41588-018-0295-5</pub-id> <pub-id pub-id-type="pmid">30478442</pub-id></citation></ref>
</ref-list>
</back>
</article>