<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title>Frontiers in Computational Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5188</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fncom.2023.1207361</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Local minimization of prediction errors drives learning of invariant object representations in a generative network model of visual perception</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes"><name><surname>Brucklacher</surname> <given-names>Matthias</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2270162/overview"/>
</contrib>
<contrib contrib-type="author"><name><surname>Boht&#x00E9;</surname> <given-names>Sander M.</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/21438/overview"/>
</contrib>
<contrib contrib-type="author"><name><surname>Mejias</surname> <given-names>Jorge F.</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/98694/overview"/>
</contrib>
<contrib contrib-type="author"><name><surname>Pennartz</surname> <given-names>Cyriel M. A.</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2725/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, University of Amsterdam</institution>, <addr-line>Amsterdam</addr-line>, <country>Netherlands</country></aff>
<aff id="aff2"><sup>2</sup><institution>Machine Learning Group, Centrum Wiskunde &#x0026; Informatica</institution>, <addr-line>Amsterdam</addr-line>, <country>Netherlands</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0002">
<p>Edited by: Arpan Banerjee, National Brain Research Centre (NBRC), India</p>
</fn>
<fn fn-type="edited-by" id="fn0003">
<p>Reviewed by: John Magnotti, University of Pennsylvania, United States; Vignesh Muralidharan, Indian Institute of Technology Jodhpur, India</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Matthias Brucklacher, <email>m.m.brucklacher@uva.nl</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>09</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>17</volume>
<elocation-id>1207361</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>04</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>08</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Brucklacher, Boht&#x00E9;, Mejias and Pennartz.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Brucklacher, Boht&#x00E9;, Mejias and Pennartz</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The ventral visual processing hierarchy of the cortex needs to fulfill at least two key functions: perceived objects must be mapped to high-level representations invariantly of the precise viewing conditions, and a generative model must be learned that allows, for instance, to fill in occluded information guided by visual experience. Here, we show how a multilayered predictive coding network can learn to recognize objects from the bottom up and to generate specific representations via a top-down pathway through a single learning rule: the local minimization of prediction errors. Trained on sequences of continuously transformed objects, neurons in the highest network area become tuned to object identity invariant of precise position, comparable to inferotemporal neurons in macaques. Drawing on this, the dynamic properties of invariant object representations reproduce experimentally observed hierarchies of timescales from low to high levels of the ventral processing stream. The predicted faster decorrelation of error-neuron activity compared to representation neurons is of relevance for the experimental search for neural correlates of prediction errors. Lastly, the generative capacity of the network is confirmed by reconstructing specific object images, robust to partial occlusion of the inputs. By learning invariance from temporal continuity within a generative model, the approach generalizes the predictive coding framework to dynamic inputs in a more biologically plausible way than self-supervised networks with non-local error-backpropagation. This was achieved simply by shifting the training paradigm to dynamic inputs, with little change in architecture and learning rule from static input-reconstructing Hebbian predictive coding networks.</p>
</abstract>
<kwd-group>
<kwd>self-supervised learning</kwd>
<kwd>predictive coding</kwd>
<kwd>generative model</kwd>
<kwd>vision</kwd>
<kwd>hierarchy</kwd>
<kwd>representation learning</kwd>
<kwd>Hebbian learning</kwd>
<kwd>video</kwd>
</kwd-group>
<counts>
<fig-count count="8"/>
<table-count count="1"/>
<equation-count count="6"/>
<ref-count count="83"/>
<page-count count="15"/>
<word-count count="11083"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1"><label>1.</label>
<title>Introduction</title>
<p>How networks of neurons in the brain infer the identity of objects from limited sensory information is one of the preeminent questions of neurobiology. Strengthening theories of generative perception (<xref ref-type="bibr" rid="ref20">Gregory, 1980</xref>; <xref ref-type="bibr" rid="ref51">Mumford, 1992</xref>; <xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>; <xref ref-type="bibr" rid="ref17">Friston, 2010</xref>; <xref ref-type="bibr" rid="ref58">Pennartz et al., 2019</xref>), evidence has accumulated to suggest that the mammalian perceptual system is relying on various forms of prediction to facilitate this process. Across time, repetition suppression that requires explicit expectations (<xref ref-type="bibr" rid="ref72">Summerfield et al., 2008</xref>; <xref ref-type="bibr" rid="ref75">Todorovic et al., 2011</xref>), encoding of deviation from temporal expectations in macaque&#x2019;s inferotemporal and prefrontal cortex (<xref ref-type="bibr" rid="ref67">Schwiedrzik and Freiwald, 2017</xref>; <xref ref-type="bibr" rid="ref6">Bellet et al., 2021</xref>) and encoding of expected movement outcomes in mouse V1 (<xref ref-type="bibr" rid="ref41">Leinweber et al., 2017</xref>) show that the brain constantly tries to predict future inputs. V1 activity evoked by illusory contours (<xref ref-type="bibr" rid="ref4">Bartels, 2014</xref>; <xref ref-type="bibr" rid="ref34">Kok and de Lange, 2014</xref>), encoding of information from occluded scene areas in early visual areas of humans (<xref ref-type="bibr" rid="ref69">Smith and Muckli, 2010</xref>) and modulation of neural responses by expectations based on the surrounding context (<xref ref-type="bibr" rid="ref33">Knierim and van Essen, 1992</xref>) show that predictions are not only made forward in time, but also across space (in the present). According to predictive coding theory, these predictions are mediated by corticocortical top-down connections (<xref ref-type="bibr" rid="ref58">Pennartz et al., 2019</xref>) and then corrected based on the received bottom-up input (<xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>) in line with hierarchical Bayesian perception (<xref ref-type="bibr" rid="ref39">Lee and Mumford, 2003</xref>). Predictive coding models have successfully explained properties of the visual system such as end-stopping in V1 neurons and learning of wavelet-like receptive fields (<xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>) and V1 activity in illusory contours (<xref ref-type="bibr" rid="ref45">Lotter et al., 2020</xref>; <xref ref-type="bibr" rid="ref56">Pang et al., 2021</xref>). However, these studies are focused on low-level effects, while the learned higher-level representations have been investigated much less (although see <xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref> for learning of sparse representations).</p>
<p>Continuously generated by the awake brain, neural representations of the external world form a partial solution to the problem of inference, arguably constituting the basis of conscious experience (<xref ref-type="bibr" rid="ref57">Pennartz, 2015</xref>), decision-making and adaptive planning (<xref ref-type="bibr" rid="ref9">Butz and Kutter, 2016</xref>). They can be loosely defined as activity patterns in response to a sensory stimulation elicited by an object. Especially important is the ability to represent multiple views of the same object in similar patterns of activity. These invariant representations have two key advantages: first, information acquired about an object (such as a novel action associated with it) can be linked to only one representation, making learning more efficient. Secondly, as illustrated in <xref rid="fig1" ref-type="fig">Figure 1</xref>, the newly acquired invariant information about single objects generalizes automatically across all viewing conditions, facilitating learning from few examples. Evidence for invariant neural representations comes from the ventral temporal lobe (<xref ref-type="bibr" rid="ref24">Haxby et al., 2001</xref>), the hippocampus in humans (<xref ref-type="bibr" rid="ref60">Quiroga et al., 2005</xref>), inferotemporal cortex of rhesus (<xref ref-type="bibr" rid="ref12">Desimone et al., 1984</xref>; <xref ref-type="bibr" rid="ref43">Logothetis et al., 1995</xref>) and macaque monkeys (<xref ref-type="bibr" rid="ref16">Freiwald and Tsao, 2010</xref>) as well as rats&#x2019; laterolateral extrastriate area (LL) (<xref ref-type="bibr" rid="ref73">Tafazoli et al., 2012</xref>, <xref ref-type="bibr" rid="ref74">2017</xref>). Current theories of how neurons come to acquire such a specialized tuning either fail to account for fundamental aspects of brain circuitry and physiology or rely on artificial learning paradigms. To construct useful representations, biological systems are limited to mostly unsupervised learning (from unlabeled data) and local learning rules, whereas machine vision algorithms based on neural networks typically rely on large amounts of labeled training data and use mechanisms like weight-sharing (<xref ref-type="bibr" rid="ref37">LeCun et al., 1989</xref>). These mechanisms facilitate generalization across viewing conditions but lack a biological foundation.</p>
<fig position="float" id="fig1"><label>Figure 1</label>
<caption>
<p>View-invariant representations for efficient cognition. <bold>(A)</bold> Barely escaping an attack, the monkey learns to associate an action (&#x201C;flee,&#x201D; encoded by the neural pattern in primary motor area M1) with the activity pattern in its retinal ganglion cells (RGCs, bottom) triggered by the image of an approaching eagle. Active cells are shown in white. <bold>(B-1)</bold> When the monkey later encounters a similar eagle from a different angle, an invariant higher-level representation (center) can still trigger the same action. <bold>(B-2)</bold> Without invariant coding, the action does not generalize to this viewing condition. Red box: scope of this paper: how do multiple low-level activity patterns become linked to one high-level, invariant representation?</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g001.tif"/>
</fig>
<p>A biologically plausible approach to learn view-invariance from transformation sequences is so-called trace learning (<xref ref-type="bibr" rid="ref15">F&#x00F6;ldi&#x00E1;k, 1991</xref>; <xref ref-type="bibr" rid="ref14">Elliffe et al., 2000</xref>; <xref ref-type="bibr" rid="ref63">Rolls, 2012</xref>) which is linked to Slow Feature Analysis (SFA) (<xref ref-type="bibr" rid="ref71">Sprekeler et al., 2007</xref>). It is based on the idea that temporal proximity between sensory patterns should be reflected in representational similarity, as the assumption can be made about the world that the causes (objects etc.) vary more slowly than the stimulation patterns they evoke on the retina. Indeed there is evidence for the importance of temporal stimulus continuity for learning of transformation-tolerance in early visual areas of rats (<xref ref-type="bibr" rid="ref48">Matteucci and Zoccolan, 2020</xref>) and area IT of monkeys (<xref ref-type="bibr" rid="ref42">Li and DiCarlo, 2008</xref>). Based on this principle of representing consecutive inputs similarly <xref ref-type="bibr" rid="ref22">Halvagal and Zenke (2022)</xref> recently showed that a more intricate learning rule with additional variance maximization leads to disentangled high-level representations. Other self-supervised models avoid representational collapse through contrasting examples (<xref ref-type="bibr" rid="ref28">Illing et al., 2021</xref>).</p>
<p>However, all of these models process information in a strictly feedforward manner or limit the role of feedback connections to a modulatory function, in contrast to evidence on retinotopic, content-carrying feedback connections in the visual cortex (<xref ref-type="bibr" rid="ref83">Zmarz and Keller, 2016</xref>; <xref ref-type="bibr" rid="ref47">Marques et al., 2018</xref>; <xref ref-type="bibr" rid="ref54">Pak et al., 2020</xref>). Here, we propose a common underlying learning mechanism for both high-level representations and a generative model capable of reconstructing specific sensory inputs: the minimization of local prediction errors through inference and learning.</p>
<p>Like the abovementioned feedforward models of invariance learning, predictive coding offers a mechanism for maintenance of higher-level representations: they are only updated when lower levels send up error signals. It can be implemented in a hierarchical neural network model of the visual processing stream using local, Hebbian learning. Furthermore, it is intimately related to the abovementioned slowness principle, which states that the most meaningful features often change on a slow timescale (<xref ref-type="bibr" rid="ref81">Wiskott and Sejnowski, 2002</xref>), because extracted causes tend to be good predictors for future input (<xref ref-type="bibr" rid="ref11">Creutzig and Sprekeler, 2008</xref>). To sum up, predictive coding is a promising candidate to explain learning of invariant object representations within the framework of generative modeling.</p>
<p>To acquire transformation-tolerance from temporal continuity, input sequences are required. Most predictive coding models so far, however, either operate on static inputs (<xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>; <xref ref-type="bibr" rid="ref70">Spratling, 2017</xref>; <xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref>) or use non-local learning rules (<xref ref-type="bibr" rid="ref30">Jiang and Rao, 2022</xref>) such as backpropagation (<xref ref-type="bibr" rid="ref64">Rumelhart et al., 1985</xref>; <xref ref-type="bibr" rid="ref68">Singer et al., 2019</xref>) and biologically implausible LSTM units (<xref ref-type="bibr" rid="ref44">Lotter et al., 2016</xref>, <xref ref-type="bibr" rid="ref45">2020</xref>). Here, we train multilayered predictive coding networks with only small architectural modifications from <xref ref-type="bibr" rid="ref61">Rao and Ballard (1999)</xref> and <xref ref-type="bibr" rid="ref13">Dora et al. (2021)</xref> on transformation sequences with purely Hebbian learning. We confirm learning of a generative model, showing that top-down predictions made by the network approximate the original input. Importantly, these predictions are not forward in time, but across retinotopic space, representing the current input. Presented with partially occluded input sequences, the network pattern-completes the occluded areas through top-down feedback, mimicking functions of human V1 and V2. While reconstructions from lower areas are more faithful, predictive neurons in the network&#x2019;s higher areas develop view-invariant representations akin to responses of neurons in the inferotemporal area of primate cortex: input stimuli shown in temporal proximity are represented similarly. A decoding analysis confirms that distinct objects are well separable. Lastly, the temporal dynamics of the neural subpopulations are analyzed and compared to recent electrophysiological data from rats. As in the experiment, temporal stability of representation neurons (measured by the decay of autocorrelation) increases as one moves up the hierarchy. In addition, the model makes the prediction that high-level error-coding neurons operate on a faster timescale than their representational counterparts.</p>
</sec>
<sec sec-type="methods" id="sec2"><label>2.</label>
<title>Methods</title>
<p>We developed a neural network consisting of four hierarchically arranged areas. Applying the principles of predictive computation, we restricted ourselves to the minimally necessary components, but other connectivity patterns are conceivable [suggested, e.g., by <xref ref-type="bibr" rid="ref25">Heeger (2017)</xref>]. As in previous implementations of predictive coding (<xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>; <xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref>), each area contains two subpopulations of neurons that are illustrated in <xref rid="fig2" ref-type="fig">Figure 2</xref>:<list list-type="order">
<list-item>
<p>Representation neurons collectively hold the &#x201C;inferred causes,&#x201D; in higher areas corresponding to perceptual content. Together with the synaptic connections towards lower areas, they generate top-down predictions to match the current representations in the area below.</p>
</list-item>
<list-item>
<p>Error neurons measure the mismatch between representation neuron activity (in the lowest area: the sensory input) and top-down predictions.</p>
</list-item>
</list></p>
<fig position="float" id="fig2"><label>Figure 2</label>
<caption>
<p>Model architecture and inference on video sequences. <bold>(A)</bold> Representational activity <inline-formula>
<mml:math id="M1">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> in area l (l &#x2208; {1,2}) is influenced by both top-down predictions highlighted in red via the respective top-down errors <inline-formula>
<mml:math id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and by the bottom-up errors<inline-formula>
<mml:math id="M3">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mstyle mathvariant="bold">
<mml:mn>1</mml:mn>
</mml:mstyle>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Intra-area connections between representation neurons (circles) and respective error neurons (squares) are one-to-one, while inter-area connections are all-to-all. Synaptic connections between neurons are drawn as filled circles if inhibitory and as triangles if excitatory. <bold>(B)</bold> A sequence of input images is fed into the lowest area of the network across subsequent moments in time<inline-formula>
<mml:math id="M4">
<mml:mrow>
<mml:mspace width="0.25em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mstyle mathvariant="bold">
<mml:mn>2</mml:mn>
</mml:mstyle>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:mo>&#x2026;</mml:mo>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The network maintains representational activity through time and thus uses it as a prior for the inference of subsequent representations.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g002.tif"/>
</fig>
<p>Some models such as (<xref ref-type="bibr" rid="ref65">Sacramento et al., 2018</xref>) suggest computation of errors in dendrites, but based on the evidence for neural encoding of errors (<xref ref-type="bibr" rid="ref83">Zmarz and Keller, 2016</xref>; <xref ref-type="bibr" rid="ref19">Green et al., 2023</xref>), we assign dedicated neurons to encode them. Development of such error-tuned neurons has been modeled by <xref ref-type="bibr" rid="ref26">Hert&#x00E4;g and Sprekeler (2020)</xref> in cortical microcircuits and by <xref ref-type="bibr" rid="ref1">Ali et al. (2021)</xref> as a result of energy efficiency. While the number of neurons in the input area depends on the dimensions of the dataset and varied between 784 and 1,156, the consecutive areas consisted of [2000, 5,000, 30] neurons (for Area 1, 2, and 3, respectively), except where noted differently. This is supported by an analysis of how altering the number of neurons affects decoding performance in <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.13</xref>.</p>
<sec id="sec3"><label>2.1.</label>
<title>Inference: updating neural activity</title>
<p>At the start of a sequence, all neural activity is set to a uniform, low value (unless stated differently in the Results section). While an image is presented to the network, the lowest area representation neurons linearly reflect the pixel-wise intensity of the input (at the bottom of <xref rid="fig2" ref-type="fig">Figure 2B</xref>). Error neurons in area <inline-formula>
<mml:math id="M5">
<mml:mi>l</mml:mi>
</mml:math>
</inline-formula> receive excitatory input from the activity <inline-formula>
<mml:math id="M6">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> of associated representation neurons as shown in the one-to-one connections in <xref rid="fig2" ref-type="fig">Figure 2A</xref>, and are inhibited by the summed-up predictions <inline-formula>
<mml:math id="M7">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#x2322;</mml:mo>
</mml:mover>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>from the higher area:</p>
<disp-formula id="E1">
<mml:math id="M8">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#x2322;</mml:mo>
</mml:mover>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>W</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math></disp-formula>
<p>where bold letters indicate vectors and matrices and <inline-formula>
<mml:math id="M10">
<mml:mrow>
<mml:msubsup>
<mml:mi>W</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> denotes the symmetric weight matrix between area <italic>l</italic> and area <italic>l&#x2009;+&#x2009;1</italic> from the previous time step (the weights will change during learning). Strictly symmetric weight matrices as frequently used in predictive coding models (<xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>; <xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref>) lead to a weight transport problem during learning. However, it has been shown that, in combination with weight decay, symmetric weights can be obtained by learning rule comparable to ours without explicitly enforcing symmetry (<xref ref-type="bibr" rid="ref2">Alonso and Neftci, 2021</xref>), since the locally available pre- and postsynaptic activity that determine the weight change are identical (symmetric) for each pair of feedforward and feedback connections. Each representation neuron receives inhibitory input from one error neuron in the same area and excitatory input from the weighted bottom-up errors and thus changes its activation state at each time step (see &#x201C;inference&#x201D; in Alg S1):</p>
<disp-formula id="E2">
<mml:math id="M11">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mi>inf</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>This adjustment of neural activation state <inline-formula>
<mml:math id="M12">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (akin to membrane potential) of representation neurons can be interpreted as matching top-down predictions better than before (and thus reducing activity of the associated error neuron) and sending down predictions that better match representation neuron activity in the area below (thus reducing errors there). The rate at which neuronal activation is changed is governed by the parameter <inline-formula>
<mml:math id="M13">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03F5;</mml:mi>
<mml:mrow>
<mml:mi>inf</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> referred to in the following as the <italic>inference rate</italic>. The activation state <inline-formula>
<mml:math id="M14">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is now translated into an output firing rate <inline-formula>
<mml:math id="M15">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>:</p>
<disp-formula id="E3">
<mml:math id="M16">
<mml:mrow>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mi>&#x03D5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x0394;</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math></disp-formula>
<p>where <inline-formula>
<mml:math id="M18">
<mml:mi>&#x03D5;</mml:mi>
</mml:math>
</inline-formula> denotes the sigmoid activation function, and <inline-formula>
<mml:math id="M19">
<mml:mrow>
<mml:msub>
<mml:mi>&#x0394;</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> a constant lateral offset of the firing threshold. The saturation of the sigmoid for large inputs corresponds to a maximal firing rate of the representation neurons, in contrast to the more artificial (rectified) linear activation functions used in <xref ref-type="bibr" rid="ref61">Rao and Ballard (1999)</xref> and <xref ref-type="bibr" rid="ref13">Dora et al. (2021)</xref> that do not have an upper bound.</p>
</sec>
<sec id="sec4"><label>2.2.</label>
<title>Learning without labels: updating synaptic strengths</title>
<p>Before training, weights are initialized to random values from a Gaussian distribution centered at zero and with standard deviation of 0.5, clipped at zero to prevent negative weights and divided by the number of neurons in the next (higher) area. After 10 inference steps, long-term adaptation of synaptic weights is conducted in a Hebbian manner, strengthening synapses between active error neurons in area <italic>l</italic> and simultaneously active representation neurons in the area above (<italic>l&#x2009;+&#x2009;1</italic>):</p>
<disp-formula id="E4">
<mml:math id="M20">
<mml:mrow>
<mml:msubsup>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x00B7;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:msup>
<mml:mrow></mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="thickmathspace"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>4</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>with learning rate <inline-formula>
<mml:math id="M21">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03F5;</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Apart from not using weight decay, normalization or a gating mechanism, we thus use the same learning rule as <xref ref-type="bibr" rid="ref61">Rao and Ballard (1999)</xref> and <xref ref-type="bibr" rid="ref13">Dora et al. (2021)</xref>. Based on the slower change of synaptic efficacy in comparison to membrane potential dynamics, weights are assumed to be constant between these updates. In Equation <inline-formula>
<mml:math id="M22">
<mml:mn>4</mml:mn>
</mml:math>
</inline-formula>, the sign of the prediction error controls the direction of the weight change. If the prediction is too large relative to the activity of the representation neurons in this area, the error is negative, and the weight mediating the prediction will be reduced. As a result, given the same prediction, the error in the consecutive time step will be smaller. This stabilizing effect on the response of error neurons is familiar from the work of <xref ref-type="bibr" rid="ref77">Vogels et al. (2011)</xref> that showed how Hebbian plasticity regulates inhibitory input to reduce firing and achieve a balanced global state.</p>
<p>To summarize, both the balanced, excitatory-inhibitory wiring of the network and the unsupervised adaptation of weights based on remaining prediction errors lead to an alignment of representations and predictions, and thus a reduction in error neuron activity. The sum of squared prediction errors can then be seen as an implicit objective function, upon which the inference steps conduct an approximate gradient descent taking into account only the sign and not value of the derivative of the activation function, unlike (<xref ref-type="bibr" rid="ref79">Whittington and Bogacz, 2017</xref>), and upon which learning conducts a precise gradient descent.</p>
</sec>
<sec id="sec5"><label>2.3.</label>
<title>Training procedure</title>
<p>We trained the network on temporally dynamic inputs, using short video sequences. After validating network performance on moving horizontal and vertical bars, we switched to using 10 digits of the MNIST handwritten digits dataset (one per digit from 0 to 9). Each sequence contained six gradually transformed images, and separate datasets were created for translational motion, rotation, and scaling (<xref rid="fig3" ref-type="fig">Figure 3</xref>). For translational and rotational motion, two transformation speeds were used, differing in overlap between consecutive images. The examples shown in <xref rid="fig3" ref-type="fig">Figure 3</xref> are from the dataset with larger step size (&#x201C;fast&#x201D; condition). To further examine robustness of the training paradigm under more realistic and less sparse inputs, random noise patterns were added to the image background during training. A last dataset consisted of five high-pass filtered images of toy objects (an airplane shown in the last row of <xref rid="fig3" ref-type="fig">Figure 3</xref>, a sports car, a truck, a lion and a tin man) from the smallNORB dataset (<xref ref-type="bibr" rid="ref38">LeCun et al., 2004</xref>), undergoing a rotation.</p>
<fig position="float" id="fig3"><label>Figure 3</label>
<caption>
<p>Example sequences from the stimulus datasets. First row: digit translation (indicated by the horizontal arrow) without noise. Second row: digit scaling (expanding arrows) with noise. Third and fourth row: rotation (indicated by the counterclockwise arrow) of digit/toy plane without noise.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g003.tif"/>
</fig>
<p>The network was trained on the 10 (for the toy objects: five) sequences, each presenting a different digit, for multiple epochs. Each epoch consisted of 10 iterations of the same sequence (e.g., of a moving digit &#x2018;6&#x2019;) before switching to the next (of digit &#x2018;7&#x2019;). All hyperparameters are summarized in <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.1</xref>. This repetition of individual sequences drastically improved network performance and could be achieved by the brain through a replay or reactivation mechanism (observed in visual cortex by <xref ref-type="bibr" rid="ref29">Ji and Wilson, 2007</xref> and <xref ref-type="bibr" rid="ref82">Xu et al., 2012</xref>, see also <xref ref-type="bibr" rid="ref80">Wilson and McNaughton, 1994</xref>; <xref ref-type="bibr" rid="ref36">Lansink et al., 2009</xref>). For laterally moving stimuli, repeated presentation can also be achieved by object-tracking saccades that lead to repeated motion across the same photoreceptors on the retina. As the most information-neutral state, the activity was reset to uniform, low values at the beginning of each sequence. This assumption is justified for objects that are seen independently of each other; for instance, not every &#x2018;6&#x2019; is followed by a &#x2018;7&#x2019; (but see <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.15</xref> for how this assumption can be relaxed). For each image, multiple inference-learning cycles (Equation 1&#x2013;4) were conducted before switching to the next image in the sequence. A training epoch consisted of an iteration through all sequences from the dataset. <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.2</xref> contains the pseudocode for the nested training loops.</p>
</sec>
<sec id="sec6"><label>2.4.</label>
<title>Analysis of neural representations</title>
<p>To quantify to what extent the network learned representations that are invariant to transformation, while at the same time retaining meaningful information about sample identity, we combined representational similarity analysis (<xref ref-type="bibr" rid="ref35">Kriegeskorte et al., 2008</xref>) with linear decoding. The distance <italic>d</italic> between two representations <inline-formula>
<mml:math id="M23">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mspace width="0.25em"/>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M24">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, vectors of neural activity in a model area, was measured via cosine dissimilarity:</p>
<disp-formula id="E5">
<mml:math id="M25">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x00B7;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2225;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2225;</mml:mo>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mo>&#x2225;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2225;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math></disp-formula>
<p>Linear decoding was conducted by mapping the inferred representations through a fully connected layer to a layer with one neuron per class label. We implemented this via the linear model class and fitting function of the sklearn library in Python.<xref rid="fn0001" ref-type="fn"><sup>1</sup></xref> Decodability was then measured by the classification accuracy on representations that the decoder had not been presented with before. How well the decoder generalized from representations of a subset of samples from each sequence to the other views of the object is a direct measure of downstream usefulness in the scenario outlined in <xref rid="fig1" ref-type="fig">Figure 1</xref>.</p>
</sec>
</sec>
<sec sec-type="results" id="sec7"><label>3.</label>
<title>Results</title>
<p>We trained the network on sequences of moving objects as specified in the Methods section, and focused on the evolving high-level representations, resulting neural dynamics, and generative input-reconstructing capacities of the network, all in comparison to neurobiology.</p>
<sec id="sec8"><label>3.1.</label>
<title>Transformation-invariant stimulus representations</title>
<p>We found that neurons in network area 3 became tuned to samples in a position-invariant manner. To quantify invariance, we analyzed the neural representations in the highest area of trained networks (<xref rid="fig2" ref-type="fig">Figure 2</xref>) under changes of inputs. More specifically, inference was run on still images from the training datasets until convergence was reached (see <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.3</xref> for a description of convergence). Then, pairwise comparison of inferred area 3 representations measured in cosine distance quantified representational dissimilarity between representations of the same sample, e.g., a digit (within-sequence) or different samples (across-sequence). All pairwise values were plotted in Representational Dissimilarity Matrices (RDMs, <xref ref-type="bibr" rid="ref35">Kriegeskorte et al., 2008</xref>) in <xref rid="fig4" ref-type="fig">Figure 4</xref>.</p>
<fig position="float" id="fig4"><label>Figure 4</label>
<caption>
<p>Representations invariant to viewing conditions are learned without data labels. The matrices depict cosine dissimilarity between representations in area 3. Each of the rows and columns in these plots corresponds to one input image (i.e., a digit sample in a specific spatial configuration), thus each matrix is symmetrical. Along each dimension, samples are ordered sequence-wise, i.e., rows and columns 0&#x2013;5, 6&#x2013;11 etc. are the same object in six different transformation states. Low values shown in purple correspond to similar activity patterns, i.e., a similar set of neurons represents the stimuli given by the combination of row and column, high values shown in yellow correspond to orthogonal activity vectors. <bold>(A)</bold> Baseline, an untrained network tested on the translationally moving digits dataset, for untrained versions of the other RDMs see <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S8</xref>. <bold>(B&#x2013;E)</bold> Networks trained and tested on one of the three datasets of ten rotating, translating (with and without noise) and scaling digits show a clear block-diagonal structure with low values for comparisons within sequences. <bold>(F)</bold> Network trained and tested on five rotation sequences of toy objects.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g004.tif"/>
</fig>
<p>Indicating invariance, RDMs of trained networks showed high similarity within sequences; for instance, Digit &#x201C;1&#x201D;, &#x201C;2&#x201D;, etc. was represented by highly similar activity patterns in area 3, irrespective of position. Representations of samples from different sequences, such as digit &#x201C;1&#x201D; and digit &#x201C;2&#x201D; at the same position were distinct, as indicated by a high dissimilarity in matrix elements off the block diagonal. The same held true for the rotating and scaling digits (<xref rid="fig4" ref-type="fig">Figures 4B</xref>,<xref rid="fig4" ref-type="fig">E</xref>) as well as for the five rotating toy objects (<xref rid="fig4" ref-type="fig">Figure 4F</xref>). <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.9</xref> contains a proof of principle demonstration of learning multiple transformations in the same network. Noise (shown for the translational motion in <xref rid="fig4" ref-type="fig">Figure 4D</xref> versus the noiseless motion in <xref rid="fig4" ref-type="fig">Figure 4C</xref>) slightly degraded clarity of the RDM but preserved the overall structure well. Additionally, the structure of the RDM proved to be quite tolerant to smaller weight initialization (<xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.7</xref>).</p>
<p>Invariance of representations was a consequence of learning from inputs that are transformed continuously in the temporal domain as evidenced by the RDMs of the untrained network that showed very little structure (<xref rid="fig4" ref-type="fig">Figure 4A</xref>, note the different color scale, cosine distance below 0.001). Networks trained on the static frames of the sequences, in which activity was reset after each frame also lacked a block-diagonal structure (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref>), illustrating the role of continuous motion in the training paradigm, which is to provide the necessary temporal structure in which subsequent inputs can be assumed to be caused by the same objects. Interestingly, we did not find an influence of sequence order on decoding accuracy (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S11</xref>), suggesting that only temporal (as shown by the comparison to the static training paradigm), but not spatial continuity of the input transformations was necessary for successful representation learning. The Hebbian learning rule thus groups together consecutive inputs in a manner reminiscent of contrastive, self-supervised methods (<xref ref-type="bibr" rid="ref55">Van Den Oord et al., 2019</xref>; <xref ref-type="bibr" rid="ref28">Illing et al., 2021</xref>; <xref ref-type="bibr" rid="ref22">Halvagal and Zenke, 2022</xref>) that explicitly penalize dissimilarity in the loss function. Here, the higher-level representation from the previous timestep provides a target for the consecutive inputs reminiscent of implementations of supervised learning with local learning rules (<xref ref-type="bibr" rid="ref40">Lee et al., 2015</xref>; <xref ref-type="bibr" rid="ref79">Whittington and Bogacz, 2017</xref>; <xref ref-type="bibr" rid="ref21">Haider et al., 2021</xref>).</p>
<p>Area 3-representations were informative about the identity of the sample moving in sequence as decodability improved with training (<xref rid="fig5" ref-type="fig">Figures 5A</xref>,<xref rid="fig5" ref-type="fig">B</xref>). In addition to its behavioral relevance, decodability of representations quantifies the learned within-sequence invariance. A biologically plausible way to make high-level object representations available to downstream processes (such as action selection, <xref rid="fig1" ref-type="fig">Figure 1</xref>) is a layer of weighted synaptic connections, i.e., a linear decoder, to infer object identity. We simulated this through a linear mapping of the converged area 3-activity vectors that were obtained as above to 10 object identity-encoding neurons (digits &#x201C;0&#x201D;, &#x201C;1&#x201D;, &#x2026;, &#x201C;9&#x201D;). After fitting the decoding model to 2/3 of the representations, evaluation was conducted on the remaining 1/3 in a stratified k-fold manner (with <italic>k</italic>&#x2009;=&#x2009;3). Compared to the information content in the input signal, as measured by the accuracy of a linear decoder, as well as k-means clustering, area 3 representations achieved better decoding performance after around five training epochs (<xref rid="fig5" ref-type="fig">Figure 5A</xref>). The model also outperformed linear Slow Feature Analysis (SFA) (<xref ref-type="bibr" rid="ref81">Wiskott and Sejnowski, 2002</xref>) of the raw inputs (for details see <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.12</xref>). This was confirmed across almost all used datasets (<xref rid="tab1" ref-type="table">Table 1</xref>) and even increased as the transformation step size was increased, resulting in smaller overlap between consecutive images (&#x201C;fast&#x201D; conditions in <xref rid="tab1" ref-type="table">Table 1</xref>, shown in the first and third row of <xref rid="fig3" ref-type="fig">Figure 3</xref>). Across the hierarchy, higher network areas developed more invariant representations than lower areas (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S9</xref>).</p>
<fig position="float" id="fig5"><label>Figure 5</label>
<caption>
<p>High-area representations encode object identity. Decoding accuracy of a linear decoder operating on area 3-representations of our predictive coding network trained in the continuous paradigm (PC-continuous) plotted across training epochs (iterations through the whole dataset). <bold>(A)</bold> Accuracy quickly rises above performance of k-means clustering, SFA and linear decoding directly on the input data (LD inputs) for the rotating toy objects dataset. The error bars for all figures are computed across four random seeds for the weight initializations. <bold>(B)</bold> Influence of continuous training: decoding accuracy in networks trained on continuous sequences (continuous lines) is increased compared to networks trained on isolated (static) frames of the sequences. <bold>(C)</bold> When increasing the size of the dataset from 10 to 200 sequences, the network of original size maintains a decoding accuracy far above chance level. Here, accuracy is significantly improved when the number of neurons in [area 1, area 2, area 3] is increased from [2000, 500, 30] (green curve) to [4,000, 2000, 90] neurons (blue curve). <bold>(D)</bold> Decoder accuracy on a previously unseen validation set of 200 randomly selected and transformed digits.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g005.tif"/>
</fig>
<table-wrap position="float" id="tab1"><label>Table 1</label>
<caption>
<p>Decoding accuracy (in percent) across datasets and models.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Dataset</th>
<th align="center" valign="top">PC-continuous, area 3</th>
<th align="center" valign="top">PC-static, area 3</th>
<th align="center" valign="top"><italic>k</italic>-means input</th>
<th align="center" valign="top">Linear decoding input</th>
<th align="center" valign="top">SFA input</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="bottom">Toy objects rotating</td>
<td align="center" valign="middle">
<bold>95.83&#x2009;&#x00B1;&#x2009;5.46</bold>
</td>
<td align="center" valign="middle">18.33 <bold>&#x00B1;</bold> 1.67</td>
<td align="center" valign="middle">53.33</td>
<td align="center" valign="middle">83.33</td>
<td align="center" valign="middle">73.33</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits rotating</td>
<td align="center" valign="middle">
<bold>94.17&#x2009;&#x00B1;&#x2009;2.50</bold>
</td>
<td align="center" valign="middle">13.33 <bold>&#x00B1;</bold> 2.64</td>
<td align="center" valign="middle">48.33</td>
<td align="center" valign="middle">85.00</td>
<td align="center" valign="middle">70.00</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits rotating, noise</td>
<td align="center" valign="middle">
<bold>90.00&#x2009;&#x00B1;&#x2009;5.14</bold>
</td>
<td align="center" valign="middle">15.83 <bold>&#x00B1;</bold> 1.86</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits scaling</td>
<td align="center" valign="middle">91.67&#x2009;&#x00B1;&#x2009;2.64</td>
<td align="center" valign="middle">75.83 <bold>&#x00B1;</bold> 1.44</td>
<td align="center" valign="middle">68.33</td>
<td align="center" valign="middle">
<bold>100.00</bold>
</td>
<td align="center" valign="middle">70.00</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits scaling, noise</td>
<td align="center" valign="middle">
<bold>86.67&#x2009;&#x00B1;&#x2009;4.25</bold>
</td>
<td align="center" valign="middle">71.25 <bold>&#x00B1;</bold> 12.33</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits translation</td>
<td align="center" valign="middle">
<bold>83.75&#x2009;&#x00B1;&#x2009;6.50</bold>
</td>
<td align="center" valign="middle">40.42 <bold>&#x00B1;</bold> 4.31</td>
<td align="center" valign="middle">60.00</td>
<td align="center" valign="middle">75,00</td>
<td align="center" valign="middle">70.00</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits translation, noise</td>
<td align="center" valign="middle">
<bold>93.33&#x2009;&#x00B1;&#x2009;4.08</bold>
</td>
<td align="center" valign="middle">47.92 <bold>&#x00B1;</bold> 6.60</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits fast rotation</td>
<td align="center" valign="middle">
<bold>94.17&#x2009;&#x00B1;&#x2009;2.50</bold>
</td>
<td align="center" valign="middle">13.33 <bold>&#x00B1;</bold> 2.64</td>
<td align="center" valign="middle">31.67</td>
<td align="center" valign="middle">40.00</td>
<td align="center" valign="middle">70.00</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits fast translation</td>
<td align="center" valign="middle">
<bold>94.58&#x2009;&#x00B1;&#x2009;3.80</bold>
</td>
<td align="center" valign="middle">12.92 <bold>&#x00B1;</bold> 0.72</td>
<td align="center" valign="middle">38.33</td>
<td align="center" valign="middle">25.00</td>
<td align="center" valign="middle">70.00</td>
</tr>
<tr>
<td align="left" valign="bottom">Digits fast translation, noise</td>
<td align="center" valign="middle">
<bold>87.50&#x2009;&#x00B1;&#x2009;4.49</bold>
</td>
<td align="center" valign="middle">14.17 <bold>&#x00B1;</bold> 3.00</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
<td align="center" valign="middle">N/A</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The best performing decoder per dataset is marked in bold. The left column is the predictive coding network trained in the continuous manner put forward in this paper.</p>
</table-wrap-foot>
</table-wrap>
<p>Decodability of network representations was maintained when the dataset size was significantly increased. We tested this by training networks on up to 20 random digits per digit class (totaling 200 sequences of the fast translations). As shown in <xref rid="fig5" ref-type="fig">Figure 5C</xref>, the network maintained above 60% linear decoding accuracy of digit class while an enlarged version of the network shown in cyan further improved this. On the other hand, increasing dataset size negatively affected the invariance structure of the RDMs (<xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.8</xref>). putatively due to the limitations discussed in section 4.4.</p>
<p>Lastly, generalization performance for the remaining MNIST dataset was measured by decoding accuracy on previously unseen digits. Here, accuracy was above 60% when more than 100 training sequences were used (<xref rid="fig5" ref-type="fig">Figure 5D</xref>). In the enlarged network, decoding accuracy rose above 75% (the blue line in <xref rid="fig5" ref-type="fig">Figure 5D</xref>), confirming the network&#x2019;s capacity to generalize. The small standard deviation between randomly initialized runs indicates the representativeness of the chosen validation subset.</p>
<p>The continuous training paradigm improved decoding performance in comparison to networks trained on static inputs. There, decoding performance dropped from the initial value and was consistently more than 20 percentage points worse than in the continuously trained network (<xref rid="fig5" ref-type="fig">Figure 5B</xref> and <xref rid="tab1" ref-type="table">Table 1</xref>). This can partially be explained by the learning of more sample-specific and thus less invariant representations in the static training paradigm, where activity was not carried over from one image to the next (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref>).</p>
</sec>
<sec id="sec9"><label>3.2.</label>
<title>Temporal stability of representations</title>
<p>Without explicitly integrated constraints, the network developed a hierarchy of timescales in which representations in higher network areas decorrelated more slowly over inference time than in lower areas. We quantified this by measuring the autocorrelation <italic>R</italic> during presentation of rotating digits. It is defined as</p>
<disp-formula id="E6">
<mml:math id="M27">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>z</mml:mi>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:mi>&#x0394;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x0394;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x0394;</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mi>z</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x00B7;</mml:mo>
<mml:mi>z</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>&#x0394;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mspace width="thickmathspace"/>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>6</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math></disp-formula>
<p>where &#x2206; is the time lag measured in inference steps between the points to be compared, <italic>T</italic> is the duration of each sequence (consisting of 6,000 inference steps), <italic>N</italic> the number of neurons in the subpopulation and <bold>z</bold> (t) the activity vector in the subpopulation (averaged across 10 inference steps). High values indicate similar, non-zero activities and thus high temporal stability. The resulting autocorrelation curves for time lags between 0 and the length of an individual sequence are shown in <xref rid="fig6" ref-type="fig">Figures 6A</xref>,<xref rid="fig6" ref-type="fig">B</xref>, averaged across the 10 rotation sequences. From these curves, decay constants were inferred by measuring the time until decay to 1/e. If that value was not reached until the sequence end, we extrapolated by using a linear continuation through the values at &#x0394;&#x2009;=&#x2009;0 and &#x0394;&#x2009;=&#x2009;6,000 time steps. Additionally, we varied the stimulus timescale by dividing the number of inference steps on each frame by the rotation speed. The resulting decay constants showed a clear and robust hierarchy across network areas, as well as a positive correlation with the stimulus timescale (<xref rid="fig6" ref-type="fig">Figure 6C</xref>). A significant difference was found between representation neurons in area 3 and area 1 (mean difference at speed one: 7377 time steps, <italic>p</italic>&#x2009;=&#x2009;1.61e-2). <italic>p</italic>-values were determined by a Games-Howell post-hoc test (an extension of the familiar Tukey post-hoc test that does not assume equal variances) succeeding rejection of the null hypothesis across the six populations in a Welch&#x2019;s ANOVA, as described in more detail in <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.16</xref>. A smaller, but significant difference was observed between representation neurons in area 2 and area 1 (4,996 time steps, <italic>p</italic>&#x2009;=&#x2009;9.68e-4). In error-coding neurons, the hierarchy was less pronounced, but area 0 and area 2 nonetheless showed a significant difference (<italic>p</italic>&#x2009;=&#x2009;9.50e-5). Comparison to a statically trained network with the same architecture which failed to develop a temporal hierarchy in representations (<xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.4</xref>) showed that the temporal hierarchy was not built into the model architecture, but instead is an emergent property of the model under the continuous training paradigm. This is underlined by the fact that the same inference rate was used in all network areas. The hierarchy in representational dynamics as well as the positive correlation with stimulus dynamics is in agreement with experimental findings in rat visual cortex (<xref ref-type="bibr" rid="ref59">Piasini et al., 2021</xref>). There, the authors computed neuronal timescales for the decay of autocorrelation in a similar manner and found more stable activity patterns in higher areas of rat visual cortex (<xref rid="fig6" ref-type="fig">Figure 6D</xref>).</p>
<fig position="float" id="fig6"><label>Figure 6</label>
<caption>
<p>The network develops a hierarchy of timescales comparable to experimental data from rodent visual cortex. <bold>(A)</bold> Temporal decay of activity autocorrelation for the representation neuron subpopulations (RNs). <bold>(B)</bold> Decay of autocorrelation for the error-coding subpopulations (ENs). <bold>(C)</bold> Inferred decay constants per subpopulation across stimulus timescales. Error bars denote one standard deviation across four randomly initialized networks (<italic>p</italic>-values are given in the main text). Increasing the speed by a factor of two corresponds to 50% less inference steps per frame. <bold>(D)</bold> Comparison to experimental evidence from rat visual cortex. Hierarchical ordering of intrinsic timescales rendered as the decay constants of activity autocorrelation, from V1 across lateromedial (LM), laterointermediate (LI) to laterolateral (LL) visual areas, adapted from <xref ref-type="bibr" rid="ref59">Piasini et al. (2021)</xref> under the license <ext-link xlink:href="https://creativecommons.org/licenses/by/4.0" ext-link-type="uri">https://creativecommons.org/licenses/by/4.0</ext-link>. &#x002A;&#x002A;&#x002A;<italic>p</italic>&#x2009;=&#x2009;5e-7, 1e-13, 2e-14, respectively for LM, LI, and LL.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g006.tif"/>
</fig>
<p>The decay speed of autocorrelation also allowed us to differentiate between quickly decorrelating error neurons and more persistent representation neurons in higher network areas. Error-coding neurons in area 2 showed a shorter activity timescale than representation neurons within the same area. The difference equaled 4,991 time steps (<italic>p</italic>&#x2009;=&#x2009;9.88e-4), compared to only 211 time steps in the statically trained network (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3</xref>). In this context <xref ref-type="bibr" rid="ref59">Piasini et al. (2021)</xref> discussed the following scenario: when perceiving a continuously moving object, its identity is predictable over time. Thus, one could expect a diminishing firing rate in neurons representing this object, in contrast to their evidence on larger timescales in higher visual areas. Our results reconcile the framework of predictive coding with these empirical observations by differentiating between quickly decorrelating error-signals and persistent representations. Remarkably, this prediction about the consequences of predictive coding circuitry for the activity autocorrelation timescales of error- and representation neurons has, to our knowledge, not been proposed before. Here it is important to mention the extensive literature on the analysis of different frequency bands in cortical feedforward and feedback signal propagation [summarized in <xref ref-type="bibr" rid="ref5">Bastos et al. (2012)</xref> from a predictive coding perspective]. These sources did, however, not speak about temporal stability and the two concepts are not easily connected. It is, for instance, conceivable to have low-frequency signals that quickly decorrelate or high-frequency signals that are maintained over time.</p>
</sec>
<sec id="sec10"><label>3.3.</label>
<title>Generative capacity</title>
<p>The network learned a generative model of the visual inputs as shown by successful input-reconstruction through the network&#x2019;s top-down pathway (<xref rid="fig7" ref-type="fig">Figure 7</xref>). Since areas further up in the ventral processing stream of the cerebral cortex are thought to encode object identity, it is interesting to ask, in how far they are to be able to encode fully detailed scene information, or whether they contain only reduced information (such as object identity). To examine the functioning of this reverse pathway under the continuous transformation training paradigm, we investigated the representational content in each area by reconstructing sensory inputs in a top-down manner. After training, a static input image was presented until network activity converged (<xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.3</xref>). Then, the input was blanked out and the inferred activity pattern (representation) from a selected area was propagated back down to the input neurons via the top-down weights. Area by area, activity <inline-formula>
<mml:math id="M31">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> of representation neurons was installed by the descending predictions <inline-formula>
<mml:math id="M32">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#x2322;</mml:mo>
</mml:mover>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (see <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.5</xref> for details).</p>
<fig position="float" id="fig7"><label>Figure 7</label>
<caption>
<p>Learning of a generative model. <bold>(A)</bold> Illustration of top-down reconstructions in the model with invariant representations. The first column depicts original input images from different datasets. Columns two to four show the activity pattern in the input area generated by propagating latent representations from different network areas to the input layer in a top-down manner. The symbols at the beginning of each row indicate the underlying transformation: translation, rotation, scaling and rotation, respecitvely (as in <xref rid="fig3" ref-type="fig">Figure 3</xref>). In early network areas, representations inferred from sensory inputs carried enough information to reconstruct the input image once it was removed. Reconstructions from higher areas were less accurate. <bold>(B)</bold> Mean squared reconstruction errors (MSE), comparing the original input to the reconstructions on a pixel-level. The vanishingly small vertical bars indicate the standard deviation across four random seeds.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g007.tif"/>
</fig>
<p>As shown in <xref rid="fig7" ref-type="fig">Figure 7A</xref>, the accuracy of reconstruction strongly depended on the area it was initiated from. While predictions from latent representations in area <italic>1</italic> gave rise to reconstructions that resembled the original inputs and achieved low reconstruction errors (<xref rid="fig7" ref-type="fig">Figure 7B</xref>), higher areas were less accurate. From there, reconstructions were either blurry or showed the stimulus in a different position, rotational angle, or scale than presented prior to construction (e.g., the &#x201C;0&#x201D; from area 3 in the second row of <xref rid="fig7" ref-type="fig">Figure 7</xref>). This logically follows from the invariance achieved in these higher areas, from where a single generalized representation cannot suffice to regenerate many specific images. Despite this limitation in obtaining precise reconstructions, which resulted from training on extended sequences instead of individual frames, area 1-representation neurons in all networks contained enough information to regenerate the inputs, thus confirming that the model had learned a generative model of the dataset.</p>
</sec>
<sec id="sec11"><label>3.4.</label>
<title>Reconstructing objects from occluded scenes</title>
<p>The generative capacity of the network&#x2019;s top-down pathway was further confirmed by its ability to reconstruct whole objects from partially occluded sequences as shown in <xref rid="fig8" ref-type="fig">Figure 8</xref>. A behaviorally relevant use of a generative pathway is the ability to fill in for missing information, such as when guessing what the whole scene may look like and planning an action toward occluded parts of an object. To investigate filling-in in the model, we presented occluded test sequences to the network trained on laterally moving digits (the same as before). After inference on each frame of the test dataset, the predictions sent down to the lowest network area were normalized and plotted retinotopically in <xref rid="fig8" ref-type="fig">Figure 8</xref>. Details on the reconstruction process can be found in <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.6</xref>. Indeed, predictions sent toward the lowest area carried information about the occluded parts (<xref rid="fig8" ref-type="fig">Figure 8A</xref>).</p>
<fig position="float" id="fig8"><label>Figure 8</label>
<caption>
<p>Reconstruction of partially occluded sequences. <bold>(A)</bold> First and third row: the input sequences shown to the PC-continuous network with the occluder outlined in red. Rows two and four: images arising from top-down predictions sent to the area 0 carried information about occluded areas of the input. <bold>(B)</bold> Comparison of a continuously trained predictive coding network to a purely feedforward network (no reconstruction) and a predictive coding network trained on static images. Shown is the mean squared reconstruction error in the occluded part, averaged across all ten sequences, rising as the occluded field becomes larger (plotted over the first to last image of the occlusion sequence). The vanishingly small error bars indicate the standard deviation across four network initializations.</p>
</caption>
<graphic xlink:href="fncom-17-1207361-g008.tif"/>
</fig>
<p>As the input deteriorated, predictions also visibly degraded, resulting in a rising MSE (<xref rid="fig8" ref-type="fig">Figure 8B</xref>). The continuously trained network consistently achieved slightly, but significantly better reconstructions than its counterpart trained on static images (for a more detailed analysis see <xref ref-type="supplementary-material" rid="SM1">Supplementary material 1.6</xref>). An independent t-test resulted in <italic>p</italic>&#x2009;&#x003C;&#x2009;6e-4 for all sequence frames except for the first, unoccluded frame where the difference was non-significant. That the difference was small can be explained by two opposing mechanisms: on the one hand memorization of specific frames putatively aids reconstruction in the static network (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref>). On the other hand, availability of invariant object identity from the temporal context, which can be expected to improve reconstruction in the continuously trained network. Overall, the availability of top-down information in occluded fields of network area 0 is comparable to the presence of concealed scene information observed in early visual areas of humans (<xref ref-type="bibr" rid="ref69">Smith and Muckli, 2010</xref>) that cannot be explained by purely feedforward models of perception. Unlike auto-associative models of sequential pattern-completion (<xref ref-type="bibr" rid="ref27">Herz et al., 1989</xref>), our network forms hierarchical representations comparable to <xref ref-type="bibr" rid="ref28">Illing et al. (2021)</xref>.</p>
</sec>
</sec>
<sec sec-type="discussions" id="sec12"><label>4.</label>
<title>Discussion</title>
<sec id="sec13"><label>4.1.</label>
<title>Summary of results</title>
<p>We have shown how networks that minimize local prediction errors learn object representations invariant to the precise viewing conditions in higher network areas (<xref rid="fig4" ref-type="fig">Figure 4</xref>), while acquiring a generative model in which especially lower areas are able to reconstruct specific inputs (<xref rid="fig7" ref-type="fig">Figures 7</xref>, <xref rid="fig8" ref-type="fig">8</xref>). The learned high-level representations distinguish between different objects, as linear decoding accuracy of object identity was high (<xref rid="fig5" ref-type="fig">Figure 5</xref>). Comparison to considerably worse decoding performance in networks trained on static images underlined the importance of temporally continuous transformation for the learning process (<xref rid="fig5" ref-type="fig">Figure 5A</xref>), noting that spatially ordered sequences (as in, e.g., visual object motion) are not strictly necessary (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S11</xref>). Focusing on the implications for neural dynamics, learning from temporally continuous transformations such as continuous motion led to a hierarchy of timescales in representation neurons that showed more slowly changing activity in higher areas, where they notably differed from the more quickly varying error neurons (<xref rid="fig6" ref-type="fig">Figure 6</xref>).</p>
</sec>
<sec id="sec14"><label>4.2.</label>
<title>A generative model to learn invariant representations</title>
<p>Without the need for explicit data labels, the model developed meaningful, decodable representations purely by Hebbian learning. Linking slowly varying predictions in higher areas to more quickly changing inputs in lower areas lead to emergence of temporally stable representations without the need for an explicit constraint for slowness as used for example in <xref ref-type="bibr" rid="ref81">Wiskott and Sejnowski (2002)</xref>. At the same time, the model acquired generative capacity that enables reconstruction of partially occluded stimuli, in line with retinotopic and content-carrying feedback connections to V1 (<xref ref-type="bibr" rid="ref69">Smith and Muckli, 2010</xref>; <xref ref-type="bibr" rid="ref47">Marques et al., 2018</xref>), see also (<xref ref-type="bibr" rid="ref58">Pennartz et al., 2019</xref>) for a review of predictive feedback mechanisms. Other neuron-level models of invariance-learning (<xref ref-type="bibr" rid="ref37">LeCun et al., 1989</xref>; <xref ref-type="bibr" rid="ref15">F&#x00F6;ldi&#x00E1;k, 1991</xref>; <xref ref-type="bibr" rid="ref63">Rolls, 2012</xref>; <xref ref-type="bibr" rid="ref22">Halvagal and Zenke, 2022</xref>) neither account for such feedback nor experimentally observed explicit encoding of mismatch between prediction and observation (<xref ref-type="bibr" rid="ref83">Zmarz and Keller, 2016</xref>; <xref ref-type="bibr" rid="ref41">Leinweber et al., 2017</xref>) and used considerably more complex learning rules requiring a larger set of assumptions (<xref ref-type="bibr" rid="ref22">Halvagal and Zenke, 2022</xref>). Conversely, auto-associative Hopfield-type models that learn dynamic pattern completion from local learning rules (<xref ref-type="bibr" rid="ref27">Herz et al., 1989</xref>; <xref ref-type="bibr" rid="ref7">Brea et al., 2013</xref>) do not learn hierarchical invariant representations like the proposed model does. By solving the task of invariance learning in agreement with the generativity of sensory cortical systems, the claim for predictive coding circuits as fundamental building blocks of the brain&#x2019;s perceptual pathways is strengthened.</p>
</sec>
<sec id="sec15"><label>4.3.</label>
<title>Related work</title>
<p>We argue that the model generalizes predictive coding to moving stimuli in a biologically more plausible way than other approaches (<xref ref-type="bibr" rid="ref44">Lotter et al., 2016</xref>, <xref ref-type="bibr" rid="ref45">2020</xref>; <xref ref-type="bibr" rid="ref1">Ali et al., 2021</xref>) that rely on error backpropagation, which is non-local (<xref ref-type="bibr" rid="ref64">Rumelhart et al., 1985</xref>) or the equivalently non-local backpropagation through time (BPTT, <xref ref-type="bibr" rid="ref1">Ali et al., 2021</xref>). BPTT achieves global gradient descent and thus generally offers performance benefits over Hebbian learning rules. However, it is not straightforward to combine BPTT with invariance learning from temporal structure and direct comparison is thus difficult. As our network is based on the principles developed by <xref ref-type="bibr" rid="ref61">Rao and Ballard (1999)</xref>, its basic neural circuitry is shared with other implementations of predictive coding with local learning rules derived from it <xref ref-type="bibr" rid="ref79">Whittington and Bogacz (2017)</xref> and <xref ref-type="bibr" rid="ref13">Dora et al. (2021)</xref>. In terms of scope of the current model, focusing on representational invariance and investigating the consequences of training on dynamic inputs clearly distinguishes the present approach from <xref ref-type="bibr" rid="ref13">Dora et al. (2021)</xref>. Mechanistic differences are biologically motivated, such as omissionof a gating term used by <xref ref-type="bibr" rid="ref13">Dora et al. (2021)</xref> that depended on the partial derivative with respect to presynaptic neuronal activity. This minimizes the set of necessary assumptions compared to other implementations that require such a term in inference (<xref ref-type="bibr" rid="ref79">Whittington and Bogacz, 2017</xref>; <xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref>) and/or learning (<xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref>). Unlike (<xref ref-type="bibr" rid="ref13">Dora et al., 2021</xref>), the present implementation also does not require weight regularization that depends on information not readily available at the synapses.</p>
</sec>
<sec id="sec16"><label>4.4.</label>
<title>Limitations in performance</title>
<p>Although sufficient for learning of invariant representations on the datasets considered here, the fully connected architecture we used can be expected to limit the degree of representation invariance (as visible, e.g., in the structure of the RDMs) for more complex datasets. However, it has been shown that the lack of inductive bias in fully connected models can be compensated for by training on larger amounts of data (<xref ref-type="bibr" rid="ref3">Bachmann et al., 2023</xref>). Here, the self-supervised nature of our model is an advantage, as it does not require labeled data. Another interesting extension of the model will be to investigate other common types of transformation such as rotation of three-dimensional objects into the plane. Based on the model&#x2019;s ability to deal with the scaling transformation and 3D toy objects, we do not expect any fundamental obstacle: the temporal structure of the transformation is important, not the way that it affects the image.</p>
<p>Fully connected areas may also restrict performance on out-of-sample testing. Here, combination of receptive field-like local filters with a pooling mechanism (<xref ref-type="bibr" rid="ref62">Riesenhuber and Poggio, 1999</xref>) may be helpful to become tolerant to the varying configurations of individual features comprising the objects from the same class. Using a weakly supervised paradigm could improve decoding accuracy even further. It has been shown that under constraints which would be out of the scope of this paper to discuss, inversely connected predictive coding networks can do exact backpropagation when clamping the highest layer activities in a supervised manner (<xref ref-type="bibr" rid="ref79">Whittington and Bogacz, 2017</xref>; <xref ref-type="bibr" rid="ref66">Salvatori et al., 2021</xref>).</p>
<p>Input reconstructions from higher network areas degraded as representations became more invariant. This is a direct consequence of Equation 7: each element from the set of area 3-representations casts a unique prediction to the area below. Consequently, multiple different (not invariant) area 3 patterns would be necessary to fully reconstruct a sequence of inputs. Thus, either the invariance in area 3 or the faithfulness of the reconstruction suffers. Nevertheless, the network as a whole appeared to strike a good balance in the trade-off of memorizing information to reconstruct individual samples in lower areas (hence the better reconstruction accuracy from area 1 in <xref rid="fig7" ref-type="fig">Figure 7</xref>) and abstracting over the sequence, where area 3 represents object identity invariantly (<xref rid="fig5" ref-type="fig">Figure 5</xref>), fitting theoretical descriptions of multilevel perception (ch. 9 in <xref ref-type="bibr" rid="ref57">Pennartz, 2015</xref>). The more detailed and sample-specific information may provide useful input to the action-oriented dorsal processing stream (<xref ref-type="bibr" rid="ref18">Goodale and Milner, 1992</xref>), whereas the hierarchy of the ventral visual cortex extracts object identity and relevant concepts (<xref ref-type="bibr" rid="ref50">Mishkin et al., 1983</xref>).</p>
</sec>
<sec id="sec17"><label>4.5.</label>
<title>Hypotheses on the neural circuitry of predictive coding</title>
<p>The model captures neural response properties in early and high-level areas of the visual cortical hierarchy. Retinotopic (<xref ref-type="bibr" rid="ref47">Marques et al., 2018</xref>) and information carrying (<xref ref-type="bibr" rid="ref69">Smith and Muckli, 2010</xref>) feedback to early visual areas (cf. <xref rid="fig7" ref-type="fig">Figures 7</xref>, <xref rid="fig8" ref-type="fig">8</xref>) as well as invariant (<xref ref-type="bibr" rid="ref43">Logothetis et al., 1995</xref>; <xref ref-type="bibr" rid="ref16">Freiwald and Tsao, 2010</xref>) and object-specific representations (cf. <xref rid="fig4" ref-type="fig">Figure 4</xref>) in the temporal lobe (<xref ref-type="bibr" rid="ref12">Desimone et al., 1984</xref>; <xref ref-type="bibr" rid="ref24">Haxby et al., 2001</xref>; <xref ref-type="bibr" rid="ref60">Quiroga et al., 2005</xref>) are captured by the simulation results. While there is ample evidence for a hierarchy of timescales in the visual processing streams of humans (<xref ref-type="bibr" rid="ref23">Hasson et al., 2008</xref>), primates (<xref ref-type="bibr" rid="ref52">Murray et al., 2014</xref>) and rodents (<xref ref-type="bibr" rid="ref59">Piasini et al., 2021</xref>), with larger temporal stability in higher areas, the compatibility with deep predictive coding is debated (<xref ref-type="bibr" rid="ref59">Piasini et al., 2021</xref>). Our simulation results of increasingly large timescales further up in the network hierarchy may help to reconcile predictive coding with the experimental evidence. Coincidentally, this was also found to be true in a recently developed predictive coding model, albeit with only two layers and without explicit error representations (<xref ref-type="bibr" rid="ref30">Jiang and Rao, 2022</xref>). Compared to emergence of temporal hierarchies purely as a result of dynamics in spiking neurons (<xref ref-type="bibr" rid="ref76">van Meegen and van Albada, 2021</xref>) or large-scale models (<xref ref-type="bibr" rid="ref10">Chaudhuri et al., 2015</xref>; <xref ref-type="bibr" rid="ref49">Mejias and Wang, 2022</xref>), our model provides a complementary account, postulating development of the temporal hierarchy as a consequence of a functional computation: learning invariance by local error minimization.</p>
<p>What novel insights can be extracted about the brain&#x2019;s putative use of predictive algorithms? Theories of predictive coding range from limiting it to a few functions [such as subtraction of corollary discharges to compensate for self-motion (<xref ref-type="bibr" rid="ref41">Leinweber et al., 2017</xref>)] and input reconstruction (<xref ref-type="bibr" rid="ref61">Rao and Ballard, 1999</xref>) to claiming extended versions of it as the most important organizational principle of the brain (<xref ref-type="bibr" rid="ref17">Friston, 2010</xref>), namely the free energy principle. PC models provide a critical step to make theories of perception and imagery quantitative and falsifiable as well as to guide experimental research (<xref ref-type="bibr" rid="ref58">Pennartz et al., 2019</xref>). Based on the simulation results, error neurons in higher visual areas operate on a much shorter activity timescale than their representational counterparts. This comparison of distinct subpopulations may provide an additional angle to measuring neural correlates of prediction errors [for a review see (<xref ref-type="bibr" rid="ref78">Walsh et al., 2020</xref>)], as representation neuron responses have been barely considered in experimental work so far. In combination with work on encoding of errors in superficial, and representations in deep cortical layers (<xref ref-type="bibr" rid="ref5">Bastos et al., 2012</xref>; <xref ref-type="bibr" rid="ref32">Keller and Mrsic-Flogel, 2018</xref>; <xref ref-type="bibr" rid="ref58">Pennartz et al., 2019</xref>; <xref ref-type="bibr" rid="ref31">Jordan and Keller, 2020</xref>), area- and layer-wise recordings of characteristic timescales could lead to a better understanding of cortical microcircuits underlying predictive coding. Layer-wise investigations also show distinct patterns of feedforward and feedback connectivity (<xref ref-type="bibr" rid="ref46">Markov et al., 2014</xref>) and information processing (<xref ref-type="bibr" rid="ref53">Oude Lohuis et al., 2022</xref>). Only with knowledge about these microcircuits, models of finer granularity can be constructed.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="sec18"><label>5.</label>
<title>Conclusion</title>
<p>Predictive coding is a theory with great explanatory power, but with unclear scope. Here, we go beyond the original scope of pure input-reconstruction and find that predictive coding networks can additionally solve an important computational problem of vision. Our results are in line with experimental data from multiple species, strengthening predictive coding as a fundamental theory of mammalian perception.</p>
</sec>
<sec sec-type="data-availability" id="sec19">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: <ext-link xlink:href="https://github.com/matthias-brucklacher/PCInvariance" ext-link-type="uri">https://github.com/matthias-brucklacher/PCInvariance</ext-link>.</p>
</sec>
<sec id="sec20">
<title>Author contributions</title>
<p>MB implemented the model, conducted the analyses, and wrote the first draft of the manuscript. All authors contributed to the conception and design of the study throughout the project, manuscript revision, read, and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="sec22">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank Shirin Dora and Kwangjun Lee for constructive discussions. This project has received funding from the European Union&#x2019;s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 945539 (Human Brain Project SGA3; to CP and SB). We acknowledge the use of Fenix Infrastructure resources, which are partially funded from the European Union&#x2019;s Horizon 2020 research and innovation program through the ICEI project under the grant agreement No. 800858. A previous version of this manuscript (<xref ref-type="bibr" rid="ref8">Brucklacher et al., 2022</xref>) can be found as a preprint at <ext-link xlink:href="https://www.biorxiv.org/content/10.1101/2022.07.18.500392v3" ext-link-type="uri">https://www.biorxiv.org/content/10.1101/2022.07.18.500392v3</ext-link>.</p>
</ack>
<sec sec-type="supplementary-material" id="sec23">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fncom.2023.1207361/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fncom.2023.1207361/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ali</surname> <given-names>A.</given-names></name> <name><surname>Ahmad</surname> <given-names>N.</given-names></name> <name><surname>de Groot</surname> <given-names>E.</given-names></name> <name><surname>van Gerven</surname> <given-names>M.A.J.</given-names></name> <name><surname>Kietzmann</surname> <given-names>T.C.</given-names></name></person-group>, (<year>2021</year>). <source>Predictive coding is a consequence of energy efficiency in recurrent neural networks (SSRN scholarly paper no. 3976481)</source>. <publisher-name>Social Science Research Network</publisher-name>, <publisher-loc>Rochester, NY</publisher-loc>.</citation></ref>
<ref id="ref2">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Alonso</surname> <given-names>N.</given-names></name> <name><surname>Neftci</surname> <given-names>E.</given-names></name></person-group>, (<year>2021</year>). <article-title>Tightening the biological constraints on gradient-based predictive coding</article-title>, in: <conf-name>International conference on neuromorphic systems 2021. Presented at the ICONS 2021</conf-name>, <publisher-name>ACM</publisher-name>, <publisher-loc>Knoxville, TN, USA</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>.</citation></ref>
<ref id="ref3">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Bachmann</surname> <given-names>G.</given-names></name> <name><surname>Anagnostidis</surname> <given-names>S.</given-names></name> <name><surname>Hofmann</surname> <given-names>T.</given-names></name></person-group> (<year>2023</year>). <article-title>Scaling MLPs: a tale of inductive bias</article-title>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2306.13575</pub-id>,</citation></ref>
<ref id="ref4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bartels</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Visual perception: early visual cortex fills in the gaps</article-title>. <source>Curr. Biol.</source> <volume>24</volume>, <fpage>R600</fpage>&#x2013;<lpage>R602</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cub.2014.05.055</pub-id>, PMID: <pub-id pub-id-type="pmid">25004362</pub-id></citation></ref>
<ref id="ref5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bastos</surname> <given-names>A. M.</given-names></name> <name><surname>Usrey</surname> <given-names>W. M.</given-names></name> <name><surname>Adams</surname> <given-names>R. A.</given-names></name> <name><surname>Mangun</surname> <given-names>G. R.</given-names></name> <name><surname>Fries</surname> <given-names>P.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Canonical microcircuits for predictive coding</article-title>. <source>Neuron</source> <volume>76</volume>, <fpage>695</fpage>&#x2013;<lpage>711</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2012.10.038</pub-id>, PMID: <pub-id pub-id-type="pmid">23177956</pub-id></citation></ref>
<ref id="ref6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bellet</surname> <given-names>M. E.</given-names></name> <name><surname>Gay</surname> <given-names>M.</given-names></name> <name><surname>Bellet</surname> <given-names>J.</given-names></name> <name><surname>Jarraya</surname> <given-names>B.</given-names></name> <name><surname>Dehaene</surname> <given-names>S.</given-names></name> <name><surname>van Kerkoerle</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Prefrontal neural ensembles encode an internal model of visual sequences and their violations</article-title>. doi: <pub-id pub-id-type="doi">10.1101/2021.10.04.463064</pub-id></citation></ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brea</surname> <given-names>J.</given-names></name> <name><surname>Senn</surname> <given-names>W.</given-names></name> <name><surname>Pfister</surname> <given-names>J.-P.</given-names></name></person-group> (<year>2013</year>). <article-title>Matching recall and storage in sequence learning with spiking neural networks</article-title>. <source>J. Neurosci.</source> <volume>33</volume>, <fpage>9565</fpage>&#x2013;<lpage>9575</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.4098-12.2013</pub-id>, PMID: <pub-id pub-id-type="pmid">23739954</pub-id></citation></ref>
<ref id="ref8">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Brucklacher</surname> <given-names>M.</given-names></name> <name><surname>Bohte</surname> <given-names>S. M.</given-names></name> <name><surname>Mejias</surname> <given-names>J. F.</given-names></name> <name><surname>Pennartz</surname> <given-names>C. M. A.</given-names></name></person-group> (<year>2022</year>). <article-title>Local minimization of prediction errors drives learning of invariant object representations in a generative network model of visual perception</article-title>. doi: <pub-id pub-id-type="doi">10.1101/2022.07.18.500392</pub-id></citation></ref>
<ref id="ref9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Butz</surname> <given-names>M. V.</given-names></name> <name><surname>Kutter</surname> <given-names>E. F.</given-names></name></person-group> (<year>2016</year>). <source>How the mind comes into being: introducing cognitive science from a functional and computational perspective</source>. Oxford: <publisher-name>Oxford University Press</publisher-name>.</citation></ref>
<ref id="ref10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaudhuri</surname> <given-names>R.</given-names></name> <name><surname>Knoblauch</surname> <given-names>K.</given-names></name> <name><surname>Gariel</surname> <given-names>M.-A.</given-names></name> <name><surname>Kennedy</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>X.-J.</given-names></name></person-group> (<year>2015</year>). <article-title>A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex</article-title>. <source>Neuron</source> <volume>88</volume>, <fpage>419</fpage>&#x2013;<lpage>431</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2015.09.008</pub-id>, PMID: <pub-id pub-id-type="pmid">26439530</pub-id></citation></ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Creutzig</surname> <given-names>F.</given-names></name> <name><surname>Sprekeler</surname> <given-names>H.</given-names></name></person-group> (<year>2008</year>). <article-title>Predictive coding and the slowness principle: an information-theoretic approach</article-title>. <source>Neural Comput.</source> <volume>20</volume>, <fpage>1026</fpage>&#x2013;<lpage>1041</lpage>. doi: <pub-id pub-id-type="doi">10.1162/neco.2008.01-07-455</pub-id>, PMID: <pub-id pub-id-type="pmid">18085988</pub-id></citation></ref>
<ref id="ref12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Desimone</surname> <given-names>R.</given-names></name> <name><surname>Albright</surname> <given-names>T. D.</given-names></name> <name><surname>Gross</surname> <given-names>C. G.</given-names></name> <name><surname>Bruce</surname> <given-names>C.</given-names></name></person-group> (<year>1984</year>). <article-title>Stimulus-selective properties of inferior temporal neurons in the macaque</article-title>. <source>J. Neurosci.</source> <volume>4</volume>, <fpage>2051</fpage>&#x2013;<lpage>2062</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.04-08-02051.1984</pub-id>, PMID: <pub-id pub-id-type="pmid">6470767</pub-id></citation></ref>
<ref id="ref13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dora</surname> <given-names>S.</given-names></name> <name><surname>Bohte</surname> <given-names>S. M.</given-names></name> <name><surname>Pennartz</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Deep gated Hebbian predictive coding accounts for emergence of complex neural response properties along the visual cortical hierarchy</article-title>. <source>Front. Comput. Neurosci.</source> <volume>65</volume>:<fpage>666131</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fncom.2021.666131</pub-id></citation></ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elliffe</surname> <given-names>M. C.</given-names></name> <name><surname>Rolls</surname> <given-names>E. T.</given-names></name> <name><surname>Parga</surname> <given-names>N.</given-names></name> <name><surname>Renart</surname> <given-names>A.</given-names></name></person-group> (<year>2000</year>). <article-title>A recurrent model of transformation invariance by association</article-title>. <source>Neural Netw.</source> <volume>13</volume>, <fpage>225</fpage>&#x2013;<lpage>237</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0893-6080(99)00096-9</pub-id>, PMID: <pub-id pub-id-type="pmid">10935762</pub-id></citation></ref>
<ref id="ref15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>F&#x00F6;ldi&#x00E1;k</surname> <given-names>P.</given-names></name></person-group> (<year>1991</year>). <article-title>Learning invariance from transformation sequences</article-title>. <source>Neural Comput.</source> <volume>3</volume>, <fpage>194</fpage>&#x2013;<lpage>200</lpage>. doi: <pub-id pub-id-type="doi">10.1162/neco.1991.3.2.194</pub-id></citation></ref>
<ref id="ref16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freiwald</surname> <given-names>W. A.</given-names></name> <name><surname>Tsao</surname> <given-names>D. Y.</given-names></name></person-group> (<year>2010</year>). <article-title>Functional compartmentalization and viewpoint generalization within the macaque face-processing system</article-title>. <source>Science</source> <volume>330</volume>, <fpage>845</fpage>&#x2013;<lpage>851</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.1194908</pub-id>, PMID: <pub-id pub-id-type="pmid">21051642</pub-id></citation></ref>
<ref id="ref17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2010</year>). <article-title>The free-energy principle: a unified brain theory?</article-title> <source>Nat. Rev. Neurosci.</source> <volume>11</volume>, <fpage>127</fpage>&#x2013;<lpage>138</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nrn2787</pub-id></citation></ref>
<ref id="ref18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodale</surname> <given-names>M. A.</given-names></name> <name><surname>Milner</surname> <given-names>A. D.</given-names></name></person-group> (<year>1992</year>). <article-title>Separate visual pathways for perception and action</article-title>. <source>Trends Neurosci.</source> <volume>15</volume>, <fpage>20</fpage>&#x2013;<lpage>25</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0166-2236(92)90344-8</pub-id></citation></ref>
<ref id="ref19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Green</surname> <given-names>J.</given-names></name> <name><surname>Bruno</surname> <given-names>C. A.</given-names></name> <name><surname>Traunm&#x00FC;ller</surname> <given-names>L.</given-names></name> <name><surname>Ding</surname> <given-names>J.</given-names></name> <name><surname>Hrvatin</surname> <given-names>S.</given-names></name> <name><surname>Wilson</surname> <given-names>D. E.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>A cell-type-specific error-correction signal in the posterior parietal cortex</article-title>. <source>Nature</source> <volume>620</volume>, <fpage>366</fpage>&#x2013;<lpage>373</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41586-023-06357-1</pub-id>, PMID: <pub-id pub-id-type="pmid">37468637</pub-id></citation></ref>
<ref id="ref20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gregory</surname> <given-names>R. L.</given-names></name></person-group> (<year>1980</year>). <article-title>Perceptions as hypotheses. Philosophical transactions of the Royal Society of London B</article-title>. <source>Biol. Sci.</source> <volume>290</volume>, <fpage>181</fpage>&#x2013;<lpage>197</lpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.1980.0090</pub-id>, PMID: <pub-id pub-id-type="pmid">6106237</pub-id></citation></ref>
<ref id="ref21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Haider</surname> <given-names>P.</given-names></name> <name><surname>Ellenberger</surname> <given-names>B.</given-names></name> <name><surname>Kriener</surname> <given-names>L.</given-names></name> <name><surname>Jordan</surname> <given-names>J.</given-names></name> <name><surname>Senn</surname> <given-names>W.</given-names></name> <name><surname>Petrovici</surname> <given-names>M. A.</given-names></name></person-group> (<year>2021</year>). &#x201C;<article-title>Latent equilibrium: a unified learning theory for arbitrarily fast computation with arbitrarily slow neurons</article-title>&#x201D; in <source>Advances in neural information processing systems</source>. ed. <person-group person-group-type="editor"><name><surname>Ranzato</surname> <given-names>M.</given-names></name></person-group>, et al. (Red Hook, New York, United States: <publisher-name>Associates, Inc</publisher-name>), <fpage>17839</fpage>&#x2013;<lpage>17851</lpage>.</citation></ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halvagal</surname> <given-names>M. S.</given-names></name> <name><surname>Zenke</surname> <given-names>F.</given-names></name></person-group> (<year>2022</year>). <article-title>The combination of Hebbian and predictive plasticity learns invariant object representations in deep sensory networks</article-title>. doi: <pub-id pub-id-type="doi">10.1101/2022.03.17.484712</pub-id></citation></ref>
<ref id="ref23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hasson</surname> <given-names>U.</given-names></name> <name><surname>Yang</surname> <given-names>E.</given-names></name> <name><surname>Vallines</surname> <given-names>I.</given-names></name> <name><surname>Heeger</surname> <given-names>D. J.</given-names></name> <name><surname>Rubin</surname> <given-names>N.</given-names></name></person-group> (<year>2008</year>). <article-title>A hierarchy of temporal receptive windows in human cortex</article-title>. <source>J. Neurosci.</source> <volume>28</volume>, <fpage>2539</fpage>&#x2013;<lpage>2550</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.5487-07.2008</pub-id>, PMID: <pub-id pub-id-type="pmid">18322098</pub-id></citation></ref>
<ref id="ref24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haxby</surname> <given-names>J. V.</given-names></name> <name><surname>Gobbini</surname> <given-names>M. I.</given-names></name> <name><surname>Furey</surname> <given-names>M. L.</given-names></name> <name><surname>Ishai</surname> <given-names>A.</given-names></name> <name><surname>Schouten</surname> <given-names>J. L.</given-names></name> <name><surname>Pietrini</surname> <given-names>P.</given-names></name></person-group> (<year>2001</year>). <article-title>Distributed and overlapping representations of faces and objects in ventral temporal cortex</article-title>. <source>Science</source> <volume>293</volume>, <fpage>2425</fpage>&#x2013;<lpage>2430</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.1063736</pub-id>, PMID: <pub-id pub-id-type="pmid">11577229</pub-id></citation></ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heeger</surname> <given-names>D. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Theory of cortical function</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>114</volume>, <fpage>1773</fpage>&#x2013;<lpage>1782</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1619788114</pub-id>, PMID: <pub-id pub-id-type="pmid">28167793</pub-id></citation></ref>
<ref id="ref26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hert&#x00E4;g</surname> <given-names>L.</given-names></name> <name><surname>Sprekeler</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Learning prediction error neurons in a canonical interneuron circuit</article-title>. <source>elife</source> <volume>9</volume>:<fpage>e57541</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.57541</pub-id>, PMID: <pub-id pub-id-type="pmid">32820723</pub-id></citation></ref>
<ref id="ref27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Herz</surname> <given-names>A.</given-names></name> <name><surname>Sulzer</surname> <given-names>B.</given-names></name> <name><surname>K&#x00FC;hn</surname> <given-names>R.</given-names></name> <name><surname>van Hemmen</surname> <given-names>J. L.</given-names></name></person-group> (<year>1989</year>). <article-title>Hebbian learning reconsidered: representation of static and dynamic objects in associative neural nets</article-title>. <source>Biol. Cybern.</source> <volume>60</volume>, <fpage>457</fpage>&#x2013;<lpage>467</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF00204701</pub-id>, PMID: <pub-id pub-id-type="pmid">11455966</pub-id></citation></ref>
<ref id="ref28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Illing</surname> <given-names>B.</given-names></name> <name><surname>Ventura</surname> <given-names>J.</given-names></name> <name><surname>Bellec</surname> <given-names>G.</given-names></name> <name><surname>Gerstner</surname> <given-names>W.</given-names></name></person-group> (<year>2021</year>). &#x201C;<article-title>Local plasticity rules can learn deep representations using self-supervised contrastive predictions</article-title>&#x201D; in <source>Advances in neural information processing systems</source> (Red Hook, New York, United States: <publisher-name>Curran Associates, Inc</publisher-name>), <fpage>30365</fpage>&#x2013;<lpage>30379</lpage>.</citation></ref>
<ref id="ref29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>D.</given-names></name> <name><surname>Wilson</surname> <given-names>M. A.</given-names></name></person-group> (<year>2007</year>). <article-title>Coordinated memory replay in the visual cortex and hippocampus during sleep</article-title>. <source>Nat. Neurosci.</source> <volume>10</volume>, <fpage>100</fpage>&#x2013;<lpage>107</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nn1825</pub-id>, PMID: <pub-id pub-id-type="pmid">17173043</pub-id></citation></ref>
<ref id="ref30">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>L. P.</given-names></name> <name><surname>Rao</surname> <given-names>R. P. N.</given-names></name></person-group> (<year>2022</year>). <article-title>Dynamic predictive coding: a new model of hierarchical sequence learning and prediction in the cortex</article-title>. doi: <pub-id pub-id-type="doi">10.1101/2022.06.23.497415</pub-id>,</citation></ref>
<ref id="ref31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jordan</surname> <given-names>R.</given-names></name> <name><surname>Keller</surname> <given-names>G. B.</given-names></name></person-group> (<year>2020</year>). <article-title>Opposing influence of top-down and bottom-up input on excitatory layer 2/3 neurons in mouse primary visual cortex</article-title>. <source>Neuron</source> <volume>108</volume>, <fpage>1194</fpage>&#x2013;<lpage>1206.e5</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2020.09.024</pub-id>, PMID: <pub-id pub-id-type="pmid">33091338</pub-id></citation></ref>
<ref id="ref32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keller</surname> <given-names>G. B.</given-names></name> <name><surname>Mrsic-Flogel</surname> <given-names>T. D.</given-names></name></person-group> (<year>2018</year>). <article-title>Predictive processing: a canonical cortical computation</article-title>. <source>Neuron</source> <volume>100</volume>, <fpage>424</fpage>&#x2013;<lpage>435</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2018.10.003</pub-id>, PMID: <pub-id pub-id-type="pmid">30359606</pub-id></citation></ref>
<ref id="ref33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knierim</surname> <given-names>J. J.</given-names></name> <name><surname>van Essen</surname> <given-names>D. C.</given-names></name></person-group> (<year>1992</year>). <article-title>Neuronal responses to static texture patterns in area V1 of the alert macaque monkey</article-title>. <source>J. Neurophysiol.</source> <volume>67</volume>, <fpage>961</fpage>&#x2013;<lpage>980</lpage>. doi: <pub-id pub-id-type="doi">10.1152/jn.1992.67.4.961</pub-id>, PMID: <pub-id pub-id-type="pmid">1588394</pub-id></citation></ref>
<ref id="ref34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kok</surname> <given-names>P.</given-names></name> <name><surname>de Lange</surname> <given-names>F. P.</given-names></name></person-group> (<year>2014</year>). <article-title>Shape perception simultaneously up-and downregulates neural activity in the primary visual cortex</article-title>. <source>Curr. Biol.</source> <volume>24</volume>, <fpage>1531</fpage>&#x2013;<lpage>1535</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cub.2014.05.042</pub-id>, PMID: <pub-id pub-id-type="pmid">24980501</pub-id></citation></ref>
<ref id="ref35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kriegeskorte</surname> <given-names>N.</given-names></name> <name><surname>Mur</surname> <given-names>M.</given-names></name> <name><surname>Bandettini</surname> <given-names>P. A.</given-names></name></person-group> (<year>2008</year>). <article-title>Representational similarity analysis-connecting the branches of systems neuroscience</article-title>. <source>Front. Syst. Neurosci.</source> <volume>2</volume>:<fpage>4</fpage>. doi: <pub-id pub-id-type="doi">10.3389/neuro.06.004.2008</pub-id></citation></ref>
<ref id="ref36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lansink</surname> <given-names>C. S.</given-names></name> <name><surname>Goltstein</surname> <given-names>P. M.</given-names></name> <name><surname>Lankelma</surname> <given-names>J. V.</given-names></name> <name><surname>McNaughton</surname> <given-names>B. L.</given-names></name> <name><surname>Pennartz</surname> <given-names>C. M. A.</given-names></name></person-group> (<year>2009</year>). <article-title>Hippocampus leads ventral striatum in replay of place-reward information</article-title>. <source>PLoS Biol.</source> <volume>7</volume>:<fpage>e1000173</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pbio.1000173</pub-id>, PMID: <pub-id pub-id-type="pmid">19688032</pub-id></citation></ref>
<ref id="ref37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Boser</surname> <given-names>B.</given-names></name> <name><surname>Denker</surname> <given-names>J. S.</given-names></name> <name><surname>Henderson</surname> <given-names>D.</given-names></name> <name><surname>Howard</surname> <given-names>R. E.</given-names></name> <name><surname>Hubbard</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>1989</year>). <article-title>Backpropagation applied to handwritten zip code recognition</article-title>. <source>Neural Comput.</source> <volume>1</volume>, <fpage>541</fpage>&#x2013;<lpage>551</lpage>. doi: <pub-id pub-id-type="doi">10.1162/neco.1989.1.4.541</pub-id></citation></ref>
<ref id="ref38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>F. J.</given-names></name> <name><surname>Bottou</surname> <given-names>L.</given-names></name></person-group> (<year>2004</year>). <article-title>Learning methods for generic object recognition with invariance to pose and lighting. Presented at the proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004</article-title>. <source>IEEE Comp. Soc.</source> <volume>2</volume>, <fpage>97</fpage>&#x2013;<lpage>104</lpage>. doi: <pub-id pub-id-type="doi">10.1109/CVPR.2004.144</pub-id></citation></ref>
<ref id="ref39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>T. S.</given-names></name> <name><surname>Mumford</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>Hierarchical Bayesian inference in the visual cortex</article-title>. <source>JOSA A</source> <volume>20</volume>, <fpage>1434</fpage>&#x2013;<lpage>1448</lpage>. doi: <pub-id pub-id-type="doi">10.1364/JOSAA.20.001434</pub-id></citation></ref>
<ref id="ref40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>D.-H.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Fischer</surname> <given-names>A.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Difference target propagation</article-title>&#x201D; in <source>Machine learning and knowledge discovery in databases, lecture notes in computer science</source>. eds. <person-group person-group-type="editor"><name><surname>Appice</surname> <given-names>A.</given-names></name> <name><surname>Rodrigues</surname> <given-names>P. P.</given-names></name> <name><surname>Santos Costa</surname> <given-names>V.</given-names></name> <name><surname>Soares</surname> <given-names>C.</given-names></name> <name><surname>Gama</surname> <given-names>J.</given-names></name> <name><surname>Jorge</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>498</fpage>&#x2013;<lpage>515</lpage>.</citation></ref>
<ref id="ref41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leinweber</surname> <given-names>M.</given-names></name> <name><surname>Ward</surname> <given-names>D. R.</given-names></name> <name><surname>Sobczak</surname> <given-names>J. M.</given-names></name> <name><surname>Attinger</surname> <given-names>A.</given-names></name> <name><surname>Keller</surname> <given-names>G. B.</given-names></name></person-group> (<year>2017</year>). <article-title>A sensorimotor circuit in mouse cortex for visual flow predictions</article-title>. <source>Neuron</source> <volume>95</volume>, <fpage>1420</fpage>&#x2013;<lpage>1432.e5</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2017.08.036</pub-id>, PMID: <pub-id pub-id-type="pmid">28910624</pub-id></citation></ref>
<ref id="ref42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>N.</given-names></name> <name><surname>DiCarlo</surname> <given-names>J. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Unsupervised natural experience rapidly alters invariant object representation in visual cortex</article-title>. <source>Science</source> <volume>321</volume>, <fpage>1502</fpage>&#x2013;<lpage>1507</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.1160028</pub-id>, PMID: <pub-id pub-id-type="pmid">18787171</pub-id></citation></ref>
<ref id="ref43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Logothetis</surname> <given-names>N. K.</given-names></name> <name><surname>Pauls</surname> <given-names>J.</given-names></name> <name><surname>Poggio</surname> <given-names>T.</given-names></name></person-group> (<year>1995</year>). <article-title>Shape representation in the inferior temporal cortex of monkeys</article-title>. <source>Curr. Biol.</source> <volume>5</volume>, <fpage>552</fpage>&#x2013;<lpage>563</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0960-9822(95)00108-4</pub-id></citation></ref>
<ref id="ref44">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Lotter</surname> <given-names>W.</given-names></name> <name><surname>Kreiman</surname> <given-names>G.</given-names></name> <name><surname>Cox</surname> <given-names>D.</given-names></name></person-group>, (<year>2016</year>). <article-title>Deep predictive coding networks for video prediction and unsupervised learning</article-title>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1605.08104</pub-id></citation></ref>
<ref id="ref45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lotter</surname> <given-names>W.</given-names></name> <name><surname>Kreiman</surname> <given-names>G.</given-names></name> <name><surname>Cox</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>A neural network trained for prediction mimics diverse features of biological neurons and perception</article-title>. <source>Nat. Mach. Intel.</source> <volume>2</volume>, <fpage>210</fpage>&#x2013;<lpage>219</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s42256-020-0170-9</pub-id>, PMID: <pub-id pub-id-type="pmid">34291193</pub-id></citation></ref>
<ref id="ref46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Markov</surname> <given-names>N. T.</given-names></name> <name><surname>Vezoli</surname> <given-names>J.</given-names></name> <name><surname>Chameau</surname> <given-names>P.</given-names></name> <name><surname>Falchier</surname> <given-names>A.</given-names></name> <name><surname>Quilodran</surname> <given-names>R.</given-names></name> <name><surname>Huissoud</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Anatomy of hierarchy: feedforward and feedback pathways in macaque visual cortex</article-title>. <source>J. Comp. Neurol.</source> <volume>522</volume>, <fpage>225</fpage>&#x2013;<lpage>259</lpage>. doi: <pub-id pub-id-type="doi">10.1002/cne.23458</pub-id>, PMID: <pub-id pub-id-type="pmid">23983048</pub-id></citation></ref>
<ref id="ref47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marques</surname> <given-names>T.</given-names></name> <name><surname>Nguyen</surname> <given-names>J.</given-names></name> <name><surname>Fioreze</surname> <given-names>G.</given-names></name> <name><surname>Petreanu</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>The functional organization of cortical feedback inputs to primary visual cortex</article-title>. <source>Nat. Neurosci.</source> <volume>21</volume>, <fpage>757</fpage>&#x2013;<lpage>764</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41593-018-0135-z</pub-id>, PMID: <pub-id pub-id-type="pmid">29662217</pub-id></citation></ref>
<ref id="ref48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matteucci</surname> <given-names>G.</given-names></name> <name><surname>Zoccolan</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Unsupervised experience with temporal continuity of the visual environment is causally involved in the development of V1 complex cells</article-title>. <source>Sci. Adv.</source> <volume>6</volume>:<fpage>eaba3742</fpage>. doi: <pub-id pub-id-type="doi">10.1126/sciadv.aba3742</pub-id>, PMID: <pub-id pub-id-type="pmid">32523998</pub-id></citation></ref>
<ref id="ref49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mejias</surname> <given-names>J. F.</given-names></name> <name><surname>Wang</surname> <given-names>X.-J.</given-names></name></person-group> (<year>2022</year>). <article-title>Mechanisms of distributed working memory in a large-scale network of macaque neocortex</article-title>. <source>elife</source> <volume>11</volume>:<fpage>e72136</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.72136</pub-id>, PMID: <pub-id pub-id-type="pmid">35200137</pub-id></citation></ref>
<ref id="ref50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mishkin</surname> <given-names>M.</given-names></name> <name><surname>Ungerleider</surname> <given-names>L. G.</given-names></name> <name><surname>Macko</surname> <given-names>K. A.</given-names></name></person-group> (<year>1983</year>). <article-title>Object vision and spatial vision: two cortical pathways</article-title>. <source>Trends Neurosci.</source> <volume>6</volume>, <fpage>414</fpage>&#x2013;<lpage>417</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0166-2236(83)90190-X</pub-id></citation></ref>
<ref id="ref51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mumford</surname> <given-names>D.</given-names></name></person-group> (<year>1992</year>). <article-title>On the computational architecture of the neocortex</article-title>. <source>Biol. Cybern.</source> <volume>66</volume>, <fpage>241</fpage>&#x2013;<lpage>251</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF00198477</pub-id></citation></ref>
<ref id="ref52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murray</surname> <given-names>J. D.</given-names></name> <name><surname>Bernacchia</surname> <given-names>A.</given-names></name> <name><surname>Freedman</surname> <given-names>D. J.</given-names></name> <name><surname>Romo</surname> <given-names>R.</given-names></name> <name><surname>Wallis</surname> <given-names>J. D.</given-names></name> <name><surname>Cai</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>A hierarchy of intrinsic timescales across primate cortex</article-title>. <source>Nat. Neurosci.</source> <volume>17</volume>, <fpage>1661</fpage>&#x2013;<lpage>1663</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nn.3862</pub-id>, PMID: <pub-id pub-id-type="pmid">25383900</pub-id></citation></ref>
<ref id="ref53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oude Lohuis</surname> <given-names>M. N.</given-names></name> <name><surname>Pie</surname> <given-names>J. L.</given-names></name> <name><surname>Marchesi</surname> <given-names>P.</given-names></name> <name><surname>Montijn</surname> <given-names>J. S.</given-names></name> <name><surname>de Kock</surname> <given-names>C. P. J.</given-names></name> <name><surname>Pennartz</surname> <given-names>C. M. A.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Multisensory task demands temporally extend the causal requirement for visual cortex in perception</article-title>. <source>Nat. Commun.</source> <volume>13</volume>:<fpage>2864</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-022-30600-4</pub-id>, PMID: <pub-id pub-id-type="pmid">35606448</pub-id></citation></ref>
<ref id="ref54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pak</surname> <given-names>A.</given-names></name> <name><surname>Ryu</surname> <given-names>E.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Chubykin</surname> <given-names>A. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Top-down feedback controls the cortical representation of illusory contours in mouse primary visual cortex</article-title>. <source>J. Neurosci.</source> <volume>40</volume>, <fpage>648</fpage>&#x2013;<lpage>660</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.1998-19.2019</pub-id>, PMID: <pub-id pub-id-type="pmid">31792152</pub-id></citation></ref>
<ref id="ref56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pang</surname> <given-names>Z.</given-names></name> <name><surname>O&#x2019;May</surname> <given-names>C. B.</given-names></name> <name><surname>Choksi</surname> <given-names>B.</given-names></name> <name><surname>VanRullen</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Predictive coding feedback results in perceived illusory contours in a recurrent neural network</article-title>. <source>Neural Netw.</source> <volume>144</volume>, <fpage>164</fpage>&#x2013;<lpage>175</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neunet.2021.08.024</pub-id>, PMID: <pub-id pub-id-type="pmid">34500255</pub-id></citation></ref>
<ref id="ref57">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pennartz</surname> <given-names>C. M.</given-names></name></person-group> (<year>2015</year>). <source>The brain&#x2019;s representational power: on consciousness and the integration of modalities</source>. Cambridge, Massachusetts: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="ref58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pennartz</surname> <given-names>C. M.</given-names></name> <name><surname>Dora</surname> <given-names>S.</given-names></name> <name><surname>Muckli</surname> <given-names>L.</given-names></name> <name><surname>Lorteije</surname> <given-names>J. A.</given-names></name></person-group> (<year>2019</year>). <article-title>Towards a unified view on pathways and functions of neural recurrent processing</article-title>. <source>Trends Neurosci.</source> <volume>42</volume>, <fpage>589</fpage>&#x2013;<lpage>603</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tins.2019.07.005</pub-id>, PMID: <pub-id pub-id-type="pmid">31399289</pub-id></citation></ref>
<ref id="ref59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Piasini</surname> <given-names>E.</given-names></name> <name><surname>Soltuzu</surname> <given-names>L.</given-names></name> <name><surname>Muratore</surname> <given-names>P.</given-names></name> <name><surname>Caramellino</surname> <given-names>R.</given-names></name> <name><surname>Vinken</surname> <given-names>K.</given-names></name> <name><surname>de Beeck</surname> <given-names>H. O.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Temporal stability of stimulus representation increases along rodent visual cortical hierarchies</article-title>. <source>Nat. Commun.</source> <volume>12</volume>, <fpage>1</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-021-24456-3</pub-id></citation></ref>
<ref id="ref60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quiroga</surname> <given-names>R. Q.</given-names></name> <name><surname>Reddy</surname> <given-names>L.</given-names></name> <name><surname>Kreiman</surname> <given-names>G.</given-names></name> <name><surname>Koch</surname> <given-names>C.</given-names></name> <name><surname>Fried</surname> <given-names>I.</given-names></name></person-group> (<year>2005</year>). <article-title>Invariant visual representation by single neurons in the human brain</article-title>. <source>Nature</source> <volume>435</volume>, <fpage>1102</fpage>&#x2013;<lpage>1107</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nature03687</pub-id></citation></ref>
<ref id="ref61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>R. P.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects</article-title>. <source>Nat. Neurosci.</source> <volume>2</volume>, <fpage>79</fpage>&#x2013;<lpage>87</lpage>. doi: <pub-id pub-id-type="doi">10.1038/4580</pub-id>, PMID: <pub-id pub-id-type="pmid">10195184</pub-id></citation></ref>
<ref id="ref62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riesenhuber</surname> <given-names>M.</given-names></name> <name><surname>Poggio</surname> <given-names>T.</given-names></name></person-group> (<year>1999</year>). <article-title>Hierarchical models of object recognition in cortex</article-title>. <source>Nat. Neurosci.</source> <volume>2</volume>, <fpage>1019</fpage>&#x2013;<lpage>1025</lpage>. doi: <pub-id pub-id-type="doi">10.1038/14819</pub-id></citation></ref>
<ref id="ref63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rolls</surname> <given-names>E. T.</given-names></name></person-group> (<year>2012</year>). <article-title>Invariant visual object and face recognition: neural and computational bases, and a model, VisNet</article-title>. <source>Front. Comput. Neurosci.</source> <volume>6</volume>:<fpage>35</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fncom.2012.00035</pub-id></citation></ref>
<ref id="ref64">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rumelhart</surname> <given-names>D. E.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Williams</surname> <given-names>R. J.</given-names></name></person-group> (<year>1985</year>). <source>Learning internal representations by error propagation</source>. La Jolla, California, United States: <publisher-name>California Univ San Diego La Jolla Inst for Cognitive Science</publisher-name>.</citation></ref>
<ref id="ref65">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sacramento</surname> <given-names>J.</given-names></name> <name><surname>Ponte Costa</surname> <given-names>R.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Senn</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Dendritic cortical microcircuits approximate the backpropagation algorithm</article-title>&#x201D; in <source>Advances in neural information processing systems</source>. ed. <person-group person-group-type="editor"><name><surname>Bengio</surname> <given-names>S.</given-names></name></person-group>, et al.,(Red Hook, New York, United States: <publisher-name>Curran Associates, Inc.</publisher-name>)</citation></ref>
<ref id="ref66">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Salvatori</surname> <given-names>T.</given-names></name> <name><surname>Song</surname> <given-names>Y.</given-names></name> <name><surname>Lukasiewicz</surname> <given-names>T.</given-names></name> <name><surname>Bogacz</surname> <given-names>R.</given-names></name> <name><surname>Xu</surname> <given-names>Z.</given-names></name></person-group>, (<year>2021</year>). <article-title>Predictive coding can do exact backpropagation on convolutional and recurrent neural networks</article-title>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2103.03725</pub-id></citation></ref>
<ref id="ref67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwiedrzik</surname> <given-names>C. M.</given-names></name> <name><surname>Freiwald</surname> <given-names>W. A.</given-names></name></person-group> (<year>2017</year>). <article-title>High-level prediction signals in a low-level area of the macaque face-processing hierarchy</article-title>. <source>Neuron</source> <volume>96</volume>, <fpage>89</fpage>&#x2013;<lpage>97.e4</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2017.09.007</pub-id>, PMID: <pub-id pub-id-type="pmid">28957679</pub-id></citation></ref>
<ref id="ref68">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Singer</surname> <given-names>Y.</given-names></name> <name><surname>Willmore</surname> <given-names>B. D. B.</given-names></name> <name><surname>King</surname> <given-names>A. J.</given-names></name> <name><surname>Harper</surname> <given-names>N. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Hierarchical temporal prediction captures motion processing from retina to higher visual cortex</article-title>. doi: <pub-id pub-id-type="doi">10.1101/575464</pub-id>,</citation></ref>
<ref id="ref69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>F. W.</given-names></name> <name><surname>Muckli</surname> <given-names>L.</given-names></name></person-group> (<year>2010</year>). <article-title>Nonstimulated early visual areas carry information about surrounding context</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>107</volume>, <fpage>20099</fpage>&#x2013;<lpage>20103</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1000233107</pub-id>, PMID: <pub-id pub-id-type="pmid">21041652</pub-id></citation></ref>
<ref id="ref70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spratling</surname> <given-names>M. W.</given-names></name></person-group> (<year>2017</year>). <article-title>A hierarchical predictive coding model of object recognition in natural images</article-title>. <source>Cogn. Comput.</source> <volume>9</volume>, <fpage>151</fpage>&#x2013;<lpage>167</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12559-016-9445-1</pub-id>, PMID: <pub-id pub-id-type="pmid">28413566</pub-id></citation></ref>
<ref id="ref71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sprekeler</surname> <given-names>H.</given-names></name> <name><surname>Michaelis</surname> <given-names>C.</given-names></name> <name><surname>Wiskott</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>Slowness: an objective for spike-timing&#x2013;dependent plasticity?</article-title> <source>PLoS Comput. Biol.</source> <volume>3</volume>:<fpage>e112</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.0030112</pub-id>, PMID: <pub-id pub-id-type="pmid">17604445</pub-id></citation></ref>
<ref id="ref72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Summerfield</surname> <given-names>C.</given-names></name> <name><surname>Trittschuh</surname> <given-names>E. H.</given-names></name> <name><surname>Monti</surname> <given-names>J. M.</given-names></name> <name><surname>Mesulam</surname> <given-names>M.-M.</given-names></name> <name><surname>Egner</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Neural repetition suppression reflects fulfilled perceptual expectations</article-title>. <source>Nat. Neurosci.</source> <volume>11</volume>, <fpage>1004</fpage>&#x2013;<lpage>1006</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nn.2163</pub-id>, PMID: <pub-id pub-id-type="pmid">19160497</pub-id></citation></ref>
<ref id="ref73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tafazoli</surname> <given-names>S.</given-names></name> <name><surname>Di Filippo</surname> <given-names>A.</given-names></name> <name><surname>Zoccolan</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Transformation-tolerant object recognition in rats revealed by visual priming</article-title>. <source>J. Neurosci.</source> <volume>32</volume>, <fpage>21</fpage>&#x2013;<lpage>34</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3932-11.2012</pub-id>, PMID: <pub-id pub-id-type="pmid">22219267</pub-id></citation></ref>
<ref id="ref74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tafazoli</surname> <given-names>S.</given-names></name> <name><surname>Safaai</surname> <given-names>H.</given-names></name> <name><surname>De Franceschi</surname> <given-names>G.</given-names></name> <name><surname>Rosselli</surname> <given-names>F. B.</given-names></name> <name><surname>Vanzella</surname> <given-names>W.</given-names></name> <name><surname>Riggi</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Emergence of transformation-tolerant representations of visual objects in rat lateral extrastriate cortex</article-title>. <source>elife</source> <volume>6</volume>:<fpage>e22794</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.22794</pub-id>, PMID: <pub-id pub-id-type="pmid">28395730</pub-id></citation></ref>
<ref id="ref75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Todorovic</surname> <given-names>A.</given-names></name> <name><surname>van Ede</surname> <given-names>F.</given-names></name> <name><surname>Maris</surname> <given-names>E.</given-names></name> <name><surname>de Lange</surname> <given-names>F. P.</given-names></name></person-group> (<year>2011</year>). <article-title>Prior expectation mediates neural adaptation to repeated sounds in the auditory cortex: an MEG study</article-title>. <source>J. Neurosci.</source> <volume>31</volume>, <fpage>9118</fpage>&#x2013;<lpage>9123</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.1425-11.2011</pub-id>, PMID: <pub-id pub-id-type="pmid">21697363</pub-id></citation></ref>
<ref id="ref55">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Van Den Oord</surname> <given-names>A.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Vinyals</surname> <given-names>O.</given-names></name></person-group>, (<year>2019</year>). <article-title>Representation learning with contrastive predictive coding</article-title>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1807.03748</pub-id></citation></ref>
<ref id="ref76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Meegen</surname> <given-names>A.</given-names></name> <name><surname>van Albada</surname> <given-names>S. J.</given-names></name></person-group> (<year>2021</year>). <article-title>Microscopic theory of intrinsic timescales in spiking neural networks</article-title>. <source>Phys. Rev. Res.</source> <volume>3</volume>:<fpage>043077</fpage>. doi: <pub-id pub-id-type="doi">10.1103/PhysRevResearch.3.043077</pub-id></citation></ref>
<ref id="ref77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vogels</surname> <given-names>T. P.</given-names></name> <name><surname>Sprekeler</surname> <given-names>H.</given-names></name> <name><surname>Zenke</surname> <given-names>F.</given-names></name> <name><surname>Clopath</surname> <given-names>C.</given-names></name> <name><surname>Gerstner</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks</article-title>. <source>Science</source> <volume>334</volume>, <fpage>1569</fpage>&#x2013;<lpage>1573</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.1211095</pub-id>, PMID: <pub-id pub-id-type="pmid">22075724</pub-id></citation></ref>
<ref id="ref78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walsh</surname> <given-names>K. S.</given-names></name> <name><surname>McGovern</surname> <given-names>D. P.</given-names></name> <name><surname>Clark</surname> <given-names>A.</given-names></name> <name><surname>O&#x2019;Connell</surname> <given-names>R. G.</given-names></name></person-group> (<year>2020</year>). <article-title>Evaluating the neurophysiological evidence for predictive processing as a model of perception</article-title>. <source>Ann. N. Y. Acad. Sci.</source> <volume>1464</volume>, <fpage>242</fpage>&#x2013;<lpage>268</lpage>. doi: <pub-id pub-id-type="doi">10.1111/nyas.14321</pub-id>, PMID: <pub-id pub-id-type="pmid">32147856</pub-id></citation></ref>
<ref id="ref79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whittington</surname> <given-names>J. C.</given-names></name> <name><surname>Bogacz</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity</article-title>. <source>Neural Comput.</source> <volume>29</volume>, <fpage>1229</fpage>&#x2013;<lpage>1262</lpage>. doi: <pub-id pub-id-type="doi">10.1162/NECO_a_00949</pub-id>, PMID: <pub-id pub-id-type="pmid">28333583</pub-id></citation></ref>
<ref id="ref80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname> <given-names>M. A.</given-names></name> <name><surname>McNaughton</surname> <given-names>B. L.</given-names></name></person-group> (<year>1994</year>). <article-title>Reactivation of hippocampal ensemble memories during sleep</article-title>. <source>Science</source> <volume>265</volume>, <fpage>676</fpage>&#x2013;<lpage>679</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.8036517</pub-id></citation></ref>
<ref id="ref81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wiskott</surname> <given-names>L.</given-names></name> <name><surname>Sejnowski</surname> <given-names>T. J.</given-names></name></person-group> (<year>2002</year>). <article-title>Slow feature analysis: unsupervised learning of invariances</article-title>. <source>Neural Comput.</source> <volume>14</volume>, <fpage>715</fpage>&#x2013;<lpage>770</lpage>. doi: <pub-id pub-id-type="doi">10.1162/089976602317318938</pub-id>, PMID: <pub-id pub-id-type="pmid">11936959</pub-id></citation></ref>
<ref id="ref82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>S.</given-names></name> <name><surname>Jiang</surname> <given-names>W.</given-names></name> <name><surname>Poo</surname> <given-names>M.-M.</given-names></name> <name><surname>Dan</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>Activity recall in visual cortical ensemble</article-title>. <source>Nat. Neurosci.</source> <volume>15</volume>, <fpage>449</fpage>&#x2013;<lpage>455</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nn.3036</pub-id>, PMID: <pub-id pub-id-type="pmid">22267160</pub-id></citation></ref>
<ref id="ref83">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zmarz</surname> <given-names>P.</given-names></name> <name><surname>Keller</surname> <given-names>G. B.</given-names></name></person-group> (<year>2016</year>). <article-title>Mismatch receptive fields in mouse visual cortex</article-title>. <source>Neuron</source> <volume>92</volume>, <fpage>766</fpage>&#x2013;<lpage>772</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2016.09.057</pub-id>, PMID: <pub-id pub-id-type="pmid">27974161</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001">
<p><sup>1</sup><ext-link xlink:href="https://pypi.org/project/sklearn-sfa/" ext-link-type="uri">https://pypi.org/project/sklearn-sfa/</ext-link></p>
</fn>
</fn-group>
</back>
</article>