<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="article-commentary">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2013.00026</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Frontiers Commentary Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Exploiting temporal continuity of views to learn visual object invariance</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Bart</surname> <given-names>Evgeniy</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Hegd&#x000E9;</surname> <given-names>Jay</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Palo Alto Research Center</institution> <country>Palo Alto, CA, USA</country></aff>
<aff id="aff2"><sup>2</sup><institution>James and Jean Culver Vision Discovery Institute and Department of Ophthalmology, Medical College of Georgia, Georgia Regents University</institution> <country>Augusta, GA, USA</country></aff>
<author-notes>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: <email>bart&#x00040;parc.com</email>; <email>jay&#x00040;hegde.us</email></p></fn>
<fn fn-type="edited-by"><p>Edited by: Misha Tsodyks, Weizmann Institute of Science, Israel</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Misha Tsodyks, Weizmann Institute of Science, Israel</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>01</day>
<month>03</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<volume>7</volume>
<elocation-id>26</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>02</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>02</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2013 Bart and Hegd&#x000E9;.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.</p>
</license>
</permissions>
<related-article id="RA1" related-article-type="commentary-article" journal-id="Front. Comput. Neurosci." journal-id-type="nlm-ta" vol="6" page="37" ext-link-type="pmc">A commentary on <article-title>Learning and disrupting invariance in visual recognition with a temporal association rule</article-title> by Isik, L., Leibo, J. Z., and Poggio, T. (2012). Front. Comput. Neurosci. 6:37. doi: 10.3389/fncom.2012.00037</related-article>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="10"/>
<page-count count="2"/>
<word-count count="1470"/>
</counts>
</article-meta>
</front>
<body>
<p>In an ever-changing visual world, the appearance of visual objects changes constantly. Yet, our perception of a given object stays robust despite the variations in the image. The mechanisms that implement this perceptual invariance are partially known (e.g., Logothetis et al., <xref ref-type="bibr" rid="B5">1995</xref>). It is also known that these mechanisms are at least in part learned from experience, but the learning processes involved are not yet fully understood.</p>
<p>Theoretical studies have suggested that the visual system may achieve this learning by using temporal association. The underlying idea is that the object currently in view is likely to be the same object that was in view a moment ago, even if its appearance has changed in the meantime due to factors such as relative motion. The visual system may therefore learn an association between potentially different-looking images if they appear in temporal succession. Learning invariance in this manner is known as temporal trace learning (F&#x000F6;ldi&#x000E1;k, <xref ref-type="bibr" rid="B2">1991</xref>; see Rolls, <xref ref-type="bibr" rid="B6">2012</xref>, for a review), and it is the focus of the study by Isik and colleagues.</p>
<p>Previous psychophysical studies by other groups have shown that the visual system can indeed exploit temporal continuity to learn invariance (Wallis and Bulthoff, <xref ref-type="bibr" rid="B9">1999</xref>, <xref ref-type="bibr" rid="B10">2001</xref>; Cox et al., <xref ref-type="bibr" rid="B1">2005</xref>). Studies have also shown that, as predicted by the trace learning rule, the visual system can be made to learn false invariance by simulating false temporal continuity between distinct objects. Under the right experimental conditions, adult subjects can be made to confuse two completely different objects with each other after as little as 1 h of training (Li and Dicarlo, <xref ref-type="bibr" rid="B4">2010</xref>). This suggests that the visual system can and does rely on temporal continuity of objects to infer invariance, and that the ability to learn using this method persists in adulthood.</p>
<p>It also raises, however, a troubling question. If the visual system can be made to learn false invariance in this way, what is to prevent false invariances from disrupting object recognition all the time? This is a real possibility, because spurious temporal continuities are not at all uncommon under natural viewing conditions. Rapid movement (of the object or the observer) or sudden occlusions may cause distinct objects to be observed in close temporal proximity. Note that although some other learning rules (e.g., continuous spatial transformation learning; Ullman, <xref ref-type="bibr" rid="B8">1996</xref>) may be more robust to this type of disruptions, it is known that at least in some cases basic temporal association learning is used by the visual system (Li and Dicarlo, <xref ref-type="bibr" rid="B4">2010</xref>). So what minimizes this invariance disruption and keeps object recognition robust and stable?</p>
<p>The study by Isik and colleagues provides a compelling potential answer. The authors describe a plausible network model of the primate visual cortex in which simulated visual cortical neurons learn invariance by using a version of the temporal trace rule. The model is based on the previously described HMAX model (Serre et al., <xref ref-type="bibr" rid="B7">2007</xref>). HMAX is a hierarchical feed-forward model which consists of multiple layers of visual neurons. Each layer extracts increasingly complex shape features of the image based on the input from the lower layer, and passes it on to the next higher layer [for details, see Figure 1 of Serre et al. (<xref ref-type="bibr" rid="B7">2007</xref>)]. Thus, each neuron in a given layer &#x0201C;listens to,&#x0201D; and integrates information from, multiple neurons in the previous layer, so that neurons in the topmost layer, arguably corresponding to those in the primate inferotemporal cortex, collectively contain a complex representation of the objects in the various input images.</p>
<p>The authors augmented HMAX to incorporate a simplified, but effective, implementation of learning by temporal association, called the &#x0201C;modified trace rule.&#x0201D; This augmented model was able to reproduce a diagnostic feature of invariance learning: when trained with smooth temporal variations of a given object, such as a face [see Figure 1, <italic>top left</italic>, of Isik et al. (<xref ref-type="bibr" rid="B3">2012</xref>)], the neurons in the topmost layer of the network, individually and collectively, did learn an invariant representation of that object.</p>
<p>The authors then studied the behavior of the model when trained with image sequences that contained false temporal continuity. In each such sequence, images at all positions showed the same object (e.g., a face), except for one position (called the &#x0201C;swap position&#x0201D;) that showed a different object [e.g., a car; see Figure 1, <italic>top right</italic>, of Isik et al. (<xref ref-type="bibr" rid="B3">2012</xref>)]. As expected, invariance tuning of each cell trained with such a sequence was disrupted, with the cell responding to the main object at most locations, but responding more strongly to the swap object at the swap location. Thus, individual cells did faithfully learn false invariance.</p>
<p>However, the neuronal population as a whole still robustly represented all stimuli. The reason is that in the simulated experiment (as under natural viewing conditions), the disruptions were relatively infrequent, and their locations were random. As a result, for any given location, the majority of cells responded consistently, thus producing consistent population-level encoding. The authors found that disruptions of continuity in the training sequences did not appreciably affect the overall population response until the amount of altered exposure was as high as 25%. As expected, robustness of invariance improved as the size of the neural population increases. This confirms the intuition that invariances that rely on larger neural populations are harder to disrupt. Altogether, the central contribution of this model is demonstrating that a highly plausible implementation of trace learning can capture known key characteristics of object invariance, including the conditions in which it remains robust and the conditions in which it does not.</p>
<p>The computational framework the authors have developed can also be used to address additional important questions about invariance. For example, it can be used to test whether invariance can be disrupted more easily when recognizing more similar objects (e.g., when distinguishing between several faces, as opposed to between cups and sailboats). It can also be used to compare invariances to various kinds of transformations, such as out-of-plane rotations or illumination changes.</p>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cox</surname> <given-names>D. D.</given-names></name> <name><surname>Meier</surname> <given-names>P.</given-names></name> <name><surname>Oertelt</surname> <given-names>N.</given-names></name> <name><surname>Dicarlo</surname> <given-names>J. J.</given-names></name></person-group> (<year>2005</year>). <article-title>&#x02018;Breaking&#x02019; position-invariant object recognition</article-title>. <source>Nat. Neurosci</source>. <volume>8</volume>, <fpage>1145</fpage>&#x02013;<lpage>1147</lpage>. <pub-id pub-id-type="doi">10.1038/nn1519</pub-id><pub-id pub-id-type="pmid">16116453</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>F&#x000F6;ldi&#x000E1;k</surname> <given-names>P.</given-names></name></person-group> (<year>1991</year>). <article-title>Learning invariance from transformation sequences</article-title>. <source>Neural Comput</source>. <volume>3</volume>, <fpage>194</fpage>&#x02013;<lpage>200</lpage>. <pub-id pub-id-type="doi">10.1162/neco.2007.19.10.2665</pub-id><pub-id pub-id-type="pmid">17716007</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Isik</surname> <given-names>L.</given-names></name> <name><surname>Leibo</surname> <given-names>J. Z.</given-names></name> <name><surname>Poggio</surname> <given-names>T.</given-names></name></person-group> (<year>2012</year>). <article-title>Learning and disrupting invariance in visual recognition with a temporal association rule</article-title>. <source>Front. Comput. Neurosci</source>. <volume>6</volume>:<issue>37</issue>. <pub-id pub-id-type="doi">10.3389/fncom.2012.00037</pub-id><pub-id pub-id-type="pmid">22754523</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>N.</given-names></name> <name><surname>Dicarlo</surname> <given-names>J. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex</article-title>. <source>Neuron</source> <volume>67</volume>, <fpage>1062</fpage>&#x02013;<lpage>1075</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2010.08.029</pub-id><pub-id pub-id-type="pmid">20869601</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Logothetis</surname> <given-names>N. K.</given-names></name> <name><surname>Pauls</surname> <given-names>J.</given-names></name> <name><surname>Poggio</surname> <given-names>T.</given-names></name></person-group> (<year>1995</year>). <article-title>Shape representation in the inferior temporal cortex of monkeys</article-title>. <source>Curr. Biol</source>. <volume>5</volume>, <fpage>552</fpage>&#x02013;<lpage>563</lpage>. <pub-id pub-id-type="doi">10.1016/S0960-9822(95)00108-4</pub-id><pub-id pub-id-type="pmid">7583105</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rolls</surname> <given-names>E. T.</given-names></name></person-group> (<year>2012</year>). <article-title>Invariant visual object and face recognition: neural and computational bases, and a model, VisNet</article-title>. <source>Front. Comput. Neurosci</source>. <volume>6</volume>:<issue>35</issue>. <pub-id pub-id-type="doi">10.3389/fncom.2012.00035</pub-id><pub-id pub-id-type="pmid">22723777</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Serre</surname> <given-names>T.</given-names></name> <name><surname>Wolf</surname> <given-names>L.</given-names></name> <name><surname>Bileschi</surname> <given-names>S.</given-names></name> <name><surname>Riesenhuber</surname> <given-names>M.</given-names></name> <name><surname>Poggio</surname> <given-names>T.</given-names></name></person-group> (<year>2007</year>). <article-title>Robust object recognition with cortex-like mechanisms</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>29</volume>, <fpage>411</fpage>&#x02013;<lpage>426</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2007.56</pub-id><pub-id pub-id-type="pmid">17224612</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ullman</surname> <given-names>S.</given-names></name></person-group> (<year>1996</year>). <source>High-Level Vision. Object Recognition and Visual Cognition</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wallis</surname> <given-names>G.</given-names></name> <name><surname>Bulthoff</surname> <given-names>H.</given-names></name></person-group> (<year>1999</year>). <article-title>Learning to recognize objects</article-title>. <source>Trends Cogn. Sci</source>. <volume>3</volume>, <fpage>22</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/S1364-6613(98)01261-3</pub-id><pub-id pub-id-type="pmid">10234223</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wallis</surname> <given-names>G.</given-names></name> <name><surname>Bulthoff</surname> <given-names>H. H.</given-names></name></person-group> (<year>2001</year>). <article-title>Effects of temporal association on recognition memory</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>98</volume>, <fpage>4800</fpage>&#x02013;<lpage>4804</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.071028598</pub-id><pub-id pub-id-type="pmid">11287633</pub-id></citation>
</ref>
</ref-list>
</back>
</article>