<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2013.00388</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Review Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Speech through ears and eyes: interfacing the senses with the supramodal brain</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>van Wassenhove</surname> <given-names>Virginie</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Cognitive Neuroimaging Unit, Brain Dynamics, INSERM, U992</institution> <country>Gif/Yvette, France</country></aff>
<aff id="aff2"><sup>2</sup><institution>NeuroSpin Center, CEA, DSV/I2BM</institution> <country>Gif/Yvette, France</country></aff>
<aff id="aff3"><sup>3</sup><institution>Cognitive Neuroimaging Unit, University Paris-Sud</institution> <country>Gif/Yvette, France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Nicholas Altieri, Idaho State University, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Nicholas Altieri, Idaho State University, USA; Luc H. Arnal, New York University, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Virginie van Wassenhove, CEA/DSV/I2BM/Neurospin, B&#x000E2;t 145 Point courrier 156, Gif/Yvette 91191, France e-mail: <email>Virginie.van-Wassenhove&#x00040;cea.fr</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Frontiers in Language Sciences, a specialty of Frontiers in Psychology.</p></fn>
</author-notes>
<pub-date pub-type="epreprint">
<day>28</day>
<month>04</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>12</day>
<month>07</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<volume>4</volume>
<elocation-id>388</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>04</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>06</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2013 van Wassenhove.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.</p>
</license>
</permissions>
<abstract><p>The comprehension of auditory-visual (AV) speech integration has greatly benefited from recent advances in neurosciences and multisensory research. AV speech integration raises numerous questions relevant to the computational rules needed for binding information (within and across sensory modalities), the representational format in which speech information is encoded in the brain (e.g., auditory vs. articulatory), or how AV speech ultimately interfaces with the linguistic system. The following non-exhaustive review provides a set of empirical findings and theoretical questions that have fed the original proposal for predictive coding in AV speech processing. More recently, predictive coding has pervaded many fields of inquiries and positively reinforced the need to refine the notion of internal models in the brain together with their implications for the interpretation of neural activity recorded with various neuroimaging techniques. However, it is argued here that the strength of predictive coding frameworks reside in the specificity of the generative internal models not in their generality; specifically, internal models come with a set of rules applied on particular representational formats themselves depending on the levels and the network structure at which predictive operations occur. As such, predictive coding in AV speech owes to specify the level(s) and the kinds of internal predictions that are necessary to account for the perceptual benefits or illusions observed in the field. Among those specifications, the actual content of a prediction comes first and foremost, followed by the representational granularity of that prediction in time. This review specifically presents a focused discussion on these issues.</p></abstract>
<kwd-group>
<kwd>analysis-by-synthesis</kwd>
<kwd>predictive coding</kwd>
<kwd>multisensory integration</kwd>
<kwd>Bayesian priors</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="200"/>
<page-count count="17"/>
<word-count count="14458"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>Introduction</title>
<p>In natural conversational settings, watching an interlocutor&#x00027;s face does not solely provide information about the speaker&#x00027;s identity or emotional state: the kinematics of the face articulating speech can robustly influence the processing and comprehension of auditory speech. Although audiovisual (AV) speech perception is ecologically relevant, classic models of speech processing have predominantly accounted for speech processing on the basis of acoustic inputs (e.g., Figure <xref ref-type="fig" rid="F1">1</xref>). From an evolutionary standpoint, proximal communication naturally engages multisensory interactions i.e., vision, audition, and touch but it is not until recently that multisensory integration in the communication system of primates has started to be investigated neurophysiologically (Ghazanfar and Logothetis, <xref ref-type="bibr" rid="B61">2003</xref>; Barraclough et al., <xref ref-type="bibr" rid="B16">2005</xref>; Ghazanfar et al., <xref ref-type="bibr" rid="B63">2005</xref>, <xref ref-type="bibr" rid="B60">2008</xref>; Kayser et al., <xref ref-type="bibr" rid="B85">2007</xref>, <xref ref-type="bibr" rid="B84">2010</xref>; Kayser and Logothetis, <xref ref-type="bibr" rid="B83">2009</xref>; Arnal and Giraud, <xref ref-type="bibr" rid="B6">2012</xref>). Advances in multisensory research has raised core issues: how early do multisensory integration occur during perceptual processing (Talsma et al., <xref ref-type="bibr" rid="B176">2010</xref>)? In which representational format do sensory modalities interface for supramodal (Pascual-Leone and Hamilton, <xref ref-type="bibr" rid="B127">2001</xref>; Voss and Zatorre, <xref ref-type="bibr" rid="B190">2012</xref>) and speech analysis (Summerfield, <xref ref-type="bibr" rid="B174">1987</xref>; Altieri et al., <xref ref-type="bibr" rid="B4">2011</xref>)? Which neuroanatomical pathways are implicated (Calvert and Thesen, <xref ref-type="bibr" rid="B36">2004</xref>; Ghazanfar and Schroeder, <xref ref-type="bibr" rid="B62">2006</xref>; Driver and Noesselt, <xref ref-type="bibr" rid="B55">2008</xref>; Murray and Spierer, <xref ref-type="bibr" rid="B120">2011</xref>)? In Humans, visual speech plays an important role in social interactions (de Gelder et al., <xref ref-type="bibr" rid="B51">1999</xref>) but also, and crucially, interfaces with the language system at various depth of linguistic processing (e.g., McGurk and MacDonald, <xref ref-type="bibr" rid="B113">1976</xref>; Auer, <xref ref-type="bibr" rid="B10">2002</xref>; Brancazio, <xref ref-type="bibr" rid="B25">2004</xref>; Campbell, <xref ref-type="bibr" rid="B40">2008</xref>). AV speech thus provides an appropriate model to address the emergence of supramodal or abstract representations in the Human mind and to build upon a rich theoretical and empirical framework elaborated in linguistic research in general (Chomsky, <xref ref-type="bibr" rid="B45">2000</xref>) and in speech research, in particular (Chomsky and Halle, <xref ref-type="bibr" rid="B46">1968</xref>; Liberman and Mattingly, <xref ref-type="bibr" rid="B98">1985</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Classic information-theoretic description of speech processing</bold>. Classic models of speech processing have been construed on the basis of the acoustics of speech, leaving aside the important contribution of visual speech inputs. As a result, the main question in audiovisual (AV) speech processing has been: when does visual speech information integrate with auditory speech? The two main alternatives are before (acoustic or phonetic features, &#x0201C;early&#x0201D; integration) or after (&#x0201C;late&#x0201D; integration) the phonological categorization of the auditory speech inputs (see also Schwartz et al., <xref ref-type="bibr" rid="B149">1998</xref>). However, this model unrealistically frames and biases the question of &#x0201C;when&#x0201D; by imposing a serial, linear and hierarchical processing for speech processing.</p></caption>
<graphic xlink:href="fpsyg-04-00388-g0001.tif"/>
</fig>
</sec>
<sec>
<title>Weighting sensory evidence against internal non-invariance</title>
<p>Speech theories have seldom incorporated visual information as raw material for speech processing (Green, <xref ref-type="bibr" rid="B71">1996</xref>; Schwartz et al., <xref ref-type="bibr" rid="B149">1998</xref>) although normal hearing and hearing-impaired populations greatly benefit from looking at the interlocutor&#x00027;s face (Sumby and Pollack, <xref ref-type="bibr" rid="B173">1954</xref>; Erber, <xref ref-type="bibr" rid="B56">1978</xref>; MacLeod and Summerfield, <xref ref-type="bibr" rid="B105">1987</xref>; Grant and Seitz, <xref ref-type="bibr" rid="B68">1998</xref>, <xref ref-type="bibr" rid="B69">2000</xref>). If any benefit for speech encoding is to be gained in the integration of AV information, the informational content provided by each sensory modality is likely to be partially, but not solely, redundant i.e., complementary. For instance, the efficiency in AV speech integration is known to depend not only on the amount of information extracted in each sensory modality but also in its variability (Grant et al., <xref ref-type="bibr" rid="B70">1998</xref>). Understanding the limitations and processing constraints of each sensory modality is thus important to understand how non-invariance in speech signals leads to invariant representations in the brain. In that regards, should speech processing be considered &#x0201C;special?&#x0201D; The historical debate is outside the scope of this review but it is here considered that positing an internal model dedicated to the processing of speech analysis is legitimate to account for (i) the need for invariant representations in the brain, (ii) the parsimonious sharing of generative rules for perception/production and (iii) the ultimate interfacing of the (AV) communication system with the Human linguistic system. As such, this review focuses on the specificities of AV speech not on the general guiding principles of multisensory (AV) integration.</p>
<sec>
<title>Temporal parsing and non-invariance</title>
<p>A canonical puzzle in (auditory, visual and AV) speech processing is how the brain correctly parses a continuous flow of sensory information. Like auditory speech, the visible kinematics of articulatory gestures hardly provides non-invariant structuring of information over time (Kent, <xref ref-type="bibr" rid="B86">1983</xref>; Tuller and Kelso, <xref ref-type="bibr" rid="B181">1984</xref>; Saltzman and Munhall, <xref ref-type="bibr" rid="B144">1989</xref>; Schwartz et al., <xref ref-type="bibr" rid="B151">2012</xref>) yet temporal information in speech is critical (Rosen, <xref ref-type="bibr" rid="B140">1992</xref>; Greenberg, <xref ref-type="bibr" rid="B72">1998</xref>). Auditory speech is typically sufficient to provide a high level of intelligibility (e.g., over the phone) and accordingly, the auditory system can parse incoming speech information with high-temporal acuity (Poeppel, <xref ref-type="bibr" rid="B130">2003</xref>; Morillon et al., <xref ref-type="bibr" rid="B117">2010</xref>; Giraud and Poeppel, <xref ref-type="bibr" rid="B65">2012</xref>). Conversely, visual speech alone leads to poor intelligibility scores (Campbell, <xref ref-type="bibr" rid="B38">1989</xref>; Massaro, <xref ref-type="bibr" rid="B111">1998</xref>) and visual processing is characterized by a slower sampling rate (Busch and VanRullen, <xref ref-type="bibr" rid="B28">2010</xref>). The slow timescales over which visible articulatory gestures evolve (and are extracted by the observer&#x00027;s brain) constrain the representational granularity of visual information to visemes, categories much less distinctive than phonemes.</p>
<p>In auditory neuroscience, the specificity of phonetic processing and phonological categorization has long been investigated (Maiste et al., <xref ref-type="bibr" rid="B108">1995</xref>; Simos et al., <xref ref-type="bibr" rid="B161">1998</xref>; Li&#x000E9;geois et al., <xref ref-type="bibr" rid="B99">1999</xref>; Sharma and Dorman, <xref ref-type="bibr" rid="B158">1999</xref>; Philips et al., <xref ref-type="bibr" rid="B128">2000</xref>). The peripheral mammalian auditory system has been proposed to efficiently encode a broad category of natural acoustic signals by using a time-frequency representation (Lewicki, <xref ref-type="bibr" rid="B95">2002</xref>; Smith and Lewicki, <xref ref-type="bibr" rid="B163">2006</xref>). In this body of work, the characteristics of auditory filters heavily depend on the statistical characteristics of sounds: as such, auditory neural coding schemes show plasticity as a function of acoustic inputs. The intrinsic neural tuning properties allow for multiple modes of acoustic processing with trade-offs in the time and frequency domains which naturally partition the time-frequency space into sub-regions. Complementary findings show that efficient coding can be realized for speech inputs (Smith and Lewicki, <xref ref-type="bibr" rid="B163">2006</xref>) supporting the notion that the statistical properties of auditory speech can drive different modes of information extraction in the same neural populations, an observation supporting the &#x0201C;speech mode&#x0201D; hypothesis (Remez et al., <xref ref-type="bibr" rid="B139">1998</xref>; Tuomainen et al., <xref ref-type="bibr" rid="B182">2005</xref>; Stekelenburg and Vroomen, <xref ref-type="bibr" rid="B168">2012</xref>).</p>
<p>In visual speech, how the brain derives speech-relevant information from seeing the dynamics of the facial articulators remains unclear. While the neuropsychology of lipreading has been thoroughly described (Campbell, <xref ref-type="bibr" rid="B37">1986</xref>, <xref ref-type="bibr" rid="B38">1989</xref>, <xref ref-type="bibr" rid="B39">1992</xref>), very few studies have specifically addressed the neural underpinnings of visual speech processing (Calvert, <xref ref-type="bibr" rid="B32">1997</xref>; Calvert and Campbell, <xref ref-type="bibr" rid="B34">2003</xref>). Visual speech is a particular form of biological motion which readily engages some face-specific sub-processes (Campbell, <xref ref-type="bibr" rid="B37">1986</xref>, <xref ref-type="bibr" rid="B39">1992</xref>) but remains functionally independent from typical face processing modules (Campbell, <xref ref-type="bibr" rid="B39">1992</xref>). Insights on the neural bases of visual speech processing may be provided by studies of biological motion (Grossman et al., <xref ref-type="bibr" rid="B73">2000</xref>; Vaina et al., <xref ref-type="bibr" rid="B183">2001</xref>; Servos et al., <xref ref-type="bibr" rid="B157">2002</xref>) and the finding of mouth-movement specific cells in temporal cortex provides a complementary departing point (Desimone and Gross, <xref ref-type="bibr" rid="B54">1979</xref>; Puce et al., <xref ref-type="bibr" rid="B133">1998</xref>; Hans-Otto, <xref ref-type="bibr" rid="B75">2001</xref>). Additionally, case studies (sp. prosopagnosia and akinetopsia) have suggested that both form and motion are necessary for the processing of visual and AV speech (Campbell et al., <xref ref-type="bibr" rid="B41">1990</xref>; Campbell, <xref ref-type="bibr" rid="B39">1992</xref>). In line with this, an unexplored hypothesis for the neural encoding of facial kinematics is the use form-from-motion computations (Cathiard and Abry, <xref ref-type="bibr" rid="B43">2007</xref>) which could help the implicit recovery of articulatory commands from seeing the speaking face (e.g., Viviani et al., <xref ref-type="bibr" rid="B189">2011</xref>).</p>
</sec>
<sec>
<title>Active sampling of visual speech cues</title>
<p>In spite of the limited informational content provided by visual speech (most articulatory gestures remain hidden), AV speech integration is resilient to further degradation of the visual speech signal. Numerous filtering approaches do not suppress integration (Rosenblum and Salda&#x000F1;a, <xref ref-type="bibr" rid="B142">1996</xref>; Campbell and Massaro, <xref ref-type="bibr" rid="B42">1997</xref>; Jordan et al., <xref ref-type="bibr" rid="B82">2000</xref>; MacDonald et al., <xref ref-type="bibr" rid="B103">2000</xref>) suggesting that the use of multiple visual cues [e.g., luminance patterns (Jordan et al., <xref ref-type="bibr" rid="B82">2000</xref>); kinematics (Rosenblum and Salda&#x000F1;a, <xref ref-type="bibr" rid="B142">1996</xref>)]. Additionally, neither the gender (Walker et al., <xref ref-type="bibr" rid="B193">1995</xref>) nor the familiarity (Rosenblum and Yakel, <xref ref-type="bibr" rid="B143">2001</xref>) of the face impacts the robustness of AV speech integration. As will be discussed later, AV speech integration also remains resilient to large AV asynchronies (cf. <italic>Resilient temporal integration and the co-modulation hypothesis</italic>). Visual kinematics alone are sufficient to maintain a high rate of AV integration (Rosenblum and Salda&#x000F1;a, <xref ref-type="bibr" rid="B142">1996</xref>) but whether foveal (i.e., explicit lip-reading with focus on the mouth area) or extra-foveal (e.g., global kinematics) information is most relevant for visemic categorization remains unclear.</p>
<p>Interestingly, gaze fixations 10&#x02013;20&#x000B0; away from the mouth are sufficient to extract relevant speech information but numerous eye movements have also been reported (Vatikiotis-Bateson et al., <xref ref-type="bibr" rid="B188">1998</xref>; Par&#x000E9; et al., <xref ref-type="bibr" rid="B126">2003</xref>). It is noteworthy that changes of gaze direction can be crucial for the extraction of auditory information as neural tuning properties throughout the auditory pathway are modulated by gaze direction (Werner-Reiss et al., <xref ref-type="bibr" rid="B197">2003</xref>) and auditory responses are affected by changes in visual fixations (Rajkai et al., <xref ref-type="bibr" rid="B135">2008</xref>; van Wassenhove et al., <xref ref-type="bibr" rid="B185">2012</xref>). These results suggest an interesting working hypothesis: the active scanning of a speaker&#x00027;s face may compensate for the slow sampling rate of the visual system.</p>
<p>Hence, despite the impoverished signals provided by visual speech, additional degradation does not fully prevent AV speech integration. As such, (supramodal) AV speech processing is more likely than not a natural mode of processing in which the contribution of visual speech to the perceptual outcome may be regulated as a function of the needs for perceptual completion in the system.</p>
</sec>
<sec>
<title>AV speech mode hypothesis</title>
<p>Several findings have suggested that AV signals displayed in a speech vs. a non-speech mode influence both behavioral and electrophysiological responses (Tuomainen et al., <xref ref-type="bibr" rid="B182">2005</xref>; Stekelenburg and Vroomen, <xref ref-type="bibr" rid="B168">2012</xref>). Several observations could complement this view. First, lip-reading stands as a natural ability that is difficult to improve (as opposed to reading ability; Campbell, <xref ref-type="bibr" rid="B39">1992</xref>) and is a good predictor of AV speech integration (Grant et al., <xref ref-type="bibr" rid="B70">1998</xref>). In line with these observations, and as will be discussed later on, AV speech integration undergoes a critical acquisition period (Schorr et al., <xref ref-type="bibr" rid="B146">2005</xref>).</p>
<p>Second, within the context of an internal speech model, AV speech integration is not arbitrary and follows principled internal rules. In the seminal work of McGurk and MacDonald (<xref ref-type="bibr" rid="B113">1976</xref>, MacDonald and McGurk, <xref ref-type="bibr" rid="B102">1978</xref>), two types of phenomena illustrate principled ways in which AV speech integration occurs. In <italic>fusion</italic>, dubbing an auditory bilabial (e.g., [ba] or [pa]) onto a visual velar place of articulation (e.g., [ga] or [ka]) leads to an illusory fused alveolar percept (e.g., [da] or [ta], respectively). Conversely, in <italic>combination</italic>, dubbing an auditory [ga] onto a visual place of articulation [ba] leads to the illusory combination percept [bga]. Fusion has been used as an index of automatic AV speech integration because it leads to a unique perceptual outcome that is nothing like any of the original sensory inputs (i.e., neither a [ga] nor a [ba], but a third percept). Combination has been much less studied: unlike fusion, the resulting percept is not unique but rather a product of co-articulated speech information (such as [bga]). Both fusion and combination provide convenient (albeit arguable) indices on whether AV speech integration has occurred or not. These effects can be generalized across places-of-articulation in stop-consonants such that any auditory bilabial dubbed onto a visual velar result in a misperceived alveolar. These two kinds of illusory AV speech outputs illustrate the complexity of AV interactions and suggest that the informational content carried by each sensory modality determines the nature of AV interactions during speech processing. A strong hypothesis is that internal principles should depend on the articulatory repertoire of a given language and few cross-linguistic studies have addressed this issue (Sekiyama and Tohkura, <xref ref-type="bibr" rid="B156">1991</xref>; Sekiyama, <xref ref-type="bibr" rid="B153">1994</xref>, <xref ref-type="bibr" rid="B154">1997</xref>).</p>
<p>Inherent to the speech mode hypothesis is the attentional-independence of speech analysis. Automaticity in AV speech processing (and in multisensory integration) is a matter of great debate (Talsma et al., <xref ref-type="bibr" rid="B176">2010</xref>). A recent finding (Alsius and Munhall, <xref ref-type="bibr" rid="B2">2013</xref>) suggests that conscious awareness of a face is not necessary for McGurk effects (cf. also Vidal et al. submitted, pers. communication). While attention may regulate the weight of sensory information being processed in each sensory modality&#x02014;e.g., via selective attention (Lakatos et al., <xref ref-type="bibr" rid="B92">2008</xref>; Schroeder and Lakatos, <xref ref-type="bibr" rid="B147">2009</xref>)&#x02014;attention does not a priori overtake the internal generative rules for speech processing. In other words, while the strength of AV speech integration can be modulated (Tiippana et al., <xref ref-type="bibr" rid="B179">2003</xref>; Soto-Faraco et al., <xref ref-type="bibr" rid="B164">2004</xref>; Alsius et al., <xref ref-type="bibr" rid="B3">2005</xref>; van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>), AV speech integration is not fully abolished in integrators.</p>
<p>The robustness and principled ways in which visual speech influences auditory speech processing suggest that the neural underpinnings of AV speech integration rely on specific computational mechanisms that are constrained by the internal rules of the speech processing system&#x02014;and possibly modulated by attentional focus on one or the other streams of information. I now elaborate on possible predictive implementations and tenants of AV speech integration.</p>
</sec>
</sec>
<sec>
<title>Predictive coding, priors and the bayesian brain</title>
<p>A majority of mental operations are cognitively impenetrable i.e., inaccessible to conscious awareness (Pylyshyn, <xref ref-type="bibr" rid="B134">1984</xref>; Kihlstrom, <xref ref-type="bibr" rid="B88">1987</xref>). Proposed more than a century ago [Parrot (cf. Allik and Konstabel, <xref ref-type="bibr" rid="B1">2005</xref>); Helmholtz MacKay, <xref ref-type="bibr" rid="B104">1958</xref>; Barlow, <xref ref-type="bibr" rid="B14">1990</xref>; Wundt (<xref ref-type="bibr" rid="B199">1874</xref>)], unconscious inferences later coined the role of sensory processing as a means to remove redundant information in the incoming signals based on the informed natural statistics of sensory events. For instance, efficient coding disambiguates incoming sensory information using mutual inhibition as a means to decorrelate mixed signals: a network can locally generate hypotheses on the basis of a known (learned) matrix from which inversion can be drawn for prediction (Barlow, <xref ref-type="bibr" rid="B13">1961</xref>; Srinivasan et al., <xref ref-type="bibr" rid="B166">1982</xref>; Barlow and F&#x000F6;ldiak, <xref ref-type="bibr" rid="B15">1989</xref>). Predictive coding can be local, for instance with a specific instantiation in the architecture of the retina (Hosoya et al., <xref ref-type="bibr" rid="B78">2005</xref>). Early predictive models have essentially focused on the removal of redundant information in the spatial domain. Recently, predictive models have incorporated more sophisticated levels of predictions (Harth et al., <xref ref-type="bibr" rid="B76">1987</xref>; Rao and Ballard, <xref ref-type="bibr" rid="B136">1999</xref>; Friston, <xref ref-type="bibr" rid="B59">2005</xref>). For instance, Harth et al. (<xref ref-type="bibr" rid="B76">1987</xref>) proposed a predictive model in which feedback connectivity shapes the extraction of information early in the visual hierarchy and such regulation of V1 activity in the analysis of sensory inputs has also been tested (Sharma et al., <xref ref-type="bibr" rid="B160">2003</xref>). The initial conception of &#x0201C;top&#x02013;down&#x0201D; regulation has been complemented with the notion that feed-forward connections may not carry the extracted information <italic>per se</italic> but rather the residual error between &#x0201C;top&#x02013;down&#x0201D; internal predictions and the incoming sensory evidence (Rao and Ballard, <xref ref-type="bibr" rid="B136">1999</xref>).</p>
<p>A growing body of evidence supports the view that the brain is a hierarchically organized inferential system in which internal hypotheses or predictions are generated at higher levels and tested against evidence at lower levels along the neural pathways (Friston, <xref ref-type="bibr" rid="B59">2005</xref>): predictions are carried by backward and lateral connections whereas prediction errors are carried by forward projections. Predictive coding schemes have thus gone from local circuitries to brain system seemingly suggesting that access to high-level representations are necessary to formulate efficient predictions.</p>
<sec>
<title>Fixed vs. informed priors</title>
<p>Conservatively, any architectural constraint (e.g., connectivity pattern, gross neuroanatomical pathways), knowledge and circuitry acquired during a sensitive and before a critical period, or the endowment of the system can all be considered deterministic or <italic>fixed priors</italic>. Contrariwise, <italic>informed priors</italic> are any form of knowledge undergoing updates available through plastic changes and acquired through experience.</p>
<p>At the system level, a common neurophysiological index taken as evidence for predictive coding in cortex is the MisMatch Negativity (MMN) response (N&#x000E4;&#x000E4;t&#x000E4;nen et al., <xref ref-type="bibr" rid="B122">1978</xref>; N&#x000E4;&#x000E4;t&#x000E4;nen, <xref ref-type="bibr" rid="B121">1995</xref>): the MMN is classically elicited by the presentation of a rare event (&#x0007E;20% of the time) in the context of standard events (&#x0007E;80% of the time). The most convincing evidence for the MMN as a residual error resulting from the comparison of an internal prediction with incoming sensory evidence is the case of the MMN to omission, namely an MMN elicited when an event is omitted in a predictable sequence of events (Tervaniemi et al., <xref ref-type="bibr" rid="B177">1994</xref>; Yabe et al., <xref ref-type="bibr" rid="B200">1997</xref>; Czigler et al., <xref ref-type="bibr" rid="B50">2006</xref>). Other classes of electrophysiological responses have been interpreted as residual errors elicited by a deviance at different levels of perceptual or linguistic complexities (e.g., the N400; Lau et al., <xref ref-type="bibr" rid="B94">2008</xref>). Recent findings have also pointed out to the hierarchical level at which statistical contingencies can be incorporated in a predictive model (Wacongne et al., <xref ref-type="bibr" rid="B192">2011</xref>). Altogether, these results are in line with recent hierarchical processing of predictive coding in which the complexity of the prediction depends on the depth of recursion in the predictive model (Kiebel et al., <xref ref-type="bibr" rid="B87">2008</xref>).</p>
<p>In AV speech, the seminal work of Sams and Aulanko (<xref ref-type="bibr" rid="B145">1991</xref>) used an MMN paradigm with magnetoencephalography (MEG). Using congruent and incongruent (McGurk: audio [pa] dubbed onto visual [ka]) stimuli, the authors found that the presentation of an incongruent (congruent) AV speech deviant in a stream of congruent (incongruent) AV speech standards elicited a robust auditory MMN. Since, a series of subsequent MMN studies has replicated these findings (Colin et al., <xref ref-type="bibr" rid="B47">2002</xref>; M&#x000F6;tt&#x000F6;nen et al., <xref ref-type="bibr" rid="B118">2002</xref>, <xref ref-type="bibr" rid="B119">2004</xref>) and the sources of the MMN was consistently located in auditory association areas, about 150 to 200 ms following auditory onset and in the superior temporal sulcus from 250 ms on. The bulk of literature using MMN in AV speech therefore suggests that internal predictions generated in the auditory regions incorporate visual information relevant for the analysis of speech.</p>
<p>Critically, it is here argued that internal models invoked for speech processing are part of the cognitive architecture i.e., likely endowed with fixed priors for the analysis of (speech) inputs. The benefit of positing an internal model is precisely to account for robust and invariant internal representations that are resilient to the ever-changing fluctuations of a sensory environment. As such, a predictive model should help refine the internal representations in light of sensory evidence, not entirely shape the internal prediction on the basis of the temporary environmental statistics.</p>
<p>In this context, the temporal statistics of stimuli using an MMN paradigm (e.g., 80% standards, 20% deviants) confine predictions to the temporary experimental context: the residual error is context-specific and tied to the temporary statistics of inputs provided within a particular experimental session. Thus, the MMN may not necessarily reveal fixed priors or specific hard-wired constrains of the system. An internal model should provide a means to stabilize non-invariance in order to counteract the highly variable nature of speech utterances irrespective of the temporally local context. A strong prediction is thus that the fixed priors of an internal model should supersede the temporary statistics of stimuli during a particular experimental session. Specifically, if predictive coding is a canonical operation of cortical function, residual errors should be the rule, not the exception and residual errors should be informative with respect to the content of the prediction, not only with respect to the temporal statistics of the sensory evidence. Following this observation, an experimental design using an equal number of different types of stimuli should reveal predictive coding indices that specifically target the hard-constraints or fixed priors of the system. In AV speech, auditory event-related potentials elicited by the presentation of AV speech stimuli show dependencies on the content of visual speech stimuli: auditory event-related potentials could thus be interpreted as the resulting residual-errors of a comparison process between auditory and visual speech inputs (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>).</p>
<p>The argument elaborated here is that to enable a clear interpretation of neurophysiological and neuroimaging data using predictive approaches, the description of the internal model being tested along with the levels at which predictions are expected to occur (hence, the representational format and content of the internal predictors) has become necessary. For instance, previous electrophysiological indices of AV speech integration (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>) including latency (interpreted as visual modulations of auditory responses that are speech content-dependent) and amplitude (interpreted as visual modulations of auditory responses that are speech content-independent) effects are not incompatible with the amplitude effects reported in other studies (e.g., Stekelenburg and Vroomen, <xref ref-type="bibr" rid="B167">2007</xref>). AV speech integration implicates speech-specific predictions (e.g., phonetic, syllabic, articulatory representations) but also entails more general operations such as temporal expectation or attentional modulation. As such, the latency effects showed speech selectivity whereas amplitude effects did not; the former may index speech-content predictions coupled with temporal expectations, whereas the latter may inform on general predictive rules. Hierarchical levels can operate predictively in a non-exclusive and parallel manner. The benefit of predictive coding approaches is thus the refinement internal generative models, their specificity with regards to the combinatorial rules that are being used and the representational formats and contents of the different levels of predictions implicated in the model.</p>
</sec>
<sec>
<title>Bayesian implementation of predictive coding</title>
<p>Can Bayesian computations serve predictive coding for speech processing? Recent advances in computational neurosciences have offered a wealth of insights on the Bayesian brain (Den&#x000E8;ve and Pouget, <xref ref-type="bibr" rid="B53">2004</xref>; Ernst and B&#x000FC;lthoff, <xref ref-type="bibr" rid="B57">2004</xref>; Ma et al., <xref ref-type="bibr" rid="B101">2006</xref>; Yuille and Kersten, <xref ref-type="bibr" rid="B201">2006</xref>) and have opened new and essential venues for the interpretation of perceptual and cognitive operations.</p>
<p>AV speech research has seen the emergence of one of the first Bayesian models for perception, the Fuzzy Logical Model of Perception or FLMP (Massaro, <xref ref-type="bibr" rid="B110">1987</xref>, <xref ref-type="bibr" rid="B111">1998</xref>). In the initial FLMP, the detection and the evaluation stages in speech processing were independent and eventually merged into a single evaluation process (Massaro, <xref ref-type="bibr" rid="B111">1998</xref>). At this level, each speech signals is independently evaluated against prototypes in memory store and assigned a &#x0201C;fuzzy truth value&#x0201D; representing how well the input matches a given prototype. The fuzzy truth value could range from 0 (does not match at all) to 1 (exactly matches the prototype); the prototypical feature represents the ideal value that an exemplar of the prototype holds&#x02014;i.e., 1 in fuzzy logic&#x02014;hence the probability that a feature is present in the speech inputs. The prototypes are defined as speech categories which provide an ensemble of features and their conjunctions (Massaro, <xref ref-type="bibr" rid="B110">1987</xref>). In AV speech processing, the 0 to 1 mapping in each sensory modality allowed the use of Bayesian conditional probabilities and computations would take the following form: what is the probability that an AV speech input is a [ba] given a 0.6 probability of being a bilabial in the auditory domain and a 0.7 probability in the visual domain? The best outcome is selected based on the goodness-of-fit determined by prior evidence through a maximum likelihood procedure. Hence, in this scheme, the independence of sensory modalities is necessary to allow the combination of two feature estimates (e.g., place-of-articulations) and a compromise is reached at the decision stage through adjustments of the model with additional sensory evidence. In the FLMP, phonological categorization is thus replaced by a syllabic-like stage (and word structuring) as constrained by the classic phonological rules.</p>
<p>A major criticism of this early Bayesian model for speech perception pertains to the fitting adjustments of the FLMP which would either overfit or be inappropriate for the purpose of predicting integration (Grant, <xref ref-type="bibr" rid="B66">2002</xref>; Schwartz, <xref ref-type="bibr" rid="B150">2003</xref>). Additional discussions have pointed out to the lack of clear accounting of the format of auditory and visual speech representations in such models (Altieri et al., <xref ref-type="bibr" rid="B4">2011</xref>). More recent proposals have notably proposed a parallel architecture to account for AV speech integration efficiency in line with the interplay of inhibitory and excitatory effects seen in neuroimaging data (Altieri and Townsend, <xref ref-type="bibr" rid="B5">2011</xref>).</p>
</sec>
</sec>
<sec>
<title>Analysis-by-synthesis (AbyS)</title>
<p>In the seminal description of Analysis-by-Synthesis (AbyS, Figure <xref ref-type="fig" rid="F2">2</xref>) for auditory speech processing by Halle and Stevens (<xref ref-type="bibr" rid="B74">1962</xref>), and in line with the Motor Theory of Speech Perception (Liberman et al., <xref ref-type="bibr" rid="B97">1967</xref>; Liberman and Mattingly, <xref ref-type="bibr" rid="B98">1985</xref>), the internal representations used for the production and perception of speech are shared. Specifically, AbyS sketched a predictive implementation for the analysis of auditory speech: the internalized rules for speech production enable to generate hypotheses about which acoustic inputs would come next (Stevens, <xref ref-type="bibr" rid="B169">1960</xref>). From a computational standpoint, AbyS provides the representational system and the fixed priors (internal rules) constraining the computations of Bayesian probabilities at the comparison stages. The comparison of auditory and visual speech inputs with internalized articulatory commands can be compatible with Bayesian computations.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Analysis-by-synthesis (Halle and Stevens, <xref ref-type="bibr" rid="B74">1962</xref>)</bold>. In the original proposal, two major successive predictive modules are postulated: articulatory analysis followed by subphonetic analysis. In both modules, the generative rules of speech production are used to emit and refine predictions of the incoming sensory signal (articulatory analysis) or residual error from the previous stage (subphonetic analysis).</p></caption>
<graphic xlink:href="fpsyg-04-00388-g0002.tif"/>
</fig>
<p>In the AbyS, auditory inputs (after preliminary spectral analysis Poeppel et al., <xref ref-type="bibr" rid="B131">2008</xref>) are matched against the internal articulatory rules that would be used to produce the utterance (Halle and Stevens, <xref ref-type="bibr" rid="B74">1962</xref>). Internal speech production rules can take upon continuous values as the set of commands in speech production change as a function of time but &#x0201C;a given articulatory configuration may not be reached before the motion toward the next must be initiated&#x0201D; (Halle and Stevens, <xref ref-type="bibr" rid="B74">1962</xref>). Although the internal rules provide a continuous evaluation of the parameters, the evaluation process can operate on a different temporal scale thereby the units of speech remain discrete and articulatory based. By analogy with the overlap of articulatory commands, the auditory speech inputs contain the traces of preceding and following context (namely, co-articulation effects). Hence, the continuous assignment of values need not bear a one-to-one relationship with the original input signals and overlapping streams of information extraction (for instance, via temporal encoding windows) may enable this process.</p>
<sec>
<title>Amodal predictions</title>
<p>This early model provided one possible implementation for a forward in time and predictive view of sensory analysis (Stevens, <xref ref-type="bibr" rid="B169">1960</xref>; Halle and Stevens, <xref ref-type="bibr" rid="B74">1962</xref>). Since, AbyS has been re-evaluated in light of recent evidence for predictive coding in speech perception (Poeppel et al., <xref ref-type="bibr" rid="B131">2008</xref>). The internally generated hypotheses are constrained by phonological rules and their distinctive features serve as the discrete units for speech production/perception (Poeppel et al., <xref ref-type="bibr" rid="B131">2008</xref>). The non-invariance of incoming speech inputs can be compensated for by the existence of trading cues matched against the invariant built-in internal rules of the speech system. In particular, the outcome of the comparison process (i.e., the residual error) enables an active correction of the perceptual outcome (i.e., recalibrating so as to match the best fitting value) of the production output.</p>
<p>In conversational settings, the visible articulatory gestures for speech production have recently been argued to precede the auditory utterance by an average of 100&#x02013;300 ms (Chandrasekaran et al., <xref ref-type="bibr" rid="B44">2009</xref>). The natural precedence of visual speech features could initiate the generation of internal hypotheses as to the incoming auditory speech inputs. This working hypothesis was tested with EEG and MEG by comparing the auditory evoked-responses elicited by auditory and AV speech stimuli (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Figure <xref ref-type="fig" rid="F3">3</xref>). The early auditory evoked responses elicited by AV speech showed (i) shorter latencies and (ii) reduced amplitudes compared to those elicited by auditory speech alone (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>). Crucially, the latency shortening of auditory evoked responses was a function of the ease with which participants categorized visual speech alone, thereby a [pa] lead to shorter latencies than [ka] or [ta]. In the context of AbyS, the reliability with which visual speech can trigger internal predictions for incoming auditory speech constrains the analysis of auditory speech (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Poeppel et al., <xref ref-type="bibr" rid="B131">2008</xref>; Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>, <xref ref-type="bibr" rid="B9">2011</xref>).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Auditory event-related potentials in response to auditory (blue), visual (green), and AV (red) non-sense syllables</bold>. (Panel <bold>A</bold>) Scalp distribution of auditory ERPs to auditory, visual and AV speech presentation. (Panel <bold>B</bold>) Latency (bottom left) and absolute amplitude (bottom right) differences of the auditory ERPs (N1 is blue, P2 is red) as a function of correct identification (CI) of visual speech. The better the identification rate in visual speech alone, the earlier the N1/P2 complex occurred. A similar amplitude decrease for N1 (less negative) and P2 (less positive) was observed for all congruent and incongruent AV presentations as compared to A presentations (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>).</p></caption>
<graphic xlink:href="fpsyg-04-00388-g0003.tif"/>
</fig>
</sec>
<sec>
<title>Temporal encoding windows and temporal windows of integration</title>
<p>Two features of the AbyS model are of particular interest here (Figure <xref ref-type="fig" rid="F5">5</xref>). First, visual speech is argued to predict auditory speech in part because of the natural precedence of incoming visual speech inputs; second, AV speech integration tolerates large AV asynchronies without affecting optimal integration (Massaro et al., <xref ref-type="bibr" rid="B112">1996</xref>; Conrey and Pisoni, <xref ref-type="bibr" rid="B49">2006</xref>; van Wassenhove et al., <xref ref-type="bibr" rid="B186">2007</xref>; Maier et al., <xref ref-type="bibr" rid="B107">2011</xref>). In one of these studies (van Wassenhove et al., <xref ref-type="bibr" rid="B186">2007</xref>), two sets of AV speech stimuli (voiced and voiceless auditory bilabials dubbed onto visual velars) were desynchronized and tested using two types of task: (i) a speech identification task (&#x0201C;what do you hear while looking at the talking face?&#x0201D;) and (ii) a temporal synchrony judgment task (&#x0201C;where AV stimuli in- or out-of-sync?). Results showed that both AV speech identification and temporal judgment tolerated about 250 ms of AV desynchrony in McGurked and congruent syllables. The duration of the &#x0201C;temporal window of integration&#x0201D; found in these experiments approximated the average syllabic duration across languages, suggesting that syllables may be an important unit of computations in AV speech processing. Additionally, this temporal window of integration showed an asymmetry so that visual leads were better tolerated than auditory leads&#x02014;with respect to the strength of AV integration. This suggested that the temporal resolutions for the processing of speech information arriving in each sensory modality may actually differ, in agreement with the natural sampling strategies found in auditory and visual systems. This interpretation could now be refined (Figure <xref ref-type="fig" rid="F4">4</xref>).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Temporal window of integration in AV speech</bold>. (Panel <bold>A</bold>) Illustration of results in a simultaneity judgment task (top) and a speech identification task (bottom) (van Wassenhove et al., <xref ref-type="bibr" rid="B186">2007</xref>). Simultaneity ratings observed for congruent (top, filled symbols) and incongruent (top, open symbols) AV speech as a function of AV desynchrony. Auditory dominated (bottom, blue), visual dominated (bottom, green) or McGurk fusion (bottom, orange) responses as a function of desynchrony using McGurked syllables. The combination of the auditory encoding (blue arrow: tolerance to visual lags) and visual encoding (green arrow: tolerance to visual leads) form the temporal encoding window for AV speech integration. (Panel <bold>B</bold>) Schematic illustration distinguishing temporal encoding and temporal integration windows. The temporal resolution reflected in the encoding window corresponds to the necessary or obligatory time for speech encoding; the temporal resolution reflected in the integration windows correspond to the encoding window plus the tolerated temporal noise leading to less than optimal encoding performance.</p></caption>
<graphic xlink:href="fpsyg-04-00388-g0004.tif"/>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Analysis-by-synthesis (AbyS) in AV speech processing</bold>. Two analytical routes are posited on the basis of the original AbyS proposal, namely a subphonetic feature and an articulatory analysis of incoming speech inputs. The privileged route for auditory processing is subphonetic by virtue of the fine temporal precision afforded by the auditory system; the privileged route for visual speech analysis is articulatory by virtue of slower temporal resolution of the visual system and the kinds of information provided by the interlocutor&#x00027;s face. Evidence for the coexistence of 2 modes of speech processing or temporal multiplexing of AV speech can be drawn from the asymmetry of the temporal window of integration in AV speech (cf. Figure <xref ref-type="fig" rid="F4">4</xref>). Although both stages are posited to run in parallel, predictions in both streams are elaborated on the basis of the generative rules of speech production. Predictive mode of AV speech processing is notably marked by a decreased amplitude of the auditory evoked responses (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>) and residual errors have been characterized either by latency shifts of the auditory evoked responses commensurate with the gain of information in visual speech (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>) or by later amplitudes differences commensurate to the detected incongruency of auditory and visual speech inputs (Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>). AbyS is thus a predictive model operating on temporal multiplexing of speech (i.e., parallel and predictive processing of speech features on two temporal scales) and is compatible with recently proposed neurophysiological implementations of predictive speech coding (Poeppel, <xref ref-type="bibr" rid="B130">2003</xref>; Giraud and Poeppel, <xref ref-type="bibr" rid="B65">2012</xref>).</p></caption>
<graphic xlink:href="fpsyg-04-00388-g0005.tif"/>
</fig>
<p>The &#x0201C;temporal window of integration&#x0201D; can be seen as the integration of two temporal encoding windows (following the precise specifications of Theunissen and Miller, <xref ref-type="bibr" rid="B178">1995</xref>), namely: the encoding window needed by the auditory system to reach phonological categorization is determined by the tolerance to visual speech lags, whereas the encoding window needed for the visual system to reach visemic categorization is illustrated by the tolerance to auditory speech lags. Hence, the original &#x0201C;temporal window of integration&#x0201D; is a misnomer: the original report describing a plateau within which the order of auditory and speech information did not diminish the rate of integration specifically illustrates the &#x0201C;temporal encoding window&#x0201D; of AV speech i.e., <italic>the necessary time needed for the speech system to elaborate a final outcome or to establish a robust residual error from the two analytical streams in the AbyS framework</italic>. The tolerated asynchronies measured by just-noticeable-differences (Vroomen and Keetels, <xref ref-type="bibr" rid="B191">2010</xref>) or thresholds should be interpreted as the actual &#x0201C;temporal integration window&#x0201D; namely, the tolerance to temporal noise in the integrative system. Said differently, <italic>the fixed constraints are the temporal encoding windows; the tolerance to noise is reflected in the temporal integration windows</italic>.</p>
<p>Temporal windows of integration or &#x0201C;temporal binding windows&#x0201D; (Stevenson et al., <xref ref-type="bibr" rid="B172">2012</xref>) have been observed for various AV stimuli and prompted some promising models for the integration of multisensory information (Colonius and Diederich, <xref ref-type="bibr" rid="B48">2004</xref>). Consistent with the distinction between encoding and integration windows described above, a refined precision of temporal integration/binding windows can be obtained after training (Powers et al., <xref ref-type="bibr" rid="B132">2009</xref>) with a likely limitation of training to the temporal encoding resolution of the system. Interestingly, a recent study (Stevenson et al., <xref ref-type="bibr" rid="B172">2012</xref>) has shown that the width of an individual&#x00027;s temporal integration window for non-speech stimuli could predict the strength of AV speech integration (Stevenson et al., <xref ref-type="bibr" rid="B172">2012</xref>). Whether direct inferences can be drawn between the conscious simultaneity of AV events (overt comparison of events timing entails segregation) and AV speech (integration of AV speech content) is, however, growing controversial. For instance, temporal windows in patients with schizophrenia obtained in a timing task are a poor predictors of their ability to bind AV speech information (Martin et al., <xref ref-type="bibr" rid="B109">2012</xref>), suggesting that distinct neural processes are implicated in the two tasks (in spite of identical AV speech stimuli). Future work in the field will likely help disambiguating which neural operations are sufficient and necessary for conscious timing and which are necessary for binding operations.</p>
</sec>
<sec>
<title>Oscillations and temporal windows</title>
<p>In this context, one could question whether the precedence of visual speech is a prerequisite for predictive coding in AV speech and specifically, whether the ordering of speech inputs in each sensory modality may affect the posited predictive scheme. This would certainly be an issue if speech analysis followed serial computations operating on a very refined temporal grain. As seen in studies of desynchronized AV speech, this does not seem to be the case: the integrative system operates on temporal windows within which order is not essential (cf. van Wassenhove, <xref ref-type="bibr" rid="B184">2009</xref> for a discussion on this topic) and both auditory and visual systems likely use different sampling rates in their acquisition of sensory evidence (cf. Temporal parsing and non-invariance).</p>
<p>Recent models of speech processing have formulated clear mechanistic hypotheses implicating neural oscillations: the temporal logistics of cortical activity naturally impose temporal granularities on the parsing and the integration of speech information (Giraud and Poeppel, <xref ref-type="bibr" rid="B65">2012</xref>). For instance, the default oscillatory activity observed in the speech network (Morillon et al., <xref ref-type="bibr" rid="B117">2010</xref>) is consistent with the posited temporal multiplexing of speech inputs. If the oscillatory hypothesis is on the right track, it is thus very unlikely that the dynamic constraints as measured by the temporal encoding (and not integration) window can be changed considering that cortical rhythms (Wang, <xref ref-type="bibr" rid="B196">2010</xref>) provide the dynamic architecture for neural operations. The role of oscillations for predictive operations in cortex has further been reviewed elsewhere (Arnal and Giraud, <xref ref-type="bibr" rid="B6">2012</xref>).</p>
<p>Additionally, visual speech may confer a natural rhythmicity to the syllabic parsing of auditory speech information (Schroeder et al., <xref ref-type="bibr" rid="B148">2008</xref>; Giraud and Poeppel, <xref ref-type="bibr" rid="B65">2012</xref>) and this could be accounted for by phase-resetting mechanisms across sensory modalities. Accordingly, recent MEG work illustrates phase consistencies during the presentation of AV information (Luo et al., <xref ref-type="bibr" rid="B100">2010</xref>; Zion Golumbic et al., <xref ref-type="bibr" rid="B202">2013</xref>). Several relevant oscillatory regimes [namely theta (4 Hz, &#x0007E;250 ms), beta (&#x0007E;20 Hz, 50 ms) and gamma (&#x0003E;40 Hz, 25 ms)] have also been reported that may constrain the integration of AV speech (Arnal et al., <xref ref-type="bibr" rid="B9">2011</xref>). A bulk of recent findings provides structuring constraints on speech processing&#x02014;i.e., fixed priors. Consistent with neurophysiology, AbyS incorporates temporal multiplexing for speech processing thereby parallel temporal resolutions are used to represent relevant speech information at the segmental and syllabic scales (Poeppel, <xref ref-type="bibr" rid="B130">2003</xref>; Poeppel et al., <xref ref-type="bibr" rid="B131">2008</xref>). In AV speech, each sensory modality may thus operate with a preferred temporal granularity and it is the integration of the two processing streams that effectively reflects the temporal encoding window. Such parallel encoding may also be compatible with recent efforts in modeling AV speech integration (Altieri and Townsend, <xref ref-type="bibr" rid="B5">2011</xref>).</p>
</sec>
</sec>
<sec>
<title>Critical period in AV speech perception: acquisition of fixed priors</title>
<p>During development, the acquisition of speech production could undergo an imitative stage from visual speech perception to speech production. In principle, the imitative stage allows children to learn how to articulate speech sounds by explicitly reproducing the caretakers&#x00027; facial gestures. However, mounting evidence suggests that imitation does not operate on a blank-slate system; rather, internal motor representations for speech are readily available early on. First, the gestural repertoire is already very rich only 3 weeks after birth, suggesting an innate ability for the articulation of elementary speech sounds (Meltzoff and Moore, <xref ref-type="bibr" rid="B115">1979</xref>; Dehaene-Lambertz and DehaeneHertz-Pannier, <xref ref-type="bibr" rid="B52">2002</xref>). Second, auditory inputs alone are sufficient for infants to reproduce accurately simple speech sounds and enable the recognition of visual speech inputs matching utterances that have only been heard (Kuhl and Meltzoff, <xref ref-type="bibr" rid="B90">1982</xref>, <xref ref-type="bibr" rid="B91">1984</xref>). Furthermore, during speech acquisition, infants do not see their own gestures: consequently, infants can only correct their own speech production via auditory feedback or via matching a peer&#x00027;s gestures (provided visually) to their own production, i.e., via proprioception (Meltzoff, <xref ref-type="bibr" rid="B114">1999</xref>).</p>
<p>Comparatively few studies have addressed the question of AV speech processing during development. The simplest detection of AV synchrony has been argued to emerge first followed by duration, rate and rhythm matching across sensory modalities in the first 10 months of an infant&#x00027;s life (Lewkowicz, <xref ref-type="bibr" rid="B96">2000</xref>). In the spatial domain, multisensory associations are established slowly during the first 2 years of life suggesting that the more complex the pattern, the later the acquisition, in agreement with the &#x0201C;increasing specificity hypothesis&#x0201D; (Gibson, <xref ref-type="bibr" rid="B64">1969</xref>; Spelke, <xref ref-type="bibr" rid="B165">1981</xref>). Three and a half months old infants are sensitive to natural temporal structures but only later on (7 months) are arbitrary multisensory associations detected (e.g., pitch and shape Bahrick, <xref ref-type="bibr" rid="B11">1992</xref>); emotion matching in strangers (Walker-Andrews, <xref ref-type="bibr" rid="B194">1986</xref>). However, early sensitivity to complex AV speech events has been reported in 5 months old infants who can detect the congruency of auditory speech inputs with facial articulatory movements (Rosenblum et al., <xref ref-type="bibr" rid="B141">1997</xref>). The spatiotemporal structuring of arbitrary patterns as well as the nature and ecological relevance of incoming information owe to be important factors in the tuning of a supramodal system. The acquisition of cross-sensory equivalences seems to undergo a perceptual restructuring that can be seen as a fine-tuning of perceptual grouping (Gestalt-like) rules.</p>
<p>Born deaf children who received implants at various ages provide an opportunity to investigate the importance of age at the time of implant for the development of AV speech perception (Bergeson and Pisoni, <xref ref-type="bibr" rid="B19">2004</xref>). A substantial proportion of children who receive cochlear implants learn to perceive speech remarkably well using their implants (Waltzman et al., <xref ref-type="bibr" rid="B195">1997</xref>; Svirsky et al., <xref ref-type="bibr" rid="B175">2000</xref>; Balkany et al., <xref ref-type="bibr" rid="B12">2002</xref>) and are able to integrate congruent AV speech stimuli (Bergeson et al., <xref ref-type="bibr" rid="B20">2003</xref>, <xref ref-type="bibr" rid="B21">2005</xref>; Niparko et al., <xref ref-type="bibr" rid="B123">2010</xref>). In a previous study (Schorr et al., <xref ref-type="bibr" rid="B146">2005</xref>), born-deaf children who had received cochlear implants were tested with McGurk stimuli [visual [ka] dubbed with auditory [pa]; (McGurk and MacDonald, <xref ref-type="bibr" rid="B113">1976</xref>)]. The main hypothesis was that experience played a critical role in forming AV associations for speech perception. In this study, most children with cochlear implants did not experience reliable McGurk effects, and AV speech perception for these children was essentially dominated by lip-reading consistent with their hearing-impairment. However, the likelihood of consistent McGurk illusory reports depended on the age at which children received their cochlear implants. Children who exhibited consistent McGurk illusions received their implants before 30 months of age; conversely, children who received implants after 30 months of age did not show consistent McGurk effects. These results demonstrated that AV speech integration was shaped by experience early on in life. When auditory experience with speech was mediated by a cochlear implant, the likelihood of acquiring strong AV speech fusion was greatly increased. These results suggested the existence of a sensitive period for AV speech perception (Sharma et al., <xref ref-type="bibr" rid="B159">2002</xref>).</p>
<p>To date however, whether the temporal constraints and neurophysiological indices for AV speech integration in development are comparable to those observed in adults remain unclear.</p>
</sec>
<sec>
<title>Resilient temporal integration and the co-modulation hypothesis</title>
<p>In natural scenes, diverse sensory cues help the brain select and integrate relevant information to build internal representations. In the context of perceptual invariance and supramodal processing, auditory pitch and visual spatial frequency have been shown to undergo automatic cross-sensory matching (Maeda et al., <xref ref-type="bibr" rid="B106">2004</xref>; Evans and Treisman, <xref ref-type="bibr" rid="B58">2010</xref>). Additionally, auditory and visual signals showing slow temporal fluctuations are most likely to undergo automatic integration (K&#x000F6;sem and van Wassenhove, <xref ref-type="bibr" rid="B89">2012</xref>). In AV speech, the acoustic envelope and the movements of the lips show high correlation or co-modulation (Grant and Seitz, <xref ref-type="bibr" rid="B69">2000</xref>; Remez, <xref ref-type="bibr" rid="B138">2003</xref>) naturally locked to the articulatory gestures of the face. Crucially, this co-modulation shows specificity: AV speech intelligibility shows a similar range of tolerance to asynchronies when the spectral characteristics of the acoustic signal preserve the feature information specific to the articulation (i.e., the F2/F3 formants region) (Grant and Greenberg, <xref ref-type="bibr" rid="B67">2001</xref>). These local correlations have recently been argued to promote AV speech integration even when visual speech information is consciously suppressed (Alsius and Munhall, <xref ref-type="bibr" rid="B2">2013</xref>). Taken altogether, these results suggest that the correlation of auditory and visual speech signals serve as a strong (bottom-up) cue for integration enabling the brain to correctly track signals belonging to the same person as indicated by recent neurophysiological findings (Zion Golumbic et al., <xref ref-type="bibr" rid="B202">2013</xref>).</p>
<p>These observations need to be reconciled with an efficient predictive coding framework as the speech content provided by audition and vision is likely undergoing a non-correlative operation. This would be necessary to allow for the typical informational gain observed in AV speech studies in line with a previously sketched out idea (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>), the proposed distinction between correlated and complementary modes of AV speech processing (Campbell, <xref ref-type="bibr" rid="B40">2008</xref>) and AV speech integration models (Altieri and Townsend, <xref ref-type="bibr" rid="B5">2011</xref>).</p>
<p>In this context, while there is ample evidence that speaking rate has a substantial impact on AV speech perception, little is known about the effect of speaking rate on the temporal encoding window. Changes in speaking rate naturally impact the kinematics of speech production, hence the acoustic and visual properties of speech. It is unclear to which extent the posited hard temporal constraints on AV speech integration may be flexible under various speaking rates. In the facial kinematics, different kinds of cues can effectively vary including the motion of the surface structures, the velocity patterns of the articulators and the frequency components over a wide spectrum. Any or all of these could contribute differently to AV speech integration for fast and slow speech and could thus perturb the integration process.</p>
<p>In two experiments (Brungart et al., <xref ref-type="bibr" rid="B27">2007</xref>, <xref ref-type="bibr" rid="B26">2008</xref>), the resilience of AV speech intelligibility was put to the test of noise, AV speech asynchrony and speaking rate. In a first experiment, AV speech recordings of phrases from the Modified Rhyme Test (MRT) were accelerated or decelerated (Brungart et al., <xref ref-type="bibr" rid="B27">2007</xref>). Eight different levels of speaking rate were tested ranging from 0.6 to 20 syllables per second (syl/s). Results showed that the benefits of AV speech were preserved at speaking rates as fast as 12.5 syl/s but disappeared when the rate was increased to 20 syl/s. Importantly, AV speech performance did not benefit from phrases presented slower than their original speaking rates. Using the same experimental material, both the speaking rate and the degree of AV speech asynchrony were varied (Brungart et al., <xref ref-type="bibr" rid="B26">2008</xref>). For the fastest speaking rates, maximal AV benefit occurred at slightly larger visual delay (150 ms) but there was no conclusive evidence suggesting that auditory speech delays for maximal benefit systematically changed with speaking rate. At the highest speaking rates, AV speech enhancement was maximal when the audio signal was delayed by &#x0007E;150 ms relative to visual speech, and performance degraded relatively rapidly when the audio speech varied away from its optimal value. As the speaking rate decreased, the range of delays for enhanced AV speech benefit increased, suggesting that participants were tolerant to a wider range of AV speech asynchronies when the speaking rate was relatively slow. However, there was no compelling evidence suggesting that the optimal delay value for AV enhancement systematically changed with the speaking rate of the talker. Finally, when acoustic noise was added, the benefit of visual cues degraded rapidly with faster speaking rate. AV speech integration in noise occurred at all speaking rates slower than 7.8 syl/s. AV speech benefits were observed in all conditions suggesting that the co-modulation of AV speech information can robustly drives integration.</p>
</sec>
<sec>
<title>Neural mechanisms for AV speech processing: convergence and divergence</title>
<p>Two reliable electrophysiological markers for AV speech integration are (i) an amplitude decrease (Besle et al., <xref ref-type="bibr" rid="B24">2004</xref>; J&#x000E4;&#x000E4;skel&#x000E4;inen et al., <xref ref-type="bibr" rid="B80">2004</xref>; van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Bernstein et al., <xref ref-type="bibr" rid="B22">2008</xref>; Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>; Piling, <xref ref-type="bibr" rid="B129">2009</xref>) and (ii) latency shifts (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>) of the auditory evoked responses. Decreased amplitude of the auditory response to visual speech inputs was originally observed when participants were shown with a video of a face articulating the same or a different vowel sound 500 ms after the presentation of the face (J&#x000E4;&#x000E4;skel&#x000E4;inen et al., <xref ref-type="bibr" rid="B80">2004</xref>). In this study, visual speech inputs were interpreted as leading to the adaptation of the subset of auditory neurons responsive to that feature. However, no difference in amplitude was observed when the visual stimuli were drawn from the same or from a different phonetic category, suggesting non-specific interactions of visual speech information with the early auditory analysis of speech. The amplitude reduction of the auditory evoked responses observed in EEG and MEG is supported by intracranial recordings (Reale et al., <xref ref-type="bibr" rid="B137">2007</xref>; Besle et al., <xref ref-type="bibr" rid="B23">2008</xref>). In particular, Besle et al. (<xref ref-type="bibr" rid="B23">2008</xref>) reported two kinds of AV interactions in the secondary auditory association cortices after the first influence of visual speech in this region: at the onset of the auditory syllable, the initial visual influence disappeared and the amplitude of the auditory response decreased compared to the auditory alone presentation. Similar amplitude reductions were observed to the presentation of AV syllables over the left lateral pSTG (Reale et al., <xref ref-type="bibr" rid="B137">2007</xref>).</p>
<p>In all of these studies, the reported amplitude reduction spanned a couple hundreds of milliseconds, consistent with the implication of low frequency neural oscillations. In monkey neurophysiology, a decreased low-frequency power in auditory cortex has been reported in the context of AV communication (Kayser and Logothetis, <xref ref-type="bibr" rid="B83">2009</xref>). Based on a set of neurophysiological recordings in monkeys, it was proposed that visual inputs change the excitability of auditory cortex by resetting the phase of ongoing oscillation (Schroeder et al., <xref ref-type="bibr" rid="B148">2008</xref>); recent evidence using an AV cocktail party design (Zion Golumbic et al., <xref ref-type="bibr" rid="B202">2013</xref>) support this hypothesis. Additional MEG findings suggest that the tracking of AV speech information may be dealt with by phase-coupling of auditory and visual cortices (Luo et al., <xref ref-type="bibr" rid="B100">2010</xref>). In the context of a recent neurocomputational framework for speech processing (Giraud and Poeppel, <xref ref-type="bibr" rid="B65">2012</xref>), visual speech would thus influence ongoing auditory activity so as to condition the analysis of auditory speech events. Whether this tracking is distinctive with regards to speech content is unclear. The decreased amplitude of auditory evoked responses may be related to the phase entrainment between auditory and visual speech or to the power decrease of low-frequency regions. However, since no clear correlation between the amplitude and the phonetic content are seen in the amplitude, this mechanism does not appear to carry the content of the speech representation, consistent with the lack of visemic or AV speech congruency effect (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>; Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>) and a previously emitted interpretation (Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>, <xref ref-type="bibr" rid="B9">2011</xref>).</p>
<p>With respect to latency shifts, two studies reported auditory evoked responses as a function of visemic information: one study interpreted that effects on auditory evoked responses carried the residual error (van Wassenhove et al., <xref ref-type="bibr" rid="B187">2005</xref>) and another reported late residual errors at about 400 ms (Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>). The specificity of this modulation remains unsettled: visual inputs have been reported to change the excitability of auditory cortex by resetting the phase of ongoing oscillation (Lakatos et al., <xref ref-type="bibr" rid="B92">2008</xref>) but an amplification of the signal would have been predicted in auditory cortex (Schroeder et al., <xref ref-type="bibr" rid="B148">2008</xref>). A recent study (Zion Golumbic et al., <xref ref-type="bibr" rid="B202">2013</xref>) implicates the role of attention in selecting or predicting relevant auditory inputs on the basis of visual information. This interpretation would be in line with the notion that visual speech information enables to increase the salience of relevant auditory information for further processing. To which extent phase-resetting mechanisms are speech-specific or more generally implicated in modulating the gain of sensory inputs remains to be determined, along with the implication of specific frequency regimes. Recent findings suggest that multiplexing of speech features could be accomplished in different frequency regimes (Arnal et al., <xref ref-type="bibr" rid="B9">2011</xref>) with coupling between auditory and visual cortices realized via STS. The directionality of these interactions remains to be thoroughly described in order to understand how specific the informational content propagates in the connectivity of these regions. Recent work in monkey neurophysiology has started addressing these issues (Kayser et al., <xref ref-type="bibr" rid="B84">2010</xref>; Panzeri et al., <xref ref-type="bibr" rid="B125">2010</xref>).</p>
<p>It is noteworthy that MEG, EEG, and surface EEG (sEEG) data can contrast with fMRI and PET findings in which enhanced and supra-additive BOLD activations have been reported to the presentation of visual and AV speech. Both enhanced and sub-additive activation in mSTG, pSTG and pSTS have been reported together with left inferior temporal gyrus (BA 44/45), premotor cortex (BA 6), and anterior cingulate gyrus (BA 32) to the presentation of congruent and incongruent AV speech, respectively (Calvert, <xref ref-type="bibr" rid="B32">1997</xref>; Calvert et al., <xref ref-type="bibr" rid="B33">1999</xref>, <xref ref-type="bibr" rid="B35">2000</xref>; Hasson et al., <xref ref-type="bibr" rid="B77">2007</xref>; Skipper et al., <xref ref-type="bibr" rid="B162">2007</xref>). Other fMRI findings (Callan et al., <xref ref-type="bibr" rid="B31">2003</xref>) have shown significant activation of the MTG, STS, and STG in response to the presentation of AV speech in noise; BOLD activation consistent with the inverse effectiveness principle in these same regions (MTG, STS, and STG) has also been reported for stimuli providing information on the place of articulation (Callan et al., <xref ref-type="bibr" rid="B30">2004</xref>). The left posterior STS has been shown sensitivity to incongruent AV speech (Calvert et al., <xref ref-type="bibr" rid="B35">2000</xref>; Wright et al., <xref ref-type="bibr" rid="B198">2003</xref>; Miller and D&#x00027;Esposito, <xref ref-type="bibr" rid="B116">2005</xref>). Using fMRI and PET, Sekiyama et al. (<xref ref-type="bibr" rid="B155">2003</xref>) used the McGurk effect with two levels of auditory noise; comparison between the low and high SNR conditions revealed a left lateralized activation in the posterior STS and BA 22, thalamus, and cerebellum. However, not all studies support the inverse effectiveness principle in auditory cortex (Calvert et al., <xref ref-type="bibr" rid="B33">1999</xref>; Jones and Callan, <xref ref-type="bibr" rid="B81">2003</xref>). Desynchronizing AV McGurk syllables does not significantly affect activation of the STS or auditory cortex (Olson et al., <xref ref-type="bibr" rid="B124">2002</xref>; Jones and Callan, <xref ref-type="bibr" rid="B81">2003</xref>) whereas others report significant and systematic activation of HG as a function of desynchrony (Miller and D&#x00027;Esposito, <xref ref-type="bibr" rid="B116">2005</xref>). Recent fMRI studies have reported specialized neural populations in the Superior Temporal Sulcus (STS in monkey) or Superior Temporal Cortex (STC, human homolog). The organization of this multisensory region is known to be patchy (Beauchamp et al., <xref ref-type="bibr" rid="B17">2004</xref>) but recognized to be an essential part of the AV speech integration network (Arnal et al., <xref ref-type="bibr" rid="B7">2009</xref>; Beauchamp et al., <xref ref-type="bibr" rid="B18">2010</xref>). The middle STC (mSTC) is a prime area for the detection of AV asynchrony and the integration of AV speech (Bushara et al., <xref ref-type="bibr" rid="B29">2001</xref>; Miller and D&#x00027;Esposito, <xref ref-type="bibr" rid="B116">2005</xref>; Stevenson et al., <xref ref-type="bibr" rid="B170">2010</xref>, <xref ref-type="bibr" rid="B171">2011</xref>). At least two neural subpopulations may coexist in this region: the synchrony population tagged S-mSTC showing increased activation to AV speech stimuli when the auditory and visual streams are in synchrony and the bimodal population tagged B-mSTC showing the opposite pattern, namely a decrease of activation with the presentation of synchronized audiovisual speech streams (Stevenson et al., <xref ref-type="bibr" rid="B170">2010</xref>, <xref ref-type="bibr" rid="B171">2011</xref>). These results may help shed light on the contribution of neural subpopulations in mSTC in computing redundant information vs. efficient coding for AV speech processing.</p>
<p>Using fMRI technique, the contribution of motor cortices has also been tested in the perception of auditory, visual and AV speech (Skipper et al., <xref ref-type="bibr" rid="B162">2007</xref>). In these experiments, participants actively produced syllables or passively perceived auditory, visual and AV stimuli in the scanner. The AV stimuli consisted of both congruent AV [pa], [ka], and [ta] and McGurk fusion stimuli (audio [pa] dubbed onto a face articulating [ka]). The main results showed that the cortical activation pattern during the perception of visual and AV but not auditory speech greatly overlapped with that observed in speech production. The areas showing above 50% of overlap in production and perception were bilateral anterior and posterior Superior Temporal cortices (STa and STp, respectively), and ventral premotor cortex (PMv). The perception of McGurk fusion elicited patterns of activation that correlated differently across cortical areas with the perception of a congruent AV [pa] (the auditory component in the McGurk fusion stimulus), AV [ka] (the visual component of the McGurk fusion stimulus) or AV [ta] (the perceived illusory [ta] elicited by the McGurk fusion stimulus). Activations observed in frontal motor areas, and auditory and somatosensory cortices during McGurk presentation correlated more with the perceived syllable (AV [ta]) than the presented syllables in either sensory modality (A [pa], V [ka]). In visual cortices, activation correlated most with the presentation of a congruent AV [ka]. Overall, results were interpreted in the context of a forward model of speech processing.</p>
</sec>
<sec>
<title>Outstanding questions</title>
<p>First, what is (in) a prediction? Although computational models provide interesting constraints with which to work, we cannot currently separate temporal prediction from speech-content predictions (e.g., Arnal and Giraud, <xref ref-type="bibr" rid="B6">2012</xref>). One important finding encompassing the context of speech is that amplitude decrease can be seen as a general marker of predictive coding (e.g., Todorovic and de Lange, <xref ref-type="bibr" rid="B180">2012</xref>) in auditory cortex and more specifically during speech production (Houde and Jordan, <xref ref-type="bibr" rid="B79">1998</xref>).</p>
<p>Second, what anchors are used to parse visual speech information or, what are the &#x0201C;edges&#x0201D; (Giraud and Poeppel, <xref ref-type="bibr" rid="B65">2012</xref>) of visual speech information? Complementarily, can we use cortical responses to derive the distinctive features of visual speech (Luo et al., <xref ref-type="bibr" rid="B100">2010</xref>)?</p>
<p>Third, in the context of fixed temporal constraints for speech processing, how early can temporal encoding/integration windows be characterized in babies? Is the co-modulation hypothesis a general guiding principle for multisensory integration or a specific feature of AV speech?</p>
<p>Finally, the implication of the motor system in the analysis of speech inputs has been a long-standing debate undergoing increasing refinement (e.g., Scott et al., <xref ref-type="bibr" rid="B152">2009</xref>). The inherent rhythmicity of speech production naturally constrains the acoustic and visual structure of auditory and visual speech outcomes. Is the primary encoding mode of AV speech articulatory or acoustic (e.g., Altieri et al., <xref ref-type="bibr" rid="B4">2011</xref>; Schwartz et al., <xref ref-type="bibr" rid="B151">2012</xref>)?</p>
<sec>
<title>Conflict of interest statement</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>This work was realized thanks to a Marie Curie IRG-24299, an ERC-StG-263584 and an ANR10JCJ-1904 to Virginie van Wassenhove.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Allik</surname> <given-names>J.</given-names></name> <name><surname>Konstabel</surname> <given-names>K.</given-names></name></person-group> (<year>2005</year>). <article-title>G. F. Parrot and the theory of unconscious inferences</article-title>. <source>J. Hist. Behav. Sci</source>. <volume>41</volume>, <fpage>317</fpage>&#x02013;<lpage>330</lpage>. <pub-id pub-id-type="doi">10.1002/jhbs.20114</pub-id><pub-id pub-id-type="pmid">16196051</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alsius</surname> <given-names>A.</given-names></name> <name><surname>Munhall</surname> <given-names>K. G.</given-names></name></person-group> (<year>2013</year>). <article-title>Detection of audiovisual speech correspondences without visual awareness</article-title>. <source>Psychol. Sci</source>. <volume>24</volume>, <fpage>423</fpage>&#x02013;<lpage>431</lpage>. <pub-id pub-id-type="doi">10.1177/0956797612457378</pub-id><pub-id pub-id-type="pmid">23462756</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alsius</surname> <given-names>A.</given-names></name> <name><surname>Navarra</surname> <given-names>J.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Soto-Faraco</surname> <given-names>S.</given-names></name></person-group> (<year>2005</year>). <article-title>Audiovisual integration of speech falters under high attention demands</article-title>. <source>Curr. Biol</source>. <volume>15</volume>, <fpage>839</fpage>&#x02013;<lpage>843</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2005.03.046</pub-id><pub-id pub-id-type="pmid">15886102</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altieri</surname> <given-names>N.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Townsend</surname> <given-names>J. T.</given-names></name></person-group> (<year>2011</year>). <article-title>Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions</article-title>. <source>Seeing Perceiving</source> <volume>24</volume>, <fpage>513</fpage>&#x02013;<lpage>539</lpage>. <pub-id pub-id-type="doi">10.1163/187847611X595864</pub-id><pub-id pub-id-type="pmid">21968081</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altieri</surname> <given-names>N.</given-names></name> <name><surname>Townsend</surname> <given-names>J. T.</given-names></name></person-group> (<year>2011</year>). <article-title>An assessment of behavioral dynamic information processing measures in audiovisual speech perception</article-title>. <source>Front. Psychol</source>. <volume>2</volume>:<issue>238</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00238</pub-id><pub-id pub-id-type="pmid">21980314</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arnal</surname> <given-names>L. H.</given-names></name> <name><surname>Giraud</surname> <given-names>A. L.</given-names></name></person-group> (<year>2012</year>). <article-title>Cortical oscillations and sensory predictions</article-title>. <source>Trends Cogn. Sci</source>. <volume>16</volume>, <fpage>390</fpage>&#x02013;<lpage>398</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2012.05.003</pub-id><pub-id pub-id-type="pmid">22682813</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arnal</surname> <given-names>L.</given-names></name> <name><surname>Morillon</surname> <given-names>B.</given-names></name> <name><surname>Kell</surname> <given-names>C.</given-names></name> <name><surname>Giraud</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <article-title>Dual neural routing of visual facilitation in speech processing</article-title>. <source>J. Neurosci</source>. <volume>29</volume>, <fpage>13445</fpage>&#x02013;<lpage>13453</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3194-09.2009</pub-id><pub-id pub-id-type="pmid">19864557</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arnal</surname> <given-names>L. H.</given-names></name> <name><surname>Wyart</surname> <given-names>V.</given-names></name> <name><surname>Giraud</surname> <given-names>A. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Transitions in neural oscillations reflect prediction errors generated in audiovisual speech</article-title>. <source>Nat. Neurosci</source>. <volume>14</volume>, <fpage>797</fpage>&#x02013;<lpage>801</lpage>. <pub-id pub-id-type="doi">10.1038/nn.2810</pub-id><pub-id pub-id-type="pmid">21552273</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Auer</surname> <given-names>E. J.</given-names></name></person-group> (<year>2002</year>). <article-title>The influence of the lexicon on speech read word recognition: contrasting segmental and lexical distinctiveness</article-title>. <source>Psychon. Bull. Rev</source>. <volume>9</volume>, <fpage>341</fpage>&#x02013;<lpage>347</lpage>. <pub-id pub-id-type="doi">10.3758/BF03196291</pub-id><pub-id pub-id-type="pmid">12120798</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bahrick</surname> <given-names>L. E.</given-names></name></person-group> (<year>1992</year>). <article-title>Infant&#x00027;s perceptual differentiation of amodal and modaliy-specific audio-visual realations</article-title>. <source>J. Exp. Child Psychol</source>. <volume>53</volume>, <fpage>180</fpage>&#x02013;<lpage>199</lpage>. <pub-id pub-id-type="doi">10.1016/0022-0965(92)90048-B</pub-id><pub-id pub-id-type="pmid">1578197</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balkany</surname> <given-names>T. J.</given-names></name> <name><surname>Hodges</surname> <given-names>A. V.</given-names></name> <name><surname>Eshraghi</surname> <given-names>A. A.</given-names></name> <name><surname>Butts</surname> <given-names>S.</given-names></name> <name><surname>Bricker</surname> <given-names>K.</given-names></name> <name><surname>Lingvai</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2002</year>). <article-title>Cochlear implants in children-a review</article-title>. <source>Acta Otolaryngol</source>. <volume>122</volume>, <fpage>356</fpage>&#x02013;<lpage>362</lpage>. <pub-id pub-id-type="doi">10.1080/00016480260000012</pub-id><pub-id pub-id-type="pmid">12125989</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Barlow</surname> <given-names>H.</given-names></name></person-group> (<year>1961</year>). &#x0201C;<article-title>Possible principles underlying the transformations of sensory messages</article-title>,&#x0201D; in <source>Sensory Communication</source>, ed <person-group person-group-type="editor"><name><surname>Rosenblith</surname> <given-names>W.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>217</fpage>&#x02013;<lpage>234</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barlow</surname> <given-names>H.</given-names></name></person-group> (<year>1990</year>). <article-title>Conditions for versatile learning, Helmholtz&#x00027;s unconscious inference, and the task of perception</article-title>. <source>Vision Res</source>. <volume>30</volume>, <fpage>1561</fpage>&#x02013;<lpage>1571</lpage>. <pub-id pub-id-type="doi">10.1016/0042-6989(90)90144-A</pub-id><pub-id pub-id-type="pmid">2288075</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Barlow</surname> <given-names>H.</given-names></name> <name><surname>F&#x000F6;ldiak</surname> <given-names>P.</given-names></name></person-group> (<year>1989</year>). &#x0201C;<article-title>Adaptation and decorrelation in the cortex</article-title>,&#x0201D; in <source>The Computing Neuron</source>, eds <person-group person-group-type="editor"><name><surname>Durbin</surname> <given-names>R.</given-names></name> <name><surname>Miall</surname> <given-names>C.</given-names></name> <name><surname>Mitchison</surname> <given-names>G.</given-names></name></person-group> (<publisher-loc>Wokingham</publisher-loc>: <publisher-name>Addison-Wesley</publisher-name>), <fpage>54</fpage>&#x02013;<lpage>72</lpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barraclough</surname> <given-names>N. E.</given-names></name> <name><surname>Xiao</surname> <given-names>D.</given-names></name> <name><surname>Baker</surname> <given-names>C. I.</given-names></name> <name><surname>Oram</surname> <given-names>M. W.</given-names></name> <name><surname>Perrett</surname> <given-names>D. I.</given-names></name></person-group> (<year>2005</year>). <article-title>Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions</article-title>. <source>J. Cogn. Neurosci</source>. <volume>17</volume>, <fpage>377</fpage>&#x02013;<lpage>391</lpage>. <pub-id pub-id-type="doi">10.1162/0898929053279586</pub-id><pub-id pub-id-type="pmid">15813999</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Argall</surname> <given-names>B. D.</given-names></name> <name><surname>Bodurka</surname> <given-names>J.</given-names></name> <name><surname>Duyn</surname> <given-names>J. H.</given-names></name> <name><surname>Martin</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>Unraveling multisensory integration: patchy organization within human STS multisensory cortex</article-title>. <source>Nat. Neurosci</source>. <volume>7</volume>, <fpage>1190</fpage>&#x02013;<lpage>1192</lpage>. <pub-id pub-id-type="doi">10.1038/nn1333</pub-id><pub-id pub-id-type="pmid">15475952</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Nath</surname> <given-names>A. R.</given-names></name> <name><surname>Pasalar</surname> <given-names>S.</given-names></name></person-group> (<year>2010</year>). <article-title>fMRI-Guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect</article-title>. <source>J. Neurosci</source>. <volume>30</volume>, <fpage>2414</fpage>&#x02013;<lpage>2417</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.4865-09.2010</pub-id><pub-id pub-id-type="pmid">20164324</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bergeson</surname> <given-names>T. R.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name></person-group> (<year>2004</year>). &#x0201C;<article-title>Audiovisual speech perception in deaf adults and children following cochlear implantation</article-title>,&#x0201D; in <source>Handbook of Multisensory Integration</source>, eds <person-group person-group-type="editor"><name><surname>Calvert</surname> <given-names>G.</given-names></name> <name><surname>Spence</surname> <given-names>C.</given-names></name> <name><surname>Stein</surname> <given-names>B. E.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>749</fpage>&#x02013;<lpage>772</lpage>.</citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bergeson</surname> <given-names>T. R.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Davis</surname> <given-names>R. A.</given-names></name></person-group> (<year>2003</year>). <article-title>A longitudinal study of audiovisual speech perception by children with hearing loss who have cochlear implants</article-title>. <source>Volta Rev</source>. <volume>103</volume>, <fpage>347</fpage>&#x02013;<lpage>370</lpage>. <pub-id pub-id-type="pmid">21743753</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bergeson</surname> <given-names>T. R.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Davis</surname> <given-names>R. A.</given-names></name></person-group> (<year>2005</year>). <article-title>Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants</article-title>. <source>Ear Hear</source>. <volume>26</volume>, <fpage>149</fpage>&#x02013;<lpage>164</lpage>. <pub-id pub-id-type="doi">10.1097/00003446-200504000-00004</pub-id><pub-id pub-id-type="pmid">15809542</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bernstein</surname> <given-names>L.</given-names></name> <name><surname>Auer</surname> <given-names>E. J.</given-names></name> <name><surname>Wagner</surname> <given-names>M.</given-names></name> <name><surname>Ponton</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Spatiotemporal dynamics of audiovisual speech processing</article-title>. <source>Neuroimage</source> <volume>39</volume>, <fpage>423</fpage>&#x02013;<lpage>435</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2007.08.035</pub-id><pub-id pub-id-type="pmid">17920933</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besle</surname> <given-names>J.</given-names></name> <name><surname>Fischer</surname> <given-names>C.</given-names></name> <name><surname>Bidet-Caulet</surname> <given-names>A.</given-names></name> <name><surname>Lecaignard</surname> <given-names>F.</given-names></name> <name><surname>Bertrand</surname> <given-names>O.</given-names></name> <name><surname>Giard</surname> <given-names>M. H.</given-names></name></person-group> (<year>2008</year>). <article-title>Visual activation and audiovisual interactions in the auditory cortex during speech perception: intracranial recordings in humans</article-title>. <source>J. Neurosci</source>. <volume>28</volume>, <fpage>14301</fpage>&#x02013;<lpage>14310</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2875-08.2008</pub-id><pub-id pub-id-type="pmid">19109511</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besle</surname> <given-names>J.</given-names></name> <name><surname>Fort</surname> <given-names>A.</given-names></name> <name><surname>Delpuech</surname> <given-names>C.</given-names></name> <name><surname>Giard</surname> <given-names>M.-H.</given-names></name></person-group> (<year>2004</year>). <article-title>Bimodal speech: early suppressive visual effects in human auditory cortex</article-title>. <source>Eur. J. Neurosci</source>. <volume>20</volume>, <fpage>2225</fpage>&#x02013;<lpage>2234</lpage>. <pub-id pub-id-type="doi">10.1111/j.1460-9568.2004.03670.x</pub-id><pub-id pub-id-type="pmid">15450102</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brancazio</surname> <given-names>L.</given-names></name></person-group> (<year>2004</year>). <article-title>Lexical influences in audiovisual speech perception</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform</source>. <volume>30</volume>, <fpage>445</fpage>&#x02013;<lpage>463</lpage>. <pub-id pub-id-type="doi">10.1037/0096-1523.30.3.445</pub-id><pub-id pub-id-type="pmid">15161378</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Brungart</surname> <given-names>D.</given-names></name> <name><surname>Iyer</surname> <given-names>N.</given-names></name> <name><surname>Simpson</surname> <given-names>B.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name></person-group> (<year>2008</year>). &#x0201C;<article-title>The effects of temporal asynchrony on the intelligibility of accelerated speech</article-title>,&#x0201D; in <source>International Conference on Auditory-Visual Speech Processing (AVSP)</source>, (<publisher-loc>Moreton Island, QLD</publisher-loc>: <publisher-name>Tangalooma Wild Dolphin Resort</publisher-name>).</citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Brungart</surname> <given-names>D.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Brandewie</surname> <given-names>E.</given-names></name> <name><surname>Romigh</surname> <given-names>G.</given-names></name></person-group> (<year>2007</year>). &#x0201C;<article-title>The effects of temporal acceleration and deceleration on auditory-visual speech perception</article-title>,&#x0201D; in <source>International Conference on Auditory-Visual Speech Processing (AVSP)</source> (<publisher-loc>Hilvarenbeek</publisher-loc>).</citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Busch</surname> <given-names>N. A.</given-names></name> <name><surname>VanRullen</surname> <given-names>R.</given-names></name></person-group> (<year>2010</year>). <article-title>Spontaneous EEG oscillations reveal periodic sampling of visual attention</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>107</volume>, <fpage>16048</fpage>&#x02013;<lpage>16053</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1004801107</pub-id><pub-id pub-id-type="pmid">20805482</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bushara</surname> <given-names>K. O.</given-names></name> <name><surname>Grafman</surname> <given-names>J.</given-names></name> <name><surname>Hallett</surname> <given-names>M.</given-names></name></person-group> (<year>2001</year>). <article-title>Neural correlates of auditory-visual stimulus onset asynchrony detection</article-title>. <source>J. Neurosci</source>. <volume>21</volume>, <fpage>300</fpage>&#x02013;<lpage>304</lpage>. <pub-id pub-id-type="pmid">11150347</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callan</surname> <given-names>D. E.</given-names></name> <name><surname>Jones</surname> <given-names>J. A.</given-names></name> <name><surname>Munhall</surname> <given-names>K. G.</given-names></name> <name><surname>Kroos</surname> <given-names>C.</given-names></name> <name><surname>Callan</surname> <given-names>A. M.</given-names></name> <name><surname>Vaitikiosis-Bateson</surname> <given-names>E.</given-names></name></person-group> (<year>2004</year>). <article-title>Multisensory integrtaion sites identified by perception of spatial wavelet filtered visual speech gesture information</article-title>. <source>J. Cogn. Neurosci</source>. <volume>16</volume>, <fpage>805</fpage>&#x02013;<lpage>816</lpage>. <pub-id pub-id-type="doi">10.1162/089892904970771</pub-id><pub-id pub-id-type="pmid">15200708</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callan</surname> <given-names>D.</given-names></name> <name><surname>Jones</surname> <given-names>J.</given-names></name> <name><surname>Munhall</surname> <given-names>K.</given-names></name> <name><surname>Callan</surname> <given-names>A.</given-names></name> <name><surname>Kroos</surname> <given-names>C.</given-names></name> <name><surname>Vatikiotis-Bateson</surname> <given-names>E.</given-names></name></person-group> (<year>2003</year>). <article-title>Neural processes underlying perceptual enhancement by visual speech gestures</article-title>. <source>Neuroreport</source> <volume>14</volume>, <fpage>2213</fpage>&#x02013;<lpage>2218</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-200312020-00016</pub-id><pub-id pub-id-type="pmid">14625450</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name></person-group> (<year>1997</year>). <article-title>Activation of auditory cortex during silent lipreading</article-title>. <source>Science</source> <volume>276</volume>, <fpage>893</fpage>&#x02013;<lpage>596</lpage>. <pub-id pub-id-type="doi">10.1126/science.276.5312.593</pub-id><pub-id pub-id-type="pmid">9110978</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name> <name><surname>Brammer</surname> <given-names>M. J.</given-names></name> <name><surname>Bullmore</surname> <given-names>E. T.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Iversen</surname> <given-names>S. D.</given-names></name> <name><surname>David</surname> <given-names>A.</given-names></name></person-group> (<year>1999</year>). <article-title>Response amplification in sensory-specific cortices during cross-modal binding</article-title>. <source>Neuroreport</source> <volume>10</volume>, <fpage>2619</fpage>&#x02013;<lpage>2623</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-199908200-00033</pub-id><pub-id pub-id-type="pmid">10574380</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>Reading speech from still and moving faces: the neural substrates of visible speech</article-title>. <source>J. Cogn. Neurosci</source>. <volume>15</volume>, <fpage>57</fpage>&#x02013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1162/089892903321107828</pub-id><pub-id pub-id-type="pmid">12590843</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Brammer</surname> <given-names>M. J.</given-names></name></person-group> (<year>2000</year>). <article-title>Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex</article-title>. <source>Curr. Biol</source>. <volume>10</volume>, <fpage>649</fpage>&#x02013;<lpage>657</lpage>. <pub-id pub-id-type="doi">10.1016/S0960-9822(00)00513-3</pub-id><pub-id pub-id-type="pmid">10837246</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name> <name><surname>Thesen</surname> <given-names>T.</given-names></name></person-group> (<year>2004</year>). <article-title>Multisensory integratio: methodological approaches and emerging principles in the human brain</article-title>. <source>J. Physiol. Paris</source> <volume>98</volume>, <fpage>191</fpage>&#x02013;<lpage>205</lpage>. <pub-id pub-id-type="doi">10.1016/j.jphysparis.2004.03.018</pub-id><pub-id pub-id-type="pmid">15477032</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<year>1986</year>). <article-title>Face recognition and lipreading</article-title>. <source>Brain</source> <volume>109</volume>, <fpage>509</fpage>&#x02013;<lpage>521</lpage>. <pub-id pub-id-type="doi">10.1093/brain/109.3.509</pub-id><pub-id pub-id-type="pmid">3719288</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<year>1989</year>). &#x0201C;<article-title>Lipreading</article-title>,&#x0201D; in <source>Handbook of Research on Face Processing</source>, eds <person-group person-group-type="editor"><name><surname>Young</surname> <given-names>A. W.</given-names></name> <name><surname>Ellis</surname> <given-names>H. D.</given-names></name></person-group> (<publisher-loc>Malden</publisher-loc>: <publisher-name>Blackwell Publishing</publisher-name>), <fpage>187</fpage>&#x02013;<lpage>233</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<year>1992</year>). &#x0201C;<article-title>Lip-reading and the modularity of cognitive function: neuropsychological glimpses of fractionation from speech and faces</article-title>,&#x0201D; in <source>Analytic Approaches to Human Cognition</source>, eds <person-group person-group-type="editor"><name><surname>Alegria</surname> <given-names>J.</given-names></name> <name><surname>Holender</surname> <given-names>D.</given-names></name> <name><surname>Junca de Morais</surname> <given-names>J.</given-names></name> <name><surname>Radeau</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Elsevier Science Publishers</publisher-name>), <fpage>275</fpage>&#x02013;<lpage>289</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>The processing of audio-visual speech: empirical and neural bases</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci</source>. <volume>363</volume>, <fpage>1001</fpage>&#x02013;<lpage>1010</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2007.2155</pub-id><pub-id pub-id-type="pmid">17827105</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Garwood</surname> <given-names>J.</given-names></name> <name><surname>Franklinavi</surname> <given-names>S.</given-names></name> <name><surname>Howard</surname> <given-names>D.</given-names></name> <name><surname>Landis</surname> <given-names>T.</given-names></name> <name><surname>Regard</surname> <given-names>M.</given-names></name></person-group> (<year>1990</year>). <article-title>Neuropsychological studies of auditory-visual fusion illusions. Four case studies and their implications</article-title>. <source>Neuropsychologia</source> <volume>28</volume>, <fpage>787</fpage>&#x02013;<lpage>802</lpage>. <pub-id pub-id-type="doi">10.1016/0028-3932(90)90003-7</pub-id><pub-id pub-id-type="pmid">1701035</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>C.</given-names></name> <name><surname>Massaro</surname> <given-names>D. W.</given-names></name></person-group> (<year>1997</year>). <article-title>Perception of visible speech: influence of spatial quantization</article-title>. <source>Perception</source> <volume>26</volume>, <fpage>627</fpage>&#x02013;<lpage>644</lpage>. <pub-id pub-id-type="doi">10.1068/p260627</pub-id><pub-id pub-id-type="pmid">9488886</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cathiard</surname> <given-names>M.-A.</given-names></name> <name><surname>Abry</surname> <given-names>C.</given-names></name></person-group> (<year>2007</year>). &#x0201C;<article-title>Speech structure decisions from speech motion coordinations</article-title>,&#x0201D; in <source>Proceedings of the XVIth International Congress of Phonetic Sciences</source>, <publisher-loc>Saarbr&#x000FC;cken</publisher-loc>.</citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chandrasekaran</surname> <given-names>C.</given-names></name> <name><surname>Trubanova</surname> <given-names>A.</given-names></name> <name><surname>Stillittano</surname> <given-names>S.</given-names></name> <name><surname>Caplier</surname> <given-names>A.</given-names></name> <name><surname>Ghazanfar</surname> <given-names>A. A.</given-names></name></person-group> (<year>2009</year>). <article-title>The natural statistics of audiovisual speech</article-title>. <source>PLoS Comput. Biol</source>. <volume>5</volume>:<fpage>e1000436</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000436</pub-id><pub-id pub-id-type="pmid">19609344</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chomsky</surname> <given-names>N.</given-names></name></person-group> (<year>2000</year>). &#x0201C;<article-title>Recent contributions to the theory of innate ideas</article-title>,&#x0201D; in <source>Minds, Brains and Computers The foundation of Cognitive Science, an Anthology</source>, eds <person-group person-group-type="editor"><name><surname>Harnish</surname> <given-names>R. M.</given-names></name> <name><surname>Cummins</surname> <given-names>D. D.</given-names></name></person-group> (<publisher-loc>Malden, MA</publisher-loc>: <publisher-name>Blackwell</publisher-name>), <fpage>452</fpage>&#x02013;<lpage>457</lpage>.</citation>
</ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chomsky</surname> <given-names>N.</given-names></name> <name><surname>Halle</surname> <given-names>M.</given-names></name></person-group> (<year>1968</year>). <source>The Sound Pattern of English</source>. <publisher-loc>New York; Evanston; London</publisher-loc>: <publisher-name>Harper and Row</publisher-name>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Colin</surname> <given-names>C.</given-names></name> <name><surname>Radeau</surname> <given-names>M.</given-names></name> <name><surname>Soquet</surname> <given-names>A.</given-names></name> <name><surname>Demolin</surname> <given-names>D.</given-names></name> <name><surname>Colin</surname> <given-names>F.</given-names></name> <name><surname>Deltenre</surname> <given-names>P.</given-names></name></person-group> (<year>2002</year>). <article-title>Mismatch negativity evoked by the McGurk-MacDonald effect: a phonetic representation within short-term memory</article-title>. <source>Clin. Neurophysiol</source>. <volume>113</volume>, <fpage>495</fpage>&#x02013;<lpage>506</lpage>. <pub-id pub-id-type="doi">10.1016/S1388-2457(02)00024-X</pub-id><pub-id pub-id-type="pmid">11955994</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Colonius</surname> <given-names>H.</given-names></name> <name><surname>Diederich</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>Multisensory interaction in saccadic reaction time: a time-window-of-integration model</article-title>. <source>J. Cogn. Neurosci</source>. <volume>16</volume>, <fpage>1000</fpage>&#x02013;<lpage>1009</lpage>. <pub-id pub-id-type="doi">10.1162/0898929041502733</pub-id><pub-id pub-id-type="pmid">15298787</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conrey</surname> <given-names>B.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name></person-group> (<year>2006</year>). <article-title>Auditory-visual speech perception and synchrony detection for speech and non speech signals</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>119</volume>, <fpage>4065</fpage>. <pub-id pub-id-type="doi">10.1121/1.2195091</pub-id><pub-id pub-id-type="pmid">16838548</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Czigler</surname> <given-names>I.</given-names></name> <name><surname>Winkler</surname> <given-names>I.</given-names></name> <name><surname>Pat&#x000F3;</surname> <given-names>L.</given-names></name> <name><surname>V&#x000E1;rnagy</surname> <given-names>A.</given-names></name> <name><surname>Weisz</surname> <given-names>J.</given-names></name> <name><surname>Bal&#x000E1;zs</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). <article-title>Visual temporal window of integration as revealed by the visual mismatch negativity event-related potential to stimulus omissions</article-title>. <source>Brain Res</source>. <volume>1104</volume>, <fpage>129</fpage>&#x02013;<lpage>140</lpage>. <pub-id pub-id-type="doi">10.1016/j.brainres.2006.05.034</pub-id><pub-id pub-id-type="pmid">16822480</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Gelder</surname> <given-names>B.</given-names></name> <name><surname>B&#x000F6;cker</surname> <given-names>K. B. E.</given-names></name> <name><surname>Tuomainen</surname> <given-names>J.</given-names></name> <name><surname>Hensen</surname> <given-names>M.</given-names></name> <name><surname>Vroomen</surname> <given-names>J.</given-names></name></person-group> (<year>1999</year>). <article-title>The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses</article-title>. <source>Neurosci. Lett</source>. <volume>260</volume>, <fpage>133</fpage>&#x02013;<lpage>136</lpage>. <pub-id pub-id-type="doi">10.1016/S0304-3940(98)00963-X</pub-id><pub-id pub-id-type="pmid">10025717</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dehaene-Lambertz</surname> <given-names>G.</given-names></name> <name><surname>Dehaene</surname> <given-names>S.</given-names></name> <name><surname>Hertz-Pannier</surname> <given-names>L.</given-names></name></person-group> (<year>2002</year>). <article-title>Functional neuroimaging of speech perception in infants</article-title>. <source>Science</source> <volume>298</volume>, <fpage>2013</fpage>&#x02013;<lpage>2015</lpage>. <pub-id pub-id-type="doi">10.1126/science.1077066</pub-id><pub-id pub-id-type="pmid">12471265</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Den&#x000E8;ve</surname> <given-names>S.</given-names></name> <name><surname>Pouget</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>Bayesian multisensory integration and cross-modal spatial links</article-title>. <source>J. Neurophysiol. Paris</source> <volume>98</volume>, <fpage>249</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1016/j.jphysparis.2004.03.011</pub-id><pub-id pub-id-type="pmid">15477036</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Desimone</surname> <given-names>R.</given-names></name> <name><surname>Gross</surname> <given-names>C. G.</given-names></name></person-group> (<year>1979</year>). <article-title>Visual areas in the temporal cortex of the macaque</article-title>. <source>Brain Res</source>. <volume>178</volume>, <fpage>363</fpage>&#x02013;<lpage>380</lpage>. <pub-id pub-id-type="doi">10.1016/0006-8993(79)90699-1</pub-id><pub-id pub-id-type="pmid">116712</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Driver</surname> <given-names>J.</given-names></name> <name><surname>Noesselt</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Multisensory interplay reveals crossmodal influences on &#x02018;sensory-specific&#x02019; brain regions, neural responses, and judgments</article-title>. <source>Neuron</source> <volume>57</volume>, <fpage>11</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2007.12.013</pub-id><pub-id pub-id-type="pmid">18184561</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Erber</surname> <given-names>M. P.</given-names></name></person-group> (<year>1978</year>). <article-title>Auditory-visual speech perception of speech with reduced optical clarity</article-title>. <source>J. Speech Hear. Res</source>. <volume>22</volume>, <fpage>213</fpage>&#x02013;<lpage>223</lpage>. <pub-id pub-id-type="pmid">491551</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ernst</surname> <given-names>M. O.</given-names></name> <name><surname>B&#x000FC;lthoff</surname> <given-names>H. H.</given-names></name></person-group> (<year>2004</year>). <article-title>Meging the senses into a robust percept</article-title>. <source>Trends Cogn. Sci</source>. <volume>8</volume>, <fpage>162</fpage>&#x02013;<lpage>169</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2004.02.002</pub-id><pub-id pub-id-type="pmid">15050512</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Evans</surname> <given-names>K. K.</given-names></name> <name><surname>Treisman</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>Natural cross-modal mappings between visual and auditory features</article-title>. <source>J. Vis</source>. <volume>10</volume>:<fpage>6</fpage>. <pub-id pub-id-type="doi">10.1167/10.1.6</pub-id><pub-id pub-id-type="pmid">21216758</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2005</year>). <article-title>A theory of cortical responses</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci</source>. <volume>360</volume>, <fpage>815</fpage>&#x02013;<lpage>836</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2005.1622</pub-id><pub-id pub-id-type="pmid">15937014</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghazanfar</surname> <given-names>A. A.</given-names></name> <name><surname>Chandrasekaran</surname> <given-names>C.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name></person-group> (<year>2008</year>). <article-title>Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys</article-title>. <source>J. Neurosci</source>. <volume>28</volume>, <fpage>4457</fpage>&#x02013;<lpage>4469</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0541-08.2008</pub-id><pub-id pub-id-type="pmid">18434524</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghazanfar</surname> <given-names>A. A.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name></person-group> (<year>2003</year>). <article-title>Facial expressions linked to monkey calls</article-title>. <source>Nature</source> <volume>423</volume>, <fpage>937</fpage>&#x02013;<lpage>938</lpage>. <pub-id pub-id-type="doi">10.1038/423937a</pub-id><pub-id pub-id-type="pmid">12827188</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghazanfar</surname> <given-names>A. A.</given-names></name> <name><surname>Schroeder</surname> <given-names>C. E.</given-names></name></person-group> (<year>2006</year>). <article-title>Is neocortex essentially multisensory</article-title>. <source>Trends Cogn. Sci</source>. <volume>10</volume>, <fpage>278</fpage>&#x02013;<lpage>285</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2006.04.008</pub-id><pub-id pub-id-type="pmid">16713325</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghazanfar</surname> <given-names>A. A.</given-names></name> <name><surname>Maier</surname> <given-names>J. X.</given-names></name> <name><surname>Hoffman</surname> <given-names>K. L.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name></person-group> (<year>2005</year>). <article-title>Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex</article-title>. <source>J. Neurosci</source>. <volume>25</volume>:<fpage>5004</fpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0799-05.2005</pub-id><pub-id pub-id-type="pmid">15901781</pub-id></citation>
</ref>
<ref id="B64">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gibson</surname> <given-names>E. J.</given-names></name></person-group> (<year>1969</year>). <source>Principles of Perceptual Learning and Development</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Appleton - Century - Crofts</publisher-name>.</citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giraud</surname> <given-names>A. L.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Cortical oscillations and speech processing: emerging computational principles and operations</article-title>. <source>Nat. Neurosci</source>. <volume>15</volume>, <fpage>511</fpage>&#x02013;<lpage>517</lpage>. <pub-id pub-id-type="doi">10.1038/nn.3063</pub-id><pub-id pub-id-type="pmid">22426255</pub-id></citation>
</ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grant</surname> <given-names>K. W.</given-names></name></person-group> (<year>2002</year>). <article-title>Measures of auditory-visual integration for speech understanding: a theoretical perspective</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>112</volume>, <fpage>30</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1121/1.1482076</pub-id><pub-id pub-id-type="pmid">12141356</pub-id></citation>
</ref>
<ref id="B67">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Grant</surname> <given-names>K. W.</given-names></name> <name><surname>Greenberg</surname> <given-names>S.</given-names></name></person-group> (<year>2001</year>). &#x0201C;<article-title>Speech intelligibility derived from asynchronous processing of auditory-visual information</article-title>,&#x0201D; in <source>Auditory-Visual Speech Pocessing</source>, (<publisher-loc>Aalborg</publisher-loc>).</citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grant</surname> <given-names>K. W.</given-names></name> <name><surname>Seitz</surname> <given-names>P. F.</given-names></name></person-group> (<year>1998</year>). <article-title>Measures of auditory-visual integration in nonsense syllables and sentences</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>104</volume>, <fpage>2438</fpage>&#x02013;<lpage>2450</lpage>. <pub-id pub-id-type="doi">10.1121/1.423751</pub-id><pub-id pub-id-type="pmid">10491705</pub-id></citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grant</surname> <given-names>K. W.</given-names></name> <name><surname>Seitz</surname> <given-names>P.-F.</given-names></name></person-group> (<year>2000</year>). <article-title>The use of visible speech cues for improving auditory detection of spoken sentences</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>108</volume>, <fpage>1197</fpage>&#x02013;<lpage>1207</lpage>. <pub-id pub-id-type="doi">10.1121/1.1288668</pub-id><pub-id pub-id-type="pmid">11008820</pub-id></citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grant</surname> <given-names>K. W.</given-names></name> <name><surname>Walden</surname> <given-names>B. E.</given-names></name> <name><surname>Seitz</surname> <given-names>P.-F.</given-names></name></person-group> (<year>1998</year>). <article-title>Auditory-visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition and auditory-visual integration</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>103</volume>, <fpage>2677</fpage>&#x02013;<lpage>2690</lpage>. <pub-id pub-id-type="doi">10.1121/1.422788</pub-id><pub-id pub-id-type="pmid">9604361</pub-id></citation>
</ref>
<ref id="B71">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Green</surname> <given-names>K. P.</given-names></name></person-group> (<year>1996</year>). &#x0201C;<article-title>The use of auditory and visual information in phonetic perception</article-title>,&#x0201D; in <source>Speechreading by Humans and Machines</source>, eds <person-group person-group-type="editor"><name><surname>Stork</surname> <given-names>D. G.</given-names></name> <name><surname>Hennecke</surname> <given-names>M. E.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>), <fpage>55</fpage>&#x02013;<lpage>77</lpage>.</citation>
</ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Greenberg</surname> <given-names>S.</given-names></name></person-group> (<year>1998</year>). <article-title>A syllabic-centric framework for the evolution of spoken language</article-title>. <source>Brain Behav. Sci</source>. <volume>21</volume>, <fpage>267</fpage>&#x02013;<lpage>268</lpage>. <pub-id pub-id-type="doi">10.1017/S0140525X98311176</pub-id></citation>
</ref>
<ref id="B73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>E.</given-names></name> <name><surname>Donnelly</surname> <given-names>M.</given-names></name> <name><surname>Price</surname> <given-names>R.</given-names></name> <name><surname>Pickens</surname> <given-names>D.</given-names></name> <name><surname>Morgan</surname> <given-names>V.</given-names></name> <name><surname>Neighbor</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Brain areas involved in perception of biological motion</article-title>. <source>J. Cogn. Neurosci</source>. <volume>12</volume>, <fpage>711</fpage>&#x02013;<lpage>720</lpage>. <pub-id pub-id-type="doi">10.1162/089892900562417</pub-id><pub-id pub-id-type="pmid">11054914</pub-id></citation>
</ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halle</surname> <given-names>M.</given-names></name> <name><surname>Stevens</surname> <given-names>K. N.</given-names></name></person-group> (<year>1962</year>). <article-title>Speech recognition: a model and a program for research</article-title>. <source>IRE Trans. Inf. Theor</source>. <volume>8</volume>, <fpage>155</fpage>&#x02013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1109/TIT.1962.1057686</pub-id></citation>
</ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hans-Otto</surname> <given-names>K.</given-names></name></person-group> (<year>2001</year>). <article-title>New insights into the functions of the superior temporal cortex</article-title>. <source>Nat. Neurosci</source>. <volume>2</volume>, <fpage>568</fpage>. <pub-id pub-id-type="doi">10.1038/35086057</pub-id><pub-id pub-id-type="pmid">11484000</pub-id></citation>
</ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harth</surname> <given-names>E.</given-names></name> <name><surname>Unnnikrishnan</surname> <given-names>K. P.</given-names></name> <name><surname>Pandya</surname> <given-names>A. S.</given-names></name></person-group> (<year>1987</year>). <article-title>The inversion of sensory processing by feedback pathways: a model of visual cognitive functions</article-title>. <source>Science</source> <volume>237</volume>, <fpage>184</fpage>&#x02013;<lpage>187</lpage>. <pub-id pub-id-type="doi">10.1126/science.3603015</pub-id><pub-id pub-id-type="pmid">3603015</pub-id></citation>
</ref>
<ref id="B77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hasson</surname> <given-names>U.</given-names></name> <name><surname>Skipper</surname> <given-names>J.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H.</given-names></name> <name><surname>Small</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Abstract coding of audiovisual speech: beyond sensory representation</article-title>. <source>Neuron</source> <volume>56</volume>, <fpage>1116</fpage>&#x02013;<lpage>1126</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2007.09.037</pub-id><pub-id pub-id-type="pmid">18093531</pub-id></citation>
</ref>
<ref id="B78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hosoya</surname> <given-names>T.</given-names></name> <name><surname>Baccus</surname> <given-names>S. A.</given-names></name> <name><surname>Meister</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Dynamic predictive coding by the retina</article-title>. <source>Nature</source> <volume>436</volume>, <fpage>71</fpage>&#x02013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1038/nature03689</pub-id><pub-id pub-id-type="pmid">16001064</pub-id></citation>
</ref>
<ref id="B79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Houde</surname> <given-names>J. F.</given-names></name> <name><surname>Jordan</surname> <given-names>M. I.</given-names></name></person-group> (<year>1998</year>). <article-title>Sensorimotor adaptation in speech production</article-title>. <source>Science</source> <volume>279</volume>, <fpage>1213</fpage>&#x02013;<lpage>1216</lpage>. <pub-id pub-id-type="doi">10.1126/science.279.5354.1213</pub-id><pub-id pub-id-type="pmid">9469813</pub-id></citation>
</ref>
<ref id="B80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>J&#x000E4;&#x000E4;skel&#x000E4;inen</surname> <given-names>I. P.</given-names></name> <name><surname>Ojanen</surname> <given-names>V.</given-names></name> <name><surname>Ahveninen</surname> <given-names>J.</given-names></name> <name><surname>Auranen</surname> <given-names>T.</given-names></name> <name><surname>Lev&#x000E4;nen</surname> <given-names>S.</given-names></name> <name><surname>M&#x000F6;tt&#x000F6;nen</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Adaptation of neuromagnetic N1 responses to phonetic stimuli by visual speech in humans</article-title>. <source>Neuroreport</source> <volume>18</volume>, <fpage>2741</fpage>&#x02013;<lpage>2744</lpage>. <pub-id pub-id-type="pmid">15597045</pub-id></citation>
</ref>
<ref id="B81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>J.</given-names></name> <name><surname>Callan</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>Brain activity during audiovisual speech perception: an fMRI study of the McGurk effect</article-title>. <source>Neuroreport</source> <volume>14</volume>, <fpage>1129</fpage>&#x02013;<lpage>1133</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-200306110-00006</pub-id><pub-id pub-id-type="pmid">12821795</pub-id></citation>
</ref>
<ref id="B82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jordan</surname> <given-names>T. R.</given-names></name> <name><surname>McCotter</surname> <given-names>M. V.</given-names></name> <name><surname>Thomas</surname> <given-names>S. M.</given-names></name></person-group> (<year>2000</year>). <article-title>Visual and audiovisual speech perception with color and gray-scale facial images</article-title>. <source>Percept. Psychophys</source>. <volume>62</volume>, <fpage>1394</fpage>&#x02013;<lpage>1404</lpage>. <pub-id pub-id-type="doi">10.3758/BF03212141</pub-id><pub-id pub-id-type="pmid">11143451</pub-id></citation>
</ref>
<ref id="B83">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kayser</surname> <given-names>C.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name></person-group> (<year>2009</year>). <article-title>Directed interactions between auditory and superior temporal cortices and their role in sensory integration</article-title>. <source>Front. Integr. Neurosci</source>. <volume>3</volume>:<fpage>7</fpage>. <pub-id pub-id-type="doi">10.3389/neuro.07.007.2009</pub-id><pub-id pub-id-type="pmid">19503750</pub-id></citation>
</ref>
<ref id="B84">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kayser</surname> <given-names>C.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name> <name><surname>Panzeri</surname> <given-names>S.</given-names></name></person-group> (<year>2010</year>). <article-title>Visual enhancement of the information representation in auditory cortex</article-title>. <source>Curr. Biol</source>. <volume>20</volume>, <fpage>19</fpage>&#x02013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2009.10.068</pub-id><pub-id pub-id-type="pmid">20036538</pub-id></citation>
</ref>
<ref id="B85">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kayser</surname> <given-names>C.</given-names></name> <name><surname>Petkov</surname> <given-names>C. I.</given-names></name> <name><surname>Augath</surname> <given-names>M.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name></person-group> (<year>2007</year>). <article-title>Functional imaging reveals visual modulation of specific fields in auditory cortex</article-title>. <source>J. Neurosci</source>. <volume>27</volume>, <fpage>1824</fpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.4737-06.2007</pub-id><pub-id pub-id-type="pmid">17314280</pub-id></citation>
</ref>
<ref id="B86">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kent</surname> <given-names>R. D.</given-names></name></person-group> (<year>1983</year>). &#x0201C;<article-title>The segmental organization of speech, Chapter 4</article-title>,&#x0201D; in <source>The Production of Speech</source>, ed. <person-group person-group-type="editor"><name><surname>MacNeilage</surname> <given-names>P. F.</given-names></name></person-group> (<publisher-loc>Newyork, NY</publisher-loc>: <publisher-name>Springer-verlag</publisher-name>), <fpage>57</fpage>&#x02013;<lpage>89</lpage>.</citation>
</ref>
<ref id="B87">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiebel</surname> <given-names>S. J.</given-names></name> <name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2008</year>). <article-title>A hierarchy of time-scales and the brain</article-title>. <source>PLoS Comput. Biol</source>. <volume>4</volume>:<fpage>e1000209</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000209</pub-id><pub-id pub-id-type="pmid">19008936</pub-id></citation>
</ref>
<ref id="B88">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kihlstrom</surname> <given-names>J. F.</given-names></name></person-group> (<year>1987</year>). <article-title>The cognitive unconscious</article-title>. <source>Science</source> <volume>237</volume>, <fpage>1445</fpage>&#x02013;<lpage>1452</lpage>. <pub-id pub-id-type="doi">10.1126/science.3629249</pub-id><pub-id pub-id-type="pmid">3629249</pub-id></citation>
</ref>
<ref id="B89">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>K&#x000F6;sem</surname> <given-names>A.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name></person-group> (<year>2012</year>). <article-title>Temporal structure in audiovisual sensory selection</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e40936</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0040936</pub-id><pub-id pub-id-type="pmid">22829899</pub-id></citation>
</ref>
<ref id="B90">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuhl</surname> <given-names>P.</given-names></name> <name><surname>Meltzoff</surname> <given-names>A.</given-names></name></person-group> (<year>1982</year>). <article-title>The bimodal perception of speech in infancy</article-title>. <source>Science</source> <volume>218</volume>, <fpage>1138</fpage>&#x02013;<lpage>1141</lpage>. <pub-id pub-id-type="doi">10.1126/science.7146899</pub-id><pub-id pub-id-type="pmid">7146899</pub-id></citation>
</ref>
<ref id="B91">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuhl</surname> <given-names>P.</given-names></name> <name><surname>Meltzoff</surname> <given-names>A. N.</given-names></name></person-group> (<year>1984</year>). <article-title>The intermodal representation of speech in infants</article-title>. <source>Infant Behav. Dev</source>. <volume>7</volume>, <fpage>361</fpage>&#x02013;<lpage>381</lpage>. <pub-id pub-id-type="doi">10.1016/S0163-6383(84)80050-8</pub-id></citation>
</ref>
<ref id="B92">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lakatos</surname> <given-names>P.</given-names></name> <name><surname>Karmos</surname> <given-names>G.</given-names></name> <name><surname>Mehta</surname> <given-names>A.</given-names></name> <name><surname>Ulbert</surname> <given-names>I.</given-names></name> <name><surname>Schroeder</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Entrainment of neuronal oscillations as a mechanism of attentional selection</article-title>. <source>Science</source> <volume>320</volume>, <fpage>110</fpage>&#x02013;<lpage>113</lpage>. <pub-id pub-id-type="doi">10.1126/science.1154735</pub-id><pub-id pub-id-type="pmid">18388295</pub-id></citation>
</ref>
<ref id="B94">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lau</surname> <given-names>E. F.</given-names></name> <name><surname>Phillips</surname> <given-names>C.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2008</year>). <article-title>A cortical network for semantics: (de)constructing the N400</article-title>. <source>Nat. Rev. Neurosci</source>. <volume>9</volume>, <fpage>920</fpage>&#x02013;<lpage>933</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2532</pub-id><pub-id pub-id-type="pmid">19020511</pub-id></citation>
</ref>
<ref id="B95">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewicki</surname> <given-names>M. S.</given-names></name></person-group> (<year>2002</year>). <article-title>Efficient coding of natural sounds</article-title>. <source>Nat. Neurosci</source>. <volume>5</volume>, <fpage>356</fpage>&#x02013;<lpage>363</lpage>. <pub-id pub-id-type="doi">10.1038/nn831</pub-id><pub-id pub-id-type="pmid">11896400</pub-id></citation>
</ref>
<ref id="B96">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewkowicz</surname> <given-names>D. J.</given-names></name></person-group> (<year>2000</year>). <article-title>The development of intersensory temporal perception: an epignetic systems/limitations view</article-title>. <source>Psychol. Bull</source>. <volume>162</volume>, <fpage>281</fpage>&#x02013;<lpage>308</lpage>. <pub-id pub-id-type="doi">10.1037/0033-2909.126.2.281</pub-id><pub-id pub-id-type="pmid">10748644</pub-id></citation>
</ref>
<ref id="B97">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liberman</surname> <given-names>A. M.</given-names></name> <name><surname>Cooper</surname> <given-names>F. S.</given-names></name> <name><surname>Shankweiler</surname> <given-names>D. P.</given-names></name> <name><surname>Studdert-Kennedy</surname> <given-names>M.</given-names></name></person-group> (<year>1967</year>). <article-title>Perception of the speech code</article-title>. <source>Psychol. Rev</source>. <volume>74</volume>, <fpage>431</fpage>&#x02013;<lpage>461</lpage>. <pub-id pub-id-type="doi">10.1037/h0020279</pub-id><pub-id pub-id-type="pmid">4170865</pub-id></citation>
</ref>
<ref id="B98">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liberman</surname> <given-names>A. M.</given-names></name> <name><surname>Mattingly</surname> <given-names>I. G.</given-names></name></person-group> (<year>1985</year>). <article-title>The motor theory of speech perception revised</article-title>. <source>Cognition</source> <volume>21</volume>, <fpage>1</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1016/0010-0277(85)90021-6</pub-id><pub-id pub-id-type="pmid">4075760</pub-id></citation>
</ref>
<ref id="B99">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li&#x000E9;geois</surname> <given-names>C.</given-names></name> <name><surname>de Graaf</surname> <given-names>J. B.</given-names></name> <name><surname>Laguitton</surname> <given-names>V.</given-names></name> <name><surname>Chauvel</surname> <given-names>P.</given-names></name></person-group> (<year>1999</year>). <article-title>Specialization of left auditory cortex for speech perception in man depends on temporal coding</article-title>. <source>Cereb. Cortex</source> <volume>9</volume>, <fpage>484</fpage>&#x02013;<lpage>496</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/9.5.484</pub-id><pub-id pub-id-type="pmid">10450893</pub-id></citation>
</ref>
<ref id="B100">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <article-title>Auditory cortex tracks both auditory and visual stimulus dynamics using low-frequency neuronal phase modulation</article-title>. <source>PLoS Biol</source>. <volume>8</volume>:<fpage>e1000445</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pbio.1000445</pub-id><pub-id pub-id-type="pmid">20711473</pub-id></citation>
</ref>
<ref id="B101">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>W. J.</given-names></name> <name><surname>Beck</surname> <given-names>J. M.</given-names></name> <name><surname>Latham</surname> <given-names>P. E.</given-names></name> <name><surname>Pouget</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>Bayesian inference with probabilistic population codes</article-title>. <source>Nat. Neurosci</source>. <volume>9</volume>, <fpage>1432</fpage>&#x02013;<lpage>1438</lpage>. <pub-id pub-id-type="doi">10.1038/nn1790</pub-id><pub-id pub-id-type="pmid">17057707</pub-id></citation>
</ref>
<ref id="B102">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacDonald</surname> <given-names>J.</given-names></name> <name><surname>McGurk</surname> <given-names>H.</given-names></name></person-group> (<year>1978</year>). <article-title>Visual influences on speech perception processes</article-title>. <source>Percept. Psychophys</source>. <volume>24</volume>, <fpage>253</fpage>&#x02013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.3758/BF03206096</pub-id><pub-id pub-id-type="pmid">704285</pub-id></citation>
</ref>
<ref id="B103">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacDonald</surname> <given-names>J.</given-names></name> <name><surname>Soren</surname> <given-names>A.</given-names></name> <name><surname>Bachmann</surname> <given-names>T.</given-names></name></person-group> (<year>2000</year>). <article-title>Hearing by eye: how much spatial degradation can be tolerated</article-title>. <source>Perception</source> <volume>29</volume>, <fpage>1155</fpage>&#x02013;<lpage>1168</lpage>. <pub-id pub-id-type="doi">10.1068/p3020</pub-id><pub-id pub-id-type="pmid">11220208</pub-id></citation>
</ref>
<ref id="B104">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacKay</surname> <given-names>D. M.</given-names></name></person-group> (<year>1958</year>). <article-title>Perceptual stability of a stroboscopically lit visual field containing self-luminous objects</article-title>. <source>Nature</source> <volume>181</volume>, <fpage>507</fpage>&#x02013;<lpage>508</lpage>. <pub-id pub-id-type="doi">10.1038/181507a0</pub-id><pub-id pub-id-type="pmid">13517199</pub-id></citation>
</ref>
<ref id="B105">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacLeod</surname> <given-names>A.</given-names></name> <name><surname>Summerfield</surname> <given-names>Q.</given-names></name></person-group> (<year>1987</year>). <article-title>Quantifying the contribution of vision to speech perception in noise</article-title>. <source>Br. J. Audiol</source>. <volume>21</volume>, <fpage>131</fpage>&#x02013;<lpage>141</lpage>. <pub-id pub-id-type="doi">10.3109/03005368709077786</pub-id><pub-id pub-id-type="pmid">3594015</pub-id></citation>
</ref>
<ref id="B106">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maeda</surname> <given-names>F.</given-names></name> <name><surname>Kanai</surname> <given-names>R.</given-names></name> <name><surname>Shimojo</surname> <given-names>S.</given-names></name></person-group> (<year>2004</year>). <article-title>Changing pitch induced visual motion illusion</article-title>. <source>Curr. Biol</source>. <volume>14</volume>, <fpage>R990</fpage>&#x02013;<lpage>R991</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2004.11.018</pub-id><pub-id pub-id-type="pmid">15589145</pub-id></citation>
</ref>
<ref id="B107">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maier</surname> <given-names>J. X.</given-names></name> <name><surname>Di Luca</surname> <given-names>M.</given-names></name> <name><surname>Noppeney</surname> <given-names>U.</given-names></name></person-group> (<year>2011</year>). <article-title>Audiovisual asynchrony detection in human speech</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform</source>. <volume>37</volume>, <fpage>245</fpage>&#x02013;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1037/a0019952</pub-id><pub-id pub-id-type="pmid">20731507</pub-id></citation>
</ref>
<ref id="B108">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maiste</surname> <given-names>A. C.</given-names></name> <name><surname>Wiens</surname> <given-names>A. S.</given-names></name> <name><surname>Hunt</surname> <given-names>M. J.</given-names></name> <name><surname>Sherg</surname> <given-names>M.</given-names></name> <name><surname>Picton</surname> <given-names>T. W.</given-names></name></person-group> (<year>1995</year>). <article-title>Event-related potentials and the categorical perception of speech sounds</article-title>. <source>Ear Hear</source>. <volume>16</volume>, <fpage>68</fpage>&#x02013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1097/00003446-199502000-00006</pub-id><pub-id pub-id-type="pmid">7774771</pub-id></citation>
</ref>
<ref id="B109">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin</surname> <given-names>B.</given-names></name> <name><surname>Giersch</surname> <given-names>A.</given-names></name> <name><surname>Huron</surname> <given-names>C.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name></person-group> (<year>2012</year>). <article-title>Temporal event structure and timing in schizophrenia: preserved binding in a longer &#x0201C;now</article-title>&#x0201D;. <source>Neuropsychologia</source> <volume>51</volume>, <fpage>358</fpage>&#x02013;<lpage>371</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2012.07.002</pub-id><pub-id pub-id-type="pmid">22813430</pub-id></citation>
</ref>
<ref id="B110">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Massaro</surname> <given-names>D. W.</given-names></name></person-group> (<year>1987</year>). <source>Speech Perception by Ear and Eye: a Paradigm for Psychological Inquiry</source>. <publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum Associates, Inc</publisher-name>.</citation>
</ref>
<ref id="B111">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Massaro</surname> <given-names>D. W.</given-names></name></person-group> (<year>1998</year>). <source>Perceiving Talking Faces</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B112">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Massaro</surname> <given-names>D. W.</given-names></name> <name><surname>Cohen</surname> <given-names>M. M.</given-names></name> <name><surname>Smeele</surname> <given-names>P. M. T.</given-names></name></person-group> (<year>1996</year>). <article-title>Perception of asynchronous and conflicting visual and auditory speech</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>100</volume>, <fpage>1777</fpage>. <pub-id pub-id-type="doi">10.1121/1.417342</pub-id><pub-id pub-id-type="pmid">8817903</pub-id></citation>
</ref>
<ref id="B113">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McGurk</surname> <given-names>H.</given-names></name> <name><surname>MacDonald</surname> <given-names>J.</given-names></name></person-group> (<year>1976</year>). <article-title>Hearing lips and seeing voices</article-title>. <source>Nature</source> <volume>264</volume>, <fpage>746</fpage>&#x02013;<lpage>748</lpage>. <pub-id pub-id-type="doi">10.1038/264746a0</pub-id><pub-id pub-id-type="pmid">1012311</pub-id></citation>
</ref>
<ref id="B114">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meltzoff</surname> <given-names>A. N.</given-names></name></person-group> (<year>1999</year>). <article-title>Origins of theory of mind, cognition and communication</article-title>. <source>J. Commun. Disord</source>. <volume>32</volume>, <fpage>251</fpage>&#x02013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1016/S0021-9924(99)00009-X</pub-id><pub-id pub-id-type="pmid">10466097</pub-id></citation>
</ref>
<ref id="B115">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meltzoff</surname> <given-names>A. N.</given-names></name> <name><surname>Moore</surname> <given-names>M. K.</given-names></name></person-group> (<year>1979</year>). <article-title>Interpreting &#x0201C;imitative&#x0201D; responses in early infancy</article-title>. <source>Science</source> <volume>205</volume>, <fpage>217</fpage>&#x02013;<lpage>219</lpage>. <pub-id pub-id-type="doi">10.1126/science.451596</pub-id><pub-id pub-id-type="pmid">451596</pub-id></citation>
</ref>
<ref id="B116">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>L.</given-names></name> <name><surname>D&#x00027;Esposito</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Perceptual fusion and stimulus coincidence in the cross-modal integration of speech</article-title>. <source>J. Neurosci</source>. <volume>25</volume>, <fpage>5884</fpage>&#x02013;<lpage>5893</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0896-05.2005</pub-id><pub-id pub-id-type="pmid">15976077</pub-id></citation>
</ref>
<ref id="B117">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morillon</surname> <given-names>B.</given-names></name> <name><surname>Lehongre</surname> <given-names>K.</given-names></name> <name><surname>Frackowiak</surname> <given-names>R. S. J.</given-names></name> <name><surname>Ducorps</surname> <given-names>A.</given-names></name> <name><surname>Kleinschmidt</surname> <given-names>A.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Neurophysiological origin of human brain asymmetry for speech and language</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>107</volume>, <fpage>18688</fpage>&#x02013;<lpage>18693</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1007189107</pub-id><pub-id pub-id-type="pmid">20956297</pub-id></citation>
</ref>
<ref id="B118">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>M&#x000F6;tt&#x000F6;nen</surname> <given-names>R.</given-names></name> <name><surname>Krause</surname> <given-names>C.</given-names></name> <name><surname>Tiippana</surname> <given-names>K.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <article-title>Processing of changes in visual speech in the human auditory cortex</article-title>. <source>Brain Res. Cogn. Brain Res</source>. <volume>13</volume>, <fpage>417</fpage>&#x02013;<lpage>425</lpage>. <pub-id pub-id-type="doi">10.1016/S0926-6410(02)00053-8</pub-id><pub-id pub-id-type="pmid">11919005</pub-id></citation>
</ref>
<ref id="B119">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>M&#x000F6;tt&#x000F6;nen</surname> <given-names>R.</given-names></name> <name><surname>Sch&#x000FC;rmann</surname> <given-names>M.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Time course of multisensory interactions during audiovisual speech perception in humans: a magnetoencephalographic study</article-title>. <source>Neurosci. Lett</source>. <volume>363</volume>, <fpage>112</fpage>&#x02013;<lpage>115</lpage>. <pub-id pub-id-type="doi">10.1016/j.neulet.2004.03.076</pub-id><pub-id pub-id-type="pmid">15172096</pub-id></citation>
</ref>
<ref id="B120">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murray</surname> <given-names>M. M.</given-names></name> <name><surname>Spierer</surname> <given-names>L.</given-names></name></person-group> (<year>2011</year>). <article-title>Multisensory integration: what you see is where you hear</article-title>. <source>Curr. Biol</source>. <volume>21</volume>, <fpage>R229</fpage>&#x02013;<lpage>R231</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2011.01.064</pub-id><pub-id pub-id-type="pmid">21419991</pub-id></citation>
</ref>
<ref id="B121">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>N&#x000E4;&#x000E4;t&#x000E4;nen</surname> <given-names>R.</given-names></name></person-group> (<year>1995</year>). <article-title>The mismatch negativity: a powerful tool for cognitive neuroscience</article-title>. <source>Ear Hear</source>. <volume>16</volume>, <fpage>6</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1097/00003446-199502000-00002</pub-id><pub-id pub-id-type="pmid">7774770</pub-id></citation>
</ref>
<ref id="B122">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>N&#x000E4;&#x000E4;t&#x000E4;nen</surname> <given-names>R.</given-names></name> <name><surname>Gaillard</surname> <given-names>A. W.</given-names></name> <name><surname>M&#x000E4;ntysalo</surname> <given-names>S.</given-names></name></person-group> (<year>1978</year>). <article-title>Early selective-attention effect on evoked potential reinterpreted</article-title>. <source>Acta Psychol</source>. <volume>42</volume>, <fpage>313</fpage>&#x02013;<lpage>329</lpage>. <pub-id pub-id-type="doi">10.1016/0001-6918(78)90006-9</pub-id><pub-id pub-id-type="pmid">685709</pub-id></citation>
</ref>
<ref id="B123">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niparko</surname> <given-names>J. K.</given-names></name> <name><surname>Tobey</surname> <given-names>E. A.</given-names></name> <name><surname>Thal</surname> <given-names>D. J.</given-names></name> <name><surname>Eisenberg</surname> <given-names>L. S.</given-names></name> <name><surname>Wang</surname> <given-names>N. Y.</given-names></name> <name><surname>Quittner</surname> <given-names>A. L.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Spoken language development in children following cochlear implantation</article-title>. <source>JAMA</source> <volume>303</volume>, <fpage>1498</fpage>&#x02013;<lpage>1506</lpage>. <pub-id pub-id-type="doi">10.1001/jama.2010.451</pub-id><pub-id pub-id-type="pmid">20407059</pub-id></citation>
</ref>
<ref id="B124">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olson</surname> <given-names>I.</given-names></name> <name><surname>Gatenby</surname> <given-names>J.</given-names></name> <name><surname>Gore</surname> <given-names>J.</given-names></name></person-group> (<year>2002</year>). <article-title>A comparison of bound and unbound audio-visual information processing in the human cerebral cortex</article-title>. <source>Brain Res. Cogn. Brain Res</source>. <volume>14</volume>, <fpage>129</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1016/S0926-6410(02)00067-8</pub-id><pub-id pub-id-type="pmid">12063136</pub-id></citation>
</ref>
<ref id="B125">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Panzeri</surname> <given-names>S.</given-names></name> <name><surname>Brunel</surname> <given-names>N.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name> <name><surname>Kayser</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>Sensory neural codes using multiplexed temporal scales</article-title>. <source>Trends Neurosci</source>. <volume>33</volume>, <fpage>111</fpage>&#x02013;<lpage>120</lpage>. <pub-id pub-id-type="doi">10.1016/j.tins.2009.12.001</pub-id><pub-id pub-id-type="pmid">20045201</pub-id></citation>
</ref>
<ref id="B126">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Par&#x000E9;</surname> <given-names>M.</given-names></name> <name><surname>Richler</surname> <given-names>R. C.</given-names></name> <name><surname>Ten Hove</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>Gaze behavior in audiovisual speech perception: the influence of ocular fixations on the McGurk effect</article-title>. <source>Percept. Psychophys</source>. <volume>65</volume>, <fpage>553</fpage>&#x02013;<lpage>567</lpage>. <pub-id pub-id-type="doi">10.3758/BF03194582</pub-id><pub-id pub-id-type="pmid">12812278</pub-id></citation>
</ref>
<ref id="B127">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pascual-Leone</surname> <given-names>A.</given-names></name> <name><surname>Hamilton</surname> <given-names>R.</given-names></name></person-group> (<year>2001</year>). <article-title>The metamodal organization of the brain</article-title>. <source>Prog. Brain Res</source>. <volume>134</volume>, <fpage>427</fpage>&#x02013;<lpage>445</lpage>. <pub-id pub-id-type="doi">10.1016/S0079-6123(01)34028-1</pub-id><pub-id pub-id-type="pmid">11702559</pub-id></citation>
</ref>
<ref id="B128">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Philips</surname> <given-names>C.</given-names></name> <name><surname>Pellathy</surname> <given-names>T.</given-names></name> <name><surname>Marantz</surname> <given-names>A.</given-names></name> <name><surname>Yellin</surname> <given-names>E.</given-names></name> <name><surname>Wexler</surname> <given-names>K.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Auditory cortex accesses phonological categories: an MEG mismatch study</article-title>. <source>J. Cogn. Neurosci</source>. <volume>12</volume>, <fpage>1038</fpage>&#x02013;<lpage>1055</lpage>. <pub-id pub-id-type="doi">10.1162/08989290051137567</pub-id><pub-id pub-id-type="pmid">11177423</pub-id></citation>
</ref>
<ref id="B129">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Piling</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Auditory event-related potentials (ERPs) in audiovisual speech perception</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>52</volume>, <fpage>1073</fpage>&#x02013;<lpage>1081</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2009/07-0276)</pub-id><pub-id pub-id-type="pmid">19641083</pub-id></citation>
</ref>
<ref id="B130">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>The analysis of speech in different temporal integration windows: cerebral lateralization as asymmetric sampling in time</article-title>. <source>Speech Commun</source>. <volume>41</volume>, <fpage>245</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1016/S0167-6393(02)00107-3</pub-id></citation>
</ref>
<ref id="B131">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poeppel</surname> <given-names>D.</given-names></name> <name><surname>Idsardi</surname> <given-names>W. J.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name></person-group> (<year>2008</year>). <article-title>Speech perception at the interface of neurobiology and linguistics</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci</source>. <volume>363</volume>, <fpage>1071</fpage>&#x02013;<lpage>1086</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2007.2160</pub-id><pub-id pub-id-type="pmid">17890189</pub-id></citation>
</ref>
<ref id="B132">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Powers</surname> <given-names>A. R.</given-names></name> <name><surname>Hillock</surname> <given-names>A. R.</given-names></name> <name><surname>Wallace</surname> <given-names>M. T.</given-names></name></person-group> (<year>2009</year>). <article-title>Perceptual training narrows the temporal window of multisensory binding</article-title>. <source>J. Neurosci</source>. <volume>29</volume>, <fpage>12265</fpage>&#x02013;<lpage>12274</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3501-09.2009</pub-id><pub-id pub-id-type="pmid">19793985</pub-id></citation>
</ref>
<ref id="B133">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>Allison</surname> <given-names>T.</given-names></name> <name><surname>Bentin</surname> <given-names>A.</given-names></name> <name><surname>Gore</surname> <given-names>J. C.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>1998</year>). <article-title>Temporal cortex activation in humans viewing eye and mouth movements</article-title>. <source>J. Neurosci</source>. <volume>18</volume>, <fpage>2188</fpage>&#x02013;<lpage>2199</lpage>. <pub-id pub-id-type="pmid">9482803</pub-id></citation>
</ref>
<ref id="B134">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pylyshyn</surname> <given-names>Z.</given-names></name></person-group> (<year>1984</year>). <source>Computation and Cognition: Towards a Foundation for Cognitive Science</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B135">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rajkai</surname> <given-names>C.</given-names></name> <name><surname>Lakatos</surname> <given-names>P.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Pincze</surname> <given-names>Z.</given-names></name> <name><surname>Karmos</surname> <given-names>G.</given-names></name> <name><surname>Schroeder</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Transient cortical excitation at the onset of visual fixation</article-title>. <source>Cereb. Cortex</source> <volume>18</volume>, <fpage>200</fpage>&#x02013;<lpage>209</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhm046</pub-id><pub-id pub-id-type="pmid">17494059</pub-id></citation>
</ref>
<ref id="B136">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>R. P. N.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects</article-title>. <source>Nat. Neurosci</source>. <volume>2</volume>, <fpage>79</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1038/4580</pub-id><pub-id pub-id-type="pmid">10195184</pub-id></citation>
</ref>
<ref id="B137">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reale</surname> <given-names>R.</given-names></name> <name><surname>Calvert</surname> <given-names>G.</given-names></name> <name><surname>Thesen</surname> <given-names>T.</given-names></name> <name><surname>Jenison</surname> <given-names>R.</given-names></name> <name><surname>Kawasaki</surname> <given-names>H.</given-names></name> <name><surname>Oya</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Auditory-visual processing represented in the human superior temporal gyrus</article-title>. <source>Neuroscience</source> <volume>145</volume>, <fpage>162</fpage>&#x02013;<lpage>184</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroscience.2006.11.036</pub-id><pub-id pub-id-type="pmid">17241747</pub-id></citation>
</ref>
<ref id="B138">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Remez</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>Establishing and maintaining perceptual coherence: unimodal and multimodal evidence</article-title>. <source>J. Phon</source>. <volume>31</volume>, <fpage>293</fpage>&#x02013;<lpage>304</lpage>. <pub-id pub-id-type="doi">10.1016/S0095-4470(03)00042-1</pub-id></citation>
</ref>
<ref id="B139">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Remez</surname> <given-names>R. E.</given-names></name> <name><surname>Fellowes</surname> <given-names>J. M.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Goh</surname> <given-names>W. D.</given-names></name> <name><surname>Rubin</surname> <given-names>P. E.</given-names></name></person-group> (<year>1998</year>). <article-title>Multimodal perceptual organization of speech: Evidence from tone analogs of spoken utterances</article-title>. <source>Speech Commun</source>. <volume>26</volume>, <fpage>65</fpage>&#x02013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1016/S0167-6393(98)00050-8</pub-id><pub-id pub-id-type="pmid">21423823</pub-id></citation>
</ref>
<ref id="B140">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosen</surname> <given-names>S.</given-names></name></person-group> (<year>1992</year>). <article-title>temporal information in speech: acoustic, auditory and linguistic aspects</article-title>. <source>Philos. Trans. R. Soc. Lond. B</source> <volume>336</volume>, <fpage>367</fpage>&#x02013;<lpage>373</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.1992.0070</pub-id><pub-id pub-id-type="pmid">1354376</pub-id></citation>
</ref>
<ref id="B141">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenblum</surname> <given-names>L.</given-names></name> <name><surname>Schmuckler</surname> <given-names>M. A.</given-names></name> <name><surname>Johnson</surname> <given-names>J. A.</given-names></name></person-group> (<year>1997</year>). <article-title>The McGurk effect in infants</article-title>. <source>Percept. Psychophys</source>. <volume>59</volume>, <fpage>347</fpage>&#x02013;<lpage>357</lpage>. <pub-id pub-id-type="doi">10.3758/BF03211902</pub-id><pub-id pub-id-type="pmid">9136265</pub-id></citation>
</ref>
<ref id="B142">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenblum</surname> <given-names>L. D.</given-names></name> <name><surname>Salda&#x000F1;a</surname> <given-names>H. M.</given-names></name></person-group> (<year>1996</year>). <article-title>An audiovisual test of kinematic primitives for visual speech perception</article-title>. <source>J. Exp. Psychol. Hum. Percep. Perform</source>. <volume>22</volume>, <fpage>318</fpage>&#x02013;<lpage>331</lpage>. <pub-id pub-id-type="doi">10.1037/0096-1523.22.2.318</pub-id><pub-id pub-id-type="pmid">8934846</pub-id></citation>
</ref>
<ref id="B143">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenblum</surname> <given-names>L.</given-names></name> <name><surname>Yakel</surname> <given-names>D. A.</given-names></name></person-group> (<year>2001</year>). <article-title>The McGurk effect from single and mixed speaker stimuli</article-title>. <source>Acoust. Res. Lett. Online</source> <volume>2</volume>, <fpage>67</fpage>&#x02013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1121/1.1366356</pub-id></citation>
</ref>
<ref id="B144">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saltzman</surname> <given-names>E. L.</given-names></name> <name><surname>Munhall</surname> <given-names>K. G.</given-names></name></person-group> (<year>1989</year>). <article-title>A dynamical approach to gestural patterning in speech production</article-title>. <source>Ecol. Psychol</source>. <volume>1</volume>, <fpage>333</fpage>&#x02013;<lpage>382</lpage>. <pub-id pub-id-type="doi">10.1207/s15326969eco0104_2</pub-id></citation>
</ref>
<ref id="B145">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sams</surname> <given-names>M.</given-names></name> <name><surname>Aulanko</surname> <given-names>R.</given-names></name></person-group> (<year>1991</year>). <article-title>Seeing speech: visual information from lip movements modifies activity in the human auditory cortex</article-title>. <source>Neurosci. Lett</source>. <volume>127</volume>, <fpage>141</fpage>&#x02013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1016/0304-3940(91)90914-F</pub-id><pub-id pub-id-type="pmid">1881611</pub-id></citation>
</ref>
<ref id="B146">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schorr</surname> <given-names>E.</given-names></name> <name><surname>Fox</surname> <given-names>N.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Knudsen</surname> <given-names>E.</given-names></name></person-group> (<year>2005</year>). <article-title>Auditory-visual fusion in speech perception in children with cochlear implants</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>102</volume>, <fpage>18748</fpage>&#x02013;<lpage>18750</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0508862102</pub-id><pub-id pub-id-type="pmid">16339316</pub-id></citation>
</ref>
<ref id="B147">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schroeder</surname> <given-names>C.</given-names></name> <name><surname>Lakatos</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>Low-frequency neuronal oscillations as instruments of sensory selection</article-title>. <source>Trends Neurosci</source>. <volume>32</volume>, <fpage>9</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1016/j.tins.2008.09.012</pub-id><pub-id pub-id-type="pmid">19012975</pub-id></citation>
</ref>
<ref id="B148">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schroeder</surname> <given-names>C.</given-names></name> <name><surname>Lakatos</surname> <given-names>P.</given-names></name> <name><surname>Kajikawa</surname> <given-names>Y.</given-names></name> <name><surname>Partan</surname> <given-names>S.</given-names></name> <name><surname>Puce</surname> <given-names>A.</given-names></name></person-group> (<year>2008</year>). <article-title>Neuronal oscillations and visual amplification of speech</article-title>. <source>Trends Cogn. Sci</source>. <volume>12</volume>, <fpage>106</fpage>&#x02013;<lpage>113</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2008.01.002</pub-id><pub-id pub-id-type="pmid">18280772</pub-id></citation>
</ref>
<ref id="B149">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>J.</given-names></name> <name><surname>Robert-Ribes</surname> <given-names>J.</given-names></name> <name><surname>Escudier</surname> <given-names>P.</given-names></name></person-group> (<year>1998</year>). &#x0201C;<article-title>Ten years after summerfield: a taxonomy of models for audio-visual fusion in speech perception</article-title>,&#x0201D; in <source>Hearing by Eye II: Advances in the Psycholoy of Speechreading and Auditory-Visual Speech</source>, eds <person-group person-group-type="editor"><name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Dodd</surname> <given-names>B.</given-names></name> <name><surname>Burnham</surname> <given-names>D.</given-names></name></person-group> (<publisher-loc>East Sussex</publisher-loc>: <publisher-name>Psychology Press</publisher-name>), <fpage>85</fpage>&#x02013;<lpage>108</lpage>.</citation>
</ref>
<ref id="B150">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>J.-L.</given-names></name></person-group> (<year>2003</year>). &#x0201C;<article-title>Why the FLMP should not be applied to McGurk data&#x02026;or how to better compare models in the Bayesian framework</article-title>,&#x0201D; in <source>AVSP - International Conference on Audio-Visual Speech Processing</source>, (<publisher-loc>St-Jorioz</publisher-loc>).</citation>
</ref>
<ref id="B151">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>J.-L.</given-names></name> <name><surname>Basirat</surname> <given-names>A.</given-names></name> <name><surname>M&#x000E9;nard</surname> <given-names>L.</given-names></name> <name><surname>Sato</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>The Perception-for-Action-Control Theory (PACT): A perceptuo-motor theory of speech perception</article-title>. <source>J. Neurolinguistics</source> <volume>25</volume>, <fpage>336</fpage>&#x02013;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.1016/j.jneuroling.2009.12.004</pub-id></citation>
</ref>
<ref id="B152">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>S. K.</given-names></name> <name><surname>McGettigan</surname> <given-names>C.</given-names></name> <name><surname>Eisner</surname> <given-names>F.</given-names></name></person-group> (<year>2009</year>). <article-title>A little more conversation, a little less action-candidate roles for the motor cortex in speech perception</article-title>. <source>Nat. Rev. Neurosci</source>. <volume>10</volume>, <fpage>295</fpage>&#x02013;<lpage>302</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2603</pub-id><pub-id pub-id-type="pmid">19277052</pub-id></citation>
</ref>
<ref id="B153">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sekiyama</surname> <given-names>K.</given-names></name></person-group> (<year>1994</year>). <article-title>Differences in auditory-visual speech perception between Japanese and americans: McGurk effect as a function of incompatibility</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>15</volume>, <fpage>143</fpage>&#x02013;<lpage>158</lpage>. <pub-id pub-id-type="doi">10.1250/ast.15.143</pub-id></citation>
</ref>
<ref id="B154">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sekiyama</surname> <given-names>K.</given-names></name></person-group> (<year>1997</year>). <article-title>Cultural and linguistic factors in audiovisual speech processing: the McGurk effect in Chinese subjects</article-title>. <source>Percept. Psychophys</source>. <volume>59</volume>, <fpage>73</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.3758/BF03206849</pub-id><pub-id pub-id-type="pmid">9038409</pub-id></citation>
</ref>
<ref id="B155">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sekiyama</surname> <given-names>K.</given-names></name> <name><surname>Kanno</surname> <given-names>I.</given-names></name> <name><surname>Miura</surname> <given-names>S.</given-names></name> <name><surname>Sugita</surname> <given-names>Y.</given-names></name></person-group> (<year>2003</year>). <article-title>Auditory-visual speech perception examined by fMRI and PET</article-title>. <source>Neurosci. Res</source>. <volume>47</volume>, <fpage>277</fpage>&#x02013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1016/S0168-0102(03)00214-1</pub-id><pub-id pub-id-type="pmid">14568109</pub-id></citation>
</ref>
<ref id="B156">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sekiyama</surname> <given-names>K.</given-names></name> <name><surname>Tohkura</surname> <given-names>Y.</given-names></name></person-group> (<year>1991</year>). <article-title>McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>90</volume>, <fpage>1797</fpage>&#x02013;<lpage>1805</lpage>. <pub-id pub-id-type="doi">10.1121/1.401660</pub-id><pub-id pub-id-type="pmid">1960275</pub-id></citation>
</ref>
<ref id="B157">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Servos</surname> <given-names>P.</given-names></name> <name><surname>Osu</surname> <given-names>R.</given-names></name> <name><surname>Santi</surname> <given-names>A.</given-names></name> <name><surname>Kawato</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <article-title>The neural substrates of biological motion perception: an fMRI study</article-title>. <source>Cereb. Cortex</source> <volume>12</volume>, <fpage>772</fpage>&#x02013;<lpage>782</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/12.7.772</pub-id><pub-id pub-id-type="pmid">12050089</pub-id></citation>
</ref>
<ref id="B158">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname> <given-names>A.</given-names></name> <name><surname>Dorman</surname> <given-names>M. F.</given-names></name></person-group> (<year>1999</year>). <article-title>Cortical auditory evoked potential correlates of categorical perception of voice-onset time</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>16</volume>, <fpage>1078</fpage>&#x02013;<lpage>1083</lpage>. <pub-id pub-id-type="doi">10.1121/1.428048</pub-id><pub-id pub-id-type="pmid">10462812</pub-id></citation>
</ref>
<ref id="B159">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname> <given-names>A.</given-names></name> <name><surname>Dorman</surname> <given-names>M. F.</given-names></name> <name><surname>Spahr</surname> <given-names>A. J.</given-names></name></person-group> (<year>2002</year>). <article-title>Rapid development of cortical auditory evoked potentials after early cochlear implantation</article-title>. <source>Neuroreport</source> <volume>13</volume>, <fpage>1365</fpage>&#x02013;<lpage>1368</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-200207190-00030</pub-id><pub-id pub-id-type="pmid">12151804</pub-id></citation>
</ref>
<ref id="B160">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname> <given-names>J.</given-names></name> <name><surname>Dragoi</surname> <given-names>V.</given-names></name> <name><surname>Tenebaum</surname> <given-names>J. B.</given-names></name></person-group> (<year>2003</year>). <article-title>V1 neurons signal acquisition of an internal representation of stimulus location</article-title>. <source>Science</source> <volume>300</volume>, <fpage>1758</fpage>&#x02013;<lpage>1763</lpage>. <pub-id pub-id-type="doi">10.1126/science.1081721</pub-id><pub-id pub-id-type="pmid">12805552</pub-id></citation>
</ref>
<ref id="B161">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simos</surname> <given-names>P. G.</given-names></name> <name><surname>Diehl</surname> <given-names>R. L.</given-names></name> <name><surname>Breier</surname> <given-names>J. I.</given-names></name> <name><surname>Molis</surname> <given-names>M. R.</given-names></name> <name><surname>Zouridakis</surname> <given-names>G.</given-names></name> <name><surname>Papanicolaou</surname> <given-names>A. C.</given-names></name></person-group> (<year>1998</year>). <article-title>MEG correlates of categorical perception of a voice onset time continuum in humans</article-title>. <source>Cogn. Brain Res</source>. <volume>7</volume>, <fpage>215</fpage>&#x02013;<lpage>219</lpage>. <pub-id pub-id-type="doi">10.1016/S0926-6410(98)00037-8</pub-id><pub-id pub-id-type="pmid">9774735</pub-id></citation>
</ref>
<ref id="B162">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Skipper</surname> <given-names>J.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H.</given-names></name> <name><surname>Small</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Hearing lips and seeing voices: How cortical areas supporting speech production mediate audiovisual speech perception</article-title>. <source>Cereb. Cortex</source> <volume>17</volume>, <fpage>2387</fpage>&#x02013;<lpage>2399</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhl147</pub-id><pub-id pub-id-type="pmid">17218482</pub-id></citation>
</ref>
<ref id="B163">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>E. C.</given-names></name> <name><surname>Lewicki</surname> <given-names>M. S.</given-names></name></person-group> (<year>2006</year>). <article-title>Efficient auditory coding</article-title>. <source>Nature</source> <volume>439</volume>, <fpage>978</fpage>&#x02013;<lpage>982</lpage>. <pub-id pub-id-type="doi">10.1038/nature04485</pub-id><pub-id pub-id-type="pmid">16495999</pub-id></citation>
</ref>
<ref id="B164">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soto-Faraco</surname> <given-names>S.</given-names></name> <name><surname>Navarra</surname> <given-names>J.</given-names></name> <name><surname>Alsius</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task</article-title>. <source>Cognition</source> <volume>92</volume>, <fpage>B13</fpage>&#x02013;<lpage>B23</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2003.10.005</pub-id><pub-id pub-id-type="pmid">15019556</pub-id></citation>
</ref>
<ref id="B165">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spelke</surname> <given-names>E. S.</given-names></name></person-group> (<year>1981</year>). <article-title>The infant&#x00027;s acquisition of knowledge of bimodally specified events</article-title>. <source>J. Exp. Child Psychol</source>. <volume>31</volume>, <fpage>279</fpage>&#x02013;<lpage>299</lpage>. <pub-id pub-id-type="doi">10.1016/0022-0965(81)90018-7</pub-id><pub-id pub-id-type="pmid">7217891</pub-id></citation>
</ref>
<ref id="B166">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srinivasan</surname> <given-names>M. V.</given-names></name> <name><surname>Laughlin</surname> <given-names>S. B.</given-names></name> <name><surname>Dubs</surname> <given-names>A.</given-names></name></person-group> (<year>1982</year>). <article-title>Predictive coding: a fresh view of inhibition in the retina</article-title>. <source>Proc. R. Soc. Lond. B Biol. Sci</source>. <volume>216</volume>, <fpage>427</fpage>&#x02013;<lpage>459</lpage>. <pub-id pub-id-type="doi">10.1098/rspb.1982.0085</pub-id><pub-id pub-id-type="pmid">6129637</pub-id></citation>
</ref>
<ref id="B167">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stekelenburg</surname> <given-names>J.</given-names></name> <name><surname>Vroomen</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>Neural correlates of multisensory integration of ecologically valid audiovisual events</article-title>. <source>J. Cogn. Neurosci</source>. <volume>19</volume>, <fpage>1964</fpage>&#x02013;<lpage>1973</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2007.19.12.1964</pub-id><pub-id pub-id-type="pmid">17892381</pub-id></citation>
</ref>
<ref id="B168">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stekelenburg</surname> <given-names>J. J.</given-names></name> <name><surname>Vroomen</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Electrophysiological evidence for a multisensory speech-specific mode of perception</article-title>. <source>Neuropsychologia</source> <volume>50</volume>, <fpage>1425</fpage>&#x02013;<lpage>1431</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2012.02.027</pub-id><pub-id pub-id-type="pmid">22410413</pub-id></citation>
</ref>
<ref id="B169">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevens</surname> <given-names>K.</given-names></name></person-group> (<year>1960</year>). <article-title>Toward a model of speech perception</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>32</volume>, <fpage>45</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1121/1.1907874</pub-id></citation>
</ref>
<ref id="B170">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>Altieri</surname> <given-names>N. A.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>James</surname> <given-names>T. W.</given-names></name></person-group> (<year>2010</year>). <article-title>Neural processing of asynchronous audiovisual speech perception</article-title>. <source>Neuroimage</source> <volume>49</volume>, <fpage>3308</fpage>&#x02013;<lpage>3318</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.12.001</pub-id><pub-id pub-id-type="pmid">20004723</pub-id></citation>
</ref>
<ref id="B171">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>Van DerKlok</surname> <given-names>R. M.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>James</surname> <given-names>T. W.</given-names></name></person-group> (<year>2011</year>). <article-title>Discrete neural substrates underlie complementary audiovisual speech integration processes</article-title>. <source>Neuroimage</source> <volume>55</volume>, <fpage>1339</fpage>&#x02013;<lpage>1345</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.12.063</pub-id><pub-id pub-id-type="pmid">21195198</pub-id></citation>
</ref>
<ref id="B172">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>Zemtsov</surname> <given-names>R. K.</given-names></name> <name><surname>Wallace</surname> <given-names>M. T.</given-names></name></person-group> (<year>2012</year>). <article-title>Individual differences in the multisensory temporal binding window predict susceptibility to audiovisual illusions</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform</source>. <volume>38</volume>, <fpage>1517</fpage>&#x02013;<lpage>1529</lpage>. <pub-id pub-id-type="doi">10.1037/a0027339</pub-id><pub-id pub-id-type="pmid">22390292</pub-id></citation>
</ref>
<ref id="B173">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sumby</surname> <given-names>W.</given-names></name> <name><surname>Pollack</surname> <given-names>I.</given-names></name></person-group> (<year>1954</year>). <article-title>Visual contribution to speech intelligibility in noise</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>26</volume>, <fpage>212</fpage>&#x02013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1121/1.1907309</pub-id></citation>
</ref>
<ref id="B174">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Summerfield</surname> <given-names>A. Q.</given-names></name></person-group> (<year>1987</year>). &#x0201C;<article-title>Some preliminaries to a comprehensive account of audio-visual speech perception</article-title>,&#x0201D; in <source>Hearing by Eye</source>, eds <person-group person-group-type="editor"><name><surname>Dodd</surname> <given-names>B.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Erlbaum Associates</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>51</lpage>.</citation>
</ref>
<ref id="B175">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Svirsky</surname> <given-names>M. A.</given-names></name> <name><surname>Robbins</surname> <given-names>A. M.</given-names></name> <name><surname>Kirk</surname> <given-names>K. I.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Miyamoto</surname> <given-names>R. T.</given-names></name></person-group> (<year>2000</year>). <article-title>Language development in profoundly deaf children with cochlear implants</article-title>. <source>Psychol. Sci</source>. <volume>11</volume>, <fpage>153</fpage>&#x02013;<lpage>158</lpage>. <pub-id pub-id-type="doi">10.1111/1467-9280.00231</pub-id><pub-id pub-id-type="pmid">11273423</pub-id></citation>
</ref>
<ref id="B176">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Talsma</surname> <given-names>D.</given-names></name> <name><surname>Senkowski</surname> <given-names>D.</given-names></name> <name><surname>Soto-Faraco</surname> <given-names>S.</given-names></name> <name><surname>Woldorff</surname> <given-names>M. G.</given-names></name></person-group> (<year>2010</year>). <article-title>The multifaceted interplay between attention and multisensory integration</article-title>. <source>Trends Cogn. Sci</source>. <volume>14</volume>, <fpage>400</fpage>&#x02013;<lpage>410</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2010.06.008</pub-id><pub-id pub-id-type="pmid">20675182</pub-id></citation>
</ref>
<ref id="B177">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tervaniemi</surname> <given-names>M.</given-names></name> <name><surname>Maury</surname> <given-names>S.</given-names></name> <name><surname>N&#x000E4;&#x000E4;t&#x000E4;nen</surname> <given-names>R.</given-names></name></person-group> (<year>1994</year>). <article-title>Neural representations of abstract stimulus features in the human brain as reflected by the mismatch negativity</article-title>. <source>Neuroreport</source> <volume>5</volume>, <fpage>844</fpage>&#x02013;<lpage>846</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-199403000-00027</pub-id><pub-id pub-id-type="pmid">8018861</pub-id></citation>
</ref>
<ref id="B178">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Theunissen</surname> <given-names>F.</given-names></name> <name><surname>Miller</surname> <given-names>J. P.</given-names></name></person-group> (<year>1995</year>). <article-title>Temporal encoding in nervous systems: a rigorous definition</article-title>. <source>J. Comput. Neurosci</source>. <volume>2</volume>, <fpage>149</fpage>&#x02013;<lpage>162</lpage>. <pub-id pub-id-type="doi">10.1007/BF00961885</pub-id><pub-id pub-id-type="pmid">8521284</pub-id></citation>
</ref>
<ref id="B179">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tiippana</surname> <given-names>K.</given-names></name> <name><surname>Andersen</surname> <given-names>T. S.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>Visual attention modulates audiovisual speech perception</article-title>. <source>Eur. J. Cogn. Psychol</source>. <volume>16</volume>, <fpage>457</fpage>&#x02013;<lpage>472</lpage>. <pub-id pub-id-type="doi">10.1080/09541440340000268</pub-id><pub-id pub-id-type="pmid">22623217</pub-id></citation>
</ref>
<ref id="B180">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Todorovic</surname> <given-names>A.</given-names></name> <name><surname>de Lange</surname> <given-names>F. P.</given-names></name></person-group> (<year>2012</year>). <article-title>Repetition suppression and expectation suppression are dissociable in time in early auditory evoked fields</article-title>. <source>J. Neurosci</source>. <volume>32</volume>, <fpage>13389</fpage>&#x02013;<lpage>13395</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2227-12.2012</pub-id><pub-id pub-id-type="pmid">23015429</pub-id></citation>
</ref>
<ref id="B181">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tuller</surname> <given-names>B.</given-names></name> <name><surname>Kelso</surname> <given-names>J. A.</given-names></name></person-group> (<year>1984</year>). <article-title>The timing of articulatory gestures: evidence for relational invariants</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>76</volume>, <fpage>1030</fpage>&#x02013;<lpage>1036</lpage>. <pub-id pub-id-type="doi">10.1121/1.391421</pub-id><pub-id pub-id-type="pmid">6501697</pub-id></citation>
</ref>
<ref id="B182">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tuomainen</surname> <given-names>J.</given-names></name> <name><surname>Andersen</surname> <given-names>T. S.</given-names></name> <name><surname>Tiippana</surname> <given-names>K.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Audio-visual speech perception is special</article-title>. <source>Cognition</source> <volume>96</volume>, <fpage>B13</fpage>&#x02013;<lpage>B22</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2004.10.004</pub-id><pub-id pub-id-type="pmid">15833302</pub-id></citation>
</ref>
<ref id="B183">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaina</surname> <given-names>L. M.</given-names></name> <name><surname>Solomon</surname> <given-names>J.</given-names></name> <name><surname>Chowdhury</surname> <given-names>S.</given-names></name> <name><surname>Sinha</surname> <given-names>P.</given-names></name> <name><surname>Belliveau</surname> <given-names>J. W.</given-names></name></person-group> (<year>2001</year>). <article-title>Functional neuroanatomy of biological motion perception in humans</article-title>. <source>Proc. Natl. Acad. Sci</source>. <volume>98</volume>, <fpage>11656</fpage>&#x02013;<lpage>11661</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.191374198</pub-id><pub-id pub-id-type="pmid">11553776</pub-id></citation>
</ref>
<ref id="B184">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Wassenhove</surname> <given-names>V.</given-names></name></person-group> (<year>2009</year>). <article-title>Minding time in an amodal representational space</article-title>. <source>Philos. Trans. R. Soc. B Biol. Sci</source>. <volume>364</volume>, <fpage>1815</fpage>&#x02013;<lpage>1830</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2009.0023</pub-id><pub-id pub-id-type="pmid">19487185</pub-id></citation>
</ref>
<ref id="B185">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Ghazanfar</surname> <given-names>A.</given-names></name> <name><surname>Munhall</surname> <given-names>K.</given-names></name> <name><surname>Schroeder</surname> <given-names>C.</given-names></name></person-group> (<year>2012</year>). &#x0201C;<article-title>Bridging the gap between human and non human studies of audiovisual integration</article-title>,&#x0201D; in <source>The New Handbook of Multisensory Processing</source>, ed <person-group person-group-type="editor"><name><surname>Stein</surname> <given-names>B. E.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>153</fpage>&#x02013;<lpage>167</lpage>.</citation>
</ref>
<ref id="B186">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Grant</surname> <given-names>K.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>Temporal window of integration in auditory-visual speech perception</article-title>. <source>Neuropsychologia</source> <volume>45</volume>, <fpage>598</fpage>&#x02013;<lpage>607</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2006.01.001</pub-id><pub-id pub-id-type="pmid">16530232</pub-id></citation>
</ref>
<ref id="B187">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Grant</surname> <given-names>K. W.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2005</year>). <article-title>Visual speech speeds up the neural processing of auditory speech</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>102</volume>, <fpage>1181</fpage>&#x02013;<lpage>1186</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0408949102</pub-id><pub-id pub-id-type="pmid">15647358</pub-id></citation>
</ref>
<ref id="B188">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vatikiotis-Bateson</surname> <given-names>E.</given-names></name> <name><surname>Eigsti</surname> <given-names>I.-M.</given-names></name> <name><surname>Yano</surname> <given-names>S.</given-names></name> <name><surname>Munhall</surname> <given-names>K. G.</given-names></name></person-group> (<year>1998</year>). <article-title>Eye movement of perceivers during audiovisual speech perception</article-title>. <source>Percept. Psychophys</source>. <volume>60</volume>, <fpage>926</fpage>&#x02013;<lpage>940</lpage>. <pub-id pub-id-type="doi">10.3758/BF03211929</pub-id><pub-id pub-id-type="pmid">9718953</pub-id></citation>
</ref>
<ref id="B189">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Viviani</surname> <given-names>P.</given-names></name> <name><surname>Figliozzi</surname> <given-names>F.</given-names></name> <name><surname>Lacquaniti</surname> <given-names>F.</given-names></name></person-group> (<year>2011</year>). <article-title>The perception of visible speech: estimation of speech rate and detection of time reversals</article-title>. <source>Exp. Brain Res</source>. <volume>215</volume>, <fpage>141</fpage>&#x02013;<lpage>161</lpage>. <pub-id pub-id-type="doi">10.1007/s00221-011-2883-9</pub-id><pub-id pub-id-type="pmid">21986668</pub-id></citation>
</ref>
<ref id="B190">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Voss</surname> <given-names>P.</given-names></name> <name><surname>Zatorre</surname> <given-names>R. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Organization and reorganization of sensory-deprived cortex</article-title>. <source>Curr. Biol</source>. <volume>22</volume>, <fpage>R168</fpage>&#x02013;<lpage>R173</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2012.01.030</pub-id><pub-id pub-id-type="pmid">22401900</pub-id></citation>
</ref>
<ref id="B191">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vroomen</surname> <given-names>J.</given-names></name> <name><surname>Keetels</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>Perception of intersensory synchrony: a tutorial review</article-title>. <source>Atten. Percept. Psychophys</source>. <volume>72</volume>, <fpage>871</fpage>&#x02013;<lpage>884</lpage>. <pub-id pub-id-type="doi">10.3758/APP.72.4.871</pub-id><pub-id pub-id-type="pmid">20436185</pub-id></citation>
</ref>
<ref id="B192">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wacongne</surname> <given-names>C.</given-names></name> <name><surname>Labyt</surname> <given-names>E.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Bekinschtein</surname> <given-names>T.</given-names></name> <name><surname>Naccache</surname> <given-names>L.</given-names></name> <name><surname>Dehaene</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Evidence for a hierarchy of predictions and prediction errors in human cortex</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>108</volume>, <fpage>20754</fpage>&#x02013;<lpage>20759</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1117807108</pub-id><pub-id pub-id-type="pmid">22147913</pub-id></citation>
</ref>
<ref id="B193">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walker</surname> <given-names>S.</given-names></name> <name><surname>Bruce</surname> <given-names>V.</given-names></name> <name><surname>O&#x00027;Malley</surname> <given-names>C.</given-names></name></person-group> (<year>1995</year>). <article-title>Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect</article-title>. <source>Attent. Percept. Psychophys</source>. <volume>57</volume>, <fpage>1124</fpage>&#x02013;<lpage>1133</lpage>. <pub-id pub-id-type="doi">10.3758/BF03208369</pub-id><pub-id pub-id-type="pmid">8539088</pub-id></citation>
</ref>
<ref id="B194">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walker-Andrews</surname> <given-names>A. S.</given-names></name></person-group> (<year>1986</year>). <article-title>Intermodal perception of expressive behaviors: relation of eye and voice</article-title>. <source>Dev. Psychol</source>. <volume>22</volume>, <fpage>373</fpage>&#x02013;<lpage>377</lpage>. <pub-id pub-id-type="doi">10.1037/0012-1649.22.3.373</pub-id></citation>
</ref>
<ref id="B195">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Waltzman</surname> <given-names>S. B.</given-names></name> <name><surname>Cohen</surname> <given-names>N. L.</given-names></name> <name><surname>Gomolin</surname> <given-names>L. H.</given-names></name> <name><surname>Green</surname> <given-names>J. E.</given-names></name> <name><surname>Shapiro</surname> <given-names>W. H.</given-names></name> <name><surname>Hoffman</surname> <given-names>R. A.</given-names></name> <etal/></person-group>. (<year>1997</year>). <article-title>Open-set speech perception in congenitally deaf children using cochlear implants</article-title>. <source>Am. J. Otol</source>. <volume>18</volume>, <fpage>342</fpage>&#x02013;<lpage>349</lpage>. <pub-id pub-id-type="pmid">9149829</pub-id></citation>
</ref>
<ref id="B196">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Neurophysiological and computational principles of cortical rhythms in cognition</article-title>. <source>Physiol. Rev</source>. <volume>90</volume>, <fpage>1195</fpage>&#x02013;<lpage>1268</lpage>. <pub-id pub-id-type="doi">10.1152/physrev.00035.2008</pub-id><pub-id pub-id-type="pmid">20664082</pub-id></citation>
</ref>
<ref id="B197">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Werner-Reiss</surname> <given-names>U.</given-names></name> <name><surname>Kelly</surname> <given-names>K.</given-names></name> <name><surname>Trause</surname> <given-names>A.</given-names></name> <name><surname>Underhill</surname> <given-names>A.</given-names></name> <name><surname>Groh</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). <article-title>Eye position affects activity in primary auditory cortex of primates</article-title>. <source>Curr. Biol</source>. <volume>13</volume>, <fpage>554</fpage>&#x02013;<lpage>562</lpage>. <pub-id pub-id-type="doi">10.1016/S0960-9822(03)00168-4</pub-id><pub-id pub-id-type="pmid">12676085</pub-id></citation>
</ref>
<ref id="B198">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname> <given-names>T.</given-names></name> <name><surname>Pelphrey</surname> <given-names>K.</given-names></name> <name><surname>Allison</surname> <given-names>T.</given-names></name> <name><surname>McKeown</surname> <given-names>M.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). <article-title>Polysensory interactions along lateral temporal regions evoked by audiovisual speech</article-title>. <source>Cereb. Cortex</source> <volume>13</volume>, <fpage>1034</fpage>&#x02013;<lpage>1043</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/13.10.1034</pub-id><pub-id pub-id-type="pmid">12967920</pub-id></citation>
</ref>
<ref id="B199">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wundt</surname> <given-names>W.</given-names></name></person-group> (<year>1874</year>). <source>Grundzuge Derphysiologischen Psychologie</source>, <publisher-loc>Leipzig</publisher-loc>: <publisher-name>Engelmann</publisher-name>.</citation>
</ref>
<ref id="B200">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yabe</surname> <given-names>H.</given-names></name> <name><surname>Tervaniemi</surname> <given-names>M.</given-names></name> <name><surname>Reinikainen</surname> <given-names>K.</given-names></name> <name><surname>N&#x000E4;&#x000E4;t&#x000E4;nen</surname> <given-names>R.</given-names></name></person-group> (<year>1997</year>). <article-title>Temporal window of integration revealed by MMN to sound omission</article-title>. <source>Neuroreport</source> <volume>8</volume>, <fpage>1971</fpage>&#x02013;<lpage>1974</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-199705260-00035</pub-id><pub-id pub-id-type="pmid">9223087</pub-id></citation>
</ref>
<ref id="B201">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuille</surname> <given-names>A.</given-names></name> <name><surname>Kersten</surname> <given-names>D.</given-names></name></person-group> (<year>2006</year>). <article-title>Vision as Bayesian inference: analysis by synthesis</article-title>. <source>Trends Cogn. Sci</source>. <volume>10</volume>, <fpage>301</fpage>&#x02013;<lpage>308</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2006.05.002</pub-id><pub-id pub-id-type="pmid">16784882</pub-id></citation>
</ref>
<ref id="B202">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zion Golumbic</surname> <given-names>E.</given-names></name> <name><surname>Cogan</surname> <given-names>G. B.</given-names></name> <name><surname>Schroeder</surname> <given-names>C. E.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>Visual input enhances selective speech envelope tracking in auditory cortex at a &#x0201C;cocktail party</article-title>&#x0201D;. <source>J. Neurosci</source>. <volume>33</volume>, <fpage>1417</fpage>&#x02013;<lpage>1426</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3675-12.2013</pub-id><pub-id pub-id-type="pmid">23345218</pub-id></citation>
</ref>
</ref-list>
</back>
</article>