<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Hum. Neurosci.</journal-id>
<journal-title>Frontiers in Human Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Hum. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5161</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnhum.2017.00174</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Venezia</surname> <given-names>Jonathan H.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/66246/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Vaden</surname> <given-names>Kenneth I.</given-names> <suffix>Jr.</suffix></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/106105/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Rong</surname> <given-names>Feng</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/26470/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Maddox</surname> <given-names>Dale</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Saberi</surname> <given-names>Kourosh</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/14412/overview"/>
</contrib> 
<contrib contrib-type="author">
<name><surname>Hickok</surname> <given-names>Gregory</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/17205/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>VA Loma Linda Healthcare System</institution> <country>Loma Linda, CA, USA</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Otolaryngology&#x02014;Head and Neck Surgery, Medical University of South Carolina</institution> <country>Charleston, SC, USA</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Cognitive Sciences, Center for Cognitive Neuroscience and Engineering, University of California</institution> <country>Irvine, CA, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Edmund C. Lalor, University of Rochester, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Karsten Specht, University of Bergen, Norway; Ruth Campbell, University College London, UK; Iain DeWitt, National Institute on Deafness and Other Communication Disorders, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Jonathan H. Venezia <email>jonathan.venezia&#x00040;va.gov</email></p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>04</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>11</volume>
<elocation-id>174</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>12</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>03</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Venezia, Vaden Jr.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Venezia, Vaden Jr</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>, Rong, Maddox, Saberi and Hickok. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. <italic>Post hoc</italic> analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.</p></abstract>
<kwd-group>
<kwd>audiovisual speech</kwd>
<kwd>superior temporal sulcus</kwd>
<kwd>fMRI</kwd>
<kwd>visual motion</kwd>
<kwd>functional gradient</kwd>
</kwd-group>
<contract-num rid="cn001">DC03681</contract-num>
<contract-num rid="cn001">DC010775</contract-num>
<contract-sponsor id="cn001">National Institutes of Health<named-content content-type="fundref-id">10.13039/100000002</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="1"/>
<equation-count count="3"/>
<ref-count count="99"/>
<page-count count="17"/>
<word-count count="13100"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>Introduction</title>
<p>The superior temporal sulcus (STS) is activated during a variety of perceptual tasks including audiovisual integration (Beauchamp et al., <xref ref-type="bibr" rid="B8">2004b</xref>; Amedi et al., <xref ref-type="bibr" rid="B2">2005</xref>), speech perception (Binder et al., <xref ref-type="bibr" rid="B15">2000</xref>, <xref ref-type="bibr" rid="B16">2008</xref>; Hickok and Poeppel, <xref ref-type="bibr" rid="B41">2004</xref>, <xref ref-type="bibr" rid="B42">2007</xref>; Price, <xref ref-type="bibr" rid="B67">2010</xref>), and biological motion perception (Allison et al., <xref ref-type="bibr" rid="B1">2000</xref>; Grossman et al., <xref ref-type="bibr" rid="B35">2000</xref>, <xref ref-type="bibr" rid="B37">2005</xref>; Grossman and Blake, <xref ref-type="bibr" rid="B36">2002</xref>; Beauchamp et al., <xref ref-type="bibr" rid="B10">2003</xref>; Puce and Perrett, <xref ref-type="bibr" rid="B69">2003</xref>). It has been widely established that auditory speech perception is influenced by visual speech information (Sumby and Pollack, <xref ref-type="bibr" rid="B90">1954</xref>; McGurk and MacDonald, <xref ref-type="bibr" rid="B55">1976</xref>; Dodd, <xref ref-type="bibr" rid="B29">1977</xref>; Reisberg et al., <xref ref-type="bibr" rid="B73">1987</xref>; Callan et al., <xref ref-type="bibr" rid="B19">2003</xref>), which is represented in part within biological motion circuits that specify the shape and position of vocal tract articulators. This high-level visual information is hypothesized to interact with auditory speech representations in the STS (Callan et al., <xref ref-type="bibr" rid="B19">2003</xref>). Indeed, the STS is well-positioned to integrate auditory and visual inputs as it lies between visual association cortex in the posterior lateral temporal region (Beauchamp et al., <xref ref-type="bibr" rid="B9">2002</xref>) and auditory association cortex in the superior temporal gyrus (Rauschecker et al., <xref ref-type="bibr" rid="B72">1995</xref>; Kaas and Hackett, <xref ref-type="bibr" rid="B46">2000</xref>; Wessinger et al., <xref ref-type="bibr" rid="B96">2001</xref>). In nonhuman primates, polysensory fields in STS have been shown to receive convergent input from unimodal auditory and visual cortical regions (Seltzer and Pandya, <xref ref-type="bibr" rid="B79">1978</xref>, <xref ref-type="bibr" rid="B80">1994</xref>; Lewis and Van Essen, <xref ref-type="bibr" rid="B50">2000</xref>) and these fields contain auditory, visual and bimodal neurons (Benevento et al., <xref ref-type="bibr" rid="B12">1977</xref>; Bruce et al., <xref ref-type="bibr" rid="B18">1981</xref>; Dahl et al., <xref ref-type="bibr" rid="B25">2009</xref>). Furthermore, human functional neuroimaging evidence supports the notion that the STS is a multisensory convergence zone for speech (Calvert et al., <xref ref-type="bibr" rid="B22">2000</xref>; Wright et al., <xref ref-type="bibr" rid="B97">2003</xref>; Beauchamp et al., <xref ref-type="bibr" rid="B7">2004a</xref>, <xref ref-type="bibr" rid="B11">2010</xref>; Szycik et al., <xref ref-type="bibr" rid="B91">2008</xref>; Stevenson and James, <xref ref-type="bibr" rid="B86">2009</xref>; Stevenson et al., <xref ref-type="bibr" rid="B85">2010</xref>, <xref ref-type="bibr" rid="B88">2011</xref>; Nath and Beauchamp, <xref ref-type="bibr" rid="B59">2011</xref>, <xref ref-type="bibr" rid="B60">2012</xref>).</p>
<p>However, it remains unclear what role, if any, biological-motion-sensitive regions of the STS play in multimodal speech perception. By and large, facial motion&#x02014;including natural facial motion (Puce et al., <xref ref-type="bibr" rid="B68">1998</xref>), movements of facial line drawings (Puce et al., <xref ref-type="bibr" rid="B70">2003</xref>), and point-light facial motion (Bernstein et al., <xref ref-type="bibr" rid="B14">2011</xref>)&#x02014;yield activation quite posteriorly in the STS, a location that is potentially distinct from auditory and visual speech-related activations. The results of a meta-analysis (Figure <xref ref-type="fig" rid="F1">1</xref>) performed using NeuroSynth (Yarkoni et al., <xref ref-type="bibr" rid="B98">2011</xref>) show peak activations for dynamic facial expressions, audiovisual speech, and auditory speech sounds that are distributed posterior-to-anterior along the STS, respectively. Previous work has established a similar visual-to-auditory gradient within regions of the STS that respond to audiovisual speech (Wright et al., <xref ref-type="bibr" rid="B97">2003</xref>), and the gradient in Figure <xref ref-type="fig" rid="F1">1</xref> further suggests that neural populations near the posterior STS are active in visual processing related to facial and biological motion perception. It is hypothesized that posterior-visual STS regions facilitate the extraction of abstract properties from biological motion stimuli (e.g., action class or action goal), defined by their invariance to specific features including motion kinematics, image size, or viewpoint (Lestou et al., <xref ref-type="bibr" rid="B49">2008</xref>; Grossman et al., <xref ref-type="bibr" rid="B38">2010</xref>). Likewise, facial motion computations in posterior STS could contribute to abstracted speech representations (Bernstein and Liebenthal, <xref ref-type="bibr" rid="B13">2014</xref>), although the relationship between biological motion and audiovisual speech systems in the STS has not been completely characterized.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Meta-analyses.</bold> A posterior to anterior gradient of effects related to visual and auditory speech information in bilateral superior temporal sulcus (STS). Three separate meta-analyses were performed using NeuroSynth (<ext-link ext-link-type="uri" xlink:href="http://neurosynth.org">http://neurosynth.org</ext-link>) to identify studies that only included healthy participants and reported effects in STS. Two custom meta-analyses (dynamic facial expressions, audiovisual speech) and one term-based meta-analysis (speech sounds) were performed (see color key for details). The FDR-corrected (<italic>p</italic> &#x0003C; 0.01) reverse inference Z-statistic maps for each meta-analysis were downloaded from NeuroSynth for plotting. Results from dynamic facial expressions (blue), audiovisual speech (green), and speech sounds (red) meta-analyses are plotted on the study-specific template in MNI space (see &#x0201C;Study-Specific Anatomical Template&#x0201D; Section) and restricted to an STS region of interest to highlight the spatial distribution of effects within the STS (see &#x0201C;STS Region of Interest Analysis&#x0201D; Section).</p></caption>
<graphic xlink:href="fnhum-11-00174-g0001.tif"/>
</fig>
<p>A challenge to characterizing spatially proximal and functionally related multimodal speech systems within the STS is that neuroimaging studies of visual speech processing typically factor out activation to nonspeech facial-motion control stimuli (Campbell et al., <xref ref-type="bibr" rid="B23">2001</xref>; Okada and Hickok, <xref ref-type="bibr" rid="B63">2009</xref>; Bernstein et al., <xref ref-type="bibr" rid="B14">2011</xref>). This strategy could limit sensitivity to cortical regions that respond to both speech and nonspeech facial motion, outlined above as supporting action encoding and other pre-linguistic perceptual processes. Similar arguments have been made regarding the interpretation of contrast-based neuroimaging studies in the auditory speech domain (Okada et al., <xref ref-type="bibr" rid="B64">2010</xref>; Stoppelman et al., <xref ref-type="bibr" rid="B89">2013</xref>). Moreover, visual speech/lipreading studies have not directly examined visual-speech-specific responses with respect to functionally defined auditory and/or multimodal speech networks. The goal of the present fMRI experiment was to more completely characterize modality-dependent and independent responses to speech, particularly within the STS. As such, we set out to map the network of auditory, visual and audiovisual<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> speech processing regions of the STS in unprecedented detail. Our investigation was guided by the following questions: (1) Does the anterior-posterior gradient of auditory and visual responses in STS observed across studies (Figure <xref ref-type="fig" rid="F1">1</xref>) replicate within a single, independent group of participants? If so (2) at what level of processing do speech-specific representations emerge in the STS; in particular, do posterior-visual regions of the STS play a role in speech processing? Participants were presented with auditory and visual speech (consonant-vowel (CV) syllables) and nonspeech (spectrally rotated syllables, nonspeech facial gestures) to enable measurement of modality-dependent and independent responses in the STS that are hypothesized to contribute to speech recognition.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and Methods</title>
<sec id="s2-1">
<title>Participants</title>
<p>This study was approved by the UC Irvine Institutional Review Board and carried out according to the Declaration of Helsinki. All participants gave written informed consent prior to their participation. Twenty (three females) right-handed native English speakers between 18 and 30 years of age participated in the study. All volunteers had normal or corrected-to-normal vision, normal hearing by self-report, no known history of neurological disease, and no other contraindications for MRI. Two participants were excluded from MRI analysis leaving <italic>N</italic> = 18 for the imaging analysis (see below).</p>
</sec>
<sec id="s2-2">
<title>Stimuli and Procedure</title>
<sec id="s2-2-1">
<title>Stimuli</title>
<p>Six two-second video clips were recorded for each of five experimental conditions featuring a single male actor shown from the neck up (Figure <xref ref-type="fig" rid="F2">2</xref>). In three speech conditions&#x02014;auditory speech (A), visual speech (V), and audiovisual speech (AV)&#x02014;the stimuli were six visually distinguishable CV syllables (&#x0005C;ba&#x0005C;, &#x0005C;tha&#x0005C;, &#x0005C;va&#x0005C;, &#x0005C;bi&#x0005C;, &#x0005C;thi&#x0005C;, &#x0005C;vi&#x0005C;). In the A condition, clips consisted of a still frame of the actor&#x02019;s face paired with auditory recordings of the syllables (44.1 kHz, 16-bit resolution). In the V condition, videos of the actor producing the syllables were presented without sound (30 frames/s). In the AV condition, videos of the actor producing the syllables were presented simultaneously with congruent auditory recordings. There were also two non-speech conditions&#x02014;spectrally rotated speech (R) and nonspeech facial gestures (G). In the R condition, spectrally inverted (Blesser, <xref ref-type="bibr" rid="B17">1972</xref>) versions of the auditory syllable recordings were presented along with a still frame of the actor. Rotated speech stimuli were created from the original auditory syllable recordings by first bandpass filtering (100&#x02013;3900 Hz) and then spectrally inverting about the filter&#x02019;s center frequency (2000 Hz). Spectral rotation preserves the spectrotemporal complexity of speech, producing a stimulus that is acoustically similar to clear speech but unintelligible (Scott et al., <xref ref-type="bibr" rid="B76">2000</xref>; Narain et al., <xref ref-type="bibr" rid="B58">2003</xref>; Okada et al., <xref ref-type="bibr" rid="B64">2010</xref>) or, in the case of sublexical speech tokens, significantly less discriminable (Liebenthal et al., <xref ref-type="bibr" rid="B51">2005</xref>). In the G condition, the actor produced the following series of nonspeech lower-face gestures without sound: partial opening of the mouth with leftward deviation, partial opening of mouth with rightward deviation, opening of mouth with lip protrusion, tongue protrusion, lower lip biting and lip retraction. These gestures contain movements of a similar extent and duration as those used to produce the syllables in the speech conditions, but cannot be construed as speech (Campbell et al., <xref ref-type="bibr" rid="B23">2001</xref>). A rest condition was included consisting of a still frame of the actor with no sound. All auditory speech stimuli were bandpass filtered to match the bandwidth of the rotated speech stimuli. All auditory stimuli were normalized to equal root-mean-square amplitude.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Example stimuli from each experimental condition</bold>.</p></caption>
<graphic xlink:href="fnhum-11-00174-g0002.tif"/>
</fig>
<p>Twelve-second blocks were created by concatenating the individual video clips in each experimental condition. Each block contained all six of the clips from that condition (i.e., all six CV syllables, all six rotated CV syllables, or all six nonspeech facial gestures). The clips were concatenated in random order to form 35 distinct blocks in each condition. Five additional &#x0201C;oddball&#x0201D; blocks were created for each condition including rest, consisting of five within-condition clips and a single oddball clip from one of the other conditions (e.g., an oddball block might contain five A clips and a single AV clip). Oddball clips were placed at random in one of the five positions following the first clip in the block. An oddball could deviate from the standards either visually (e.g., a V clip in a G block), acoustically (e.g., an A clip in an R block), or both (e.g., an AV clip in an R block or an A clip in a V block). Each of these types of deviation occurred with equal frequency so that participants would attend equally to auditory and visual components of the stimuli. We selected the oddball task because it did not force participants to explicitly categorize or identify individual stimuli&#x02014;particularly speech sounds&#x02014;within a block. The oddball task asked participants to detect deviance on the basis of stimulus condition rather than stimulus identity <italic>per se</italic>. This low-level task ensured that speech-related activations were not contaminated by linguistic and/or verbal working memory demands unrelated to sensory-perceptual processing.</p>
<sec id="s2-2-1-1">
<title>Motion energy in visual speech vs. nonspeech facial gestures</title>
<p>Bernstein et al. (<xref ref-type="bibr" rid="B14">2011</xref>) point out that speech and nonspeech facial gestures such as those used in the present study may not be well-matched on a number of low-level characteristics including total motion energy. To test this, we computed an estimate of the total motion energy in our V and G stimuli as follows. For each video clip, a frame-by-frame estimate of the vertical and horizontal optical flow velocity was calculated using the Horn-Schunck algorithm (Horn and Schunck, <xref ref-type="bibr" rid="B43">1981</xref>) implemented in MATLAB. The total motion energy in each clip was computed as the root-mean-square optical flow velocity across both image dimensions and all frames. A condition-level estimate of the total motion energy was computed by summing the estimated motion energy across all six clips in a given condition. Using this approach, we found that nonspeech facial gestures (G) had 33% more total motion energy than speech gestures (V). As such, we should expect that brain regions sensitive to motion energy would activate more to G than V.</p>
</sec>
</sec>
<sec id="s2-2-2">
<title>Procedure</title>
<p>Functional imaging runs consisted of pseudo-random presentation of 21 blocks, three from each condition along with three rest blocks and three oddball blocks. Blocks were separated by a 500 ms inter-block interval during which a black fixation cross was presented against a gray background. Participants were instructed to press a button each time an oddball was detected. The experiment started with a short practice session inside the scanner during which participants were exposed to a single block from each condition including a rest block and an oddball block. Participants were then scanned for ten functional runs immediately followed by acquisition of a high-resolution T1 anatomical volume. Auditory stimuli were presented through an MR compatible headset (ResTech) and stimulus delivery and timing were controlled using Cogent software<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> implemented in Matlab 6 (Mathworks Inc., Natick, MA, USA).</p>
</sec>
</sec>
<sec id="s2-3">
<title>Image Acquisition</title>
<p>MR images were obtained on a Philips Achieva 3T (Philips Medical Systems, Andover, MA, USA) fitted with an 8-channel SENSE receiver/head coil, at the Research Imaging Center facility at the University of California, Irvine. We collected a total of 1090 echo planar imaging (EPI) volumes over 10 runs using single pulse Gradient Echo EPI (matrix = 112 &#x000D7; 110, TR = 2.5 s, TE = 25 ms, size = 1.957 &#x000D7; 1.957 &#x000D7; 1.5 mm, flip angle = 90). Forty-Four axial slices provided whole brain coverage. Slices were acquired sequentially with a 0.5 mm gap. After the functional scans, a high-resolution anatomical image was acquired with a magnetization prepared rapid acquisition gradient echo (MPrage) pulse sequence in the axial plane (matrix = 240 &#x000D7; 240, TR = 11 ms, TE = 3.54 ms, size = 1 &#x000D7; 1 &#x000D7; 1 mm).</p>
</sec>
<sec id="s2-4">
<title>Data Analysis</title>
<sec id="s2-4-1">
<title>Behavioral Data Analysis</title>
<p>The Signal Detection Theory measure <italic>d&#x02032;</italic> was calculated to determine performance on the oddball detection task (Green and Swets, <xref ref-type="bibr" rid="B34">1966</xref>). A hit was defined as a positive response (button press) to an oddball block, while a false alarm was defined as a positive response to a non-oddball block. The hit rate (H) was calculated as the number of hits divided by the total number of oddball blocks, while the false alarm rate (F) was calculated as the number of false alarms divided by the number of non-oddball blocks, and <italic>d&#x02032;</italic> was calculated as:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:msup><mml:mi>d</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>&#x00A0;</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:msup><mml:mi>&#x03A6;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mtext>1</mml:mtext></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mi>&#x03A6;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mtext>1</mml:mtext></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where &#x003A6; is the standard normal cumulative distribution function. Participants with a <italic>d&#x02032;</italic> greater than 1.5 standard deviation below the group mean were excluded from further analysis (<italic>N</italic> = 2; see &#x0201C;Results&#x0201D; Section). We also calculated H separately for each oddball type (A, V, AV, R, G). These condition-specific hit rates were entered in a repeated measures ANOVA with Greenhouse/Geisser correction.</p>
</sec>
<sec id="s2-4-2">
<title>Neuroimaging Analysis</title>
<p>Minus the two participants excluded for poor behavioral performance, the total sample size for neuroimaging analysis was <italic>N</italic> = 18.</p>
<sec id="s2-4-2-1">
<title>Study-specific anatomical template</title>
<p>A study-specific anatomical template image was created using symmetric diffeomorphic registration (SyN) in the Advanced Normalization Tools (ANTS v2.0.0/2.1.0) software (Avants and Gee, <xref ref-type="bibr" rid="B4">2004</xref>; Avants et al., <xref ref-type="bibr" rid="B5">2008</xref>). Each participant&#x02019;s T1 anatomical image was submitted to the template-construction processing stream in ANTS (buildtemplateparallel.sh), which comprises rigid and SyN registration steps. For SyN, we used a cross correlation similarity metric (Avants et al., <xref ref-type="bibr" rid="B6">2011</xref>) with a three-level multi-resolution registration with 50 &#x000D7; 70 &#x000D7; 10 iterations. The whole-head template was skull stripped in ANTS (antsBrainExtraction.sh) and a brain+cerebellum mask of the skull-stripped template was inverse warped to each participant&#x02019;s native space and used to skull strip the individual participant T1 images. These skull-stripped images were then re-registered to the skull-stripped template using SyN. The resulting study-specific anatomical template was then aligned to the MNI-space ICBM 152 nonlinear atlas version 2009c (Fonov et al., <xref ref-type="bibr" rid="B32">2009</xref>, <xref ref-type="bibr" rid="B33">2011</xref>) using a 12-parameter affine registration in ANTS. The ICBM atlas, which had better tissue contrast than our study-specific template image, was diffeomorphically warped (SyN) to the study-specific template in MNI space. The warped version of the ICBM atlas was used to plot the functional data.</p>
</sec>
<sec id="s2-4-2-2">
<title>Preprocessing</title>
<p>Preprocessing of the functional data was performed using AFNI (v16.0.11) software<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. For each run, slice timing correction was performed followed by realignment (motion correction) and coregistration of the EPI images to the high resolution anatomical image. Spatial normalization was then performed by applying the set of rigid, diffeomorphic, and affine transformations mapping each participant&#x02019;s anatomical image to the study-specific template in MNI space (antsApplyTransforms, linear interpolation). Images were then spatially smoothed with an isotropic 6-mm full-width half-maximum (FWHM) Gaussian kernel, and each run was scaled to have a mean of 100 across time at each voxel.</p>
</sec>
<sec id="s2-4-2-3">
<title>First level analysis</title>
<p>First level regression analysis (AFNI 3dDeconvolve) was performed in individual subjects. To create the regressors of interest, a stimulus-timing vector was created for each experimental condition and convolved with a model hemodynamic response function. The &#x0201C;still face&#x0201D; rest condition was not modeled explicitly and was thus included in the baseline term. An additional 12 regressors corresponding to motion parameters determined during the realignment stage of preprocessing along with their temporal derivatives were entered into the model. Oddball blocks were modeled as a single regressor of no interest. Individual time points were censored from analysis when more than 10% of in-brain voxels were identified as outliers (AFNI 3dToutcount) or when the Euclidean norm of the motion derivatives exceeded 0.4.</p>
</sec>
<sec id="s2-4-2-4">
<title>Group analysis</title>
<p>A second-level analysis of variance was performed on the first-level parameter estimates (henceforth &#x0201C;percent signal change&#x0201D; (PSC)) from each participant, treating &#x0201C;participant&#x0201D; as a random effect. Activation images (mean PSC) and statistical parametric maps (<italic>t</italic>-statistics) were created for each individual condition. Significantly active voxels were defined as those for which <italic>t</italic>-statistics exceeded the <italic>p</italic> &#x0003C; 0.005 level with a cluster extent threshold of 185 voxels. This cluster threshold held the family-wise error rate (FWER) less than 0.05 as determined by Monte Carlo simulation using AFNI 3dClustSim with padding to minimize edge effects (Eklund et al., <xref ref-type="bibr" rid="B30">2016</xref>). Estimates of smoothness in the data were drawn from the residual error time series for each participant after first-level analysis (AFNI 3dFWHMx). These estimates were averaged across participants separately in each voxel dimension for input to 3dClustSim.</p>
<p>To identify multisensory regions at the group level, we performed the conjunction A&#x02229;V, and to identify regions sensitive to facial motion at the group level, we performed the conjunction V&#x02229;G. Conjunctions were performed by constructing minimum <italic>t</italic>-maps (e.g., minimum T score from [A, V] at each voxel) and these maps were thresholded at <italic>p</italic> &#x0003C; 0.005 with a cluster extent threshold of 185 voxels (FWER &#x0003C; 0.05) as for individual condition maps. This tests the &#x0201C;conjunction null&#x0201D; hypothesis (Nichols et al., <xref ref-type="bibr" rid="B61">2005</xref>). We also performed contrasts for activations greater for speech than nonspeech, matched for input modality: A &#x0003E; R and V &#x0003E; G.</p>
</sec>
<sec id="s2-4-2-5">
<title>STS region of interest analysis</title>
<p>An STS ROI was generated from the TT_desai_ddpmaps probabilistic atlas distributed with AFNI. The atlas, which is based on Freesurfer demarcation of the STS in 61 brains (see, Liebenthal et al., <xref ref-type="bibr" rid="B53">2014</xref>), shows for each voxel the percentage of brains in which that voxel was included in the STS. Left and right hemisphere probabilistic maps of the STS were thresholded at 30% to create a binary mask for ROI analysis. The ROI mask, which was originally aligned to the TT_N27 (Colin) template brain distributed with AFNI, was then aligned to our study-specific template in MNI space by first warping the TT_N27 template to our study-specific template using a 12-parameter affine transformation in ANTS, and then applying the transformation matrix to the STS binary mask image using nearest neighbor interpolation. The left and right STS ROIs were then subdivided splitting the STS into eight equal-length subregions along the anterior-posterior axis. The centers of mass of each STS subregion are provided in Table <xref ref-type="table" rid="T1">1</xref>.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption><p><bold>Centers of mass of superior temporal sulcus (STS) subregions (MNI coordinates)</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th align="center" colspan="3">Left STS</th>
<th align="center" colspan="3">Right STS</th>
</tr>
<tr>
<th align="left"><bold>Subregion number</bold></th>
<th align="center"><italic>x</italic></th>
<th align="center"><italic>y</italic></th>
<th align="center"><italic>z</italic> </th>
<th align="center"><italic>x</italic></th>
<th align="center"><italic>y</italic></th>
<th align="center"><italic>z</italic></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">(Anterior) 1</td>
<td align="center">&#x02212;52</td>
<td align="center">&#x02212;4</td>
<td align="center">&#x02212;17</td>
<td align="center">52</td>
<td align="center">&#x02212;4</td>
<td align="center">&#x02212;17</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">&#x02212;54</td>
<td align="center">&#x02212;13</td>
<td align="center">&#x02212;11</td>
<td align="center">53</td>
<td align="center">&#x02212;15</td>
<td align="center">&#x02212;10</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">&#x02212;54</td>
<td align="center">&#x02212;22</td>
<td align="center">&#x02212;7</td>
<td align="center">52</td>
<td align="center">&#x02212;24</td>
<td align="center">&#x02212;5</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">&#x02212;54</td>
<td align="center">&#x02212;32</td>
<td align="center">&#x02212;2</td>
<td align="center">52</td>
<td align="center">&#x02212;32</td>
<td align="center">0</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">&#x02212;54</td>
<td align="center">&#x02212;42</td>
<td align="center">4</td>
<td align="center">51</td>
<td align="center">&#x02212;41</td>
<td align="center">10</td>
</tr>
<tr>
<td align="left">6</td>
<td align="center">&#x02212;50</td>
<td align="center">&#x02212;52</td>
<td align="center">15</td>
<td align="center">50</td>
<td align="center">&#x02212;50</td>
<td align="center">18</td>
</tr>
<tr>
<td align="left">7</td>
<td align="center">&#x02212;46</td>
<td align="center">&#x02212;62</td>
<td align="center">21</td>
<td align="center">47</td>
<td align="center">&#x02212;58</td>
<td align="center">18</td>
</tr>
<tr>
<td align="left">(Posterior) 8</td>
<td align="center">&#x02212;43</td>
<td align="center">&#x02212;71</td>
<td align="center">23</td>
<td align="center">45</td>
<td align="center">&#x02212;65</td>
<td align="center">22</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s2-4-2-5-1">
<title>Generalized linear mixed model: auditory, visual, speech effects on BOLD</title>
<p>A generalized linear mixed model (GLMM) regression analysis was performed to characterize the extent to which BOLD contrast changed within STS subregions during auditory, visual and speech presentations. For each participant, block-by-block PSC estimates for each condition were extracted from each of the eight STS subregions in each hemisphere using &#x0201C;Least Squares&#x02014;Separate&#x0201D; (LS-S) regression (Mumford et al., <xref ref-type="bibr" rid="B56">2012</xref>). In the LS-S regression (AFNI 3dLSS), the model included one regressor of interest modeling a single block from a given condition (e.g., V), and five nuisance regressors modeling: (1) all other blocks in the condition of interest (e.g., V); and (2) all blocks in the remaining conditions (e.g., A, AV, R, G). Run-level baseline and drift terms were included in order to remove global signal differences and differential trends across runs. Repeating this for each block in each condition produced an LS-S &#x0201C;time series&#x0201D; with block-level BOLD estimates that served as input to the GLMM. The GLMM parameterized the five conditions (A, V, AV, R, G) to capture modality-dependent BOLD changes. First, an auditory parameter (AP) was coded as AP = 1 for the A, AV, and R conditions, and AP = 0 for the V and G conditions. Second, a visual parameter (VP) was coded as VP = 1 for the V, AV, and G conditions, and VP = 0 for the A and R conditions. Finally, a speech parameter (SP) was coded as SP = 1 for the A, V and AV conditions, and SP = 0 for the R and G conditions. The GLMM including fixed effects for AP, VP and SP, as well as participant-level random intercept and random slope terms for AP, VP and SP, was fit to the block-level BOLD data separately for each STS subregion in each hemisphere. The predictor variables were scaled within participant and region (<italic>m</italic> = 0, sd = 1) to account potentially unbalanced observations. Separately for each participant, observations with extreme BOLD values were excluded from the model based on the following formula for outlier detection:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>*</mml:mo><mml:msqrt><mml:mrow><mml:mi>&#x03C0;</mml:mi><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msqrt><mml:mo>*</mml:mo><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mo>;</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>&#x03A6;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mn>0.001</mml:mn><mml:mo>/</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where C is the outlier cut-off, MAD is the median absolute deviation, &#x003A6; is the standard normal cumulative distribution function, and <italic>N</italic> is the number of time points (blocks). The GLMM equation can be expressed as follows:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mi>B</mml:mi><mml:mi>O</mml:mi><mml:mi>L</mml:mi><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00A0;</mml:mo><mml:mo>~</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:mi>A</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>V</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>S</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00A0;</mml:mo><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>1</mml:mtext><mml:mo>+</mml:mo><mml:mi>A</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>V</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>S</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x007C;</mml:mo><mml:mi>s</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mo>,</mml:mo></mml:math></disp-formula>
<p>where <italic>s</italic> stands for subject and <italic>c</italic> for condition.</p>
<p>Non-parametric significance tests were performed after calculating <italic>t</italic>-scores for AP, VP and SP effects. Because the ROI statistic results were not spatially independent, family-wise error corrected significance was calculated by permuting predictor variables without replacement. Empirical null distributions of <italic>t</italic>-scores were computed by randomly permuting condition order and recalculating <italic>t</italic>-scores (10,000 reshuffled samples). Each permutation was applied to AP, VP and SP conditions identically to preserve covariance among the parameters. Furthermore, the same permutation was used for each STS subregion to preserve spatial dependencies. The dependent variable (BOLD) was not reordered. After each permutation, the maximum and minimum <italic>t</italic>-scores across STS subregions were used to create empirical <italic>max-t</italic> and <italic>min-t</italic> distributions for each fixed effect (AP, VP, SP). Observed <italic>t</italic>-scores (<italic>t</italic><sub>obs</sub>) were compared to the <italic>max-t</italic> or <italic>min-t</italic> distributions to calculate one tailed <italic>p</italic>-values: <italic>P(max-t &#x0003E; t<sub>obs</sub>)</italic> for positive <italic>t</italic>-scores, or <italic>P(min-t &#x0003C; t<sub>obs</sub>)</italic> for negative scores. Tests with <italic>p</italic> &#x0003C; 0.05 were considered significant with FWER &#x0003C; 0.05 across all STS subregions (Nichols and Holmes, <xref ref-type="bibr" rid="B62">2002</xref>).</p>
</sec>
<sec id="s2-4-2-5-2">
<title>Principal component analysis</title>
<p>To summarize changes in the pattern of responses across conditions throughout the STS, we conducted a principal component analysis on the group activation images for each of the five experimental conditions. In the PCA, each voxel of the left or right STS was considered as a separate variable, and each experimental condition was considered as a separate observation. The first two principal components were extracted. Each component yielded a score for each experimental condition, where the pattern of scores demonstrated how patterns of activation across conditions separated along that principal dimension. Each component also yielded a set of coefficients across voxels, where the sign of the coefficient determined which conditions that voxel preferred and the magnitude of the coefficient determined the extent to which that voxel&#x02019;s activation followed the pattern described by the condition scores.</p>
</sec>
<sec id="s2-4-2-5-3">
<title><italic>Post hoc</italic> analysis of visual STS subregions</title>
<p>The results of the GLMM analysis (see &#x0201C;GLMM Region of Interest Results&#x0201D; Section) revealed several posterior STS subregions in each hemisphere that showed a significant preference for stimuli containing visual information (V, AV, G) vs. only auditory information (A, R). For the five visually responsive/preferring subregions (three right hemisphere, two left hemisphere), we performed a repeated-measures ANOVA comparing PSC in V, AV and G (Greenhouse-Geisser correction for violations of sphericity; <italic>&#x003B1;</italic> = 0.05, uncorrected). In STS regions that did not exhibit significant univariate differences for V, AV and G, multivariate pattern classification analysis (MVPA) was performed to determine whether these conditions could be distinguished in terms of spatial patterns of activity. This analysis was crucial because regions that activated significantly during visual presentations, yet did not distinguish between visual conditions on the basis of univariate activation, could have nonetheless carried important, condition-specific information in multivariate activation patterns (Mur et al., <xref ref-type="bibr" rid="B57">2009</xref>). MVPA was achieved using a support vector machine (SVM; MATLAB Bioinformatics Toolbox v3.1, The MathWorks, Inc., Natick, MA, USA) as the pattern classification method. Two pairwise classifications were performed. First, activity patterns were used to discriminate between two different types of facial motion (V vs. G). Second, activity patterns were used to differentiate between visual and audiovisual trials with identical visual information (V vs. AV). Both MVPA tests were conducted on BOLD time series data in native space. The STS subregions in the group anatomical space were spatially transformed into native space using the inverse of the transformations mapping each participant&#x02019;s anatomical image to the study-specific template (ANTS, nearest neighbor interpolation).</p>
<p>Inputs to the classifier were estimates of activation to each block calculated using LS-S regression as described above. LS-S coefficients representing all 15 blocks for each condition were calculated and stored with appropriate run labels at each voxel. Prior to classification, LS-S coefficients for each ROI were <italic>z</italic>-scored across voxels for each block, effectively removing mean amplitude differences across blocks (Mumford et al., <xref ref-type="bibr" rid="B56">2012</xref>; Coutanche, <xref ref-type="bibr" rid="B24">2013</xref>).</p>
<p>We performed SVM classification on the LS-S data using a leave-one-out cross validation approach within-subject (Vapnik, <xref ref-type="bibr" rid="B94">1999</xref>). In each iteration, we used data from 9 of the 10 functional scan runs to train an SVM classifier and then used the trained classifier to test the data from the remaining run. The SVM-estimated condition labels for the testing data set were then compared with the real labels to compute classification sensitivity. For each pairwise classification, one condition was arbitrarily defined as signal and the other as noise. A classifier hit was counted when the SVM-estimated condition label matched the real condition label for the &#x0201C;signal&#x0201D; condition, and a false alarm was counted when the SVM-estimated label did not match the real condition label in the &#x0201C;noise&#x0201D; condition. A measure of sensitivity, <italic>d&#x02032;</italic>, was calculated following the formula for a yes-no experiment (equation 1, above). Classification <italic>d&#x02032;</italic> for each subject was derived by averaging the <italic>d&#x02032;</italic> scores across all leave-one-out runs, and an overall <italic>d&#x02032;</italic> was computed by averaging across subjects for each pairwise classification.</p>
<p>Classification <italic>d&#x02032;</italic> scores were evaluated statistically using a nonparametric bootstrap method (Lunneborg, <xref ref-type="bibr" rid="B54">2000</xref>). Classification procedures were repeated 10,000 times for each pairwise classification within each individual data set, with the condition labels reshuffled per repetition. This provided an empirical null distribution of <italic>d&#x02032;</italic> for each subject and pairwise classification. A bootstrap-T approach was used to assess the significance of the classification <italic>d&#x02032;</italic> across participants. For each repetition of the bootstrap, a <italic>t</italic>-test of the bootstrapped <italic>d&#x02032;</italic> scores across all subjects against the ideal chance <italic>d&#x02032;</italic> score (zero) was performed. The observed <italic>t</italic>-score (<italic>t</italic><sub>obs</sub>) obtained from the true data was then statistically tested against the empirical null distribution of <italic>t</italic>-scores (<italic>t</italic><sub>null</sub>). A <italic>p</italic>-value was calculated as <italic>P</italic>(<italic>t</italic><sub>null</sub> &#x0003E; <italic>t</italic><sub>obs</sub>), where <italic>p</italic> &#x0003C; 0.05 determined that <italic>d&#x02032;</italic> was significantly greater than chance across subjects.</p>
</sec>
</sec>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>Behavior</title>
<p>Based on their performance in the condition-oddball detection task, two participants were below the behavioral cut off and were excluded from further analysis (<italic>d&#x02032;</italic> = 1.85, hits = 14/30, false alarms = 4/150; and <italic>d&#x02032;</italic> = 2.13, hits = 14/30, false alarms = 2/150). The remaining eighteen participants performed well on the task (mean <italic>d&#x02032;</italic> = 3.40 &#x000B1; 0.14 SEM, mean hits = 26/30 &#x000B1; 0.56 SEM, mean false alarms = 3/150 &#x000B1; 0.84 SEM), which indicated that they attended to both auditory and visual components of the stimuli. Among the participants whose performance exceeded the behavioral cutoff, there was not a significant difference in hit rate across conditions (<italic>F</italic><sub>(2.3,38.7)</sub> = 2.07, <italic>p</italic> = 0.13).</p>
</sec>
<sec id="s3-2">
<title>Neuroimaging</title>
<sec id="s3-2-1">
<title>Whole-Brain Results</title>
<p>Activation maps for each of the five experimental conditions relative to rest are shown in Figure <xref ref-type="fig" rid="F3">3</xref> (FWER &#x0003C; 0.05). Visual facial gestures (V, AV, G) activated bilateral primary and secondary visual cortices, lateral occipital-temporal visual regions, inferior and middle temporal gyri, and posterior STS. Conditions containing auditory information (A, AV, R) activated supratemporal auditory regions, the lateral superior temporal gyrus, and portions of the STS bilaterally. All conditions except for R activated bilateral inferior frontal regions. We tested directly for voxels showing an enhanced response to intelligible speech by computing the contrasts A &#x0003E; R and V &#x0003E; G. The A &#x0003E; R contrast (not displayed) did not yield any significant differences at the group level. Although this is not consistent with previous imaging work (Scott et al., <xref ref-type="bibr" rid="B76">2000</xref>; Narain et al., <xref ref-type="bibr" rid="B58">2003</xref>; Liebenthal et al., <xref ref-type="bibr" rid="B51">2005</xref>; Okada et al., <xref ref-type="bibr" rid="B64">2010</xref>), we believe that our use of sublexical stimuli may have contributed to this null result. The V &#x0003E; G contrast yielded a visual speech network consistent with previous work (Campbell et al., <xref ref-type="bibr" rid="B23">2001</xref>; Callan et al., <xref ref-type="bibr" rid="B20">2004</xref>; Okada and Hickok, <xref ref-type="bibr" rid="B63">2009</xref>; Bernstein et al., <xref ref-type="bibr" rid="B14">2011</xref>; Hertrich et al., <xref ref-type="bibr" rid="B40">2011</xref>), including bilateral STS, left inferior frontal gyrus, and a host of inferior parietal and frontal sensory-motor brain regions (Figure <xref ref-type="fig" rid="F4">4B</xref>).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>BOLD effects during each experimental condition.</bold> Results are shown on an inflated surface rendering of the study-specific template in MNI space. Top: speech conditions (A, V, AV). Bottom: nonspeech conditions (R, G). All maps thresholded at an uncorrected voxel-wise <italic>p</italic> &#x0003C; 0.005 with a cluster threshold of 185 voxels (family-wise error rate (FWER) corrected <italic>p</italic> &#x0003C; 0.05). PSC, percent signal change.</p></caption>
<graphic xlink:href="fnhum-11-00174-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Whole-brain conjunction analyses. (A)</bold> Group-level conjunction results plotted on an inflated surface rendering of the study-specific template in MNI space. Significant responses during visual speech and nonspeech facial gestures (V&#x02229;G, teal) were observed in the posterior STS. Significant responses during auditory speech and visual speech (A&#x02229;V, yellow) were observed the middle STS (mSTS). <bold>(B)</bold> Conjunction analyses are plotted together with the contrast V vs. G (blue), which highlights regions that activate preferentially to visual speech vs. nonspeech facial gestures. These visual-speech-specific regions fall anterior to V&#x02229;G in the STS and overlap strongly with A&#x02229;V (pink). All maps were thresholded at voxel-wise <italic>p</italic> &#x0003C; 0.005 (uncorrected) with a cluster-extent threshold of 185 voxels (FWER corrected <italic>p</italic> &#x0003C; 0.05).</p></caption>
<graphic xlink:href="fnhum-11-00174-g0004.tif"/>
</fig>
<p>Results from the conjunction analyses demonstrated overlapping Auditory and Visual speech effects (A&#x02229;V) in STS/STG locations that were anterior with respect to Visual speech and nonspeech-Gesture effects (V&#x02229;G) in both the left (LH) and right (RH) hemispheres (Figure <xref ref-type="fig" rid="F4">4A</xref>). MNI coordinates for the STS peak conjunction effects were: A&#x02229;V LH = &#x02212;61, &#x02212;42, 6; A&#x02229;V RH = 59, &#x02212;32, 2; V&#x02229;G LH = &#x02212;49, &#x02212;52, 10; V&#x02229;G RH = 57, &#x02212;44, 10. Significant conjunction effects for both A&#x02229;V and V&#x02229;G were observed in the left inferior frontal sulcus and bilateral middle frontal gyrus. Effects specific to A&#x02229;V were present in the left temporoparietal junction, while effects specific to V&#x02229;G were seen in bilateral visual cortices including hMT, right inferior frontal sulcus, and bilateral precentral sulcus/gyrus. Visual activations in which speech was preferred (V &#x0003E; G) exhibited considerable overlap with A&#x02229;V but not with V&#x02229;G (Figure <xref ref-type="fig" rid="F4">4B</xref>), suggesting that multisensory-responsive STS activates preferentially to visual speech. Note that some of the V &#x0003E; G activation on the ventral bank of the STS was due to deactivation in the G condition, rather than large activations in the V condition (see Visual Speech in Figure <xref ref-type="fig" rid="F3">3</xref>).</p>
</sec>
<sec id="s3-2-2">
<title>STS Region of Interest Results</title>
<sec id="s3-2-2-1">
<title>GLMM region of interest results</title>
<p>The results of the GLMM analysis revealed significant positive effects of AP (preference for auditory stimuli) in anterior- or mid-STS subregions (Figure <xref ref-type="fig" rid="F5">5</xref>; LH: 2, 4, 5; RH: 1&#x02013;4), while significant positive effects of VP (preference for visual stimuli) were observed primarily in posterior STS subregions (LH: 6&#x02013;7; RH: 5&#x02013;7). Overlapping positive effects of AP and VP were observed in two right hemisphere subregions (RH: 2, 4). Significant positive effects of SP (preference for speech vs. nonspeech) were observed in mid-STS subregions in both hemispheres (LH: 4&#x02013;5, RH: 3&#x02013;4). The spatial distribution of AP, VP, and SP effects were somewhat different across hemispheres. First, positive effects of AP were smaller or nonsignificant in anterior STS subregions in the left hemisphere, while such effects were larger and consistently significant in the right hemisphere. Second, positive effects of VP extended from posterior to anterior STS subregions in the right but not the left hemisphere. Finally, the transition zone from primarily visual to multimodal activation appeared to localize differently within each hemisphere&#x02014;subregion 5 in the left hemisphere and subregion 4 in the right hemisphere. However, the broad pattern&#x02014;namely, a transition from VP to SP/mixed to AP moving posterior to anterior&#x02014;was maintained across hemispheres.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>ROI analyses based on linear mixed effects modeling. (A)</bold> Axial cut-away rendering of the study-specific template in MNI space showing left and right hemispheres overlaid with probabilistically defined STS regions of interest. Individual STS subregions have been color-coded (arbitrary colormap) and numbered 1&#x02013;8 moving anterior to posterior. <bold>(B,C)</bold> Results of the generalized linear mixed model (GLMM) analysis are plotted by subregion number separately for the left <bold>(B)</bold> and right <bold>(C)</bold> hemispheres. Group average fixed effects of the auditory parameter (AP), visual parameter (VP) and speech parameter (SP) are given by the heights of the red, blue and green bars respectively. Effects are shown separately for each STS subregion (1&#x02013;8, horizontal axis). Significant effects (FWER corrected <italic>p</italic> &#x0003C; 0.05) are each marked with an asterisk. Error bars reflect 1 SEM.</p></caption>
<graphic xlink:href="fnhum-11-00174-g0005.tif"/>
</fig>
</sec>
<sec id="s3-2-2-2">
<title>Principal component analysis</title>
<p>We also used a data-driven approach to capture patterns of activation across the STS. Group mean activations in each of our five experimental conditions and across all voxels of the STS were entered into a principal component analysis considering each voxel as a variable and each condition as an observation. The analysis was performed separately for left and right hemisphere STS ROIs, without splitting into subregions. The first two principal components explained 79.83% and 17.09% of the variance in the left STS, respectively, and 81.96% and 15.80% of the variance in the right STS, respectively. In Figure <xref ref-type="fig" rid="F6">6</xref>, we list the condition scores and plot the voxel coefficients for each principal component. In both hemispheres, the first principal component (PC1) primarily described activation differences between unimodal auditory (A, R) and unimodal visual (V, G) conditions. As such, large positive condition scores were observed for V and G, while large negative condition scores were observed for A and R. Therefore, voxels that loaded positively on PC1 were &#x0201C;visual-preferring&#x0201D; while voxels that loaded negatively on PC1 were &#x0201C;auditory-preferring.&#x0201D; As can be seen in Figure <xref ref-type="fig" rid="F6">6</xref> (top), voxel coefficients transitioned from positive (visual-preferring) in the posterior STS to negative (auditory-preferring) in the anterior STS in both hemispheres, with the positive-negative boundary closely aligned to the posterior-most extent of the Sylvian fissure. This pattern was especially clear in the left hemisphere, whereas visual-preferring voxels in the right hemisphere extended more anteriorly and along the ventral bank of the anterior STS. In both hemispheres, the largest negative coefficients were located on the dorsal bank of the mid-anterior STS, and the largest positive coefficients were located on the ventral bank of the posterior STS.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>Principal component maps.</bold> Condition scores and voxel coefficients for the first two principal components are shown. Principal component analyses were performed separately for left and right STS. Voxel coefficients are displayed on inflated surface renderings of the study-specific template. Color maps indicate the sign and magnitude for voxel coefficients. Condition scores are displayed as bar plots beneath the relevant brain image, with conditions color-coded as in Figures <xref ref-type="fig" rid="F2">2</xref>, <xref ref-type="fig" rid="F3">3</xref>, <xref ref-type="fig" rid="F5">5</xref>. Voxels with large positive (negative) coefficients activated preferentially to conditions with positive (negative) scores. For example, positive voxels for the first principal component responded maximally during the G and V conditions, while negative voxels responded most to R and A.</p></caption>
<graphic xlink:href="fnhum-11-00174-g0006.tif"/>
</fig>
<p>The second principal component (PC2) was essentially a &#x0201C;multisensory speech&#x0201D; component. In both hemispheres, the condition scores for PC2 were large and positive for AV, followed in order by V, A, R and finally G, which had a large negative condition score. As such, voxels that loaded positively on PC2 preferred multisensory speech, while voxels that loaded negatively on PC2 preferred unisensory (primarily visual) nonspeech. As can be seen in Figure <xref ref-type="fig" rid="F6">6</xref> (bottom), large positive voxel coefficients were observed primarily on the dorsal bank of the middle and mid-posterior STS, while negative voxel coefficients were observed mostly in the posterior, visual regions of the STS.</p>
<p>To further emphasize the transition in voxel activation patterns moving from posterior STS regions to more anterior STS regions, we generated a series of principal component biplots (Figure <xref ref-type="fig" rid="F7">7</xref>). The biplot is a two-dimensional characterization of voxel activation patterns along the first two principal dimensions (PC1 and PC2). On each biplot, scaled condition scores (orange circles) and voxel coeffcients (blue vectors) are plotted together in the same space. The biplot can be interpreted as follows. Conditions that evoked similar patterns of activation across STS voxels have similar scores, and thus the orange circles corresponding to those conditions will be physically closer to each other on the biplot. A single blue vector represents each voxel and the voxel&#x02019;s condition preference is given by the direction and magnitude of the vector; that is, the vector will point toward the preferred condition(s) and the length of the vector describes the strength of that preference. We show separate biplots for each STS subregion in the left (Figure <xref ref-type="fig" rid="F7">7</xref>, top) and right (Figure <xref ref-type="fig" rid="F7">7</xref>, bottom) hemispheres. In the series of biplots for each hemisphere, we observe a gradual transition from visually-preferring voxels in posterior subregions (6&#x02013;8) which point toward (i.e., prefer) visual conditions (AV, V, G), to multisensory voxels in mid-STS subregions (4&#x02013;5) which primarily point toward AV, to auditory-preferring voxels in anterior STS subregions (1&#x02013;3) which point toward auditory conditions (AV, A, R). Minor differences exist between the hemispheres but the overall pattern is clearly maintained.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>Condition preferences based on mean activity changes.</bold> Series of PCA biplots spanning all eight STS subregions are displayed for the right (top) and left (bottom) hemispheres. Each PCA biplot shows voxel coefficients as blue vectors, with orange circles representing the scaled principal component scores for each experimental condition. Conditions are labeled on the left-most plots for each hemisphere and these labels apply to the neighboring plots located to the right. On each plot, the first principal dimension is represented along the abscissa and the second principal dimension along the ordinate. The range of the axes (labeled on the bottom left plot) is identical for all 16 plots. Voxel coefficient vectors point toward the condition(s) preferred in terms of mean activity; shorter vectors correspond to voxels that did not exhibit a strong preference. These results clearly demonstrate a functional-anatomic gradient of activation preferences transitioning from visual (subregions 6&#x02013;8) to audiovisual (subregions 4&#x02013;5) to auditory (subregions 1&#x02013;3) moving posterior to anterior.</p></caption>
<graphic xlink:href="fnhum-11-00174-g0007.tif"/>
</fig>
</sec>
<sec id="s3-2-2-3">
<title><italic>Post hoc</italic> analysis of visual STS subregions</title>
<p>In the GLMM analysis (see &#x0201C;GLMM Region of Interest Results&#x0201D; Section) we found that a number of posterior STS subregions were significantly activated by conditions containing visual (facial) motion but not by auditory conditions. In a <italic>post hoc</italic> analysis, we tested whether activation in these posterior subregions differed across our three facial motion conditions (V, AV, G). Although a significant visual effect in the GLMM indicates that (relatively) increased activation was present in all three conditions, differential activation may have emerged for the following reasons: differences in total motion energy in G vs. V, AV (see &#x0201C;Motion Energy in Visual Speech vs. Nonspeech Facial Gestures&#x0201D; Section); effects of bimodal stimulation, i.e., AV vs. V, G; or visual-speech specificity, i.e., V, AV vs. G. In the left hemisphere, significant activation to facial motion had been observed in subregions 6 and 7 (Figure <xref ref-type="fig" rid="F5">5</xref>, top). A significant difference in activation between the facial motion conditions was observed in subregion 7 (<italic>F</italic><sub>(2,34)</sub> = 9.72, <italic>p</italic> &#x0003C; 0.01) but not subregion 6 (<italic>F</italic><sub>(2,34)</sub> = 0.33, <italic>p</italic> = 0.63). In the right hemisphere, significant activation to facial motion had been observed in subregions 5&#x02013;7 (Figure <xref ref-type="fig" rid="F5">5</xref>, bottom). Significant differences in activation between the facial motion conditions were observed in subregion 7 (<italic>F</italic><sub>(2,34)</sub> = 15.27, <italic>p</italic> &#x0003C; 0.001), and subregion 6 (<italic>F</italic><sub>(2,34)</sub> = 7.56, <italic>p</italic> &#x0003C; 0.01), but not subregion 5 (<italic>F</italic><sub>(2,34)</sub> = 1.01, <italic>p</italic> = 0.36). In both hemispheres, subregions exhibiting significant differences in activation between the facial motion conditions showed a consistent pattern: G &#x0003E; V &#x0003E; AV. This pattern matched the pattern of activation observed in the visual motion area hMT, defined here using a term-based meta-analysis for &#x0201C;visual motion&#x0201D; in NeuroSynth. Thus, the more posterior visual STS subregions were sensitive to motion energy and were (partially) inhibited by multisensory stimulation, mirroring hMT, while the anterior-most visual STS subregions did not respond differentially across the facial motion conditions.</p>
<p>For these anterior-most visual STS subregions (left subregion 6, right subregion 5), we conducted a MVPA analysis using a SVM to determine whether facial motion conditions could be distinguished in terms of the pattern of activation across voxels, whereas no differences were observed in terms of average activation magnitude. We conducted two pairwise classifications: V vs. G, which tested for representational differences between two different classes of facial motion, and V vs. AV, which tested for representational differences between unisensory and multisensory versions of the same facial motion stimulus. In fact, V and G were successfully discriminated on the basis of activation patterns in the left (d&#x02032; = 1.26, <italic>t</italic><sub>(17)</sub> = 4.16, <italic>p</italic> &#x0003C; 0.01) and right (d&#x02032; = 0.99, <italic>t</italic><sub>(17)</sub> = 5.00, <italic>p</italic> &#x0003C; 0.01) hemispheres, while V and AV were not successfully discriminated in either hemisphere (both <italic>p</italic> &#x0003E; 0.05). Thus, the anterior-most visual STS subregions did not respond differentially to V and G in terms of overall magnitude, despite differences in total motion energy, but did distinguish these conditions in their patterns of activation across voxels. However, patterns of activation were indistinguishable for V and AV, which contained identical facial motion information.</p>
</sec>
</sec>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>In the present fMRI study, we set out to answer two questions concerning the organization of multisensory speech streams in the STS: (1) Does activation follow a posterior-to-anterior gradient from facial motion processing regions, to multisensory speech regions, to auditory regions? And if so (2) where along this gradient do speech-specific representations emerge in the STS; in particular, do posterior-visual regions of the STS play a role in speech processing? To answer these questions we presented participants with a variety of speech and nonspeech conditions: auditory speech, visual speech, audiovisual speech, spectrally rotated speech and nonspeech facial gestures. Briefly, we confirmed within a single group of participants that activation in the STS does follow a posterior-to-anterior gradient (visual &#x02192; multisensory &#x02192; auditory) as observed across studies in Figure <xref ref-type="fig" rid="F1">1</xref>. We found that speech-specific representations emerged in multisensory regions of the middle STS (mSTS), but we also found that speech could be distinguished from nonspeech in multivariate patterns of activation within posterior visual regions immediately bordering the multisensory regions.</p>
<sec id="s4-1">
<title>A Posterior-to-Anterior Functional Gradient in the STS</title>
<p>Different analysis methods converged to reveal a posterior-to-anterior functional organization of the STS. First, in a whole-brain analysis, we performed the conjunctions V&#x02229;G and A&#x02229;V, respectively. The logic here was that V&#x02229;G should identify voxels that responded to two types (speech and nonspeech) of visual stimuli (i.e., biological/facial motion regions), while A&#x02229;V should identify voxels that responded to speech across multiple input modalities (i.e., audiovisual speech regions). Both conjunctions reliably identified voxels in the STS bilaterally. Crucially, we observed that activations to A&#x02229;V were located anterior to, and were largely non-overlapping with, activations to V&#x02229;G (Figure <xref ref-type="fig" rid="F4">4A</xref>), providing support for a posterior-anterior functional organization within the STS.</p>
<p>Second, in an anatomical ROI analysis, we used a parameterization approach to examine patterns of activation across eight STS subregions divided evenly along the anterior-posterior axis (see &#x0201C;STS Region of Interest Analysis&#x0201D; Section). An AP coded for increased activation in conditions containing an auditory signal (A, AV and R), while a VP coded for increased activation in conditions containing facial motion (V, AV and G). In accordance with the whole-brain results, we observed significant VP activation in posterior STS subregions bilaterally and significant AP activation in mid- or anterior-STS subregions bilaterally (Figure <xref ref-type="fig" rid="F5">5</xref>). A clear transition from VP to AP occurred in the mid-posterior STS in both hemispheres.</p>
<p>Third, using a data-driven approach, a principal component analysis revealed that the posterior-anterior distinction between visual and auditory activation explained &#x0007E;80% of the variance in activation patterns across STS voxels. The PCA was performed on mean activation across participants in each of the five experimental conditions, treating voxels as variables and conditions as observations. The first principal component (PC1), which distinguished maximally between A/R on the one hand (large negative condition scores), and V/G on the other hand (large positive condition scores), loaded positively (i.e., visual activation) on posterior STS voxels and loaded negatively (i.e., auditory activation) on anterior STS voxels. A clear transition from positive- to negative-loading voxels was observed in the mid-posterior STS in both hemispheres (Figure <xref ref-type="fig" rid="F6">6</xref>, PC1). Visual activations extended slightly more anterior in the right STS, which was also observed in the parameterization analysis. The second principal component (PC2), which distinguished maximally between AV (large positive condition score) and G (large negative condition score), loaded positively (i.e., multisensory-speech activation) on voxels in the middle and mid-posterior STS where visual activation transitioned to auditory activation (Figure <xref ref-type="fig" rid="F6">6</xref>, PC2). A PCA biplot analysis (Figure <xref ref-type="fig" rid="F7">7</xref>) demonstrated that activation transitioned gradually from visual, to multisensory, to auditory moving posterior to anterior.</p>
<p>Previous studies have demonstrated that visual speech and nonspeech facial gestures tend to co-activate only the posterior regions of the STS (Campbell et al., <xref ref-type="bibr" rid="B23">2001</xref>; Bernstein et al., <xref ref-type="bibr" rid="B14">2011</xref>), and it has also been demonstrated that visual speech activations diminish moving from posterior to anterior in the STS, while auditory speech activations remain elevated (Wright et al., <xref ref-type="bibr" rid="B97">2003</xref>). Moreover, face-specific functional connectivity has been observed between face-selective regions of the fusiform gyrus and the pSTS (Zhang et al., <xref ref-type="bibr" rid="B99">2009</xref>; Turk-Browne et al., <xref ref-type="bibr" rid="B92">2010</xref>), and both task-based and meta-analytic functional connectivity analyses show coupling between pSTS and V5/MT (Lahnakoski et al., <xref ref-type="bibr" rid="B47">2012</xref>; Erickson et al., <xref ref-type="bibr" rid="B31">2017</xref>). Haxby et al. (<xref ref-type="bibr" rid="B39">2000</xref>) suggest that the pSTS is involved in processing changeable or dynamic aspects of faces (Said et al., <xref ref-type="bibr" rid="B75">2010</xref>). Recent evidence suggests that more anterior STS regions, specifically the mSTS, are crucial for auditory speech processing (Specht et al., <xref ref-type="bibr" rid="B84">2009</xref>; Liebenthal et al., <xref ref-type="bibr" rid="B52">2010</xref>; Bernstein and Liebenthal, <xref ref-type="bibr" rid="B13">2014</xref>).</p>
<p>In accordance with these and the present findings, a recent study by Deen et al. (<xref ref-type="bibr" rid="B27">2015</xref>) revealed a posterior-to-anterior functional-anatomic organization of the STS using a range of socially relevant stimuli/tasks. Specifically, Deen et al. (<xref ref-type="bibr" rid="B27">2015</xref>) found a reliable posterior-to-anterior ordering of task-related response preferences: the posterior-most STS was activated by a theory of mind task, followed by activations to biological motion in the pSTS, activation to dynamic faces in the pSTS, activation to voices in the mSTS, and activation to a language task in the anterior STS. However, unlike in the present study, they found significant overlap between activations within the pSTS that were related to biological-motion, faces, and voices. This may owe to the use of non-speech human vocalizations such as coughing and laughter in their study (see, Stevenson and James, <xref ref-type="bibr" rid="B86">2009</xref>). Alternatively, overlapping face-voice activations may have occurred in what we observe presently as a multisensory &#x0201C;transition zone&#x0201D; in the mid-posterior STS.</p>
<p>The present study has mapped the organization of the STS for multisensory speech processing in more detail than these previous studies. Overall, the results indicate the presence of a posterior-to-anterior functional gradient in the STS moving from facial motion processing, to multisensory processing, to auditory processing.</p>
</sec>
<sec id="s4-2">
<title>Speech-Specific Activations in Middle STS</title>
<p>Having established the existence of a posterior-to-anterior processing gradient in the STS, we wanted to ascertain the locations at which speech-specific activation was present. Before proceeding, we should note that an original goal of this study was to examine speech-specific activation within the auditory and visual modalities separately, namely by using the conditions R and G as within-modality nonspeech controls (see &#x0201C;Group Analysis&#x0201D; Section). However, no voxels in the brain showed greater activation for A than R, and the pattern of activation across STS voxels was extremely similar for A and R (Figure <xref ref-type="fig" rid="F7">7</xref>). We believe that spectral rotation may have failed to completely remove phonetic information from the speech stimuli (Liebenthal et al., <xref ref-type="bibr" rid="B51">2005</xref>). The result was that &#x0201C;speech-specific&#x0201D; activations were driven primarily by differences between the speech conditions (A, V, AV) and nonspeech facial motion (G).</p>
<p>With that said, our results consistently showed that speech-specific activations emerged in multisensory regions of the middle and mid-posterior STS. In our whole-brain analysis, activation for V &#x0003E; G overlapped strongly with A&#x02229;V, but not with V&#x02229;G (Figure <xref ref-type="fig" rid="F4">4B</xref>). In our parameterization analysis, the SP, which coded for increased activation in conditions containing speech (A, V, AV), was significant in bilateral STS subregions in the middle and mid-posterior STS where activation transitioned from visual to auditory along the anterior-posterior axis (Figure <xref ref-type="fig" rid="F5">5</xref>). Our principal component analysis revealed strong activation preferences for multisensory speech (AV) in this &#x0201C;transition zone&#x0201D; in both hemispheres (Figure <xref ref-type="fig" rid="F7">7</xref>). Moreover, PC2, which maximally distinguished between multisensory speech (AV) and visual nonspeech (G), loaded most strongly on mSTS voxels in both hemispheres.</p>
<p>A recent fMRI study (Bernstein et al., <xref ref-type="bibr" rid="B14">2011</xref>) employed a rather comprehensive set of visual speech and nonspeech stimuli (but no auditory stimuli), demonstrating that a more anterior region of the left pSTS/pMTG responded preferentially to orofacial visual motion when it was speech-related, while more posterior regions of pSTS responded to orofacial motion whether or not it was speech-related. The authors dubbed the anterior speech-related area the &#x0201C;temporal visual speech area&#x0201D; (TVSA). Bernstein et al. (<xref ref-type="bibr" rid="B14">2011</xref>) and Bernstein and Liebenthal (<xref ref-type="bibr" rid="B13">2014</xref>) have described the TVSA as a high-level, modal visual area. However, our study shows that visual-speech-specific activations also occur in the multisensory STS. Bernstein and Liebenthal (<xref ref-type="bibr" rid="B13">2014</xref>) have suggested that the TVSA feeds directly into speech-related regions of multisensory STS (Stevenson and James, <xref ref-type="bibr" rid="B86">2009</xref>). We are aware of no studies that have established unequivocally the level at which auditory and visual speech signals interact in multisensory STS&#x02014;specifically, whether multisensory speech signals interact at the phonological level, or if, as others have suggested (Calvert et al., <xref ref-type="bibr" rid="B21">1999</xref>; Skipper et al., <xref ref-type="bibr" rid="B81">2007</xref>; Arnal et al., <xref ref-type="bibr" rid="B3">2009</xref>), the outcome of multisensory integration merely informs phonological mechanisms in other brain regions. Bernstein et al. (<xref ref-type="bibr" rid="B14">2011</xref>) and Bernstein and Liebenthal (<xref ref-type="bibr" rid="B13">2014</xref>) suggest that speech sounds are categorized downstream in more anterior auditory regions of the STS. In the present study, we did not observe speech-specific activations in anterior subregions of the STS (Figure <xref ref-type="fig" rid="F5">5</xref>), though it should be noted that our task did not require explicit categorization or discrimination of speech sounds. Moreover, we employed sublexical stimuli whereas other studies indicate that speech-related activations in the anterior STS are most prominent in response to word- or sentence-level stimuli (Scott et al., <xref ref-type="bibr" rid="B76">2000</xref>, <xref ref-type="bibr" rid="B77">2006</xref>; Davis and Johnsrude, <xref ref-type="bibr" rid="B26">2003</xref>; Specht and Reul, <xref ref-type="bibr" rid="B83">2003</xref>; Leff et al., <xref ref-type="bibr" rid="B48">2008</xref>; DeWitt and Rauschecker, <xref ref-type="bibr" rid="B28">2012</xref>). Some of our own work suggests that anterior speech-related activations may reflect higher-level combinatorial processing or extraction of prosody rather than analysis of speech sounds <italic>per se</italic> (Humphries et al., <xref ref-type="bibr" rid="B45">2001</xref>, <xref ref-type="bibr" rid="B44">2005</xref>; Rogalsky and Hickok, <xref ref-type="bibr" rid="B74">2009</xref>; Okada et al., <xref ref-type="bibr" rid="B64">2010</xref>).</p>
</sec>
<sec id="s4-3">
<title>Role of Visual STS Subregions in Speech Perception</title>
<p>We were particularly interested in ascertaining the role, if any, of posterior visual STS subregions in the perception of visual speech. Presently, visual STS subregions are defined as those that showed increased activation to conditions containing facial motion (V, AV, G) relative to auditory-only conditions (A, R), i.e., significant positive effects of VP but not AP in our GLMM analysis (see &#x0201C;GLMM Region of Interest Results&#x0201D; Section). For each of these subregions, we tested for differences in activation across the facial motion conditions. Speech-specific activations (i.e., V, AV vs. G) were not expected given the results of our whole-brain analysis (see &#x0201C;Whole-Brain Results&#x0201D; Section), which demonstrated that visual-speech-specific activation (V &#x0003E; G) was located farther anterior in the STS. Nonetheless, the possibility remained that a more fine-grained analysis of posterior STS subregions would reveal such effects. We were also interested in testing whether visual subregions would be sensitive to differences in total motion energy across conditions (G &#x0003E; [V, AV]). Specifically, we took advantage of the fact that the nonspeech facial gestures in our G stimuli produced more total motion energy than the speech gestures in our V/AV stimuli (see &#x0201C;Motion Energy in Visual Speech vs. Nonspeech Facial Gestures&#x0201D; Section). We wanted to know whether activation in posterior-visual subregions of the STS would increase with total motion energy, as would be expected for canonical visual motion regions, or if activation would be relatively insensitive to low-level motion kinematics. Recent studies suggest that, indeed, activation in the pSTS may be relatively insensitive to motion kinematics, image size, or viewpoint (Lestou et al., <xref ref-type="bibr" rid="B49">2008</xref>; Grossman et al., <xref ref-type="bibr" rid="B38">2010</xref>), and some investigators have suggested that the pSTS codes high-level aspects of biological motion such as action goals or intentions (Pelphrey et al., <xref ref-type="bibr" rid="B66">2004</xref>; Vander Wyk et al., <xref ref-type="bibr" rid="B93">2009</xref>). Therefore, we were most interested in determining which, if any, of the subregions <italic>did not</italic> demonstrate differential activation on the basis of total motion energy (i.e., G &#x0003E; V, AV), and, for these subregions, whether the pattern of activation across voxels would discriminate among the facial motion conditions (V vs. G; V vs. AV).</p>
<p>In fact, we found a significant effect of facial motion condition in several of the posterior visual STS subregions (see &#x0201C;<italic>Post hoc</italic> Analysis of Visual STS Subregions&#x0201D; Section). Namely, activation followed the pattern G &#x0003E; V &#x0003E; AV, which was the same pattern exhibited by a canonical visual motion area, hMT. However, no significant effect of condition was observed for the visual subregion immediately bordering the mSTS &#x0201C;transition zone&#x0201D; in which activation preferences changed from visual to auditory/multisensory (left hemisphere Subregion 6, right hemisphere Subregion 5; Figure <xref ref-type="fig" rid="F5">5</xref>). In these anterior-most visual pSTS subregions, activation was nearly identical for V, AV and G (<italic>p</italic> &#x0003E; 0.3). Crucially, while these subregions did not distinguish between facial motion conditions in terms of univariate activation, they did distinguish between speech and nonspeech facial motion (V vs. G) in terms of the multivariate pattern of activation across voxels (i.e., using MVPA). Activation patterns were, however, not influenced by presentation modality (V vs. AV), and therefore information coded in the multivariate patterns reflects the class of visual motion stimulus (speech vs. nonspeech). To summarize, these particular visual pSTS subregions: (a) can be distinguished from neighboring visual STS subregions located immediately posterior because they do not show sensitivity to total motion energy; (b) can be distinguished from neighboring multisensory STS subregions located immediately anterior because they do not activate to auditory-only stimuli and do not activate preferentially to visual speech vs. nonspeech gestures; and (c) nonetheless distinguish between visual speech and nonspeech on the basis of multivariate patterns of activation. We therefore conclude that these posterior visual STS subregions immediately bordering the mSTS &#x0201C;transition zone&#x0201D; code for high-level aspects of facial actions, and that speech actions can be distinguished from nonspeech actions on the basis of population-level representations of these high-level features (see also Said et al., <xref ref-type="bibr" rid="B75">2010</xref>).</p>
</sec>
<sec id="s4-4">
<title>Hemispheric Differences</title>
<p>In terms of hemispheric differences, perhaps the most striking pattern observed in the present data is the broad similarity in STS activation preferences across hemispheres. This is most clearly observable in the PCA (Figures <xref ref-type="fig" rid="F6">6</xref>, <xref ref-type="fig" rid="F7">7</xref>), which demonstrates very similar patterns of condition scores and per-STS-subregion voxel preferences across hemispheres. However, some subtle differences in hemispheric organization were observed. First, the extent of visual-speech-specific activation (V &#x0003E; G) was greater in the left STS (Figure <xref ref-type="fig" rid="F5">5B</xref>). This was supported by the results of the GLMM analysis (Figure <xref ref-type="fig" rid="F5">5</xref>) which demonstrated that the response to VP (which includes nonspeech condition G) was generally larger in mid- and anterior-STS subregions of the right vs. the left hemisphere. The same pattern was revealed in the coefficient maps of PC1 in the PCA (Figure <xref ref-type="fig" rid="F6">6</xref>, top). Thus, overall, it seems the strength of speech-specific activations in the right hemisphere was lower than in the left hemisphere. This concurs with previous imaging studies investigating effects of intelligibility with visual or audiovisual speech (Callan et al., <xref ref-type="bibr" rid="B19">2003</xref>, <xref ref-type="bibr" rid="B20">2004</xref>; Sekiyama et al., <xref ref-type="bibr" rid="B78">2003</xref>; Okada and Hickok, <xref ref-type="bibr" rid="B63">2009</xref>), and may be generally related to the idea of a &#x0201C;hemispheric lateralization gradient&#x0201D; (Peelle, <xref ref-type="bibr" rid="B65">2012</xref>; Specht, <xref ref-type="bibr" rid="B82">2013</xref>) in which stronger patterns of left-hemisphere lateralization emerge at higher levels of analysis in speech processing (e.g., auditory vs. phonological vs. lexical-semantic or syntactic). The speech-specific activations observed presently could reflect sublexical phonological analysis which, according to lateralization theories, would predict an intermediate level of left hemisphere lateralization. Second, the location of the multisensory &#x0201C;transition zone&#x0201D; was slightly different across hemispheres (left subregion 6/5, right subregion 5/4). We believe this merely reflects differential alignment of our anatomically-defined STS ROIs across hemispheres; the functional pattern is nearly identical. Third, the GLMM analysis revealed stronger auditory activation (AP) in anterior STS subregions of the right vs. the left hemisphere (Figure <xref ref-type="fig" rid="F5">5</xref>). While we can only speculate as to the reason for this, one possibility is that anterior regions of the right STS perform a more general acoustical (perhaps prosodic) analysis of speech-like signals, while anterior regions of the left hemisphere perform higher-level linguistic analyses. This notion is in line with theories of anterior STS function discussed above (see &#x0201C;Speech-Specific Activations in Middle STS&#x0201D; Section) and with lateralization theories discussed here.</p>
</sec>
<sec sec-type="conclusion" id="s4-5">
<title>Conclusion</title>
<p>In the present fMRI experiment, we measured activation to a range of auditory and visual speech (A, V, AV) and nonspeech (R, G) stimuli, focusing particularly on the pattern of activation in the STS. The results demonstrated the following: (1) activation in the STS follows a posterior-to-anterior functional gradient from facial motion processing, to multisensory processing, to auditory processing; (2) speech-specific activations arise in multisensory regions of the middle STS; (3) abstract representations of visible facial gestures emerge in visual regions of the pSTS that immediately border the multisensory regions. We therefore suggest a functional-anatomic workflow for speech processing in the STS&#x02014;namely, lower-level aspects of facial motion are processed in the posterior-most visual STS subregions; high-level/abstract aspects of facial motion are extracted in the pSTS immediately bordering mSTS; visual and auditory speech representations are integrated in mSTS; and integrated percepts feed into speech processing streams (Hickok and Poeppel, <xref ref-type="bibr" rid="B42">2007</xref>; Rauschecker and Scott, <xref ref-type="bibr" rid="B71">2009</xref>), potentially including auditory-phonological systems for speech sound categorization in more-anterior regions of the STS (Specht et al., <xref ref-type="bibr" rid="B84">2009</xref>; Liebenthal et al., <xref ref-type="bibr" rid="B52">2010</xref>; Bernstein and Liebenthal, <xref ref-type="bibr" rid="B13">2014</xref>).</p>
</sec>
</sec>
<sec id="s5">
<title>Author Contributions</title>
<p>JHV and GH conceptualized and designed the research. JHV, DM and KS prepared the stimuli. JHV and DM collected the data. JHV, KIV and FR analyzed the data. JHV wrote the manuscript. All authors interpreted the data, provided critical feedback and revised the manuscript.</p>
</sec>
<sec id="s6">
<title>Funding</title>
<p>During this investigation, JHV was supported by the National Institute on Deafness and Other Communication Disorders (National Institutes of Health, NIH) Award DC010775 from the University of California, Irvine, CA, USA. The investigation was supported by the National Institute on Deafness and Other Communication Disorders Award DC03681 to GH.</p>
</sec>
<sec id="s7">
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack>
<p>Other than editing the final manuscript, this work was completed while JHV was a graduate student and postdoctoral fellow at the University of California, Irvine, CA, USA.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Allison</surname> <given-names>T.</given-names></name> <name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>Social perception from visual cues: role of the STS region</article-title>. <source>Trends Cogn. Sci.</source> <volume>4</volume>, <fpage>267</fpage>&#x02013;<lpage>278</lpage>. <pub-id pub-id-type="doi">10.1016/s1364-6613(00)01501-1</pub-id><pub-id pub-id-type="pmid">10859571</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amedi</surname> <given-names>A.</given-names></name> <name><surname>von Kriegstein</surname> <given-names>K.</given-names></name> <name><surname>van Atteveldt</surname> <given-names>N.</given-names></name> <name><surname>Beauchamp</surname> <given-names>M.</given-names></name> <name><surname>Naumer</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Functional imaging of human crossmodal identification and object recognition</article-title>. <source>Exp. Brain Res.</source> <volume>166</volume>, <fpage>559</fpage>&#x02013;<lpage>571</lpage>. <pub-id pub-id-type="doi">10.1007/s00221-005-2396-5</pub-id><pub-id pub-id-type="pmid">16028028</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arnal</surname> <given-names>L. H.</given-names></name> <name><surname>Morillon</surname> <given-names>B.</given-names></name> <name><surname>Kell</surname> <given-names>C. A.</given-names></name> <name><surname>Giraud</surname> <given-names>A. L.</given-names></name></person-group> (<year>2009</year>). <article-title>Dual neural routing of visual facilitation in speech processing</article-title>. <source>J. Neurosci.</source> <volume>29</volume>, <fpage>13445</fpage>&#x02013;<lpage>13453</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3194-09.2009</pub-id><pub-id pub-id-type="pmid">19864557</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Avants</surname> <given-names>B.</given-names></name> <name><surname>Duda</surname> <given-names>J. T.</given-names></name> <name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Pluta</surname> <given-names>J.</given-names></name> <name><surname>Gee</surname> <given-names>J. C.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Multivariate analysis of structural and diffusion imaging in traumatic brain injury</article-title>. <source>Acad. Radiol.</source> <volume>15</volume>, <fpage>1360</fpage>&#x02013;<lpage>1375</lpage>. <pub-id pub-id-type="doi">10.1016/j.acra.2008.07.007</pub-id><pub-id pub-id-type="pmid">18995188</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Avants</surname> <given-names>B.</given-names></name> <name><surname>Gee</surname> <given-names>J. C.</given-names></name></person-group> (<year>2004</year>). <article-title>Geodesic estimation for large deformation anatomical shape averaging and interpolation</article-title>. <source>Neuroimage</source> <volume>23</volume>, <fpage>S139</fpage>&#x02013;<lpage>S150</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2004.07.010</pub-id><pub-id pub-id-type="pmid">15501083</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Avants</surname> <given-names>B. B.</given-names></name> <name><surname>Tustison</surname> <given-names>N. J.</given-names></name> <name><surname>Song</surname> <given-names>G.</given-names></name> <name><surname>Cook</surname> <given-names>P. A.</given-names></name> <name><surname>Klein</surname> <given-names>A.</given-names></name> <name><surname>Gee</surname> <given-names>J. C.</given-names></name></person-group> (<year>2011</year>). <article-title>A reproducible evaluation of ANTs similarity metric performance in brain image registration</article-title>. <source>Neuroimage</source> <volume>54</volume>, <fpage>2033</fpage>&#x02013;<lpage>2044</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.09.025</pub-id><pub-id pub-id-type="pmid">20851191</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Argall</surname> <given-names>B. D.</given-names></name> <name><surname>Bodurka</surname> <given-names>J.</given-names></name> <name><surname>Duyn</surname> <given-names>J. H.</given-names></name> <name><surname>Martin</surname> <given-names>A.</given-names></name></person-group> (<year>2004a</year>). <article-title>Unraveling multisensory integration: patchy organization within human STS multisensory cortex</article-title>. <source>Nat. Neurosci.</source> <volume>7</volume>, <fpage>1190</fpage>&#x02013;<lpage>1192</lpage>. <pub-id pub-id-type="doi">10.1038/nn1333</pub-id><pub-id pub-id-type="pmid">15475952</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Lee</surname> <given-names>K. E.</given-names></name> <name><surname>Argall</surname> <given-names>B. D.</given-names></name> <name><surname>Martin</surname> <given-names>A.</given-names></name></person-group> (<year>2004b</year>). <article-title>Integration of auditory and visual information about objects in superior temporal sulcus</article-title>. <source>Neuron</source> <volume>41</volume>, <fpage>809</fpage>&#x02013;<lpage>823</lpage>. <pub-id pub-id-type="doi">10.1016/s0896-6273(04)00070-4</pub-id><pub-id pub-id-type="pmid">15003179</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Lee</surname> <given-names>K. E.</given-names></name> <name><surname>Haxby</surname> <given-names>J. V.</given-names></name> <name><surname>Martin</surname> <given-names>A.</given-names></name></person-group> (<year>2002</year>). <article-title>Parallel visual motion processing streams for manipulable objects and human movements</article-title>. <source>Neuron</source> <volume>34</volume>, <fpage>149</fpage>&#x02013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1016/s0896-6273(02)00642-6</pub-id><pub-id pub-id-type="pmid">11931749</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Lee</surname> <given-names>K. E.</given-names></name> <name><surname>Haxby</surname> <given-names>J. V.</given-names></name> <name><surname>Martin</surname> <given-names>A.</given-names></name></person-group> (<year>2003</year>). <article-title>fMRI responses to video and point-light displays of moving humans and manipulable objects</article-title>. <source>J. Cogn. Neurosci.</source> <volume>15</volume>, <fpage>991</fpage>&#x02013;<lpage>1001</lpage>. <pub-id pub-id-type="doi">10.1162/089892903770007380</pub-id><pub-id pub-id-type="pmid">14614810</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name> <name><surname>Nath</surname> <given-names>A. R.</given-names></name> <name><surname>Pasalar</surname> <given-names>S.</given-names></name></person-group> (<year>2010</year>). <article-title>fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect</article-title>. <source>J. Neurosci.</source> <volume>30</volume>, <fpage>2414</fpage>&#x02013;<lpage>2417</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.4865-09.2010</pub-id><pub-id pub-id-type="pmid">20164324</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benevento</surname> <given-names>L. A.</given-names></name> <name><surname>Fallon</surname> <given-names>J.</given-names></name> <name><surname>Davis</surname> <given-names>B.</given-names></name> <name><surname>Rezak</surname> <given-names>M.</given-names></name></person-group> (<year>1977</year>). <article-title>Auditory-visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey</article-title>. <source>Exp. Neurol.</source> <volume>57</volume>, <fpage>849</fpage>&#x02013;<lpage>872</lpage>. <pub-id pub-id-type="doi">10.1016/0014-4886(77)90112-1</pub-id><pub-id pub-id-type="pmid">411682</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bernstein</surname> <given-names>L. E.</given-names></name> <name><surname>Jiang</surname> <given-names>J.</given-names></name> <name><surname>Pantazis</surname> <given-names>D.</given-names></name> <name><surname>Lu</surname> <given-names>Z. L.</given-names></name> <name><surname>Joshi</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>Visual phonetic processing localized using speech and nonspeech face gestures in video and point-light displays</article-title>. <source>Hum. Brain Mapp.</source> <volume>32</volume>, <fpage>1660</fpage>&#x02013;<lpage>1676</lpage>. <pub-id pub-id-type="doi">10.1002/hbm.21139</pub-id><pub-id pub-id-type="pmid">20853377</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bernstein</surname> <given-names>L. E.</given-names></name> <name><surname>Liebenthal</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>Neural pathways for visual speech perception</article-title>. <source>Front. Neurosci.</source> <volume>8</volume>:<fpage>386</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2014.00386</pub-id><pub-id pub-id-type="pmid">25520611</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Binder</surname> <given-names>J.</given-names></name> <name><surname>Frost</surname> <given-names>J.</given-names></name> <name><surname>Hammeke</surname> <given-names>T.</given-names></name> <name><surname>Bellgowan</surname> <given-names>P.</given-names></name> <name><surname>Springer</surname> <given-names>J.</given-names></name> <name><surname>Kaufman</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Human temporal lobe activation by speech and nonspeech sounds</article-title>. <source>Cereb. Cortex</source> <volume>10</volume>, <fpage>512</fpage>&#x02013;<lpage>528</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/10.5.512</pub-id><pub-id pub-id-type="pmid">10847601</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Binder</surname> <given-names>J. R.</given-names></name> <name><surname>Swanson</surname> <given-names>S. J.</given-names></name> <name><surname>Hammeke</surname> <given-names>T. A.</given-names></name> <name><surname>Sabsevitz</surname> <given-names>D. S.</given-names></name></person-group> (<year>2008</year>). <article-title>A comparison of five fMRI protocols for mapping speech comprehension systems</article-title>. <source>Epilepsia</source> <volume>49</volume>, <fpage>1980</fpage>&#x02013;<lpage>1997</lpage>. <pub-id pub-id-type="doi">10.1111/j.1528-1167.2008.01683.x</pub-id><pub-id pub-id-type="pmid">18513352</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blesser</surname> <given-names>B.</given-names></name></person-group> (<year>1972</year>). <article-title>Speech perception under conditions of spectral transformation: I. Phonetic characteristics</article-title>. <source>J. Speech Hear. Res.</source> <volume>15</volume>, <fpage>5</fpage>&#x02013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1044/jshr.1501.05</pub-id><pub-id pub-id-type="pmid">5012812</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bruce</surname> <given-names>C.</given-names></name> <name><surname>Desimone</surname> <given-names>R.</given-names></name> <name><surname>Gross</surname> <given-names>C. G.</given-names></name></person-group> (<year>1981</year>). <article-title>Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque</article-title>. <source>J. Neurophysiol.</source> <volume>46</volume>, <fpage>369</fpage>&#x02013;<lpage>384</lpage>. <pub-id pub-id-type="pmid">6267219</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callan</surname> <given-names>D. E.</given-names></name> <name><surname>Jones</surname> <given-names>J. A.</given-names></name> <name><surname>Munhall</surname> <given-names>K.</given-names></name> <name><surname>Callan</surname> <given-names>A. M.</given-names></name> <name><surname>Kroos</surname> <given-names>C.</given-names></name> <name><surname>Vatikiotis-Bateson</surname> <given-names>E.</given-names></name></person-group> (<year>2003</year>). <article-title>Neural processes underlying perceptual enhancement by visual speech gestures</article-title>. <source>Neuroreport</source> <volume>14</volume>, <fpage>2213</fpage>&#x02013;<lpage>2218</lpage>. <pub-id pub-id-type="doi">10.1097/01.wnr.0000095492.38740.8f</pub-id><pub-id pub-id-type="pmid">14625450</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callan</surname> <given-names>D. E.</given-names></name> <name><surname>Jones</surname> <given-names>J. A.</given-names></name> <name><surname>Munhall</surname> <given-names>K.</given-names></name> <name><surname>Kroos</surname> <given-names>C.</given-names></name> <name><surname>Callan</surname> <given-names>A. M.</given-names></name> <name><surname>Vatikiotis-Bateson</surname> <given-names>E.</given-names></name></person-group> (<year>2004</year>). <article-title>Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information</article-title>. <source>J. Cogn. Neurosci.</source> <volume>16</volume>, <fpage>805</fpage>&#x02013;<lpage>816</lpage>. <pub-id pub-id-type="doi">10.1162/089892904970771</pub-id><pub-id pub-id-type="pmid">15200708</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name> <name><surname>Brammer</surname> <given-names>M. J.</given-names></name> <name><surname>Bullmore</surname> <given-names>E. T.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Iversen</surname> <given-names>S. D.</given-names></name> <name><surname>David</surname> <given-names>A. S.</given-names></name></person-group> (<year>1999</year>). <article-title>Response amplification in sensory-specific cortices during crossmodal binding</article-title>. <source>Neuroreport</source> <volume>10</volume>, <fpage>2619</fpage>&#x02013;<lpage>2623</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-199908200-00033</pub-id><pub-id pub-id-type="pmid">10574380</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvert</surname> <given-names>G. A.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>Brammer</surname> <given-names>M. J.</given-names></name></person-group> (<year>2000</year>). <article-title>Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex</article-title>. <source>Curr. Biol.</source> <volume>10</volume>, <fpage>649</fpage>&#x02013;<lpage>657</lpage>. <pub-id pub-id-type="doi">10.1016/s0960-9822(00)00513-3</pub-id><pub-id pub-id-type="pmid">10837246</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>R.</given-names></name> <name><surname>MacSweeney</surname> <given-names>M.</given-names></name> <name><surname>Surguladze</surname> <given-names>S.</given-names></name> <name><surname>Calvert</surname> <given-names>G.</given-names></name> <name><surname>McGuire</surname> <given-names>P.</given-names></name> <name><surname>Suckling</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2001</year>). <article-title>Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning)</article-title>. <source>Cogn. Brain Res.</source> <volume>12</volume>, <fpage>233</fpage>&#x02013;<lpage>243</lpage>. <pub-id pub-id-type="doi">10.1016/s0926-6410(01)00054-4</pub-id><pub-id pub-id-type="pmid">11587893</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coutanche</surname> <given-names>M. N.</given-names></name></person-group> (<year>2013</year>). <article-title>Distinguishing multi-voxel patterns and mean activation: why, how and what does it tell us?</article-title> <source>Cogn. Affect. Behav. Neurosci.</source> <volume>13</volume>, <fpage>667</fpage>&#x02013;<lpage>673</lpage>. <pub-id pub-id-type="doi">10.3758/s13415-013-0186-2</pub-id><pub-id pub-id-type="pmid">23857415</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dahl</surname> <given-names>C. D.</given-names></name> <name><surname>Logothetis</surname> <given-names>N. K.</given-names></name> <name><surname>Kayser</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Spatial organization of multisensory responses in temporal association cortex</article-title>. <source>J. Neurosci.</source> <volume>29</volume>, <fpage>11924</fpage>&#x02013;<lpage>11932</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3437-09.2009</pub-id><pub-id pub-id-type="pmid">19776278</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davis</surname> <given-names>M. H.</given-names></name> <name><surname>Johnsrude</surname> <given-names>I. S.</given-names></name></person-group> (<year>2003</year>). <article-title>Hierarchical processing in spoken language comprehension</article-title>. <source>J. Neurosci.</source> <volume>23</volume>, <fpage>3423</fpage>&#x02013;<lpage>3431</lpage>. <pub-id pub-id-type="pmid">12716950</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deen</surname> <given-names>B.</given-names></name> <name><surname>Koldewyn</surname> <given-names>K.</given-names></name> <name><surname>Kanwisher</surname> <given-names>N.</given-names></name> <name><surname>Saxe</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Functional organization of social perception and cognition in the superior temporal sulcus</article-title>. <source>Cereb. Cortex</source> <volume>25</volume>, <fpage>4596</fpage>&#x02013;<lpage>4609</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhv111</pub-id><pub-id pub-id-type="pmid">26048954</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeWitt</surname> <given-names>I.</given-names></name> <name><surname>Rauschecker</surname> <given-names>J. P.</given-names></name></person-group> (<year>2012</year>). <article-title>Phoneme and word recognition in the auditory ventral stream</article-title>. <source>Proc. Natl. Acad. Sci. U S A</source> <volume>109</volume>, <fpage>E505</fpage>&#x02013;<lpage>E514</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1113427109</pub-id><pub-id pub-id-type="pmid">22308358</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dodd</surname> <given-names>B.</given-names></name></person-group> (<year>1977</year>). <article-title>The role of vision in the perception of speech</article-title>. <source>Perception</source> <volume>6</volume>, <fpage>31</fpage>&#x02013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1068/p060031</pub-id><pub-id pub-id-type="pmid">840618</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eklund</surname> <given-names>A.</given-names></name> <name><surname>Nichols</surname> <given-names>T. E.</given-names></name> <name><surname>Knutsson</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). <article-title>Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates</article-title>. <source>Proc. Natl. Acad. Sci. U S A</source> <volume>113</volume>, <fpage>7900</fpage>&#x02013;<lpage>7905</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1602413113</pub-id><pub-id pub-id-type="pmid">27357684</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Erickson</surname> <given-names>L. C.</given-names></name> <name><surname>Rauschecker</surname> <given-names>J. P.</given-names></name> <name><surname>Turkeltaub</surname> <given-names>P. E.</given-names></name></person-group> (<year>2017</year>). <article-title>Meta-analytic connectivity modeling of the human superior temporal sulcus</article-title>. <source>Brain Struct. Funct.</source> <volume>222</volume>, <fpage>267</fpage>&#x02013;<lpage>267</lpage>. <pub-id pub-id-type="doi">10.1007/s00429-016-1215-z</pub-id><pub-id pub-id-type="pmid">27003288</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fonov</surname> <given-names>V.</given-names></name> <name><surname>Evans</surname> <given-names>A. C.</given-names></name> <name><surname>Botteron</surname> <given-names>K.</given-names></name> <name><surname>Almli</surname> <given-names>C. R.</given-names></name> <name><surname>McKinstry</surname> <given-names>R. C.</given-names></name> <name><surname>Collins</surname> <given-names>D. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Unbiased average age-appropriate atlases for pediatric studies</article-title>. <source>Neuroimage</source> <volume>54</volume>, <fpage>313</fpage>&#x02013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.07.033</pub-id><pub-id pub-id-type="pmid">20656036</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fonov</surname> <given-names>V.</given-names></name> <name><surname>Evans</surname> <given-names>A.</given-names></name> <name><surname>McKinstry</surname> <given-names>R.</given-names></name> <name><surname>Almli</surname> <given-names>C.</given-names></name> <name><surname>Collins</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Unbiased nonlinear average age-appropriate brain templates from birth to adulthood</article-title>. <source>Neuroimage</source> <volume>47</volume>:<fpage>S102</fpage>. <pub-id pub-id-type="doi">10.1016/s1053-8119(09)70884-5</pub-id></citation></ref>
<ref id="B34"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Green</surname> <given-names>D. M.</given-names></name> <name><surname>Swets</surname> <given-names>J. A.</given-names></name></person-group> (<year>1966</year>). <source>Signal Detection Theory and Psychophysics.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name>.</citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>E. D.</given-names></name> <name><surname>Blake</surname> <given-names>R.</given-names></name></person-group> (<year>2002</year>). <article-title>Brain areas active during visual perception of biological motion</article-title>. <source>Neuron</source> <volume>35</volume>, <fpage>1167</fpage>&#x02013;<lpage>1175</lpage>. <pub-id pub-id-type="doi">10.1016/s0896-6273(02)00897-8</pub-id><pub-id pub-id-type="pmid">12354405</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>E. D.</given-names></name> <name><surname>Battelli</surname> <given-names>L.</given-names></name> <name><surname>Pascual-Leone</surname> <given-names>A.</given-names></name></person-group> (<year>2005</year>). <article-title>Repetitive TMS over posterior STS disrupts perception of biological motion</article-title>. <source>Vision Res.</source> <volume>45</volume>, <fpage>2847</fpage>&#x02013;<lpage>2853</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2005.05.027</pub-id><pub-id pub-id-type="pmid">16039692</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>E.</given-names></name> <name><surname>Donnelly</surname> <given-names>M.</given-names></name> <name><surname>Price</surname> <given-names>R.</given-names></name> <name><surname>Pickens</surname> <given-names>D.</given-names></name> <name><surname>Morgan</surname> <given-names>V.</given-names></name> <name><surname>Neighbor</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Brain areas involved in perception of biological motion</article-title>. <source>J. Cogn. Neurosci.</source> <volume>12</volume>, <fpage>711</fpage>&#x02013;<lpage>720</lpage>. <pub-id pub-id-type="doi">10.1162/089892900562417</pub-id><pub-id pub-id-type="pmid">11054914</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>E. D.</given-names></name> <name><surname>Jardine</surname> <given-names>N. L.</given-names></name> <name><surname>Pyles</surname> <given-names>J. A.</given-names></name></person-group> (<year>2010</year>). <article-title>fMR-adaptation reveals invariant coding of biological motion on the human STS</article-title>. <source>Front. Hum. Neurosci.</source> <volume>4</volume>:<fpage>15</fpage>. <pub-id pub-id-type="doi">10.3389/neuro.09.015.2010</pub-id><pub-id pub-id-type="pmid">20431723</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haxby</surname> <given-names>J. V.</given-names></name> <name><surname>Hoffman</surname> <given-names>E. A.</given-names></name> <name><surname>Gobbini</surname> <given-names>M. I.</given-names></name></person-group> (<year>2000</year>). <article-title>The distributed human neural system for face perception</article-title>. <source>Trends Cogn. Sci.</source> <volume>4</volume>, <fpage>223</fpage>&#x02013;<lpage>233</lpage>. <pub-id pub-id-type="doi">10.1016/s1364-6613(00)01482-0</pub-id><pub-id pub-id-type="pmid">10827445</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hertrich</surname> <given-names>I.</given-names></name> <name><surname>Dietrich</surname> <given-names>S.</given-names></name> <name><surname>Ackermann</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>Cross-modal interactions during perception of audiovisual speech and nonspeech signals: an fMRI study</article-title>. <source>J. Cogn. Neurosci.</source> <volume>23</volume>, <fpage>221</fpage>&#x02013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2010.21421</pub-id><pub-id pub-id-type="pmid">20044895</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hickok</surname> <given-names>G.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language</article-title>. <source>Cognition</source> <volume>92</volume>, <fpage>67</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2003.10.011</pub-id><pub-id pub-id-type="pmid">15037127</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hickok</surname> <given-names>G.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>The cortical organization of speech processing</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>8</volume>, <fpage>393</fpage>&#x02013;<lpage>402</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2113</pub-id><pub-id pub-id-type="pmid">17431404</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horn</surname> <given-names>B. K.</given-names></name> <name><surname>Schunck</surname> <given-names>B. G.</given-names></name></person-group> (<year>1981</year>). <article-title>Determining optical flow</article-title>. <source>Artif. Intell.</source> <volume>17</volume>, <fpage>185</fpage>&#x02013;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.1016/0004-3702(81)90024-2</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Humphries</surname> <given-names>C.</given-names></name> <name><surname>Love</surname> <given-names>T.</given-names></name> <name><surname>Swinney</surname> <given-names>D.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2005</year>). <article-title>Response of anterior temporal cortex to syntactic and prosodic manipulations during sentence processing</article-title>. <source>Hum. Brain Mapp.</source> <volume>26</volume>, <fpage>128</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1002/hbm.20148</pub-id><pub-id pub-id-type="pmid">15895428</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Humphries</surname> <given-names>C.</given-names></name> <name><surname>Willard</surname> <given-names>K.</given-names></name> <name><surname>Buchsbaum</surname> <given-names>B.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2001</year>). <article-title>Role of anterior temporal cortex in auditory sentence comprehension: an fMRI study</article-title>. <source>Neuroreport</source> <volume>12</volume>, <fpage>1749</fpage>&#x02013;<lpage>1752</lpage>. <pub-id pub-id-type="doi">10.1097/00001756-200106130-00046</pub-id><pub-id pub-id-type="pmid">11409752</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaas</surname> <given-names>J. H.</given-names></name> <name><surname>Hackett</surname> <given-names>T. A.</given-names></name></person-group> (<year>2000</year>). <article-title>Subdivisions of auditory cortex and processing streams in primates</article-title>. <source>Proc. Natl. Acad. Sci. U S A</source> <volume>97</volume>, <fpage>11793</fpage>&#x02013;<lpage>11799</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.97.22.11793</pub-id><pub-id pub-id-type="pmid">11050211</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lahnakoski</surname> <given-names>J. M.</given-names></name> <name><surname>Glerean</surname> <given-names>E.</given-names></name> <name><surname>Salmi</surname> <given-names>J.</given-names></name> <name><surname>J&#x000E4;&#x000E4;skel&#x000E4;inen</surname> <given-names>I. P.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name> <name><surname>Hari</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Naturalistic FMRI mapping reveals superior temporal sulcus as the hub for the distributed brain network for social perception</article-title>. <source>Front. Hum. Neurosci.</source> <volume>6</volume>:<fpage>233</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2012.00233</pub-id><pub-id pub-id-type="pmid">22905026</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leff</surname> <given-names>A. P.</given-names></name> <name><surname>Schofield</surname> <given-names>T. M.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Crinion</surname> <given-names>J. T.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Price</surname> <given-names>C. J.</given-names></name></person-group> (<year>2008</year>). <article-title>The cortical dynamics of intelligible speech</article-title>. <source>J. Neurosci.</source> <volume>28</volume>, <fpage>13209</fpage>&#x02013;<lpage>13215</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2903-08.2008</pub-id><pub-id pub-id-type="pmid">19052212</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lestou</surname> <given-names>V.</given-names></name> <name><surname>Pollick</surname> <given-names>F. E.</given-names></name> <name><surname>Kourtzi</surname> <given-names>Z.</given-names></name></person-group> (<year>2008</year>). <article-title>Neural substrates for action understanding at different description levels in the human brain</article-title>. <source>J. Cogn. Neurosci.</source> <volume>20</volume>, <fpage>324</fpage>&#x02013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2008.20021</pub-id><pub-id pub-id-type="pmid">18275338</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>J. W.</given-names></name> <name><surname>Van Essen</surname> <given-names>D. C.</given-names></name></person-group> (<year>2000</year>). <article-title>Corticocortical connections of visual, sensorimotor and multimodal processing areas in the parietal lobe of the macaque monkey</article-title>. <source>J. Comp. Neurol.</source> <volume>428</volume>, <fpage>112</fpage>&#x02013;<lpage>137</lpage>. <pub-id pub-id-type="doi">10.1002/1096-9861(20001204)428:1&#x0003C;112::AID-CNE8&#x0003E;3.0.co;2-9</pub-id><pub-id pub-id-type="pmid">11058227</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liebenthal</surname> <given-names>E.</given-names></name> <name><surname>Binder</surname> <given-names>J. R.</given-names></name> <name><surname>Spitzer</surname> <given-names>S. M.</given-names></name> <name><surname>Possing</surname> <given-names>E. T.</given-names></name> <name><surname>Medler</surname> <given-names>D. A.</given-names></name></person-group> (<year>2005</year>). <article-title>Neural substrates of phonemic perception</article-title>. <source>Cereb. Cortex</source> <volume>15</volume>, <fpage>1621</fpage>&#x02013;<lpage>1631</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhi040</pub-id><pub-id pub-id-type="pmid">15703256</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liebenthal</surname> <given-names>E.</given-names></name> <name><surname>Desai</surname> <given-names>R.</given-names></name> <name><surname>Ellingson</surname> <given-names>M. M.</given-names></name> <name><surname>Ramachandran</surname> <given-names>B.</given-names></name> <name><surname>Desai</surname> <given-names>A.</given-names></name> <name><surname>Binder</surname> <given-names>J. R.</given-names></name></person-group> (<year>2010</year>). <article-title>Specialization along the left superior temporal sulcus for auditory categorization</article-title>. <source>Cereb. Cortex</source> <volume>20</volume>, <fpage>2958</fpage>&#x02013;<lpage>2970</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhq045</pub-id><pub-id pub-id-type="pmid">20382643</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liebenthal</surname> <given-names>E.</given-names></name> <name><surname>Desai</surname> <given-names>R. H.</given-names></name> <name><surname>Humphries</surname> <given-names>C.</given-names></name> <name><surname>Sabri</surname> <given-names>M.</given-names></name> <name><surname>Desai</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>The functional organization of the left STS: a large scale meta-analysis of PET and fMRI studies of healthy adults</article-title>. <source>Front. Neurosci.</source> <volume>8</volume>:<fpage>289</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2014.00289</pub-id><pub-id pub-id-type="pmid">25309312</pub-id></citation></ref>
<ref id="B54"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lunneborg</surname> <given-names>C. E.</given-names></name></person-group> (<year>2000</year>). <source>Data Analysis by Resampling: Concepts and Applications.</source> <publisher-loc>Grove, CA</publisher-loc>: <publisher-name>Duxbury Pacific</publisher-name>.</citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McGurk</surname> <given-names>H.</given-names></name> <name><surname>MacDonald</surname> <given-names>J.</given-names></name></person-group> (<year>1976</year>). <article-title>Hearing lips and seeing voices</article-title>. <source>Nature</source> <volume>264</volume>, <fpage>746</fpage>&#x02013;<lpage>748</lpage>. <pub-id pub-id-type="doi">10.1038/264746a0</pub-id><pub-id pub-id-type="pmid">1012311</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mumford</surname> <given-names>J. A.</given-names></name> <name><surname>Turner</surname> <given-names>B. O.</given-names></name> <name><surname>Ashby</surname> <given-names>F. G.</given-names></name> <name><surname>Poldrack</surname> <given-names>R. A.</given-names></name></person-group> (<year>2012</year>). <article-title>Deconvolving BOLD activation in event-related designs for multivoxel pattern classification analyses</article-title>. <source>Neuroimage</source> <volume>59</volume>, <fpage>2636</fpage>&#x02013;<lpage>2643</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2011.08.076</pub-id><pub-id pub-id-type="pmid">21924359</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mur</surname> <given-names>M.</given-names></name> <name><surname>Bandettini</surname> <given-names>P. A.</given-names></name> <name><surname>Kriegeskorte</surname> <given-names>N.</given-names></name></person-group> (<year>2009</year>). <article-title>Revealing representational content with pattern-information fMRI&#x02014;an introductory guide</article-title>. <source>Soc. Cogn. Affect. Neurosci.</source> <volume>4</volume>, <fpage>101</fpage>&#x02013;<lpage>109</lpage>. <pub-id pub-id-type="doi">10.1093/scan/nsn044</pub-id><pub-id pub-id-type="pmid">19151374</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Narain</surname> <given-names>C.</given-names></name> <name><surname>Scott</surname> <given-names>S. K.</given-names></name> <name><surname>Wise</surname> <given-names>R. J.</given-names></name> <name><surname>Rosen</surname> <given-names>S.</given-names></name> <name><surname>Leff</surname> <given-names>A.</given-names></name> <name><surname>Iversen</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Defining a left-lateralized response specific to intelligible speech using fMRI</article-title>. <source>Cereb. Cortex</source> <volume>13</volume>, <fpage>1362</fpage>&#x02013;<lpage>1368</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhg083</pub-id><pub-id pub-id-type="pmid">14615301</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nath</surname> <given-names>A. R.</given-names></name> <name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech</article-title>. <source>J. Neurosci.</source> <volume>31</volume>, <fpage>1704</fpage>&#x02013;<lpage>1714</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.4853-10.2011</pub-id><pub-id pub-id-type="pmid">21289179</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nath</surname> <given-names>A. R.</given-names></name> <name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name></person-group> (<year>2012</year>). <article-title>A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion</article-title>. <source>Neuroimage</source> <volume>59</volume>, <fpage>781</fpage>&#x02013;<lpage>787</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2011.07.024</pub-id><pub-id pub-id-type="pmid">21787869</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nichols</surname> <given-names>T.</given-names></name> <name><surname>Brett</surname> <given-names>M.</given-names></name> <name><surname>Andersson</surname> <given-names>J.</given-names></name> <name><surname>Wager</surname> <given-names>T.</given-names></name> <name><surname>Poline</surname> <given-names>J.-B.</given-names></name></person-group> (<year>2005</year>). <article-title>Valid conjunction inference with the minimum statistic</article-title>. <source>Neuroimage</source> <volume>25</volume>, <fpage>653</fpage>&#x02013;<lpage>660</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2004.12.005</pub-id><pub-id pub-id-type="pmid">15808966</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nichols</surname> <given-names>T. E.</given-names></name> <name><surname>Holmes</surname> <given-names>A. P.</given-names></name></person-group> (<year>2002</year>). <article-title>Nonparametric permutation tests for functional neuroimaging: a primer with examples</article-title>. <source>Hum. Brain Mapp.</source> <volume>15</volume>, <fpage>1</fpage>&#x02013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1002/hbm.1058</pub-id><pub-id pub-id-type="pmid">11747097</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okada</surname> <given-names>K.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2009</year>). <article-title>Two cortical mechanisms support the integration of visual and auditory speech: a hypothesis and preliminary data</article-title>. <source>Neurosci. Lett.</source> <volume>452</volume>, <fpage>219</fpage>&#x02013;<lpage>223</lpage>. <pub-id pub-id-type="doi">10.1016/j.neulet.2009.01.060</pub-id><pub-id pub-id-type="pmid">19348727</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okada</surname> <given-names>K.</given-names></name> <name><surname>Rong</surname> <given-names>F.</given-names></name> <name><surname>Venezia</surname> <given-names>J.</given-names></name> <name><surname>Matchin</surname> <given-names>W.</given-names></name> <name><surname>Hsieh</surname> <given-names>I. H.</given-names></name> <name><surname>Saberi</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech</article-title>. <source>Cereb. Cortex</source> <volume>20</volume>, <fpage>2486</fpage>&#x02013;<lpage>2495</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhp318</pub-id><pub-id pub-id-type="pmid">20100898</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peelle</surname> <given-names>J. E.</given-names></name></person-group> (<year>2012</year>). <article-title>The hemispheric lateralization of speech processing depends on what &#x0201C;speech&#x0201D; is: a hierarchical perspective</article-title>. <source>Front. Hum. Neurosci.</source> <volume>6</volume>:<fpage>309</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2012.00309</pub-id><pub-id pub-id-type="pmid">23162455</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pelphrey</surname> <given-names>K. A.</given-names></name> <name><surname>Morris</surname> <given-names>J. P.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>2004</year>). <article-title>Grasping the intentions of others: the perceived intentionality of an action influences activity in the superior temporal sulcus during social perception</article-title>. <source>J. Cogn. Neurosci.</source> <volume>16</volume>, <fpage>1706</fpage>&#x02013;<lpage>1716</lpage>. <pub-id pub-id-type="doi">10.1162/0898929042947900</pub-id><pub-id pub-id-type="pmid">15701223</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Price</surname> <given-names>C. J.</given-names></name></person-group> (<year>2010</year>). <article-title>The anatomy of language: a review of 100 fMRI studies published in 2009</article-title>. <source>Ann. N Y Acad. Sci.</source> <volume>1191</volume>, <fpage>62</fpage>&#x02013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1111/j.1749-6632.2010.05444.x</pub-id><pub-id pub-id-type="pmid">20392276</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>Allison</surname> <given-names>T.</given-names></name> <name><surname>Bentin</surname> <given-names>S.</given-names></name> <name><surname>Gore</surname> <given-names>J. C.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>1998</year>). <article-title>Temporal cortex activation in humans viewing eye and mouth movements</article-title>. <source>J. Neurosci.</source> <volume>18</volume>, <fpage>2188</fpage>&#x02013;<lpage>2199</lpage>. <pub-id pub-id-type="pmid">9482803</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>Perrett</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>Electrophysiology and brain imaging of biological motion</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci.</source> <volume>358</volume>, <fpage>435</fpage>&#x02013;<lpage>445</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2002.1221</pub-id><pub-id pub-id-type="pmid">12689371</pub-id></citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>Syngeniotis</surname> <given-names>A.</given-names></name> <name><surname>Thompson</surname> <given-names>J. C.</given-names></name> <name><surname>Abbott</surname> <given-names>D. F.</given-names></name> <name><surname>Wheaton</surname> <given-names>K. J.</given-names></name> <name><surname>Castiello</surname> <given-names>U.</given-names></name></person-group> (<year>2003</year>). <article-title>The human temporal lobe integrates facial form and motion: evidence from fMRI and ERP studies</article-title>. <source>Neuroimage</source> <volume>19</volume>, <fpage>861</fpage>&#x02013;<lpage>869</lpage>. <pub-id pub-id-type="doi">10.1016/s1053-8119(03)00189-7</pub-id><pub-id pub-id-type="pmid">12880814</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rauschecker</surname> <given-names>J. P.</given-names></name> <name><surname>Scott</surname> <given-names>S. K.</given-names></name></person-group> (<year>2009</year>). <article-title>Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing</article-title>. <source>Nat. Neurosci.</source> <volume>12</volume>, <fpage>718</fpage>&#x02013;<lpage>724</lpage>. <pub-id pub-id-type="doi">10.1038/nn.2331</pub-id><pub-id pub-id-type="pmid">19471271</pub-id></citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rauschecker</surname> <given-names>J. P.</given-names></name> <name><surname>Tian</surname> <given-names>B.</given-names></name> <name><surname>Hauser</surname> <given-names>M.</given-names></name></person-group> (<year>1995</year>). <article-title>Processing of complex sounds in the macaque nonprimary auditory cortex</article-title>. <source>Science</source> <volume>268</volume>, <fpage>111</fpage>&#x02013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.1126/science.7701330</pub-id><pub-id pub-id-type="pmid">7701330</pub-id></citation></ref>
<ref id="B73"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Reisberg</surname> <given-names>D.</given-names></name> <name><surname>Mclean</surname> <given-names>J.</given-names></name> <name><surname>Goldfield</surname> <given-names>A.</given-names></name></person-group> (<year>1987</year>). &#x0201C;<article-title>Easy to hear but hard to understand: a lip-reading advantage with intact auditory stimuli</article-title>,&#x0201D; in <source>Hearing by Eye: The Psychology of Lip-Reading</source>, eds <person-group person-group-type="editor"><name><surname>Dodd</surname> <given-names>B.</given-names></name> <name><surname>Campbell</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Erlbaum</publisher-name>), <fpage>97</fpage>&#x02013;<lpage>114</lpage>.</citation></ref>
<ref id="B74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rogalsky</surname> <given-names>C.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2009</year>). <article-title>Selective attention to semantic and syntactic features modulates sentence processing networks in anterior temporal cortex</article-title>. <source>Cereb. Cortex</source> <volume>19</volume>, <fpage>786</fpage>&#x02013;<lpage>796</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhn126</pub-id><pub-id pub-id-type="pmid">18669589</pub-id></citation></ref>
<ref id="B75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Said</surname> <given-names>C. P.</given-names></name> <name><surname>Moore</surname> <given-names>C. D.</given-names></name> <name><surname>Engell</surname> <given-names>A. D.</given-names></name> <name><surname>Todorov</surname> <given-names>A.</given-names></name> <name><surname>Haxby</surname> <given-names>J. V.</given-names></name></person-group> (<year>2010</year>). <article-title>Distributed representations of dynamic facial expressions in the superior temporal sulcus</article-title>. <source>J. Vis.</source> <volume>10</volume>:<fpage>11</fpage>. <pub-id pub-id-type="doi">10.1167/10.5.11</pub-id><pub-id pub-id-type="pmid">20616141</pub-id></citation></ref>
<ref id="B76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>S. K.</given-names></name> <name><surname>Blank</surname> <given-names>C. C.</given-names></name> <name><surname>Rosen</surname> <given-names>S.</given-names></name> <name><surname>Wise</surname> <given-names>R. J.</given-names></name></person-group> (<year>2000</year>). <article-title>Identification of a pathway for intelligible speech in the left temporal lobe</article-title>. <source>Brain</source> <volume>123</volume>, <fpage>2400</fpage>&#x02013;<lpage>2406</lpage>. <pub-id pub-id-type="doi">10.1093/brain/123.12.2400</pub-id><pub-id pub-id-type="pmid">11099443</pub-id></citation></ref>
<ref id="B77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>S. K.</given-names></name> <name><surname>Rosen</surname> <given-names>S.</given-names></name> <name><surname>Lang</surname> <given-names>H.</given-names></name> <name><surname>Wise</surname> <given-names>R. J. S.</given-names></name></person-group> (<year>2006</year>). <article-title>Neural correlates of intelligibility in speech investigated with noise vocoded speech&#x02014;a positron emission tomography study</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>120</volume>, <fpage>1075</fpage>&#x02013;<lpage>1083</lpage>. <pub-id pub-id-type="doi">10.1121/1.2216725</pub-id><pub-id pub-id-type="pmid">16938993</pub-id></citation></ref>
<ref id="B78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sekiyama</surname> <given-names>K.</given-names></name> <name><surname>Kanno</surname> <given-names>I.</given-names></name> <name><surname>Miura</surname> <given-names>S.</given-names></name> <name><surname>Sugita</surname> <given-names>Y.</given-names></name></person-group> (<year>2003</year>). <article-title>Auditory-visual speech perception examined by fMRI and PET</article-title>. <source>Neurosci. Res.</source> <volume>47</volume>, <fpage>277</fpage>&#x02013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1016/s0168-0102(03)00214-1</pub-id><pub-id pub-id-type="pmid">14568109</pub-id></citation></ref>
<ref id="B79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seltzer</surname> <given-names>B.</given-names></name> <name><surname>Pandya</surname> <given-names>D. N.</given-names></name></person-group> (<year>1978</year>). <article-title>Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey</article-title>. <source>Brain Res.</source> <volume>149</volume>, <fpage>1</fpage>&#x02013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/0006-8993(78)90584-x</pub-id><pub-id pub-id-type="pmid">418850</pub-id></citation></ref>
<ref id="B80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seltzer</surname> <given-names>B.</given-names></name> <name><surname>Pandya</surname> <given-names>D. N.</given-names></name></person-group> (<year>1994</year>). <article-title>Parietal, temporal and occipita projections to cortex of the superior temporal sulcus in the rhesus monkey: a retrograde tracer study</article-title>. <source>J. Comp. Neurol.</source> <volume>343</volume>, <fpage>445</fpage>&#x02013;<lpage>463</lpage>. <pub-id pub-id-type="doi">10.1002/cne.903430308</pub-id><pub-id pub-id-type="pmid">8027452</pub-id></citation></ref>
<ref id="B81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Skipper</surname> <given-names>J. I.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H. C.</given-names></name> <name><surname>Small</surname> <given-names>S. L.</given-names></name></person-group> (<year>2007</year>). <article-title>Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception</article-title>. <source>Cereb. Cortex</source> <volume>17</volume>, <fpage>2387</fpage>&#x02013;<lpage>2399</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhl147</pub-id><pub-id pub-id-type="pmid">17218482</pub-id></citation></ref>
<ref id="B82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Specht</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>Mapping a lateralization gradient within the ventral stream for auditory speech perception</article-title>. <source>Front. Hum. Neurosci.</source> <volume>7</volume>:<fpage>629</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2013.00629</pub-id><pub-id pub-id-type="pmid">24106470</pub-id></citation></ref>
<ref id="B84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Specht</surname> <given-names>K.</given-names></name> <name><surname>Osnes</surname> <given-names>B.</given-names></name> <name><surname>Hugdahl</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>Detection of differential speech specific processes in the temporal lobe using fMRI and a dynamic &#x0201C;sound morphing&#x0201D; technique</article-title>. <source>Hum. Brain Mapp.</source> <volume>30</volume>, <fpage>3436</fpage>&#x02013;<lpage>3444</lpage>. <pub-id pub-id-type="doi">10.1002/hbm.20768</pub-id><pub-id pub-id-type="pmid">19347876</pub-id></citation></ref>
<ref id="B83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Specht</surname> <given-names>K.</given-names></name> <name><surname>Reul</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). <article-title>Functional segregation of the temporal lobes into highly differentiated subsystems for auditory perception: an auditory rapid event-related fMRI-task</article-title>. <source>Neuroimage</source> <volume>20</volume>, <fpage>1944</fpage>&#x02013;<lpage>1954</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2003.07.034</pub-id><pub-id pub-id-type="pmid">14683700</pub-id></citation></ref>
<ref id="B85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>Altieri</surname> <given-names>N. A.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>James</surname> <given-names>T. W.</given-names></name></person-group> (<year>2010</year>). <article-title>Neural processing of asynchronous audiovisual speech perception</article-title>. <source>Neuroimage</source> <volume>49</volume>, <fpage>3308</fpage>&#x02013;<lpage>3318</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.12.001</pub-id><pub-id pub-id-type="pmid">20004723</pub-id></citation></ref>
<ref id="B87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>Ghose</surname> <given-names>D.</given-names></name> <name><surname>Fister</surname> <given-names>J. K.</given-names></name> <name><surname>Sarko</surname> <given-names>D. K.</given-names></name> <name><surname>Altieri</surname> <given-names>N. A.</given-names></name> <name><surname>Nidiffer</surname> <given-names>A. R.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Identifying and quantifying multisensory integration: a tutorial review</article-title>. <source>Brain Topogr.</source> <volume>27</volume>, <fpage>707</fpage>&#x02013;<lpage>730</lpage>. <pub-id pub-id-type="doi">10.1007/s10548-014-0365-7</pub-id><pub-id pub-id-type="pmid">24722880</pub-id></citation></ref>
<ref id="B86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>James</surname> <given-names>T. W.</given-names></name></person-group> (<year>2009</year>). <article-title>Audiovisual integration in human superior temporal sulcus: inverse effectiveness and the neural processing of speech and object recognition</article-title>. <source>Neuroimage</source> <volume>44</volume>, <fpage>1210</fpage>&#x02013;<lpage>1223</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2008.09.034</pub-id><pub-id pub-id-type="pmid">18973818</pub-id></citation></ref>
<ref id="B88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>R. A.</given-names></name> <name><surname>VanDerKlok</surname> <given-names>R. M.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>James</surname> <given-names>T. W.</given-names></name></person-group> (<year>2011</year>). <article-title>Discrete neural substrates underlie complementary audiovisual speech integration processes</article-title>. <source>Neuroimage</source> <volume>55</volume>, <fpage>1339</fpage>&#x02013;<lpage>1345</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.12.063</pub-id><pub-id pub-id-type="pmid">21195198</pub-id></citation></ref>
<ref id="B89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stoppelman</surname> <given-names>N.</given-names></name> <name><surname>Harpaz</surname> <given-names>T.</given-names></name> <name><surname>Ben Shachar</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>Do not throw out the baby with the bath water: choosing an effective baseline for a functional localizer of speech processing</article-title>. <source>Brain Behav.</source> <volume>3</volume>, <fpage>211</fpage>&#x02013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1002/brb3.129</pub-id><pub-id pub-id-type="pmid">23785653</pub-id></citation></ref>
<ref id="B90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sumby</surname> <given-names>W. H.</given-names></name> <name><surname>Pollack</surname> <given-names>I.</given-names></name></person-group> (<year>1954</year>). <article-title>Visual contribution to speech intelligibility in noise</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>26</volume>, <fpage>212</fpage>&#x02013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1121/1.1907309</pub-id></citation></ref>
<ref id="B91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szycik</surname> <given-names>G. R.</given-names></name> <name><surname>Tausche</surname> <given-names>P.</given-names></name> <name><surname>M&#x000FC;nte</surname> <given-names>T. F.</given-names></name></person-group> (<year>2008</year>). <article-title>A novel approach to study audiovisual integration in speech perception: localizer fMRI and sparse sampling</article-title>. <source>Brain Res.</source> <volume>1220</volume>, <fpage>142</fpage>&#x02013;<lpage>149</lpage>. <pub-id pub-id-type="doi">10.1016/j.brainres.2007.08.027</pub-id><pub-id pub-id-type="pmid">17880929</pub-id></citation></ref>
<ref id="B92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turk-Browne</surname> <given-names>N. B.</given-names></name> <name><surname>Norman-Haignere</surname> <given-names>S. V.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>2010</year>). <article-title>Face-specific resting functional connectivity between the fusiform gyrus and posterior superior temporal sulcus</article-title>. <source>Front. Hum. Neurosci.</source> <volume>4</volume>:<fpage>176</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2010.00176</pub-id><pub-id pub-id-type="pmid">21151362</pub-id></citation></ref>
<ref id="B93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vander Wyk</surname> <given-names>B. C.</given-names></name> <name><surname>Hudac</surname> <given-names>C. M.</given-names></name> <name><surname>Carter</surname> <given-names>E. J.</given-names></name> <name><surname>Sobel</surname> <given-names>D. M.</given-names></name> <name><surname>Pelphrey</surname> <given-names>K. A.</given-names></name></person-group> (<year>2009</year>). <article-title>Action understanding in the superior temporal sulcus region</article-title>. <source>Psychol. Sci.</source> <volume>20</volume>, <fpage>771</fpage>&#x02013;<lpage>777</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9280.2009.02359.x</pub-id><pub-id pub-id-type="pmid">19422619</pub-id></citation></ref>
<ref id="B94"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Vapnik</surname> <given-names>V. N.</given-names></name></person-group> (<year>1999</year>). <source>The Nature of Statistical Learning Theory.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer-Verlag New York</publisher-name>.</citation></ref>
<ref id="B95"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Venezia</surname> <given-names>J. H.</given-names></name> <name><surname>Matchin</surname> <given-names>W.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). &#x0201C;<article-title>Multisensory integration and audiovisual speech perception</article-title>,&#x0201D; in <source>Brain Mapping: An Encyclopedic Reference</source>, (Vol 2) ed. <person-group person-group-type="editor"><name><surname>Toga</surname> <given-names>A. W.</given-names></name></person-group> (<publisher-loc>Elsevier</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>565</fpage>&#x02013;<lpage>572</lpage>.</citation></ref>
<ref id="B96"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wessinger</surname> <given-names>C.</given-names></name> <name><surname>VanMeter</surname> <given-names>J.</given-names></name> <name><surname>Tian</surname> <given-names>B.</given-names></name> <name><surname>Van Lare</surname> <given-names>J.</given-names></name> <name><surname>Pekar</surname> <given-names>J.</given-names></name> <name><surname>Rauschecker</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging</article-title>. <source>J. Cogn. Neurosci.</source> <volume>13</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1162/089892901564108</pub-id><pub-id pub-id-type="pmid">11224904</pub-id></citation></ref>
<ref id="B97"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname> <given-names>T. M.</given-names></name> <name><surname>Pelphrey</surname> <given-names>K. A.</given-names></name> <name><surname>Allison</surname> <given-names>T.</given-names></name> <name><surname>McKeown</surname> <given-names>M. J.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). <article-title>Polysensory interactions along lateral temporal regions evoked by audiovisual speech</article-title>. <source>Cereb. Cortex</source> <volume>13</volume>, <fpage>1034</fpage>&#x02013;<lpage>1043</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/13.10.1034</pub-id><pub-id pub-id-type="pmid">12967920</pub-id></citation></ref>
<ref id="B98"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yarkoni</surname> <given-names>T.</given-names></name> <name><surname>Poldrack</surname> <given-names>R. A.</given-names></name> <name><surname>Nichols</surname> <given-names>T. E.</given-names></name> <name><surname>Van Essen</surname> <given-names>D. C.</given-names></name> <name><surname>Wager</surname> <given-names>T. D.</given-names></name></person-group> (<year>2011</year>). <article-title>Large-scale automated synthesis of human functional neuroimaging data</article-title>. <source>Nat. Methods</source> <volume>8</volume>, <fpage>665</fpage>&#x02013;<lpage>670</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1635</pub-id><pub-id pub-id-type="pmid">21706013</pub-id></citation></ref>
<ref id="B99"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Tian</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Lee</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>Intrinsically organized network for face perception during the resting state</article-title>. <source>Neurosci. Lett.</source> <volume>454</volume>, <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1016/j.neulet.2009.02.054</pub-id><pub-id pub-id-type="pmid">19429043</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>We will use the terms &#x0201C;audiovisual&#x0201D;, &#x0201C;multisensory&#x0201D; and &#x0201C;multimodal&#x0201D; to describe brain regions that respond to more than one sensory modality. We will not test directly for multisensory <italic>interactions</italic> using criteria established in the animal electrophysiological literature (see, Stevenson et al., <xref ref-type="bibr" rid="B87">2014</xref>; Venezia et al., <xref ref-type="bibr" rid="B95">2015</xref>). Therefore, our use of the terminology will not strictly distinguish between regions that respond to two different unimodal signals vs. regions that <italic>prefer</italic> multimodal stimulation.</p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="http://www.vislab.ucl.ac.uk/cogent_2000.php">http://www.vislab.ucl.ac.uk/cogent_2000.php</ext-link></p></fn>
<fn id="fn0003"><p><sup>3</sup><ext-link ext-link-type="uri" xlink:href="http://afni.nimh.nih.gov/afni">http://afni.nimh.nih.gov/afni</ext-link></p></fn>
</fn-group>
</back>
</article>