<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Hum. Neurosci.</journal-id>
<journal-title>Frontiers in Human Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Hum. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5161</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnhum.2023.1225976</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Hearing, seeing, and feeling speech: the neurophysiological correlates of trimodal speech perception</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Hansmann</surname> <given-names>Doreen</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2318829/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Derrick</surname> <given-names>Donald</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/134392/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Theys</surname> <given-names>Catherine</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2245510/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Psychology, Speech and Hearing, University of Canterbury</institution>, <addr-line>Christchurch</addr-line>, <country>New Zealand</country></aff>
<aff id="aff2"><sup>2</sup><institution>New Zealand Institute of Language, Brain and Behaviour, University of Canterbury</institution>, <addr-line>Christchurch</addr-line>, <country>New Zealand</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: James W. Dias, Medical University of South Carolina, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Koichi Yokosawa, Hokkaido University, Japan; Josko Soda, University of Split, Croatia</p></fn>
<corresp id="c001">&#x002A;Correspondence: Doreen Hansmann, <email>doreen.hansmann@canterbury.ac.nz</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>29</day>
<month>08</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>17</volume>
<elocation-id>1225976</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>05</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>08</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Hansmann, Derrick and Theys.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Hansmann, Derrick and Theys</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>To perceive speech, our brains process information from different sensory modalities. Previous electroencephalography (EEG) research has established that audio-visual information provides an advantage compared to auditory-only information during early auditory processing. In addition, behavioral research showed that auditory speech perception is not only enhanced by visual information but also by tactile information, transmitted by puffs of air arriving at the skin and aligned with speech. The current EEG study aimed to investigate whether the behavioral benefits of bimodal audio-aerotactile and trimodal audio-visual-aerotactile speech presentation are reflected in cortical auditory event-related neurophysiological responses.</p>
</sec>
<sec>
<title>Methods</title>
<p>To examine the influence of multimodal information on speech perception, 20 listeners conducted a two-alternative forced-choice syllable identification task at three different signal-to-noise levels.</p>
</sec>
<sec>
<title>Results</title>
<p>Behavioral results showed increased syllable identification accuracy when auditory information was complemented with visual information, but did not show the same effect for the addition of tactile information. Similarly, EEG results showed an amplitude suppression for the auditory N1 and P2 event-related potentials for the audio-visual and audio-visual-aerotactile modalities compared to auditory and audio-aerotactile presentations of the syllable/pa/. No statistically significant difference was present between audio-aerotactile and auditory-only modalities.</p>
</sec>
<sec>
<title>Discussion</title>
<p>Current findings are consistent with past EEG research showing a visually induced amplitude suppression during early auditory processing. In addition, the significant neurophysiological effect of audio-visual but not audio-aerotactile presentation is in line with the large benefit of visual information but comparatively much smaller effect of aerotactile information on auditory speech perception previously identified in behavioral research.</p>
</sec>
</abstract>
<kwd-group>
<kwd>audio-visual speech perception</kwd>
<kwd>audio-tactile speech perception</kwd>
<kwd>trimodal speech perception</kwd>
<kwd>multisensory integration</kwd>
<kwd>EEG</kwd>
<kwd>auditory evoked potentials</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="48"/>
<page-count count="11"/>
<word-count count="7798"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Speech and Language</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>1. Introduction</title>
<p>Speech perception is a multimodal process. Listeners do not only rely on the auditory signal but also on the visual information provided by the speaker&#x2019;s articulatory movements, especially when the acoustic signal is degraded. <xref ref-type="bibr" rid="B41">Sumby and Pollack (1954)</xref> laid the groundwork for this knowledge with their behavioral study. By presenting words in different noise environments, in auditory-only and audio-visual conditions, they demonstrated that as the acoustic signal became more degraded the audio-visual condition led to improved word intelligibility compared to the auditory-only condition. Further groundbreaking evidence of audio-visual interaction was provided by <xref ref-type="bibr" rid="B32">McGurk and MacDonald (1976)</xref> who showed that presenting mismatching stimuli (e.g., auditory /ba/ and visual /ga/) resulted in perception of a fused response /da/ (i.e., &#x201C;McGurk effect&#x201D;) (<xref ref-type="bibr" rid="B32">McGurk and MacDonald, 1976</xref>). The interference of visual information with auditory perception provided further evidence for the integration of bimodal information during speech perception. Since these seminal works, these findings have been extensively replicated and expanded on (for a review see <xref ref-type="bibr" rid="B31">Mallick et al., 2015</xref>).</p>
<p>In addition to the well-established influence of audio-visual presentation on speech perception, audio-tactile influences have also been demonstrated. Early behavioral evidence showed that feeling a speaker&#x2019;s facial movements led to an enhancement of speech perception in deafblind perceivers (<xref ref-type="bibr" rid="B1">Alcorn, 1932</xref>) as well as trained healthy listeners (<xref ref-type="bibr" rid="B37">Reed et al., 1978</xref>, <xref ref-type="bibr" rid="B36">1982</xref>). Further support for tactile influences on auditory speech perception comes from studies using vibro-tactile systems in deaf as well as normal-hearing adults (e.g., <xref ref-type="bibr" rid="B14">De Filippo, 1984</xref>; <xref ref-type="bibr" rid="B48">Weisenberger, 1989</xref>; <xref ref-type="bibr" rid="B8">Bernstein et al., 1991</xref>; <xref ref-type="bibr" rid="B47">Waldstein and Boothroyd, 1995</xref>). However, all these studies required extended participant training and used stimuli that are not representative of sensory information typically available during face-to-face interactions.</p>
<p>Evidence of audio-tactile integration in untrained subjects comes from behavioral studies investigating the effect of small air puffs (i.e., aerotactile information) on the listener&#x2019;s skin (<xref ref-type="bibr" rid="B23">Gick and Derrick, 2009</xref>; <xref ref-type="bibr" rid="B24">Gick et al., 2010</xref>; <xref ref-type="bibr" rid="B17">Derrick and Gick, 2013</xref>). The air puff mimics a phonetic feature of specific speech sounds, distinguishing between high (e.g., /pa/) and low stop-release air flow (e.g., /ba/). By putting a hand in front of their face while saying /pa/, a native English speaker will experience a very noticeable burst of air. Doing the same with /ba/, they will experience low to unnoticeable air flow instead. <xref ref-type="bibr" rid="B23">Gick and Derrick (2009)</xref> applied air puffs on participants&#x2019; skin (either on their neck or hand) simultaneously with a degraded auditory signal. Their results showed that the presence of an air puff, unaffected by body location, enhanced correct identification of aspirated stimuli (e.g., /pa/) but it also interfered with correct identification of unaspirated stimuli (e.g., /ba/). These findings demonstrate that aerotactile information can interact with the auditory signal in a similar way as visual information in two-alternative forced choice experiments, however, this effect has not been observed in more complex tasks (<xref ref-type="bibr" rid="B19">Derrick et al., 2019a</xref>).</p>
<p>Interestingly, aerotactile information can also influence speech perception in the absence of auditory signals. In a study on visual-aerotactile speech perception, participants watched silent videos of a person saying /ba/ or /pa/, both alone or with air puffs applied to the skin (<xref ref-type="bibr" rid="B10">Bicevskis et al., 2016</xref>). They identified a syllable more likely as /pa/ when air puffs were present, demonstrating the integration of aerotactile information with visual information when an auditory signal is absent. This finding further confirms the multisensory nature of speech perception, with different bimodal sensory cues being integrated during the perception process.</p>
<p>Together, these studies demonstrated that speech perception can be a bimodal audio-visual and audio-tactile process, with some support for visuo-tactile speech perception. <xref ref-type="bibr" rid="B20">Derrick et al. (2019b)</xref> extended these findings to trimodal audio-visual-tactile integration in speech perception. In a two-way forced-choice auditory syllable-in-noise classification task (/pa/ or/ ga/), both visual and aerotactile information altered the signal-to-noise ratio (SNR) threshold for accurate identification of auditory signals. However, the strength of the effect of each modality differed. The visual component had a strong influence on auditory syllable-in-noise identification, resulting in a 28.0 dB improvement in SNR between matching and mismatching visual stimulus presentations. In comparison, the tactile component had a much smaller but significant influence, leading to a 1.6 dB SNR decrease in required auditory clarity. The results also showed that the three modalities provided additive influences. Visual and tactile information combined had a stronger influence on auditory speech perception than visual information alone, and the latter had a stronger influence than tactile stimuli alone. These findings demonstrate the simultaneous influence of both visual and tactile signals on auditory speech perception, illustrating a truly multimodal effect on behavioral responses.</p>
<p>To gain a better understanding of the processes contributing to multimodal speech perception, behavioral findings have been complemented by studies on neurophysiological processing. In an early EEG study, <xref ref-type="bibr" rid="B45">van Wassenhove et al. (2005)</xref> investigated the influence of audio-visual information on the N1 (negative peak approximately 100 ms following stimulus presentation) and P2 (positive peak approximately 200 ms following stimulus presentation) early cortical auditory event-related potentials (ERPs). In a three-alternative forced choice task, three syllables differing in visual saliency (/pa/, /ka/, and /ta/) were presented in matching and mismatching auditory, visual, and audio-visual conditions. Results showed that latencies of the auditory N1/P2 were reduced for audio-visual signals compared to the auditory-only condition, indicating faster auditory processing. The degree of visual saliency interacted with the temporal facilitation, with stronger visual predictors resulting in faster onset latencies (<italic>p</italic> &#x003C; <italic>t</italic> &#x003C; <italic>k</italic>). These findings were replicated in later studies (e.g., <xref ref-type="bibr" rid="B3">Arnal et al., 2009</xref>; <xref ref-type="bibr" rid="B33">Paris et al., 2016</xref>). <xref ref-type="bibr" rid="B45">van Wassenhove et al. (2005)</xref> also observed reduced N1/P2 amplitudes for audio-visual signals compared to auditory-only ones, independent of the saliency of the visual stimuli. They suggested that the N1/P2 suppression is independent of the featural content of speech input but rather reflects a more global bimodal integration process during which the preceding visual input leads to deactivation of the auditory cortices (<xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>; see also <xref ref-type="bibr" rid="B3">Arnal et al., 2009</xref>; <xref ref-type="bibr" rid="B7">Baart et al., 2014</xref>).</p>
<p>A reduced and earlier N1/P2 complex in audio-visual compared to auditory-only conditions can also be observed for non-speech stimuli (e.g., handclapping; <xref ref-type="bibr" rid="B40">Stekelenburg and Vroomen, 2007</xref>; <xref ref-type="bibr" rid="B46">Vroomen and Stekelenburg, 2010</xref>). However, N1 suppression was only observed when visual information preceded and reliably predicted sound onset for both non-speech and speech stimuli, indicating that audio-visual N1 enhancement is dependent on anticipatory visual information. In addition, bimodal integration cannot only be observed for well-known or familiar perceptual experiences but also for audio-visual stimuli associated with less daily life experience (<xref ref-type="bibr" rid="B33">Paris et al., 2016</xref>; <xref ref-type="bibr" rid="B44">Treille et al., 2018</xref>). For example, <xref ref-type="bibr" rid="B44">Treille et al. (2018)</xref> used a facial view of lip movements or a sagittal view of tongue movements as visual stimuli. Both stimulus types interacted with the auditory speech signal, resulting in reduced P2 amplitude and latency in both audio-visual conditions compared to the auditory-alone one. This finding suggests that prior associative audio-visual experience is not necessary to result in bimodal interaction, and that dynamic and phonetic informational cues are sharable across modalities by relying on the listener&#x2019;s implicit knowledge of speech production (<xref ref-type="bibr" rid="B44">Treille et al., 2018</xref>).</p>
<p>In contrast to studies on bimodal audio-visual processing, EEG studies focusing on audio-tactile effects are scarce. <xref ref-type="bibr" rid="B42">Treille et al. (2014a)</xref> compared the bimodal effects of audio-visual and audio-haptic speech perception. Participants, unexperienced with audio-haptic speech perception, were seated in front of the experimenter and had to keep their eyes closed while placing their right hand on the experimenter&#x2019;s lips and cheek to feel the speech gestures. Using a two-alternative forced-choice task, two syllables (/pa/ or /ta/) were presented auditorily, visually and/or haptically. In line with previous research, the N1 amplitude was attenuated and its latency reduced during audio-visual compared to auditory-only speech perception. Tactile information also led to a speeding up of N1 in audio-haptic compared to auditory-only speech perception, indicating that tactile information can also accelerate this early auditory processing. As with visual information, articulatory movements and therefore tactile information precede the onset of the acoustic signal, which may lead to a speeding-up of N1 due to constraints put on subsequent auditory processing (<xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>; <xref ref-type="bibr" rid="B42">Treille et al., 2014a</xref>). In a follow-up study, <xref ref-type="bibr" rid="B43">Treille et al. (2014b)</xref> also reported on haptically induced N1 amplitude suppression. However, this finding was stimulus dependent (i.e., /pa/ but not /ta/ and /ka/ syllables), possibly because the stronger saliency of the bilabial rounding movements for /pa/ conveyed a stronger predictive signal to facilitate the onset of the auditory event (<xref ref-type="bibr" rid="B40">Stekelenburg and Vroomen, 2007</xref>; <xref ref-type="bibr" rid="B46">Vroomen and Stekelenburg, 2010</xref>). <xref ref-type="bibr" rid="B43">Treille et al. (2014b)</xref> also reported shorter P2 latences in the audio-visual and audio-haptic compared to the auditory-only condition. This latency effect was independent of stimulus type or the degree of visual saliency, differing from earlier findings (<xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>). The authors argued that differences between experimental settings and natural stimulus variability may be possible reasons for the different results.</p>
<p>Taken together, the neurophysiological findings showed that the auditory N1/P2 complex is modulated by information from different sensory modalities. While the evidence for audio-visual integration dominates, the limited audio-haptic findings indicate that tactile information is integrated in a similar manner as visual information, by contributing predictive information of the incoming auditory event. It is important to note that although the audio-haptic experience is less familiar or less natural than the audio-visual one, N1/P2 modulations could be observed suggesting that prior associative sensory experience is not needed for a noticeable audio-tactile interaction during early auditory processing. However, these findings have not yet been extended to more natural aerotactile stimuli. In addition, behavioral research showed a trimodal auditory-visual-tactile signal processing advantage compared to uni- and bi-modal speech stimuli (<xref ref-type="bibr" rid="B20">Derrick et al., 2019b</xref>), but we do not yet understand how the brain integrates all three modalities together. The current EEG study therefore aimed to identify whether (1) congruent audio-aerotactile speech signals led to neurophysiological processing advantages compared to auditory-only presentation, and (2) trimodal audio-visuo-aerotactile presentation of speech led to additional auditory processing enhancement beyond bimodally presented information. We hypothesized to see decreased amplitudes of the auditory N1 and P2 ERPs during matching audio-visual (AV) and audio-aerotactile (AT) stimulus presentation compared to auditory-only (A) stimuli, as well as an additional decrease in amplitude for trimodal signals (AVT) compared to bimodal speech stimuli. Based on behavioral findings (<xref ref-type="bibr" rid="B20">Derrick et al., 2019b</xref>), it was expected that the tactile effect would be smaller than the visual effect (i.e., AVT &#x003C; AV &#x003C; AT &#x003C; A).</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>2. Materials and methods</title>
<sec id="S2.SS1">
<title>2.1. Participants</title>
<p>Twenty adult New Zealand English speakers (3 males, 17 females, <italic>M</italic> = 23 years, <italic>SD</italic> = 4.8) were recruited. Participants completed a demographic information sheet and underwent an audiological screening. Pure tone audiometry testing was carried out for frequencies of 500 Hz, 1 kHz, 2 kHz, and 4 kHz using an Interacoustics AS608 screening audiometer. Average pure tone thresholds were calculated and if the threshold was less than or equal to 25 dB hearing level, hearing sensitivity was considered within normal range. None of the participants had a history of neurological disease or visual, speech, language, or hearing problems. Participants received a &#x0024;20 voucher as compensation for their time. The study was approved by the University of Canterbury&#x2019;s Human Ethics Committee (HEC 2017/23/LR-PS) and participants provided written informed consent.</p>
</sec>
<sec id="S2.SS2">
<title>2.2. Stimuli</title>
<sec id="S2.SS2.SSS1">
<title>2.2.1. Recording of stimuli</title>
<p>The stimuli in this experiment are a subset of the stimuli from <xref ref-type="bibr" rid="B19">Derrick et al. (2019a)</xref>. One female speaker, producing forty tokens of /pa/ and /ga/ each, was recorded in a sound-attenuated room with a professional lighting setup. The video was recorded on a Sony MediaPro PMW-EX3 video camera set to record with the MPEG2 HD35 HL codec, with a resolution of 1920 by 1080 pixels (16:9 aspect ratio), a frame rate of 25 frames per second (fps), and a hardware-synched linear pulse-code-modulation (LPCM) 16-bit stereo audio recording at 48,000 Hz. The video was then converted to a time-preserving H.264 codec in yuv420p format encapsulated in an MP4 package, with audio extracted using FFMPEG (<xref ref-type="bibr" rid="B21">FFmpeg Developers, 2016</xref>). The audio was segmented in Praat (<xref ref-type="bibr" rid="B11">Boersma and Weenink, 2023</xref>), and the authors jointly selected ten recordings of each syllable that matched in duration, intensity, fundamental frequency, and phonation. In addition, the facial motion of each token was inspected to eliminate any case of eye-blink or noticeably distinguishable head motion. The video showed the complete face of the speaker, in frontal view from above the neck to the top of the head (see <xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Screenshots of /pa/ and /ga/ video stimuli for the visual (V) conditions, and the blurred and still lower face for the auditory-only (A) condition. Reprinted with permission from <xref ref-type="bibr" rid="B20">Derrick et al. (2019b)</xref> Acoustic Society of America (ASA).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-1225976-g001.tif"/>
</fig>
</sec>
<sec id="S2.SS2.SSS2">
<title>2.2.2. Creation of stimuli</title>
<p>The ten /pa/ and ten /ga/ tokens were sorted by length to form the closest duration-matched pairs. Software was written in R (<xref ref-type="bibr" rid="B35">R Development Core Team, 2018</xref>), WarbleR (<xref ref-type="bibr" rid="B2">Araya-Salas and Smith-Vidaurre, 2017</xref>), FFmpeg (<xref ref-type="bibr" rid="B21">FFmpeg Developers, 2016</xref>), and the Bourne Again Shell (BASH). The software took the timing of each video file and extracted the video with 750 ms lead time (prior the motion onset), and 500 ms follow time. For each video stimulus, it produced a version with right-channel audio from the original and left-channel audio that was either empty (for no air flow stimuli) or contained an 80 ms 12 kHz maximum intensity sine wave used to operate our custom air flow system. In addition to the audio-visual (AV) condition, for each video, a version was produced with a blurred and still lower face for the auditory-only (A) condition (see <xref ref-type="fig" rid="F1">Figure 1</xref>). To generate speech noise, the recordings of the speech tokens were randomly superimposed 10,000 times within a 10 s looped sound file using an automated process written in R (<xref ref-type="bibr" rid="B35">R Development Core Team, 2018</xref>), WarbleR (<xref ref-type="bibr" rid="B2">Araya-Salas and Smith-Vidaurre, 2017</xref>), and FFMPEG (<xref ref-type="bibr" rid="B21">FFmpeg Developers, 2016</xref>). Noise created using this method results in a noise spectrum that is nearly identical to the long-term spectrum of the speech tokens from that speaker (<xref ref-type="bibr" rid="B39">Smits et al., 2004</xref>; <xref ref-type="bibr" rid="B26">Jansen et al., 2010</xref>). This type of noise has similar efficacy regardless of the volume at which it is presented, allowing for effective application of signal-to-noise ratios used in this experiment. The software then overlayed the right channel audio with speech-noise, making a video file for each token with SNRs of &#x2013;8, &#x2013;14, and &#x2013;20 dB. The volume of the stimuli was kept constant at &#x223C;60 dB, adjusting the level of white noise (<xref ref-type="bibr" rid="B20">Derrick et al., 2019b</xref>), ensuring that each token was of similar maximum amplitude for maximum comfort during the experiments.</p>
</sec>
<sec id="S2.SS2.SSS3">
<title>2.2.3. Presentation of stimuli</title>
<p>The experiment consisted of five conditions: auditory only, visual only, audio-visual, audio-tactile, and audiovisual-tactile. Bimodal and multimodal conditions were only presented in a congruent context (e.g., air flow only for /pa/ tokens but not for /ga/ tokens) leading to eight different types of stimuli across the five conditions. The visual-only condition was presented with a constant white noise at &#x223C;60 dB, other conditions were presented at three different SNRs (&#x2013;8, &#x2013;14, and &#x2013;20 dB). For each item 50 trials were presented, leading to a total of 1,000 stimulus presentations over the entire EEG recording session (see <xref ref-type="table" rid="T1">Table 1</xref>).</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Type and number of trials per modality and SNR (total of 1,000 trials presented over 10 blocks).</p></caption>
<table cellspacing="5" cellpadding="5" frame="box" rules="all">
<thead>
<tr>
<td valign="top" align="left" style="color:#ffffff;background-color: #7f8080;">Modalities/SNRs</td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;">&#x2212;8 dB</td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;">&#x2212;14 dB</td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;">&#x2212;20 dB</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="4" style="background-color: #dcdcdc;"><bold>Audio only (A)</bold></td>
</tr>
<tr>
<td valign="top" align="left">pa</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">ga</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left" colspan="4" style="background-color: #dcdcdc;"><bold>Audio-visual (AV)</bold></td>
</tr>
<tr>
<td valign="top" align="left">pa</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">ga</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left" colspan="4" style="background-color: #dcdcdc;"><bold>Audio-tactile (AT)</bold></td>
</tr>
<tr>
<td valign="top" align="left">pa</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left" colspan="4" style="background-color: #dcdcdc;"><bold>Audio-visual-tactile (AVT)</bold></td>
</tr>
<tr>
<td valign="top" align="left">pa</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color: #dcdcdc;"><bold>Visual only (V)</bold></td>
<td valign="top" align="center" colspan="3" style="background-color: #dcdcdc;"><bold>Noise</bold></td>
</tr>
<tr>
<td valign="top" align="left">pa</td>
<td valign="top" align="center" colspan="3">50</td>
</tr>
<tr>
<td valign="top" align="left">ga</td>
<td valign="top" align="center" colspan="3">50</td>
</tr>
</tbody>
</table></table-wrap>
<p>During the auditory-only condition the speaker was presented with a blurred rectangle covering the lower face and articulatory movements (see <xref ref-type="fig" rid="F1">Figure 1</xref>). For the bi- and trimodal conditions, the auditory, tactile and visual signals were presented simultaneously (<xref ref-type="bibr" rid="B15">Derrick et al., 2009</xref>). <xref ref-type="fig" rid="F2">Figure 2</xref> shows a schematic overview of the experimental setup. Sound was presented through EARtone 3A Insert Headphones in both ears at &#x223C;60 dB, simultaneous with the relevant video. Visual stimuli were displayed on a computer screen placed 1 m in front of the participant. The tactile signal involved a slight, inaudible, cutaneous air flow presented on the suprasternal notch of the participant via the Murata piezoelectric air pump that was positioned 3 cm in front of the individual. The 80 ms 12 kHz sine wave was used to operate our air flow production system (<xref ref-type="bibr" rid="B16">Derrick and De Rybel, 2015</xref>). The air flow system uses a Murata&#x2019;s microblower, a 20 mm &#x00D7; 20 mm &#x00D7; 1.85 mm piezoelectric air pump with up to 0.8 l/m flow, max 19.38 cm/H<sub>2</sub>O pressure, and approximately 30 ms 5&#x2013;95% intensity rise time, allowing artificial approximation of continuously varying air flow in speech (<xref ref-type="bibr" rid="B16">Derrick and De Rybel, 2015</xref>). The sine wave in the left channel turns on the air flow system at full capacity, generating its highest air flow with a duration within the range of the voice onset time of a word-onset velar voiceless stop (/ga/), and at the long end of length for that of a labial voiceless stop (/pa/) (<xref ref-type="bibr" rid="B29">Lisker and Abramson, 1967</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Experimental setup.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-1225976-g002.tif"/>
</fig>
</sec>
</sec>
<sec id="S2.SS3">
<title>2.3. EEG recording and procedure</title>
<p>Participants were given a two-alternative forced-choice task (2AFC). They were told that they may perceive some noise and puffs of air along with the speech syllables they needed to identify. EEG data were continuously recorded using a 64-channel BioSemi Active Two system (<xref ref-type="bibr" rid="B12">Brain Products GmbH, 2021</xref>, M&#x00FC;nchen, Germany). Data were sampled at 250 Hz. Electrooculograms (EOG) were recorded from individual electrodes placed next to the left and right eye for horizontal eye movements and above and below the left eye for vertical eye movements. Two additional electrodes were placed on the left and right ear lobes for off-line referencing. During the EEG recording, participants were seated in a soundproof booth. The presentation of trials was coordinated with the E-prime 2.0 software (E-Studio; version 2.0, Psychology Software Tools, Inc., Pittsburgh, PA, USA). Each trial began with the presentation of a black fixation cross in the middle of a light gray background, which was shown for 120 ms. Then a stimulus from one of the five conditions (A, V, AV, AT, and AVT) was presented playing for 2,000 ms. A 320 ms pause followed in order to avoid possible interference between speech identification and motor response induced by the participants&#x2019; button press (<xref ref-type="bibr" rid="B22">Ganesh et al., 2014</xref>). Then a question mark symbol appeared on the screen and participants were required to categorize each syllable by pressing on one button corresponding to /pa/ or /ga/ (counterbalanced between subjects) on the response box with no forced time limit. After participants selected an answer, they were given a blank screen during the inter-trial interval, randomized between 800 and 1000 ms.</p>
<p>Participants were given three practice items prior to starting the actual experiment consisting of 10 blocks of 100 trials each. Trials within blocks were randomized across modalities but stimuli from the same modality were not presented for more than two times consecutively. The order of blocks was randomized for each participant. After each block participants took a two- to 3-min rest and continued the experiment by pressing the spacebar. The experiment lasted approximately 1.5 h, including subject preparation, explanations and pauses between blocks.</p>
</sec>
<sec id="S2.SS4">
<title>2.4. Data analyses</title>
<p>Three participants were excluded from data analysis due to technical problems during the EEG recording. For the remaining 17 participants offline data analysis was conducted using BrainVision Analyzer 2.0 (<xref ref-type="bibr" rid="B12">Brain Products GmbH, 2021</xref>, M&#x00FC;nchen, Germany). Data was re-referenced to the averaged voltage of the two earlobe electrodes and bandpass-filtered to 0.1 to 30 Hz (slope: 24 dB/octave). The signal was then segmented into 1,100-ms-long epochs starting 100 ms pre-stimulus until 1,000 ms post-stimulus. Only trials with correct responses were included in further EEG analyses. A semi-automated routine with additional visual inspection was used to exclude epochs that contained artifacts (voltages exceeding &#x00B1; 100 &#x03BC;V, at any channel). The mean artifact rate was 11%. Epochs were baseline corrected using the EEG data from &#x2212;100 to 0 ms relative to stimulus onset. Averaged ERPs for the conditions A, AV, AT, and AVT were calculated for each participant. The Cz electrode was used for ERP analysis as in previous reports (<xref ref-type="bibr" rid="B7">Baart et al., 2014</xref>, <xref ref-type="bibr" rid="B5">2017</xref>; <xref ref-type="bibr" rid="B4">Baart, 2016</xref>). Based on visual inspection of the grand average waveforms, a time window from 100 to 250 ms was selected that encompassed the auditory N1 and P2 components. The averaged EEG activity was extracted from three 50 ms time bins (e.g., <xref ref-type="bibr" rid="B38">Schepers et al., 2013</xref>; <xref ref-type="bibr" rid="B6">Baart and Samuel, 2015</xref>). The multimodal integration effects on N1 and P2 were analyzed by comparing A, AV, AT, and AVT responses (<xref ref-type="bibr" rid="B4">Baart, 2016</xref>). Statistical analyses were carried out using a three-way repeated measures ANOVA for each syllable type (/ga/ and /pa/) with time window, modality and SNR as within-subject factors. An alpha level of 0.05 was used to determine statistical significance. Bonferroni&#x2019;s correction was applied in further <italic>post hoc</italic> analyses.</p>
<p>Behavioral data was recorded during the EEG experiment in the form of accuracy data. A two-way repeated measures ANOVA for each syllable type (/pa/ and /ga/) was conducted with modality and SNR as within-subject factors. Greenhouse&#x2013;Geisser correction was used whenever sphericity was violated and Bonferroni&#x2019;s correction in further <italic>post hoc</italic> analyses. Significance was defined at the <italic>p</italic> &#x003C; 0.05 level.</p>
</sec>
</sec>
<sec id="S3" sec-type="results">
<title>3. Results</title>
<sec id="S3.SS1">
<title>3.1. Behavioral data</title>
<sec id="S3.SS1.SSS1">
<title>3.1.1. Accuracy /pa/ syllable</title>
<p>The number of correct trials under each condition are reported in <xref ref-type="fig" rid="F3">Figure 3</xref>. ANOVA showed a main effect for both <italic>SNR</italic> [<italic>F</italic><sub>(1.18,18.94)</sub> = 37.33, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.70], and <italic>modality</italic> [<italic>F</italic><sub>(1.89,30.28)</sub> = 17.29, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.52], with AV and AVT conditions being identified more accurately than the A only (all <italic>post hoc</italic> analyses <italic>p</italic> &#x003C; 0.001), and the AT conditions (all <italic>post hoc</italic> analyses <italic>p</italic> &#x003C; 0.001). The interaction between <italic>SNR and modality</italic> was also significant [<italic>F</italic><sub>(2.29,36.65)</sub> = 15.05, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.49], revealing modality effects at the &#x2212;14 and &#x2212;20 dB SNR level but not at &#x2212;8 dB. At &#x2212;14 dB SNR, listeners responded to AV and AVT trials more correctly than to A (<italic>post hoc</italic> analyses <italic>p</italic> = 0.015 and <italic>p</italic> &#x003C; 0.01, respectively) and AT trials (<italic>post hoc</italic> analyses <italic>p</italic> = 0.003 and <italic>p</italic> &#x003C; 0.001, respectively). The same was observed at the &#x2212;20 dB level, with AV and AVT showing more accuracy than A only (all <italic>post hoc</italic> analyses <italic>p</italic> &#x003C; 0.001) and AT (all <italic>post hoc</italic> analyses <italic>p</italic> = 0.002). No difference in accuracy was found between A and AT at &#x2212;14 and &#x2212;20 dB (<italic>post hoc</italic> analyses <italic>p</italic> = 1.0 and <italic>p</italic> = 0.61, respectively).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Accuracy data for syllable /pa/ for auditory-only (A), audio-visual (AV), audio-tactile (AT), and audio-visual-tactile (AVT) conditions at each SNR level (&#x2013;8, &#x2013;14, &#x2013;20 dB). Error bars are based on Binomial confidence intervals (95%).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-1225976-g003.tif"/>
</fig>
</sec>
<sec id="S3.SS1.SSS2">
<title>3.1.2. Accuracy /ga/ syllable</title>
<p>The ANOVA showed a main effect for both <italic>SNR</italic> [<italic>F</italic><sub>(1.09,17.53)</sub> = 39.91, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.71] and <italic>modality</italic> [<italic>F</italic><sub>(1,16)</sub> = 50.74, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.76], with listeners identifying trials more accurately in the AV compared to the A condition. The ANOVA further revealed an interaction effect between <italic>SNR and modality</italic> [<italic>F</italic><sub>(1.17,18.79)</sub> = 44.63, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.74], indicating a difference between modalities only at the &#x2212;20 dB level with listeners responding more correctly when additional visual information was present (<italic>post hoc</italic> analysis <italic>p</italic> &#x003C; 0.001; see <xref ref-type="fig" rid="F4">Figure 4</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Accuracy data for syllable /ga/ for auditory-only (A) and audio-visual (AV) conditions at each SNR level (&#x2013;8, &#x2013;14, &#x2013;20 dB). Error bars are based on Binomial confidence intervals (95%).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-1225976-g004.tif"/>
</fig>
</sec>
</sec>
<sec id="S3.SS2">
<title>3.2. ERP data</title>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows the grand averaged responses obtained for each modality. Visual inspection indicated a reduced amplitude of the audio-visual and audio-visual-tactile N1 and P2 auditory ERPs compared to the auditory-only condition, which was confirmed by the statistical analyses. Visual inspection also indicated a reduced N1 amplitude in audio-tactile compared to audio-only condition for /pa/. However, this difference was not statistically significant.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Grand-average of auditory evoked potentials for /pa/ and /ga/ syllables at Cz in different modalities illustrating the N1/P2 effects. <bold>(A)</bold> ERPs for /pa/ in auditory-only (A), audio-tactile (AT), audio-visual (AV), and audio-visual-tactile (AVT) conditions. <bold>(B)</bold> ERPs for /ga/ during auditory-only (A) and audio-visual (AV) conditions.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-1225976-g005.tif"/>
</fig>
<sec id="S3.SS2.SSS1">
<title>3.2.1. 100&#x2013;250 ms time window: /pa/ syllable</title>
<p>The ANOVA showed a significant main effect for <italic>time window</italic> [<italic>F</italic><sub>(2,32)</sub> = 68.13, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.81], and an interaction with <italic>modality</italic> [<italic>F</italic><sub>(6,96)</sub> = 13.86, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.46]. No other statistically significant main effects or interactions were present. <italic>Post hoc</italic> comparisons testing the four modalities (A, AV, AT, AVT) against each other within each time window, showed a main effect of <italic>modality</italic> in the 100&#x2013;150 ms time window, <italic>F</italic><sub>(3,150)</sub> = 5.25, <italic>p</italic> = 0.002, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.09, and the 200&#x2013;250 ms time window, <italic>F</italic><sub>(3,150)</sub> = 12.79, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.20. In the 100&#x2013;150 ms time window, A (<italic>M</italic> = &#x2212;3.99 &#x03BC;V, <italic>SD</italic> = 2.95) showed a more negative (i.e., larger) amplitude than AV (<italic>M</italic> = &#x2212;3.15 &#x03BC;V, <italic>SD</italic> = 3.14) and AVT (<italic>M</italic> = &#x2212;3.14 &#x03BC;V, <italic>SD</italic> = 3.21). No significant difference was found for A (<italic>M</italic> = &#x2212;3.99 &#x03BC;V, <italic>SD</italic> = 2.95) compared to AT (<italic>M</italic> = &#x2212;3.49 &#x03BC;V, <italic>SD</italic> = 3.01; <italic>p</italic> = 0.17). In the 200&#x2013;250 ms time window, A (<italic>M</italic> = 6.30 &#x03BC;V, <italic>SD</italic> = 3.10) showed a more positive (i.e., larger) amplitude than AV (<italic>M</italic> = 4.78 &#x03BC;V, <italic>SD</italic> = 2.93) and AVT (<italic>M</italic> = 4.69 &#x03BC;V, <italic>SD</italic> = 3.17). In addition, AT (<italic>M</italic> = 6.19 &#x03BC;V, <italic>SD</italic> = 2.86) resulted in a more positive amplitude than AV (<italic>M</italic> = 4.78 &#x03BC;V, <italic>SD</italic> = 2.93) and AVT (<italic>M</italic> = 4.69 &#x03BC;V, <italic>SD</italic> = 3.17). No significant difference was found for A (<italic>M</italic> = 6.30 &#x03BC;V, <italic>SD</italic> = 3.10) compared to AT (<italic>M</italic> = 6.19 &#x03BC;V, <italic>SD</italic> = 2.86) in this time window (<italic>p</italic> = 1.0). Similarly, no significant differences were identified in the 150&#x2013;200 ms time window [<italic>F</italic><sub>(3,150)</sub> = 0.75, <italic>p</italic> = 0.52, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.01].</p>
</sec>
<sec id="S3.SS2.SSS2">
<title>3.2.2. 100&#x2013;250 ms time window: /ga/ syllable</title>
<p>The ANOVA revealed a significant main effect for <italic>time window</italic> [<italic>F</italic><sub>(2,32)</sub> = 60.46, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.79], and an interaction with <italic>modality</italic> [<italic>F</italic><sub>(2, 32)</sub> = 11.22, <italic>p</italic> &#x003C; 0.001, &#x03B7;<italic><sub><italic>p</italic></sub></italic><sup>2</sup> = 0.41]. No other significant main effects or interactions were found. <italic>Post hoc</italic> pairwise comparisons showed only a significantly reduced amplitude for AV (<italic>M</italic> = 4.89 &#x03BC;V, <italic>SD</italic> = 3.02) compared to A (<italic>M</italic> = 6.57 &#x03BC;V, <italic>SD</italic> = 3.18) in the 200&#x2013;250 ms time window, <italic>t</italic>(50) = 4.34, <italic>p</italic> &#x003C; 0.001.</p>
</sec>
</sec>
</sec>
<sec id="S4" sec-type="discussion">
<title>4. Discussion</title>
<p>Previous EEG research showed that speech perception is a multimodal process, demonstrating a neurophysiological processing advantage for bimodal audio-visual and audio-haptic signals over auditory-only ones (<xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>; <xref ref-type="bibr" rid="B34">Pilling, 2009</xref>; <xref ref-type="bibr" rid="B42">Treille et al., 2014a</xref>,<xref ref-type="bibr" rid="B43">b</xref>). The present study aimed to expand previous work by investigating bimodal audio-aerotactile as well as trimodal audio-visuo-aerotactile integration in speech perception. Given that this study is the first to investigate the influence of aerotactile and trimodal stimuli, we used a basic EEG paradigm with a two-alternative forced-choice identification task, presenting two syllables (/pa/ and /ga/) at three different noise levels.</p>
<p>The main findings of our study showed that presenting congruent visual information led to significant amplitude reductions compared to auditory-only information, whereas the presentation of aerotactile speech signals did not lead to similar neurophysiological processing advantages in the auditory N1/P2 complex. Comparison of the trimodal AVT with the AV condition further confirmed the negative result related to the presentation of air puffs. Our results therefore did not confirm our hypothesis of a decrease in amplitudes with the addition of modalities (i.e., AVT &#x003C; AV &#x003C; AT &#x003C; A). This is consistent with the behavioral /pa/ findings at &#x2212;20 dB SNR in the current study, showing no significant difference in the audio-tactile compared to auditory-only condition, while the increase in accuracy in the two conditions with additional visual information was statistically significant. These findings are in line with previous behavioral trimodal speech perception results that showed a very small&#x2014;albeit statistically significant&#x2013;effect of aerotactile information compared to a strong influence of visual information on auditory syllable-in-noise identification (<xref ref-type="bibr" rid="B20">Derrick et al., 2019b</xref>).</p>
<p>As auditory, visual and aerotactile speech outputs share information in optimal environments, benefits of multimodal presentation become evident when one of the signals is degraded (e.g., <xref ref-type="bibr" rid="B41">Sumby and Pollack, 1954</xref>; <xref ref-type="bibr" rid="B23">Gick and Derrick, 2009</xref>). The relatively small behavioral effect of aerotactile information in previous studies (<xref ref-type="bibr" rid="B19">Derrick et al., 2019a</xref>,<xref ref-type="bibr" rid="B20">b</xref>) and absence of significant behavioral and neurophysiological effects of this type of information in the current study, suggests that aerotactile stimuli may only significantly enhance speech perception when there is no redundancy between this signal and all other information available.</p>
<p>The insignificant effect of the aerotactile information can further be interpreted in the context of existing EEG research investigating hand-to-face speech perception. <xref ref-type="bibr" rid="B43">Treille et al. (2014b)</xref> compared AV and audio-haptic (AH) modalities and reported a reduced N1 amplitude in both AV and AH modalities compared to the A modality. Importantly, the attenuation of N1 in the AH modality was restricted to the syllable /pa/ (not /ta/ or /ka/) due to the dependency of the N1 amplitude on the temporal relationship of sensory input. Due to the haptic saliency of the bilabial rounding movements involved in /pa/, this syllable was more reliable in predicting the sound onset than /ta/ and /ka/(<xref ref-type="bibr" rid="B43">Treille et al., 2014b</xref>). In the present study the air puff and auditory event were perceived simultaneously (<xref ref-type="bibr" rid="B15">Derrick et al., 2009</xref>), thus no anticipatory cue was available in the tactile signal unlike in the haptically perceived /pa/. This could explain the absence of an N1 amplitude attenuation for the aerotactile signal and would be in line with past AV studies showing that N1 suppression depends on the leading visual signal and how well it predicts the onset of the auditory signal (<xref ref-type="bibr" rid="B40">Stekelenburg and Vroomen, 2007</xref>; <xref ref-type="bibr" rid="B46">Vroomen and Stekelenburg, 2010</xref>; <xref ref-type="bibr" rid="B7">Baart et al., 2014</xref>). It is worth noting that in the grand-average data (<xref ref-type="fig" rid="F5">Figure 5</xref>) the tactile component does appear to have a visible effect on the N1 component of the auditory evoked potential. Although the visual effect is stronger, tactile induced suppression could indicate a potential processing advantage of AT over A, which would imply integration of tactile information regardless of its effectiveness in predicting sound onset (<xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>). However, this remains highly speculative at this stage and requires further investigation.</p>
<p>Our AV compared to A results are consistent with previous studies reporting decreased auditory-evoked N1 and P2 ERPs when visual information accompanied the acoustic signal (<xref ref-type="bibr" rid="B28">Klucharev et al., 2003</xref>; <xref ref-type="bibr" rid="B9">Besle et al., 2004</xref>; <xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>; <xref ref-type="bibr" rid="B13">Brunelli&#x00E8;re et al., 2013</xref>), confirming that visual information evoked an amplitude suppression during early auditory processing. For /pa/, AV signals yielded less negative ERPs in the 100&#x2013;150 ms window (i.e., N1) and less positive ERPs in the 200&#x2013;250 ms window (i.e., P2) compared to A signals. For /ga/ the AV compared to A amplitude reduction was only observed in the 200&#x2013;250 ms window. A similar finding has been reported by <xref ref-type="bibr" rid="B7">Baart et al. (2014)</xref>. Using stimuli that started with an alveolar instead of labial place of articulation, they observed an attenuation of the auditory P2 amplitude but not of the N1 amplitude. As the N1 amplitude is sensitive to a temporal relationship between visual and auditory signals (<xref ref-type="bibr" rid="B40">Stekelenburg and Vroomen, 2007</xref>; <xref ref-type="bibr" rid="B46">Vroomen and Stekelenburg, 2010</xref>), N1 suppression is dependent on anticipatory visual motion and its prediction of sound onset. Based on these findings, <xref ref-type="bibr" rid="B7">Baart et al. (2014)</xref> suggested that the temporal prediction of sound onset in alveolar stimuli may be less effective compared to stimuli that start with a labial place of articulation, resulting in a lack of N1 amplitude reduction. As /ga/ has a velar place of articulation, the absence of an attenuated N1 could be attributed to a less effective visual signal in the prediction of sound onset compared to the labial /pa/. Of note, however, is that other studies have reported N1 amplitude reductions for both labial and velar stimuli (<xref ref-type="bibr" rid="B45">van Wassenhove et al., 2005</xref>; <xref ref-type="bibr" rid="B13">Brunelli&#x00E8;re et al., 2013</xref>; <xref ref-type="bibr" rid="B43">Treille et al., 2014b</xref>). Inconsistencies of N1 and P2 effects across studies have been attributed to different factors, including variability in experimental tasks and associated cognitive load, experimental settings, data processing and analyses [see <xref ref-type="bibr" rid="B4">Baart (2016)</xref> for review and meta-analysis].</p>
<p>Our behavioral results showed a significant interaction effect between modality and SNR level, similar to previous behavioral research showing increased reliance on the visual signal as speech becomes more degraded (e.g., <xref ref-type="bibr" rid="B41">Sumby and Pollack, 1954</xref>; <xref ref-type="bibr" rid="B20">Derrick et al., 2019b</xref>). In contrast, the EEG results did not show the same interaction effect. The lack of a significant effect of varying noise levels in the current study could be attributed to the use of a simple two-alternative forced-choice speech identification task. Visual information seemed to dominate prediction of the incoming syllable no matter how much noise obscured the auditory signal. Similar findings have been reported previously. For example, <xref ref-type="bibr" rid="B25">Gilbert et al. (2012)</xref> also used a two-alternative forced-choice task (/ba/ vs /ga/) in four different acoustic environments (quiet, 0 dB SNR, &#x2013;9 dB SNR, &#x2013;18 dB SNR) and did not find an effect of SNR on audio-visual speech perception. Use of a larger sample of syllables would avoid using an elimination strategy and should be considered in future experiments (<xref ref-type="bibr" rid="B30">Liu et al., 2013</xref>).</p>
<p>While the airflow pump used in the current study has successfully been used in previous bi- and tri-modal behavioral aerotactile research (<xref ref-type="bibr" rid="B19">Derrick et al., 2019a</xref>,<xref ref-type="bibr" rid="B20">b</xref>), a limitation of this artificial air flow pump is that produces the same pressure (max 1.5 kPa) as speech but only one twelfth of the air flow (0.8 l/m) that speech normally generates (11.1 l/m) (<xref ref-type="bibr" rid="B19">Derrick et al., 2019a</xref>). To address the lower air flow, the system was placed close to the skin (3 cm away from participant) capable of covering a smaller area of skin compared to what a speaker&#x2019;s breath covers from a close speaking distance. This smaller impact area may negatively affect skin mechanoreceptor response among the Fast-acting type II receptors (Pacinian corpuscles), which only require a tiny 0.01 mm of skin indentation to respond, but must be impacted over an area about the size of a hand (<xref ref-type="bibr" rid="B27">Johnson, 2002</xref>). In addition, recent work on speech air flow shows that air flow patterns in speech are produced with much more widely varying penetration speeds than previously recognized (see <xref ref-type="bibr" rid="B18">Derrick et al., 2022</xref>). Future studies with new state of the art air flow systems may be able to address these factors.</p>
<p>In order to achieve our goals without placing an excessive burden on participants (&#x003E;1.5 h of EEG recording), we kept the paradigm as simple as we could. This entailed using a limited number of trails to represent each stimulus (50 trials per modality and per SNR, respectively), which is a limitation. Future studies considering a larger number of trials or a more complex paradigm could provide more insights into the relevance of aerotactile information during early auditory processing in speech perception. Future studies may also consider the analysis of visual-only and tactile-only modalities to test predictions in an additive model framework (e.g., AT-T &#x2260; A).</p>
</sec>
<sec id="S5" sec-type="conclusion">
<title>5. Conclusion</title>
<p>We reported the first EEG study to investigate the effect of bimodal audio-aerotactile and trimodal audio-visuo-aerotactile information on early auditory processing. Our findings provided support for the facilitation of auditory processing following presentation of congruent visual information. Our results did not confirm our hypothesis of an additional beneficial effect of aerotactile information on the neurophysiological processing of auditory signals. Together, the present findings confirm the large benefit of visual information on auditory speech perception in noise and are in line with the comparatively much smaller effect of aerotactile information identified in previous behavioral research.</p>
</sec>
<sec id="S6" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="S7" sec-type="ethics-statement">
<title>Ethics statement</title>
<p>The studies involving humans were approved by Human Ethics Committee, University of Canterbury, Christchurch, New Zealand. The study was conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec id="S8" sec-type="author-contributions">
<title>Author contributions</title>
<p>DH mainly responsible for data collection, data analysis, and manuscript writing. DD and CT wrote sections of the manuscript. DH and DD mainly responsible for production of figures and table. All authors contributed to conception and design of the study, manuscript revision, read, and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="S9" sec-type="funding-information">
<title>Funding</title>
<p>This research was partially supported by the MBIE fund (awarded to DD) and UC Marsden Seed Funding (awarded to CT and DD).</p>
</sec>
<ack><p>We would like to thank our participants for their involvement in this research. Special thanks to John Chrisstoffels (University of Canterbury School of Fine Arts) for his cinematography, to Claire Elliott for providing our stimuli, and to research assistant Ruth Chiam.</p>
</ack>
<sec id="S10" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="S11" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alcorn</surname> <given-names>S.</given-names></name></person-group> (<year>1932</year>). <article-title>The Tadoma method.</article-title> <source><italic>Volta Rev.</italic></source> <volume>34</volume> <fpage>195</fpage>&#x2013;<lpage>198</lpage>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Araya-Salas</surname> <given-names>M.</given-names></name> <name><surname>Smith-Vidaurre</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>warbleR: An r package to streamline analysis of animal acoustic signals.</article-title> <source><italic>Methods Ecol. Evol.</italic></source> <volume>8</volume> <fpage>184</fpage>&#x2013;<lpage>191</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210X.12624</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arnal</surname> <given-names>L. H.</given-names></name> <name><surname>Morillon</surname> <given-names>B.</given-names></name> <name><surname>Kell</surname> <given-names>C. A.</given-names></name> <name><surname>Giraud</surname> <given-names>A.-L.</given-names></name></person-group> (<year>2009</year>). <article-title>Dual neural routing of visual facilitation in speech processing.</article-title> <source><italic>J. Neurosci.</italic></source> <volume>29</volume> <fpage>13445</fpage>&#x2013;<lpage>13453</lpage>. <pub-id pub-id-type="doi">10.1523/jneurosci.3194-09.2009</pub-id> <pub-id pub-id-type="pmid">19864557</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baart</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays.</article-title> <source><italic>Psychophysiology</italic></source> <volume>53</volume> <fpage>1295</fpage>&#x2013;<lpage>1306</lpage>. <pub-id pub-id-type="doi">10.1111/psyp.12683</pub-id> <pub-id pub-id-type="pmid">27295181</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baart</surname> <given-names>M.</given-names></name> <name><surname>Lindborg</surname> <given-names>A.</given-names></name> <name><surname>Andersen</surname> <given-names>T. S.</given-names></name></person-group> (<year>2017</year>). <article-title>Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception.</article-title> <source><italic>Eur/ J. Neurosci.</italic></source> <volume>46</volume> <fpage>2578</fpage>&#x2013;<lpage>2583</lpage>. <pub-id pub-id-type="doi">10.1111/ejn.13734</pub-id> <pub-id pub-id-type="pmid">28976045</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baart</surname> <given-names>M.</given-names></name> <name><surname>Samuel</surname> <given-names>A. G.</given-names></name></person-group> (<year>2015</year>). <article-title>Turning a blind eye to the lexicon: ERPs show no cross-talk between lip-read and lexical context during speech sound processing.</article-title> <source><italic>J. Mem. Lang.</italic></source> <volume>85</volume> <fpage>42</fpage>&#x2013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2015.06.008</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baart</surname> <given-names>M.</given-names></name> <name><surname>Stekelenburg</surname> <given-names>J. J.</given-names></name> <name><surname>Vroomen</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Electrophysiological evidence for speech-specific audiovisual integration.</article-title> <source><italic>Neuropsychologia</italic></source> <volume>53</volume> <fpage>115</fpage>&#x2013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2013.11.011</pub-id> <pub-id pub-id-type="pmid">24291340</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bernstein</surname> <given-names>L. E.</given-names></name> <name><surname>Demorest</surname> <given-names>M. E.</given-names></name> <name><surname>Coulter</surname> <given-names>D. C.</given-names></name> <name><surname>O&#x2019;Connell</surname> <given-names>M. P.</given-names></name></person-group> (<year>1991</year>). <article-title>Lipreading sentences with vibrotactile vocoders: Performance of normal-hearing and hearing-impaired subjects.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>90</volume> <fpage>2971</fpage>&#x2013;<lpage>2984</lpage>. <pub-id pub-id-type="doi">10.1121/1.401771</pub-id> <pub-id pub-id-type="pmid">1838561</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besle</surname> <given-names>J.</given-names></name> <name><surname>Fort</surname> <given-names>A.</given-names></name> <name><surname>Delpuech</surname> <given-names>C.</given-names></name> <name><surname>Giard</surname> <given-names>M.-H.</given-names></name></person-group> (<year>2004</year>). <article-title>Bimodal speech: Early suppressive visual effects in human auditory cortex.</article-title> <source><italic>Eur. J. Neurosci.</italic></source> <volume>20</volume> <fpage>2225</fpage>&#x2013;<lpage>2234</lpage>. <pub-id pub-id-type="doi">10.1111/j.1460-9568.2004.03670.x</pub-id> <pub-id pub-id-type="pmid">15450102</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bicevskis</surname> <given-names>K.</given-names></name> <name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>Gick</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Visual-tactile integration in speech perception: Evidence for modality neutral speech primitives.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>140</volume> <fpage>3531</fpage>&#x2013;<lpage>3539</lpage>. <pub-id pub-id-type="doi">10.1121/1.4965968</pub-id> <pub-id pub-id-type="pmid">27908052</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boersma</surname> <given-names>P.</given-names></name> <name><surname>Weenink</surname> <given-names>D.</given-names></name></person-group> (<year>2023</year>). <source><italic>Praat: Doing phonetics by computer (version 6.0.52) [computer program]</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.praat.org">http://www.praat.org</ext-link> <comment>(accessed May 9, 2023)</comment>.</citation></ref>
<ref id="B12"><citation citation-type="journal"><collab>Brain Products GmbH</collab> (<year>2021</year>). <source><italic>BrainVision Analyzer (Version 2.2. 2).</italic></source> <publisher-loc>Gilching</publisher-loc>: <publisher-name>Brain Products GmbH</publisher-name>.</citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brunelli&#x00E8;re</surname> <given-names>A.</given-names></name> <name><surname>S&#x00E1;nchez-Garc&#x00ED;a</surname> <given-names>C.</given-names></name> <name><surname>Ikumi</surname> <given-names>N.</given-names></name> <name><surname>Soto-Faraco</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Visual information constrains early and late stages of spoken-word recognition in sentence context.</article-title> <source><italic>Int/ J. Psychophysiol.</italic></source> <volume>89</volume> <fpage>136</fpage>&#x2013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijpsycho.2013.06.016</pub-id> <pub-id pub-id-type="pmid">23797145</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Filippo</surname> <given-names>C. L.</given-names></name></person-group> (<year>1984</year>). <article-title>Laboratory projects in tactile aids to lipreading.</article-title> <source><italic>Ear Hear.</italic></source> <volume>5</volume> <fpage>211</fpage>&#x2013;<lpage>227</lpage>. <pub-id pub-id-type="doi">10.1097/00003446-198407000-00006</pub-id> <pub-id pub-id-type="pmid">6468779</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>Anderson</surname> <given-names>P.</given-names></name> <name><surname>Gick</surname> <given-names>B.</given-names></name> <name><surname>Green</surname> <given-names>S.</given-names></name></person-group> (<year>2009</year>). <article-title>Characteristics of air puffs produced in English &#x201C;pa&#x201D;: Experiments and simulations.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>125</volume> <fpage>2272</fpage>&#x2013;<lpage>2281</lpage>. <pub-id pub-id-type="doi">10.1121/1.3081496</pub-id> <pub-id pub-id-type="pmid">19354402</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>De Rybel</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <source><italic>System for audio analysis and perception enhancement.</italic></source> <comment>PCT Patent Number WO 2015/122785. PCT/NZ2015/050014</comment>. <publisher-loc>Geneva</publisher-loc>: <publisher-name>World Intellectual Property Organization</publisher-name>.</citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>Gick</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Aerotactile integration from distal skin stimuli.</article-title> <source><italic>Multisens. Res.</italic></source> <volume>26</volume> <fpage>405</fpage>&#x2013;<lpage>416</lpage>. <pub-id pub-id-type="doi">10.1163/22134808-00002427</pub-id> <pub-id pub-id-type="pmid">24649526</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>Kabaliuk</surname> <given-names>N.</given-names></name> <name><surname>Longworth</surname> <given-names>L.</given-names></name> <name><surname>Pishyar-Dehkordi</surname> <given-names>P.</given-names></name> <name><surname>Jermy</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>Speech air flow with and without face masks.</article-title> <source><italic>Sci. Rep.</italic></source> <volume>12</volume>:<issue>837</issue>. <pub-id pub-id-type="doi">10.1038/s41598-021-04745-z</pub-id> <pub-id pub-id-type="pmid">35039580</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>Madappallimattam</surname> <given-names>J.</given-names></name> <name><surname>Theys</surname> <given-names>C.</given-names></name></person-group> (<year>2019a</year>). <article-title>Aero-tactile integration during speech perception: Effect of response and stimulus characteristics on syllable identification.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>146</volume> <fpage>1605</fpage>&#x2013;<lpage>1614</lpage>. <pub-id pub-id-type="doi">10.1121/1.5125131</pub-id> <pub-id pub-id-type="pmid">31590504</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Derrick</surname> <given-names>D.</given-names></name> <name><surname>Hansmann</surname> <given-names>D.</given-names></name> <name><surname>Theys</surname> <given-names>C.</given-names></name></person-group> (<year>2019b</year>). <article-title>Tri-modal speech: Audio-visual-tactile integration in speech perception.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>146</volume> <fpage>3495</fpage>&#x2013;<lpage>3504</lpage>. <pub-id pub-id-type="doi">10.1121/1.5134064</pub-id> <pub-id pub-id-type="pmid">31795693</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><collab>FFmpeg Developers</collab> (<year>2016</year>). <source><italic>FFmpeg tool [sofware]</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://ffmpeg.org/">http://ffmpeg.org/</ext-link> <comment>(accessed May 9, 2023)</comment>.</citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ganesh</surname> <given-names>A. C.</given-names></name> <name><surname>Berthommier</surname> <given-names>F.</given-names></name> <name><surname>Vilain</surname> <given-names>C.</given-names></name> <name><surname>Sato</surname> <given-names>M.</given-names></name> <name><surname>Schwartz</surname> <given-names>J. L.</given-names></name></person-group> (<year>2014</year>). <article-title>A possible neurophysiological correlate of audiovisual binding and unbinding in speech perception.</article-title> <source><italic>Front. Psychol.</italic></source> <volume>5</volume>:<issue>1340</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2014.01340</pub-id> <pub-id pub-id-type="pmid">25505438</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gick</surname> <given-names>B.</given-names></name> <name><surname>Derrick</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Aero-tactile integration in speech perception.</article-title> <source><italic>Nature</italic></source> <volume>462</volume> <fpage>502</fpage>&#x2013;<lpage>504</lpage>.</citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gick</surname> <given-names>B.</given-names></name> <name><surname>Ikegami</surname> <given-names>Y.</given-names></name> <name><surname>Derrick</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <article-title>The temporal window of audio-tactile integration in speech perception.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>128</volume> <fpage>EL342</fpage>&#x2013;<lpage>EL346</lpage>. <pub-id pub-id-type="doi">10.1121/1.3505759</pub-id> <pub-id pub-id-type="pmid">21110549</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gilbert</surname> <given-names>J. L.</given-names></name> <name><surname>Lansing</surname> <given-names>C. R.</given-names></name> <name><surname>Garnsey</surname> <given-names>S. M.</given-names></name></person-group> (<year>2012</year>). <article-title>Seeing facial motion affects auditory processing in noise.</article-title> <source><italic>Attent. Percept. Psychophys.</italic></source> <volume>74</volume> <fpage>1761</fpage>&#x2013;<lpage>1781</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-012-0375-z</pub-id> <pub-id pub-id-type="pmid">23070884</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jansen</surname> <given-names>S.</given-names></name> <name><surname>Luts</surname> <given-names>H.</given-names></name> <name><surname>Wagener</surname> <given-names>K. C.</given-names></name> <name><surname>Frachet</surname> <given-names>B.</given-names></name> <name><surname>Wouters</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>The French digit triplet test: A hearing screening tool for speech intelligibility in noise.</article-title> <source><italic>Int. J. Audiol.</italic></source> <volume>49</volume> <fpage>378</fpage>&#x2013;<lpage>387</lpage>. <pub-id pub-id-type="doi">10.3109/14992020903431272</pub-id> <pub-id pub-id-type="pmid">20380611</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K. O.</given-names></name></person-group> (<year>2002</year>). &#x201C;<article-title>Neural basis of haptic perception</article-title>,&#x201D; in <source><italic>Seven&#x2019;s handbook of experimental psychology</italic></source>, <edition>3rd Edn</edition>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Pashler</surname> <given-names>H.</given-names></name> <name><surname>Yantis</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name>), <fpage>537</fpage>&#x2013;<lpage>583</lpage>.</citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klucharev</surname> <given-names>V.</given-names></name> <name><surname>Mottonen</surname> <given-names>R.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception.</article-title> <source><italic>Cogn. Brain Res.</italic></source> <volume>18</volume> <fpage>65</fpage>&#x2013;<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogbrainres.2003.09.004</pub-id> <pub-id pub-id-type="pmid">14659498</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lisker</surname> <given-names>L.</given-names></name> <name><surname>Abramson</surname> <given-names>A. S.</given-names></name></person-group> (<year>1967</year>). <article-title>Some effects of context on voice onset time in English stops.</article-title> <source><italic>Lang. Speech</italic></source> <volume>10</volume> <fpage>1</fpage>&#x2013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1177/002383096701000101</pub-id> <pub-id pub-id-type="pmid">6044530</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>B.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name> <name><surname>Gao</surname> <given-names>X.</given-names></name> <name><surname>Dang</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Correlation between audio&#x2013;visual enhancement of speech in different noise environments and SNR: A combined behavioral and electrophysiological study.</article-title> <source><italic>Neuroscience</italic></source> <volume>247</volume> <fpage>145</fpage>&#x2013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroscience.2013.05.007</pub-id> <pub-id pub-id-type="pmid">23673276</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mallick</surname> <given-names>D. B.</given-names></name> <name><surname>Magnotti</surname> <given-names>J. F.</given-names></name> <name><surname>Beauchamp</surname> <given-names>M. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Variability and stability in the McGurk effect: Contributions of participants, stimuli, time, and response type.</article-title> <source><italic>Psychono. Bull. Rev.</italic></source> <volume>22</volume> <fpage>1299</fpage>&#x2013;<lpage>1307</lpage>. <pub-id pub-id-type="doi">10.3758/s13423-015-0817-4</pub-id> <pub-id pub-id-type="pmid">25802068</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McGurk</surname> <given-names>H.</given-names></name> <name><surname>MacDonald</surname> <given-names>J.</given-names></name></person-group> (<year>1976</year>). <article-title>Hearing lips and seeing voices.</article-title> <source><italic>Nature</italic></source> <volume>264</volume> <fpage>746</fpage>&#x2013;<lpage>748</lpage>.</citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paris</surname> <given-names>T.</given-names></name> <name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Davis</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>Using EEG and stimulus context to probe the modelling of auditory-visual speech.</article-title> <source><italic>Cortex.</italic></source> <volume>75</volume> <fpage>220</fpage>&#x2013;<lpage>230</lpage>. <pub-id pub-id-type="doi">10.1016/j.cortex.2015.03.010</pub-id> <pub-id pub-id-type="pmid">26045213</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pilling</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Auditory event-related potentials (ERPs) in audiovisual speech perception</article-title>. <source><italic>J. Speech Lang. Hear. Res</italic></source>. <volume>52</volume>, <fpage>1073</fpage>&#x2013;<lpage>1081</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2009/07-0276)</pub-id> <pub-id pub-id-type="pmid">19641083</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><collab>R Development Core Team</collab> (<year>2018</year>). <source><italic>R: A language and environment for statistical computing.</italic></source> <publisher-loc>Vienna</publisher-loc>: <publisher-name>R Development Core Team</publisher-name>.</citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reed</surname> <given-names>C. M.</given-names></name> <name><surname>Durlach</surname> <given-names>N. I.</given-names></name> <name><surname>Braida</surname> <given-names>L. D.</given-names></name> <name><surname>Schultz</surname> <given-names>M. C.</given-names></name></person-group> (<year>1982</year>). <article-title>Analytic Study of the Tadoma method.</article-title> <source><italic>J. Speech Lang. Hear. Res.</italic></source> <volume>25</volume> <fpage>108</fpage>&#x2013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1044/jshr.2501.108</pub-id> <pub-id pub-id-type="pmid">7087411</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reed</surname> <given-names>C. M.</given-names></name> <name><surname>Rubin</surname> <given-names>S. I.</given-names></name> <name><surname>Braida</surname> <given-names>L. D.</given-names></name> <name><surname>Durlach</surname> <given-names>N. I.</given-names></name></person-group> (<year>1978</year>). <article-title>Analytic study of the Tadoma method: Discrimination ability of untrained observers.</article-title> <source><italic>J. Speech Hear. Res.</italic></source> <volume>21</volume> <fpage>625</fpage>&#x2013;<lpage>637</lpage>. <pub-id pub-id-type="doi">10.1044/jshr.2104.625</pub-id> <pub-id pub-id-type="pmid">745365</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schepers</surname> <given-names>I. M.</given-names></name> <name><surname>Schneider</surname> <given-names>T. R.</given-names></name> <name><surname>Hipp</surname> <given-names>J. F.</given-names></name> <name><surname>Engel</surname> <given-names>A. K.</given-names></name> <name><surname>Senkowski</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>Noise alters beta-band activity in superior temporal cortex during audiovisual speech processing.</article-title> <source><italic>Neuroimage</italic></source> <volume>70</volume> <fpage>101</fpage>&#x2013;<lpage>112</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2012.11.066</pub-id> <pub-id pub-id-type="pmid">23274182</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smits</surname> <given-names>C.</given-names></name> <name><surname>Kapteyn</surname> <given-names>T. S.</given-names></name> <name><surname>Houtgast</surname> <given-names>T.</given-names></name></person-group> (<year>2004</year>). <article-title>Development and validation of an automatic speech-in-noise screening test by telephone.</article-title> <source><italic>Int. J. Audiol.</italic></source> <volume>43</volume> <fpage>15</fpage>&#x2013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1080/14992020400050004</pub-id> <pub-id pub-id-type="pmid">14974624</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stekelenburg</surname> <given-names>J. J.</given-names></name> <name><surname>Vroomen</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>Neural correlates of multisensory integration of ecologically valid audiovisual events.</article-title> <source><italic>J. Cogn. Neurosci.</italic></source> <volume>19</volume> <fpage>1964</fpage>&#x2013;<lpage>1973</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2007.19.12.1964</pub-id> <pub-id pub-id-type="pmid">17892381</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sumby</surname> <given-names>W. H.</given-names></name> <name><surname>Pollack</surname> <given-names>I.</given-names></name></person-group> (<year>1954</year>). <article-title>Visual contribution to speech intelligibility in noise.</article-title> <source><italic>J. Acoust. Soc. Am.</italic></source> <volume>26</volume> <fpage>212</fpage>&#x2013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1121/1.1907309</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Treille</surname> <given-names>A.</given-names></name> <name><surname>Cordeboeuf</surname> <given-names>A.</given-names></name> <name><surname>Vilain</surname> <given-names>C.</given-names></name> <name><surname>Sato</surname> <given-names>M.</given-names></name></person-group> (<year>2014a</year>). <article-title>Haptic and visual informaiton speed up the neural processing of auditory speech in live dyadic interactions.</article-title> <source><italic>Neurophychologia</italic></source> <volume>57</volume> <fpage>71</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2014.02.004</pub-id> <pub-id pub-id-type="pmid">24530236</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Treille</surname> <given-names>A.</given-names></name> <name><surname>Vilain</surname> <given-names>C.</given-names></name> <name><surname>Sato</surname> <given-names>M.</given-names></name></person-group> (<year>2014b</year>). <article-title>The sound of your lips: Electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception.</article-title> <source><italic>Front. Psychol.</italic></source> <volume>5</volume>:<issue>420</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2014.00420</pub-id> <pub-id pub-id-type="pmid">24860533</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Treille</surname> <given-names>A.</given-names></name> <name><surname>Vilain</surname> <given-names>C.</given-names></name> <name><surname>Schwartz</surname> <given-names>J.-L.</given-names></name> <name><surname>Hueber</surname> <given-names>T.</given-names></name> <name><surname>Sato</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Electrophysiological evidence for Audio-visuo-lingual speech integration.</article-title> <source><italic>Neuropsychologia</italic></source> <volume>109</volume> <fpage>126</fpage>&#x2013;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2017.12.024</pub-id> <pub-id pub-id-type="pmid">29248497</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Grant</surname> <given-names>K. W.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2005</year>). <article-title>Visual speech speeds up the neural processing of auditory speech.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>102</volume> <fpage>1181</fpage>&#x2013;<lpage>1186</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0408949102</pub-id> <pub-id pub-id-type="pmid">15647358</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vroomen</surname> <given-names>J.</given-names></name> <name><surname>Stekelenburg</surname> <given-names>J. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli.</article-title> <source><italic>J. Cogn. Neurosci.</italic></source> <volume>22</volume> <fpage>1583</fpage>&#x2013;<lpage>1596</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2009.21308</pub-id> <pub-id pub-id-type="pmid">19583474</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Waldstein</surname> <given-names>R. S.</given-names></name> <name><surname>Boothroyd</surname> <given-names>A.</given-names></name></person-group> (<year>1995</year>). <article-title>Speechreading supplemented by single-channel and multichannel tactile displays of voice fundamental frequency.</article-title> <source><italic>J. Speech Lang. Hear. Res.</italic></source> <volume>38</volume> <fpage>690</fpage>&#x2013;<lpage>705</lpage>. <pub-id pub-id-type="doi">10.1044/jshr.3803.690</pub-id> <pub-id pub-id-type="pmid">7674660</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weisenberger</surname> <given-names>J.</given-names></name></person-group> (<year>1989</year>). <article-title>Tactile aids for speech perception and production by hearing-impaired people.</article-title> <source><italic>Volta Rev.</italic></source> <volume>91</volume> <fpage>79</fpage>&#x2013;<lpage>100</lpage>.</citation></ref>
</ref-list>
</back>
</article>
