<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Commun.</journal-id>
<journal-title>Frontiers in Communication</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Commun.</abbrev-journal-title>
<issn pub-type="epub">2297-900X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomm.2022.874215</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Communication</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Face-Masked Speech Intelligibility: The Influence of Speaking Style, Visual Information, and Background Noise</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Pycha</surname> <given-names>Anne</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1384755/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Cohn</surname> <given-names>Michelle</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1148373/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zellou</surname> <given-names>Georgia</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/934154/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Linguistics, University of Wisconsin</institution>, <addr-line>Milwaukee, WI</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Linguistics, University of California, Davis</institution>, <addr-line>Davis, CA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: S&#x000F3;nia Frota, University of Lisbon, Portugal</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Luis Jesus, University of Aveiro, Portugal; Tim Ziemer, University of Bremen, Germany; Stefanie Shattuck-Hufnagel, Massachusetts Institute of Technology, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Anne Pycha <email>pycha&#x00040;uwm.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Language Sciences, a section of the journal Frontiers in Communication</p></fn></author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>05</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>7</volume>
<elocation-id>874215</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>04</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Pycha, Cohn and Zellou.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Pycha, Cohn and Zellou</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>The current study investigates the intelligibility of face-masked speech while manipulating speaking style, presence of visual information about the speaker, and level of background noise. Speakers produced sentences while in both face-masked and non-face-masked conditions in clear and casual speaking styles. Two online experiments presented the sentences to listeners in multi-talker babble at different signal-to-noise ratios: &#x02212;6 dB SNR and &#x02212;3 dB SNR. Listeners completed a word identification task accompanied by either no visual information or visual information indicating whether the speaker was wearing a face mask or not (congruent with the actual face-masking condition). Across both studies, intelligibility is higher for clear speech. Intelligibility is also higher for face-masked speech, suggesting that speakers adapt their productions to be more intelligible in the presence of a physical barrier, namely a face mask. In addition, intelligibility is boosted when listeners are given visual cues that the speaker is wearing a face mask, but only at higher noise levels. We discuss these findings in terms of theories of speech production and perception.</p></abstract>
<kwd-group>
<kwd>speech production</kwd>
<kwd>speech perception</kwd>
<kwd>speech intelligibility</kwd>
<kwd>face mask</kwd>
<kwd>background noise</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="3"/>
<equation-count count="0"/>
<ref-count count="66"/>
<page-count count="13"/>
<word-count count="10496"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>During the COVID-19 pandemic, face masks became commonplace throughout the world. Despite their efficacy in helping to prevent virus transmission, face masks present an obstacle for speech communication (Bottalico et al., <xref ref-type="bibr" rid="B8">2020</xref>; Hampton et al., <xref ref-type="bibr" rid="B22">2020</xref>; Saunders et al., <xref ref-type="bibr" rid="B53">2021</xref>). To begin with, masks obscure speakers&#x00027; mouths and therefore deprive listeners of visual cues that can be used to support comprehension (Giovanelli et al., <xref ref-type="bibr" rid="B20">2021</xref>; Truong and Weber, <xref ref-type="bibr" rid="B61">2021</xref>). Even for the audio signal, face masks act as a physical barrier for sound waves and have been shown to reduce signal transmission from the mouth (specifically, a &#x0201C;simulated&#x0201D; mouth consisting of a loudspeaker in a dummy head; Palmiero et al., <xref ref-type="bibr" rid="B41">2016</xref>). In overcoming this communicative challenge, both speakers and listeners might play a role. Speakers, for example, can modulate their speaking style to enhance intelligibility. Listeners, for their part, can make use of additional cues, such as visual information about the face-masked status of the speaker, and they may also adjust their listening strategies in response to signal degradation. In the current study, the goal is to pinpoint the ways in which these speaker and listener adaptations interact during speech communication while wearing a face mask. To that end, the current study investigates the intelligibility of face-masked speech while manipulating speaking style, availability of visual information about the speaker, and level of background noise. In doing so, this work evaluates adaptation theories of speech production, as well as social and cognitive accounts of speech perception.</p>
<sec>
<title>Face Masks and Speakers</title>
<p>In everyday conversations, people often speak casually. But when listening conditions are difficult, speakers may adapt by shifting to a &#x0201C;clear&#x0201D; speech style (Lindblom, <xref ref-type="bibr" rid="B32">1990</xref>). In the presence of background noise, for example, speakers&#x00027; productions become louder, slower, and higher-pitched (the Lombard effect; Lombard, <xref ref-type="bibr" rid="B34">1911</xref>; Brumm and Zollinger, <xref ref-type="bibr" rid="B11">2011</xref>). Clear speech produces intelligibility benefits across a wide range of situations (for review, see Smiljani&#x00107; and Bradlow, <xref ref-type="bibr" rid="B56">2009</xref>), including face-mask situations. For example, Smiljani&#x00107; et al. (<xref ref-type="bibr" rid="B57">2021</xref>) found that clear speech produced with a face mask increased intelligibility, compared to casual speech produced with or without a face mask. In a similar vein, Yi et al. (<xref ref-type="bibr" rid="B64">2021</xref>) found that, across both face-masked and non-face-masked conditions in speech-shaped noise (SSN) and multitalker babble, clear speech was better understood than conversational speech. Furthermore, in an audio-only condition, they found similar word identification accuracy in SSN for clear face-masked speech and conversational non-face-masked speech, suggesting that the clear speech style compensated for the signal degradation from the face mask.</p>
<p>In related work, the current authors have also shown that clear speech style boosts intelligibility in face-masked situations (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>), although the pattern of results differed from those of other studies. Crucially, these findings showed that listeners&#x00027; comprehension accuracy was actually <italic>greater</italic> in a face-masked clear condition than in a non-face-masked clear condition. No such boost occurred for the casual style, which does not demand that the speaker produce clarity; nor did it occur for a positive-emotional speaking style, which does not demand clarity either, but has nevertheless been shown to produce intelligibility benefits for listeners (Dupuis and Pichora-Fuller, <xref ref-type="bibr" rid="B15">2008</xref>). Note that this pattern is inconsistent with <italic>automatic adaptation accounts</italic> of speech production (e.g., Junqua, <xref ref-type="bibr" rid="B28">1993</xref>), which claim that, in the presence of a communication challenge (such as noise, or a face mask), speakers will adapt their productions automatically regardless of speech style. However, this pattern is consistent with <italic>targeted adaptation accounts</italic> (Hazan et al., <xref ref-type="bibr" rid="B25">2015</xref>; Garnier et al., <xref ref-type="bibr" rid="B19">2018</xref>), which claim that speakers adapt to challenges by actively tailoring their productions to specific communicative needs of a given situation; here, the need to speak clearly while also overcoming the physical barrier of the mask.</p>
<p>The current study attempts to replicate the clear vs. casual pattern of speech style results reported by Cohn et al. (<xref ref-type="bibr" rid="B12">2021</xref>), but also extend this line of research to investigate how the pattern changes when different demands are made of the listener.</p></sec>
<sec>
<title>Face Masks and Listeners</title>
<p>While several studies have addressed the role of the speaker in face-masked communication, less is known about the role of the listener. In general, previous research has demonstrated that listener beliefs and behaviors affect their interpretation of the speech signal, and the same can be expected to hold true in face-masked situations. Here, the focus is on two different features that have been shown to influence the listener: their use of visual cues about the speaker, and their response to different levels of signal degradation.</p>
<sec>
<title>Integrating Cues About the Speaker</title>
<p>Listeners&#x00027; experiences of speech are shaped by their beliefs about the identity or origin of the speaker. Many studies investigating this issue have asked participants to listen to an audio signal accompanied by pictures of talkers with different apparent ethnic or racial identities. Results have shown that listeners interpret the same speech signal differently, depending upon whether they believe the speaker is foreign-born or native (e.g., Rubin, <xref ref-type="bibr" rid="B49">1992</xref>; McGowan, <xref ref-type="bibr" rid="B38">2015</xref>; Ingvalson et al., <xref ref-type="bibr" rid="B27">2017</xref>).</p>
<p>Two different <italic>social perception</italic> models have been proposed to account for these effects. According to a <italic>bias account</italic>, bias against non-dominant groups reduces attention to the speech signal (Rubin and Smith, <xref ref-type="bibr" rid="B50">1990</xref>; Rubin, <xref ref-type="bibr" rid="B49">1992</xref>; Kang and Rubin, <xref ref-type="bibr" rid="B30">2009</xref>; Lippi-Green, <xref ref-type="bibr" rid="B33">2011</xref>). This model predicts reduced intelligibility for non-dominant speaker groups, correlated with the degree to which they are the object of bias within a particular societal context. In contrast, an <italic>alignment account</italic> proposes that the modulating factor is not bias per se, but rather the fit between social expectations and the signal (Babel and Russell, <xref ref-type="bibr" rid="B5">2015</xref>; McGowan, <xref ref-type="bibr" rid="B38">2015</xref>). This model predicts reduced intelligibility when listeners&#x00027; expectations about a speaker do not match the speech that they produce, and enhanced intelligibility when they do match, regardless of whether the expectations concern a dominant or a non-dominant group.</p>
<p>The literature contains empirical support for both <italic>bias</italic> and <italic>alignment</italic> theories. Rubin (<xref ref-type="bibr" rid="B49">1992</xref>), for example, examined the perception of native-accented American English speech that was accompanied either by a photo of a person with Asian facial features, or by a photo of a person with Caucasian facial features. Despite the fact that the speech samples were the same across conditions, American English listeners showed better comprehension in the Caucasian photo condition, in line with the predictions of the <italic>bias account</italic>. Other studies have also reported reduced intelligibility or increased accentedness ratings for non-dominant social groups, including a Syrian identity presented alongside German speech (Fiedler et al., <xref ref-type="bibr" rid="B18">2019</xref>), an image of a person from Morocco accompanying Dutch speech (Hanul&#x000ED;kov&#x000E1;, <xref ref-type="bibr" rid="B23">2018</xref>), and an image of a person from South Asia accompanying English speech (Kutlu, <xref ref-type="bibr" rid="B31">2020</xref>). Applying these results to the current study, one potential bias against face-masked speakers is that they are difficult to understand. One would therefore predict speech intelligibility to decrease whenever listeners are presented with an image of a face-masked speaker, compared to an image of non-face-masked speaker.</p>
<p>Several studies have made observations which challenge the <italic>bias account</italic>. McGowan (<xref ref-type="bibr" rid="B38">2015</xref>) conducted a study similar to that of Rubin (<xref ref-type="bibr" rid="B49">1992</xref>), except that the speech samples consisted of Chinese-accented (specifically, Mandarin-accented) English, rather than native-accented English. Some listener participants had very limited exposure to Chinese-accented English, while other participants were of Chinese-American heritage. Results for both groups showed that accuracy was higher when speech was accompanied by a photo of a person with Asian facial features, compared to a person with Caucasian facial features. This finding is not compatible with a bias account: if bias against a non-dominant social group reduces attention to the signal, one would not expect better accuracy in the Asian photo condition. Instead, this finding is compatible with an <italic>alignment account</italic>, whereby consistency, or alignment between visual information (here, a photo), and the speech signal leads to better language comprehension. Yi et al. (<xref ref-type="bibr" rid="B65">2013</xref>), Babel and Russell (<xref ref-type="bibr" rid="B5">2015</xref>), and Gnevsheva (<xref ref-type="bibr" rid="B21">2018</xref>) also report findings that are compatible with an <italic>alignment account</italic>. Relatedly, a study by McLaughlin et al. (<xref ref-type="bibr" rid="B39">2022</xref>) finds no evidence for implicit racial biases in audio-visual benefits for accented vs. unaccented speech, further challenging a <italic>bias account</italic>. Applying these results to the current study, people plausibly have certain expectations about face-masked speakers (e.g., they produce speech that is sometimes altered by a physical barrier). Under the <italic>alignment account</italic>, one expects enhanced intelligibility whenever listeners are given information about the speaker that supports their expectations.</p>
<p>In many of the studies in this literature, the accompanying images relied upon phenotypical traits determined in large part by genetic factors, such as hair color and facial features, or on apparent region-of-origin (e.g., Niedzielski, <xref ref-type="bibr" rid="B40">1999</xref>; Hay et al., <xref ref-type="bibr" rid="B24">2006</xref>). The images used in the current study are of a different nature, because face masks constitute a transient, non-phenotypical, non-regional characteristic of a speaker. It remains an open question whether such characteristics can also affect speech intelligibility, but at least one study suggests that they might. D&#x00027;Onofrio (<xref ref-type="bibr" rid="B14">2019</xref>) presented participants with audio recordings accompanied by photos of the same individual with different clothing, hairstyle, and facial expressions, and reported that these different stylistic presentations (or &#x0201C;personae&#x0201D;) affected lexical recall. In the current study, line drawings of the same individual either with or without a face mask are presented to listeners in order to test whether this affects intelligibility.</p></sec>
<sec>
<title>Listener Responses to Signal Degradation</title>
<p>In everyday communication, listeners confront many factors that potentially make the speech signal more difficult to understand, such as foreign accents and background noise, as well as face masks. In theory, one might expect each of these factors to affect listener behavior in a simple linear fashion. In reality, the existing literature suggests more complex scenarios. To begin with, the impact of degraded signals extends beyond intelligibility and affects other cognitive variables, such as listener effort. Complicating the picture further, different sources of degradation do not always combine in an additive fashion.</p>
<p>Research on listener effort has focused on speech signals presented in the presence of background noise at different signal-to-noise ratios (SNR). As SNR becomes lower, listeners generally do worse on listening tasks, as expected (e.g., Pichora-Fuller et al., <xref ref-type="bibr" rid="B42">1995</xref>; Fallon et al., <xref ref-type="bibr" rid="B17">2000</xref>). This is true for face-masked speech as well: Toscano and Toscano (<xref ref-type="bibr" rid="B60">2021</xref>) found that comprehension accuracy was at ceiling across face-mask conditions at SNR &#x0002B;13 dB, but accuracy was significantly lower for masked speech conditions at &#x02212;3 dB SNR. Less conspicuously, SNR also affects effort: as SNR becomes lower, listeners give higher ratings of their listening effort (Rudner et al., <xref ref-type="bibr" rid="B51">2012</xref>). Again, the same holds true for face-masked speech: Brown et al. (<xref ref-type="bibr" rid="B10">2021</xref>) reported higher effort ratings for face-masked conditions, compared to non-face-masked conditions. In addition to subjective effort ratings, SNR has been shown to modulate pupil responses (Zekveld et al., <xref ref-type="bibr" rid="B66">2010</xref>), recall tasks (Rabbitt, <xref ref-type="bibr" rid="B45">1966</xref>, <xref ref-type="bibr" rid="B46">1968</xref>), and performance on simultaneous non-speech tasks (e.g., Broadbent, <xref ref-type="bibr" rid="B9">1958</xref>; Sarampalis et al., <xref ref-type="bibr" rid="B52">2009</xref>; for an overview, see Strand et al., <xref ref-type="bibr" rid="B58">2018</xref>). These results highlight the fact that listening is not a passive activity, but a complex cognitive behavior, as proposed by <italic>cognitive accounts</italic> (Heald and Nusbaum, <xref ref-type="bibr" rid="B26">2014</xref>).</p>
<p>Research on different sources of degradation underscores a similar point. For example, Smiljani&#x00107; et al. (<xref ref-type="bibr" rid="B57">2021</xref>) examined two such sources: face masks worn by a speaker, and background noise (six-talker babble). Their results showed that in quiet conditions, face-masked speech was just as intelligible as non-face-masked speech (see also Magee et al., <xref ref-type="bibr" rid="B36">2020</xref>). In noisy conditions, however, the presence of a face mask decreased intelligibility compared to the no-mask condition. This suggests that the listeners&#x00027; experience of signal degradation may have emerged from the specific combination of face-mask plus background noise, rather than by each factor independently.</p>
<p>Complex interactions have also been reported for other types of challenging signals. For example, Adank et al. (<xref ref-type="bibr" rid="B3">2009</xref>) asked participants to do a sentence verification task with audio recordings in two different English accents (Southeastern Britain vs. Glasgow) accompanied by three different levels of background noise. Their results show a significant interaction between accent and noise level, suggesting that each accent-plus-noise combination may have placed a unique demand on the listener. van Wijngaarden et al. (<xref ref-type="bibr" rid="B63">2002</xref>) and Rogers et al. (<xref ref-type="bibr" rid="B47">2006</xref>) report related results. More broadly, Adank (<xref ref-type="bibr" rid="B2">2012</xref>) found that while background noise and a non-native accent both led to increased difficulty for listeners, these two sources of degradation correlated with increased activity in different regions of the cortex, suggesting that listeners apply different strategies for comprehending speech-in-noise and foreign accents (see also Van Engen and Peelle, <xref ref-type="bibr" rid="B62">2014</xref>). The takeaway message from this line of work is that each different degradation combination may have the potential to elicit a distinct pattern of listener behavior.</p>
<p>In addition to these considerations, it is also established that SNR interacts with visual information. For example, the audio-visual benefit derived from observing a speaker&#x00027;s lip and face movements varies according to the degree of intelligibility (Ross et al., <xref ref-type="bibr" rid="B48">2007</xref>) and level of background noise (Sumby and Pollack, <xref ref-type="bibr" rid="B59">1954</xref>). Given this previous work using dynamic information as portrayed in video clips, we might also expect that SNR would interact with static visual images of a speaker. The current study pursued these questions of listener behavior by presenting face-masked and non-face-masked speech at two different SNRs. In Experiment 1, we presented stimuli in noise at &#x02212;6 dB SNR; in Experiment 2, we presented them at &#x02212;3 dB SNR. We manipulated SNR across experiments, rather than within a single experiment, so that the no-image condition of Experiment 1 could stand alone as a replication of our previous study (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>), which was conducted at &#x02212;6 dB SNR. From a simple perspective, one might expect the highest levels of comprehension to occur for non-face-masked speech at the higher, potentially easier SNR, and the lowest levels of comprehension for face-masked speech at the lower, potentially more difficult SNR. One might also expect that any advantages conferred by the presence of a visual image would decrease at the easier SNR. However, given the results discussed above, as well as recent findings on speech-style interactions (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>), more complex results are anticipated. These findings will speak to theories of speech production and perception with the overarching goal to elucidate the impact of face masks on comprehension during everyday communication.</p></sec></sec>
<sec>
<title>Current Study and Predictions</title>
<p>Two online experiments reported here investigate intelligibility of American English target words in sentences produced with or without a fabric face mask, across two speaking styles (casual and clear), accompanied by either no image or an image of the speaker (presented as a line drawing). Thus, each experiment crossed three factors, with two levels each: 2 face-mask conditions <sup>&#x0002A;</sup> 2 speaking styles <sup>&#x0002A;</sup> 2 image conditions. Sentences were presented in multi-talker babble, at &#x02212;6 dB SNR (noisier) in Experiment 1 and &#x02212;3 dB SNR (less noisy) in Experiment 2.</p>
<p>In both experiments, an effect of speech style is predicted, such that sentences produced in clear speech will exhibit higher target-word accuracy rates than those produced in casual speech, in line with prior work (Smiljani&#x00107; and Bradlow, <xref ref-type="bibr" rid="B56">2009</xref>). Crucially, speech style is also predicted to interact with face-mask conditions. In Experiment 1 at &#x02212;6 dB SNR, identical to the SNR used in the authors&#x00027; previous work (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>), a replication of the prior finding is expected: that is, face-masked speech should be <italic>more</italic> intelligible than non-face-masked speech in the clear style, with no such effect in the casual style. This pattern would support a <italic>targeted adaptation account</italic> of production (Lindblom, <xref ref-type="bibr" rid="B32">1990</xref>). According to this account, speakers balance production-oriented and listener-oriented factors in order to tune the speech signal to the communication needs of a particular situation. Our previous and currently expected findings support this idea because they suggest that, while speakers do tune their speech for the specific situation of trying to speak clearly while wearing a face mask, they do not make changes in the absence of a defined communicative goal, even when wearing a face mask. In Experiment 2, at &#x02212;3 dB SNR, an interaction between style and face-masking is also predicted. However, in accordance with <italic>cognitive accounts</italic> (Heald and Nusbaum, <xref ref-type="bibr" rid="B26">2014</xref>), the reduced demands on the listener might allow participants to behave differently toward the speech signal, resulting in a different interaction with speech style than in Experiment 1. For example, given the reduced importance of clear speech in quieter conditions, it is possible that the advantage for clear face-masked speech may be reduced or disappear entirely in Experiment 2.</p>
<p>Also in both experiments, an effect of image is predicted. As proposed by an <italic>alignment account</italic>, overall greater intelligibility for face-masked speech is predicted when the participants also see an image of a masked speaker, because listeners receive visual information about the speaker which is consistent (or &#x0201C;matched&#x0201D;) with the signal. Alternatively, the <italic>bias account</italic> would predict overall lower intelligibility when participants see the face-masked image, because listeners may hold a bias against face-masked speakers that they are more difficult to understand.</p></sec></sec>
<sec id="s2">
<title>Experiment 1: &#x02212;6 dB SNR</title>
<p>Experiment 1, conducted online, tests the intelligibility of spoken sentences in a 2 (face-mask vs. no-face-mask) <sup>&#x0002A;</sup> 2 (clear vs. casual speech) <sup>&#x0002A;</sup> 2 (no image vs. image) design. Sentences were presented in multi-talker babble at &#x02212;6 dB SNR.</p>
<sec>
<title>Methods</title>
<sec>
<title>Participants</title>
<p>Listener participants (<italic>n</italic> = 112) were native English speakers from the United States and undergraduates from University of California, Davis, recruited from the Psychology subjects pool (mean age = 19.45 years, sd = 1.46 years; 86 female, 23 male, 3 non-binary). All participants reported no hearing difficulty.</p></sec>
<sec>
<title>Auditory Stimuli</title>
<p>A set of 154<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> low-predictability sentences from the Speech-Perception-in-Noise (SPIN) corpus was selected (Kalikow et al., <xref ref-type="bibr" rid="B29">1977</xref>). The full set of the sentences were produced by both a female and male speaker using a head-mounted microphone (Shure WH20XLR)<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>, audio mixer (Steinberg UR12), and face masks made of fabric. Speakers produced the same set of sentences (in the same order), first face-masked and then non-face-masked across three styles: in clear and casual speech styles, as well as a third style, positive-emotional, which is not analyzed here in order to constrain the scope of the present work. Each speaker produced the sentences for a real interlocutor (the other speaker), who wrote down the final word of each sentence as it was produced, in light of prior work showing that speakers naturally produce more intelligible speech in the presence of a real interlocutor, vs. an imagined one (Scarborough and Zellou, <xref ref-type="bibr" rid="B54">2013</xref>). Speakers were given explicit instructions about how to produce each style. For clear speech, the instructions were: &#x0201C;In this condition, speak clearly to someone who may have trouble understanding you.&#x0201D; For casual speech, the instructions were: &#x0201C;In this condition, say the sentences in a natural, casual manner.&#x0201D; The recordings used in the current study are identical to those used in the authors&#x00027; previous investigation of face-masked speech (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>).</p>
<p>Because each style and masking condition was recorded in one long sound file, we force-aligned the productions with the Montreal Forced Aligner (MFA) (McAuliffe et al., <xref ref-type="bibr" rid="B37">2017</xref>) to determine consistent boundaries to segment each sentence. <xref ref-type="fig" rid="F1">Figure 1</xref> plots the long-term average spectra (LTAS) of the 154 recorded sentences across the four production conditions (2 face-masking conditions <sup>&#x0002A;</sup> 2 speech styles), calculated (Quen&#x000E9; and van Delft, <xref ref-type="bibr" rid="B44">2010</xref>) and plotted with Praat (Boersma and Weenink, <xref ref-type="bibr" rid="B7">2021</xref>) (relative to 2e<sup>&#x02212;05</sup> Pascal, the default in Praat<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>). Note that the LTAS was calculated for unmodified sentences (i.e., not intensity normalized). As seen, both clear speech conditions exhibit greater intensity than casual conditions, particularly above 2.5 kHz. Furthermore, within both clear and casual styles, the masked condition exhibits slightly higher intensity at some higher frequencies (2.5&#x02013;5 kHz) than the unmasked condition.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>(Color online) Long term average spectra (LTAS) of originally recorded sentences, by face-mask and speaking style condition. The y-axis shows relative sound pressure level (SPL) to 2e<sup>&#x02212;05</sup> Pascal (the default in Praat). Note this is for sentences prior to the other pre-processing steps (e.g., normalizing intensity to an average of 60 dB and mixing with noise).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-874215-g0001.tif"/>
</fig>
<p>After each sentence had been segmented from the recording, we normalized the intensities to an average of 60 dB (relative to 2e<sup>&#x02212;05</sup> Pascal) in Praat. Multi-talker babble (MTB) was created using American English voices generated from Amazon Polly (Joanna, Salli, Joey, Matthew) producing the &#x0201C;Rainbow Passage&#x0201D; (Fairbanks, <xref ref-type="bibr" rid="B16">1960</xref>) [normalized intensity to an average 60 dB (relative to 2e<sup>&#x02212;05</sup> Pascal) and resampled to 44.1 kHz in Praat]. For each stimulus sentence, a 5-s sample from each Polly voice was randomly selected and mixed into a mono channel. Each sentence was mixed with the unique 4-talker babble recording at &#x02212;6 dB SNR; the sentence started 500 ms after MTB onset and ended 500 ms before MTB offset. The intensity of each sentence-plus-MTB stimulus was then normalized to 60 dB (relative to 2e<sup>&#x02212;05</sup> Pascal) in Praat. Additionally, two sound calibration sentences (&#x0201C;Bill heard we asked about the host&#x0201D;, &#x0201C;I&#x00027;m talking about the bench&#x0201D;) produced by the two speakers but not included in the SPIN trials, were also normalized in intensity to 60 dB. Normalizing the intensity of all sound files ensured that they would be at a consistent volume throughout the experiment, although it does not reflect the actual SPL (which would vary based on each participants&#x00027; playback hardware).</p></sec>
<sec>
<title>Picture Stimuli</title>
<p>An open-source line drawing formed the basis of the speaker images (<xref ref-type="fig" rid="F2">Figure 2</xref>). In selecting the drawing, the goal was to choose a relatively abstract image, devoid of many specific cues to speaker identity, that could realistically accompany either a male or a female voice. To create the face-masked version of the speaker, an adapted image of a fabric face-mask was pasted onto the drawing.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Face-masked and non-face-masked images used as visual information about the speaker.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-874215-g0002.tif"/>
</fig></sec>
<sec>
<title>Procedure</title>
<p>Participants completed the experiment online via Qualtrics. In order to ensure that participants could hear the stimuli properly, the study began with two sound calibration questions. They heard two sentences presented (&#x0201C;Bill heard we asked about the host&#x0201D;, &#x0201C;I&#x00027;m talking about the bench&#x0201D;) and were asked to select the correct sentence from a set of options containing phonological competitors of the final word (e.g., &#x0201C;Bill heard we asked about the coast&#x0201D;, &#x0201C;Bill heard we asked about the toast&#x0201D;). If they did not select the correct sentence, they were asked to complete the sound calibration again. Once participants passed the calibration procedure, they instructed not to change the volume until the experiment ended<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref>.</p>
<p>Next, participants were familiarized with the stimuli and the experimental task. A series of instructions introduced them to the noisy background of other talkers, the two target talkers, and the task of typing the final word of each sentence. In situations where the participants were unsure about the final word, they were encouraged to guess.</p>
<p>Two pseudorandomized lists of the SPIN sentences were generated. The first half of the list was randomly presented in either the No-Image block (no picture, 52 trials), or in the Image block (with a picture, 52 trials). In the Image block, listeners were presented with an image of a face (<xref ref-type="fig" rid="F2">Figure 2</xref>) that was always congruent with the actual face-masking condition of the recording (i.e., a face-masked picture for face-masked recordings, and a non-face-masked picture for non-face-masked recordings). The second half of the list was randomly presented in the other block. Ordering of blocks (No-Image, Image) were counterbalanced across participants, and list correspondence to the block was counterbalanced across subjects. All subjects heard each sentence once (balanced across speaker, condition, and speaking style). Note that participants were also exposed to a positive-emotional speaking style, not analyzed here.</p>
<p>Thus, for this experiment, each participant heard 104 sentences with MTB at &#x02212;6 dB SNR. For each trial, participants typed the final word of the sentence.</p></sec></sec>
<sec>
<title>Analysis</title>
<p>Participants&#x00027; typed responses for the target words were converted to lowercase and stripped of punctuation and extra spacing, using regex in R (version 4.1.2). Accuracy in target word identification was scored as binomial data (1 = correct, 0 = incorrect), and modeled with a mixed effects logistic regression using the <italic>lme4</italic> R package (Bates et al., <xref ref-type="bibr" rid="B6">2015</xref>). Fixed effects included Face-Masking Condition (face-masked, non-face-masked), Speech Style (clear, casual), Visual Information (no image, image) and all possible interactions. Random effects included by-Participant and by-Speaker random intercepts, as well as by-Participant random slopes for Visual Information, and by-Participant and by-Speaker random slopes for Speaking Style and Face-Masking Condition<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref>. Models including by-Listener and/or by-Speaker random slopes for Speaking Style and/or Face-Masking Condition resulted in singularity errors, thus they were dropped from the final model. The retained model lmer syntax is: Accuracy &#x0007E; Face-Masking Condition<sup>&#x0002A;</sup>Visual Information<sup>&#x0002A;</sup>Speaking Style &#x0002B; (1&#x0002B; Visual Information | Listener) &#x0002B; (1 | Speaker).</p></sec>
<sec>
<title>Results</title>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> displays word identification accuracy across conditions, and <xref ref-type="table" rid="T1">Table 1</xref> provides the output of the statistical model. The model showed an effect of Face-Masking Condition wherein listeners were more accurate for face-masked speech. There was also an effect of Speaking Style, such that listeners were more accurate at identifying target words for clear speech than for casual speech. Face-Masking Condition also interacted with Visual Information: face-masked speech was <italic>more</italic> intelligible when presented with an image. Face-Masking Condition also interacted with Speaking Style, revealing higher accuracy for face-masked clear speech than the other conditions. No other interactions were observed.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>(Color online) Target word identification accuracy for Experiment 1, &#x02212;6 dB SNR. The bars show the mean for each speech style, face-masking, and image condition. The error bars indicate standard errors of the mean. Individual points show mean accuracy for each participant across conditions.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-874215-g0003.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Summary statistics for the linear mixed effects model for Experiment 1, &#x02212;6 dB SNR.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Coef</bold></th>
<th valign="top" align="center"><bold>SE</bold></th>
<th valign="top" align="center"><bold>z</bold></th>
<th valign="top" align="center"><bold><italic>p</italic></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">(Intercept)</td>
<td valign="top" align="center">&#x02212;0.71</td>
<td valign="top" align="center">0.33</td>
<td valign="top" align="center">&#x02212;2.14</td>
<td valign="top" align="center">0.03</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked)</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">3.71</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image)</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.63</td>
<td valign="top" align="center">0.53</td>
</tr>
<tr>
<td valign="top" align="left">Speaking style (clear)</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">13.69</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image)</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">2.47</td>
<td valign="top" align="center">0.01</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">3.86</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;3.2e-03</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">&#x02212;0.15</td>
<td valign="top" align="center">0.88</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;0.01</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">&#x02212;0.54</td>
<td valign="top" align="center">0.59</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Num. observations = 11,455, Num. listeners = 112, Num. speakers = 2</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Discussion of Experiment 1</title>
<p>The results of Experiment 1 show that intelligibility is higher for face-masked speech than for non-face-masked speech. On the face of it, this result would seem unexpected, given that face masks act as a physical barrier which reduces speech transmission from the mouth (Palmiero et al., <xref ref-type="bibr" rid="B41">2016</xref>). However, this result is less surprising in light of findings showing that Lombard adjustments result in more intelligible speech in noisy conditions (Junqua, <xref ref-type="bibr" rid="B28">1993</xref>; Lu and Cooke, <xref ref-type="bibr" rid="B35">2008</xref>), which suggests that the speakers who recorded the stimulus sentences made adjustments to overcome the face-mask barrier, and that these adjustments were advantageous for listeners with competing background noise.</p>
<p>The results of Experiment 1 also indicate that intelligibility is higher for clear speech than for casual speech. This finding was expected, given the clear speech intelligibility benefit (Smiljani&#x00107; and Bradlow, <xref ref-type="bibr" rid="B56">2009</xref>). Furthermore, intelligibility was higher for face-masked clear speech than for the other conditions. This finding replicates the results of previous work that presented identical stimuli at the same noise level, namely &#x02212;6 dB SNR (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>). This pattern of results supports a <italic>targeted adaptation account</italic> of speech production (e.g., Lindblom, <xref ref-type="bibr" rid="B32">1990</xref>), and suggests speakers actively tailor their productions in response to the communicative situation (here, the need to overcome the barrier of the mask while also following the instructions to speak clearly).</p>
<p>Finally, Experiment 1 shows that intelligibility is higher for face-masked speech in the visual information condition, compared to other conditions. Thus, participants were more accurate when they knew that the speaker was wearing a face mask. This finding provides support for <italic>alignment accounts</italic> (e.g., McGowan, <xref ref-type="bibr" rid="B38">2015</xref>), which claim that listeners benefit from information about speakers, as long as it is consistent with information in the speech signal. Such a finding is difficult to reconcile with <italic>bias accounts</italic> (e.g., Rubin, <xref ref-type="bibr" rid="B49">1992</xref>), which claim that intelligibility decreases when listeners are biased against a speaker (e.g., &#x0201C;people with face masks are hard to understand&#x0201D;).</p>
<p>As discussed above, listening is a complex behavior that is actively shaped by the communicative context, and previous work has provided support for this idea by showing that listeners respond to face-masked speech differently at different SNRs (Toscano and Toscano, <xref ref-type="bibr" rid="B60">2021</xref>). Therefore, Experiment 2 tested the factors of speech style, face-masking, and visual information at a higher, less noisy SNR, &#x02212;3 dB.</p></sec></sec>
<sec id="s3">
<title>Experiment 2: &#x02212;3 dB SNR</title>
<p>The design of Experiment 2, also conducted online, was identical to that of Experiment 1. The only difference was that MTB was mixed with the target sentences at &#x02212;3 dB SNR.</p>
<sec>
<title>Methods</title>
<sec>
<title>Participants</title>
<p>One hundred sixteen native English speakers from the United States participated in Experiment 2 (mean age = 19.86 years, sd = 1.94 years; 86 female, 28 male, 2 non-binary). They were recruited through the University of California, Davis Psychology subjects pool. All participants reported no hearing difficulty. None of the participants for Experiment 2 had previously participated in Experiment 1.</p></sec>
<sec>
<title>Stimuli</title>
<p>Stimuli consisted of the same 154 SPIN recorded sentences in the face-masked and speech style conditions used in Experiment 1. Randomly selected clips of Amazon Polly talkers were generated to create a novel production of 4-talker babble for each sentence (full method described in <bold>Section Auditory Stimuli</bold>). The SPIN sentences were mixed with 4-talker babble at &#x02212;3 dB SNR and normalized in intensity to 60 dB (relative to 2e<sup>&#x02212;05</sup> Pascal).</p></sec>
<sec>
<title>Procedure</title>
<p>The procedure was identical to that in Experiment 1.</p></sec></sec>
<sec>
<title>Analysis</title>
<p>Accuracy was scored with the same methods as in Experiment 1. A model including by-Listener and/or by-Speaker random slopes for Face-Masking Condition and/or Speaking Style resulted in singularity errors. The retained model lmer syntax is: Face-Masking Condition<sup>&#x0002A;</sup>Visual Information<sup>&#x0002A;</sup>Speaking Style &#x0002B; (1&#x0002B; Visual Information| Listener) &#x0002B; (1 | Speaker).</p></sec>
<sec>
<title>Results</title>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> displays word identification accuracy across conditions, and <xref ref-type="table" rid="T2">Table 2</xref> provides the output of the statistical model. The model revealed an effect of Face-Masking Condition wherein listeners were more accurate for face-masked speech than non-face-masked speech. Additionally, there was an effect of Speaking Style, indicating that listeners were better at identifying words produced in clear speech than causal speech. There were also several interactions. First, Face-Masking Condition interacted with Visual Information, such that face-masked speech was less intelligible in the image condition than in the no-image condition. Additionally, there was an interaction between Face-Masking Condition and Speech Style, where there was less of an increase for clear face-masked speech than for <italic>casual</italic> face-masked speech (recall that the factors were sum coded), seen in <xref ref-type="fig" rid="F4">Figure 4</xref>. Finally, Visual Information and Speaking Style interacted, where accuracy was lower in the image condition for clear speech.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>(Color online) Target word identification accuracy for Experiment 2, &#x02212;3 dB SNR. The bars show the mean for each speech style, face-masking, and image condition. The error bars indicate standard errors of the mean. Individual points show mean accuracy for each participant across conditions.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-874215-g0004.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Summary statistics for the linear mixed effects model of Experiment 2, &#x02212;3 dB SNR.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Coef</bold></th>
<th valign="top" align="center"><bold>SE</bold></th>
<th valign="top" align="center"><bold>df</bold></th>
<th valign="top" align="center"><bold>z</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">(Intercept)</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.09</td>
<td valign="top" align="center">0.93</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked)</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">3.66</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image)</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.83</td>
</tr>
<tr>
<td valign="top" align="left">Speaking style (clear)</td>
<td valign="top" align="center">0.19</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">9.4</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image)</td>
<td valign="top" align="center">&#x02212;0.05</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">&#x02212;2.6</td>
<td valign="top" align="center">&#x0003C;0.01</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;0.06</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">&#x02212;2.94</td>
<td valign="top" align="center">&#x0003C;0.01</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;0.05</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">&#x02212;2.61</td>
<td valign="top" align="center">&#x0003C;0.01</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;0.01</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">&#x02212;0.62</td>
<td valign="top" align="center">0.53</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Num. observations = 11,868, Num. listeners = 116, Num. speakers = 2</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Discussion of Experiment 2</title>
<p>Some of the basic findings of Experiment 2 were similar to those of Experiment 1: intelligibility was higher for face-masked speech compared to non-face-masked speech, as part of an automatic, highly generalized response to a barrier, as proposed by Junqua (<xref ref-type="bibr" rid="B28">1993</xref>). Additionally, we see the clear speech intelligibility effect (Smiljani&#x00107; and Bradlow, <xref ref-type="bibr" rid="B56">2009</xref>), with higher accuracy for clear speech compared to casual speech.</p>
<p>Other results from Experiment 2, however, differ from those of Experiment 1. As revealed by a comparison of <xref ref-type="fig" rid="F3">Figures 3</xref>, <xref ref-type="fig" rid="F4">4</xref>, overall accuracy was higher in Experiment 2 (noisy at &#x02212;3 dB SNR), compared to Experiment 1 (noiser at &#x02212;6 dB SNR). This is an expected outcome which is consistent with previous work examining SNRs (e.g., Pichora-Fuller et al., <xref ref-type="bibr" rid="B42">1995</xref>; Fallon et al., <xref ref-type="bibr" rid="B17">2000</xref>).</p>
<p>In addition to this across-the-board change, the results of Experiment 2 also differ in their patterning. Whereas, in Experiment 1, face-masked clear speech was more intelligible than other conditions, this effect is less apparent in Experiment 2. One finding is that we see a more consistent increase in intelligibility in &#x02212;3 dB SNR for the face-masked <italic>casual</italic> conditions. It is not immediately apparent why this should be the case. One speculation is that different levels of background noise set up different expectations for listeners. With more background noise, listeners might come to expect a clearer style, because they are aware that the speaker must make adjustments in order to be understood. With less background noise, listeners might come to expect a less clear, potentially more casual style, because they are aware that the conditions are easier for the speaker. Another finding from Experiment 2 concerns the role of visual information, where having additional visual cues that the speakers were masked actually reduced intelligibility, in line with <italic>bias accounts</italic> (e.g., Rubin, <xref ref-type="bibr" rid="B49">1992</xref>). Overall, Experiment 2 findings suggest that reliance on visual information about speakers decreases in less noisy listening conditions, with weaker intelligibility benefits for both face-masking and clear speech.</p></sec></sec>
<sec id="s4">
<title><italic>Post-hoc</italic> Analysis</title>
<p>To directly compare across the two SNRs, we fit a combined model to the accuracy data for both Experiment 1 and 2 data. The model structure was: Accuracy &#x0007E; Face-Masking Condition<sup>&#x0002A;</sup>Visual Information<sup>&#x0002A;</sup>Speaking Style<sup>&#x0002A;</sup>SNR &#x0002B; (1&#x0002B; Visual Information &#x0002B; Speaking Style | Listener) &#x0002B; (1 | Speaker) (note that a model including by-Listener random slopes for Face-Masking Condition resulted in a singularity error).</p>
<p>A combined plot, showing accuracy across both SNRs, is shown in <xref ref-type="fig" rid="F5">Figure 5</xref> and the output of the statistical model is provided in <xref ref-type="table" rid="T3">Table 3</xref>. Results confirmed some of the general findings observed: higher accuracy for clear speech, as well as face-masked speech. Furthermore, as expected, we observe a sizable decrease in intelligibility at the lower SNR, &#x02212;6 dB. There was also an interaction between Speaking Style and SNR, wherein there was a larger increase for clear speech in the more difficult SNR (-6 dB). This was further mediated by a 3-way interaction with Face-Masking Condition: accuracy was even higher for face-masked clear speech in the &#x02212;6 dB SNR. Finally, we observed a 3-way interaction between Face-Masking Condition, Visual Information, and SNR, showing higher accuracy with visual information for face-masked speech in the more difficult SNR. No other effects or interactions were observed.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>(Color online) Target word identification accuracy for both experiments. The x-axis shows signal-to-noise SNR (&#x02212;6 dB, &#x02212;3 dB). The points show the grand mean for each SNR, with the lines indicating differences for SNR for face-masking condition (solid orange line = face-masked; dotted blue line = non-face-masked). Image and speech style conditions are faceted. The error bars indicate standard errors of the mean.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-874215-g0005.tif"/>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Summary statistics for the linear mixed effects model of the combined &#x02212;3 and &#x02212;6 dB SNR.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Coef</bold></th>
<th valign="top" align="center"><bold>SE</bold></th>
<th valign="top" align="center"><bold>df</bold></th>
<th valign="top" align="center"><bold>z</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">(Intercept)</td>
<td valign="top" align="center">&#x02212;0.34</td>
<td valign="top" align="center">0.31</td>
<td valign="top" align="center">&#x02212;1.1</td>
<td valign="top" align="center">0.27</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked)</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">5.21</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image)</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.58</td>
<td valign="top" align="center">0.56</td>
</tr>
<tr>
<td valign="top" align="left">Speaking style (clear)</td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">16.38</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">&#x02212;0.37</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">&#x02212;10.64</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image)</td>
<td valign="top" align="center">&#x02212;3.7e-04</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">&#x02212;0.03</td>
<td valign="top" align="center">0.98</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.73</td>
<td valign="top" align="center">0.46</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;0.03</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">&#x02212;1.91</td>
<td valign="top" align="center">0.06</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">1.6e-03</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.11</td>
<td valign="top" align="center">0.91</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">2.3e-03</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.09</td>
<td valign="top" align="center">0.93</td>
</tr>
<tr>
<td valign="top" align="left">Speaking style (clear) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">3.32</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image) &#x0002A; Speaking style (clear)</td>
<td valign="top" align="center">&#x02212;0.01</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">&#x02212;0.81</td>
<td valign="top" align="center">0.42</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">3.59</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Speaking style (clear) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">4.75</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Visual information (image) &#x0002A; Speaking style (clear) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">1.74</td>
<td valign="top" align="center">0.08</td>
</tr>
<tr>
<td valign="top" align="left">Face-masking condition (face-masked) &#x0002A; Visual information (image) &#x0002A; Speaking style (clear) &#x0002A; SNR (&#x02212;6 dB)</td>
<td valign="top" align="center">4.4e-04</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.98</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Num. observations = 23,323, Num. listeners = 228, Num. speakers = 2</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s5">
<title>General Discussion</title>
<p>The current study investigated the interaction of speaker- and listener-related factors in the comprehension of face-masked speech. The general findings, observed across the two experiments, are that intelligibility is higher when speakers wear a face mask and also when speakers use a clear speaking style. Furthermore, intelligibility can be boosted when listeners know that the speaker is wearing a face mask. Together, these observations reveal that speakers and listeners are remarkably flexible in the way they adjust their planning and comprehension processes to fit the real-time communicative context. Notably, these general findings manifested themselves in different patterns depending upon the extent of signal degradation, yielding distinct sets of interactions in a noisier situation (Experiment 1) compared to a less noisy situation (Experiment 2).</p>
<p>In both experiments, participants exhibited better overall performance when listening to face-masked speech. On the face of it, this is a surprising finding, particularly since face masks have been shown to reduce speech signal transmission from a (simulated) mouth by 3&#x02013;4% (Palmiero et al., <xref ref-type="bibr" rid="B41">2016</xref>). And yet, it is well-established that speakers make adjustments to overcome communication barriers. For example, the Lombard effect demonstrates that speakers change their productions in the presence of background noise (Lombard, <xref ref-type="bibr" rid="B34">1911</xref>; Brumm and Zollinger, <xref ref-type="bibr" rid="B11">2011</xref>) and Lombard-speech is more intelligible to listeners when it is mixed with noise (e.g., Lu and Cooke, <xref ref-type="bibr" rid="B35">2008</xref>). The current results suggest that speakers also adjust their productions in the presence of a different type of barrier, namely a face mask. Furthermore, the fact that the advantage for face-masked speech occurs in both clear and casual speech styles suggests that these adjustments occur regardless of the speaking goal and are therefore, to a certain extent, automatic (Junqua, <xref ref-type="bibr" rid="B28">1993</xref>). This finding is consistent with previous studies on how speakers behave in the presence of noise (Pick et al., <xref ref-type="bibr" rid="B43">1989</xref>) as well as more recent studies of face-masked productions (Asadi et al., <xref ref-type="bibr" rid="B4">2020</xref>).</p>
<p>In both experiments, participants also exhibited better overall performance when listening to clear speech. This is not a surprising finding, since dozens of studies have reported clear-speech advantages across a wide range of experimental conditions (Smiljani&#x00107; and Bradlow, <xref ref-type="bibr" rid="B56">2009</xref>). Also as expected, overall accuracy was lower in Experiment 1 (noiser at &#x02212;6 dB SNR) than Experiment 2 (less noisy at &#x02212;3 dB SNR), consistent with previous work examining SNRs (e.g., Pichora-Fuller et al., <xref ref-type="bibr" rid="B42">1995</xref>; Fallon et al., <xref ref-type="bibr" rid="B17">2000</xref>). However, looking at the interactions, it is also apparent that the intelligibility benefit of clear speech depends upon the listening context. To begin with, the differences between clear and casual speech styles are more apparent in a noisier situation at &#x02212;6 dB SNR (Experiment 1) compared to a less noisy situation at &#x02212;3 dB SNR (Experiment 2). The clear speech advantage seems to be stronger depending on the difficulty of the listening condition.</p>
<p>Furthermore, clear and casual speech styles also interact differently with face-masking conditions. This is most apparent in Experiment 1, where the advantage for face-masked speech is strongest in the clear style, but not in the casual style, a pattern which replicates previous results at &#x02212;6 dB SNR (Cohn et al., <xref ref-type="bibr" rid="B12">2021</xref>). This pattern also suggests that, in response to a given situation, speakers may actually combine automatic adaptations with targeted adaptations. For example, a face mask is a barrier that is present regardless of how the speaker wishes to talk, and regardless of whether the speaker really wishes to be understood. A barrier that exhibits such across-the-board effects may conceivably give rise to automatic adaptations on the part of the speaker, which are not tailored to any particular communicative need, but simply serve to help overcome the barrier. The requirement to be clear, on the other hand, is a specific goal. To accomplish it, the speaker must take into account many situation-specific factors, including not just the speaker&#x00027;s own status (e.g., face-masked or not), but also the status of the listener and the surrounding environment. Such considerations conceivably give rise to targeted adaptations, tailored to the specific needs of the communicative situation. In a face-masked, clear-speech situation then, automatic and targeted adaptations may both be present. If this is the case, it may offer one explanation for why these effects have been so difficult to disentangle in previous work (e.g., Garnier et al., <xref ref-type="bibr" rid="B19">2018</xref>). Indeed, this interpretation is supported by the LTAS of the sentences: face-masked clear speech shows increased amplitude of some of the higher frequencies, suggesting that some targeting was at play. At the same time, the LTAS shows that face-masked casual speech is boosted relative to non-face-masked casual speech. These boosts appear to occur in the frequency range that tends to be attenuated by the presence of face-masks (above 1 kHz in Corey et al., <xref ref-type="bibr" rid="B13">2020</xref>), suggesting that speakers actively compensate for the barrier.</p>
<p>From the perspective of the listener, the fact that clear and casual speech styles interact differently with face-masking conditions across Experiments 1 and 2 is consistent with the general notion, outlined in the Introduction, that each combination of signal degradation has the potential to elicit a distinct pattern of behavior (Adank, <xref ref-type="bibr" rid="B2">2012</xref>). For example, the findings of Smiljani&#x00107; et al. (<xref ref-type="bibr" rid="B57">2021</xref>) suggested that face-masking conditions do not necessarily exhibit effects independently of noise conditions. Rather, particular <italic>combinations</italic> of these conditions gave rise to unique patterns of listener behavior. The current study supports this scenario, and, furthermore, shows that it also holds true when we combine different sources of degradation with different speech styles.</p>
<p>The current results also show that intelligibility can be boosted when listeners know that the speaker is wearing a face mask. Specifically, in the noisier situation of &#x02212;6 dB SNR (Experiment 1), face-masked speech was more intelligible with the visual presentation of a face-masked image. Regardless of the theoretical framework that we adopt, this finding suggests that listeners possess some knowledge about what face masks do to the speech signal, and apply their knowledge (&#x0201C;the speaker is wearing a mask&#x0201D;) in their interpretation of the signal. Given the timing of our study and the people who participated in it, this is not surprising. We recruited participants in Fall of 2021, well over a year into the COVID-19 pandemic. Our participants resided in California, a state with some of the strictest masking mandates in the United States. By the time that they listened to the stimuli in the current study, then, they had presumably been listening to masked speech for over a year and a half, and had familiarity with it.</p>
<p>Given this, the findings of Experiment 1 would be difficult to interpret within a bias account, in which listeners&#x00027; knowledge about face-masked speech gets incorporated into a bias (e.g., &#x0201C;it is too hard to understand&#x0201D;). Under this scenario, knowledge that the speaker is wearing a face mask should lead to lower, not higher, intelligibility. Instead, our result provides support for <italic>alignment accounts</italic> (McGowan, <xref ref-type="bibr" rid="B38">2015</xref>), which predict that comprehension should be easier whenever the characteristics of the speech signal align with social expectations about the speaker. Here, speech signals produced with a face mask aligned with participants&#x00027; expectations, built up over at least 18 months of listening, about a speaker wearing a face mask.</p>
<p>In contrast, the intelligibility boost did not occur in the less noisy situation of &#x02212;3 dB SNR (Experiment 2), where the visual image condition exhibited reduced accuracy, compared to the no-image condition. Here, one possibility is that in a relatively less noisy listening task (i.e., &#x02212;3 dB, relative to &#x02212;6 dB) in which listeners do not need to exert as much effort, bias effects could emerge. This possibility is consistent with prior work reporting a bias effect in the absence of background noise (Rubin, <xref ref-type="bibr" rid="B49">1992</xref>) but similar intelligibility in more difficult conditions (e.g., &#x02212;4 dB SNR in McLaughlin et al., <xref ref-type="bibr" rid="B39">2022</xref>). Yet, other work has shown bias effects to persist at more challenging listening conditions (e.g., &#x02212;10 dB SNR in Fiedler et al., <xref ref-type="bibr" rid="B18">2019</xref>; &#x02212;4 dB SNR in Yi et al., <xref ref-type="bibr" rid="B65">2013</xref>), suggesting that other factors are also at play (e.g., speaking style). Future work using within-subject comparisons, particularly varying listening difficulty (e.g., <italic>via</italic> SNR levels, types of noise), can further test the reliability of bias effects.</p>
<p>As noted in the Introduction, the visual information in the current study differed from most images used in the previous literature, which tend to highlight &#x0201C;phenotypical&#x0201D; characteristics of a speaker, such as ethnicity or region-of-origin. The current images differed only in the presence vs. absence of a face mask, thereby depicting different transient states of the same speaker, more similar to &#x0201C;personae&#x0201D; (D&#x00027;Onofrio, <xref ref-type="bibr" rid="B14">2019</xref>). The current results are therefore consistent with an emerging body of work which shows that transient, non-phenotypical information about a speaker also affects the process of speech comprehension (D&#x00027;Onofrio, <xref ref-type="bibr" rid="B14">2019</xref>). Note that in D&#x00027;Onofrio (<xref ref-type="bibr" rid="B14">2019</xref>)&#x00027;s work, images of the same individual differed in hair style, facial expression, and clothing, all of which can be chosen by a person to convey social meaning. For settings in which face masks are optional (e.g., at an outdoor concert), the decision to wear a face mask might convey social meaning in a similar manner. However, for settings in which they may be required by government or organizational mandates, (e.g., at a doctor&#x00027;s office, or in a school classroom during a pandemic), the social meaning of a face mask may be largely diminished or absent. The differences between these two kinds of transient characteristics are ripe areas for further investigation in future work.</p>
<p>In the current study, when visual information occurred, it was always congruent with the speech signal. That is, the image of a non-face-masked speaker always accompanied non-face-masked speech, and the image of a face-masked speaker always accompanied face-masked speech. This approach, which has been employed in previous studies (Gnevsheva, <xref ref-type="bibr" rid="B21">2018</xref>), has the advantage of ecological validity, because participants are only exposed to scenarios that are possible in everyday life. Future work examining mismatched guise (e.g., face-masked speech with unmasked image) can further test the role of bias and alignment effects. The current study also used static, black and white line drawings to provide information about the speaker. Future work with photographic images or videos could further explore the role of visual cues to support intelligibility.</p>
<p>An additional limitation of the current study is that participants were all adults. Recent work (Schwarz et al., <xref ref-type="bibr" rid="B55">2021</xref>) has shown that children also exhibit differences in the way they perceive face-masked speech&#x02014;and might also make different clear speech adaptations to overcome the mask. Furthermore, the study included only one type of face mask, a fabric face-mask. Other types of masks are commonplace in medical environments (e.g., surgical masks) and they have shown to differentially affect speech-in-noise perception (Bottalico et al., <xref ref-type="bibr" rid="B8">2020</xref>; Toscano and Toscano, <xref ref-type="bibr" rid="B60">2021</xref>). Investigating the role of visual information about different types of masks is an avenue for future work.</p>
<p>While output from the mouth is the most important source of acoustic information for speech intelligibility, there is work showing that sound additionally radiates from other parts of a speaker&#x00027;s anatomy that would not be obstructed by a face mask (e.g., the lower eyelids in Abe, <xref ref-type="bibr" rid="B1">2019</xref>). While we suspect any such effects would be negligible in high levels of background noise, as in the present study (&#x02212;6 dB and &#x02212;3 dB SNR), this raises interesting questions for future work. In particular, the extent to which speakers&#x00027; adjustments specifically increase sound radiation from these uncovered areas could shed light on the dynamic types of adjustments speakers make in the presence of communication barriers.</p>
<p>This research also has practical implications for producing and perceiving speech in a face-masked world. In response to degraded communication situations, face-masked speakers can actively modulate the way they talk, while listeners can adjust their listening strategies. Such findings are relevant for the COVID-19 pandemic, and, in settings such as hospitals and doctors&#x00027; offices, they will remain relevant well into the future.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>The de-identified raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p></sec>
<sec id="s7">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by Institutional Review Board of University of California, Davis. The patients/participants provided their written informed consent to participate in this study.</p></sec>
<sec id="s8">
<title>Author Contributions</title>
<p>AP, MC, and GZ contributed to conception and design of the study and wrote sections of the manuscript. MC programmed the experiment and performed the statistical analysis. AP wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.</p></sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This material is based upon work supported by the National Science Foundation SBE Postdoctoral Research Fellowship to MC under Grant No. 1911855.</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ack><p>The authors thank Melina Sarian for her help with stimulus collection.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Abe</surname> <given-names>O.</given-names></name></person-group> (<year>2019</year>). <source>Sound Radiation of Singing Voices</source> (<publisher-loc>PhD Thesis</publisher-loc>). University of Hamburg.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adank</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>The neural bases of difficult speech comprehension and speech production: two activation likelihood estimation (ALE) meta-analyses</article-title>. <source>Brain Lang.</source> <volume>122</volume>, <fpage>42</fpage>&#x02013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1016/j.bandl.2012.04.014</pub-id><pub-id pub-id-type="pmid">22633697</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adank</surname> <given-names>P.</given-names></name> <name><surname>Evans</surname> <given-names>B. G.</given-names></name> <name><surname>Stuart-Smith</surname> <given-names>J.</given-names></name> <name><surname>Scott</surname> <given-names>S. K.</given-names></name></person-group> (<year>2009</year>). <article-title>Comprehension of familiar and unfamiliar native accents under adverse listening conditions</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform.</source> <volume>35</volume>, <fpage>520</fpage>&#x02013;<lpage>529</lpage>. <pub-id pub-id-type="doi">10.1037/a0013552</pub-id><pub-id pub-id-type="pmid">19331505</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asadi</surname> <given-names>S.</given-names></name> <name><surname>Cappa</surname> <given-names>C. D.</given-names></name> <name><surname>Barreda</surname> <given-names>S.</given-names></name> <name><surname>Wexler</surname> <given-names>A. S.</given-names></name> <name><surname>Bouvier</surname> <given-names>N. M.</given-names></name> <name><surname>Ristenpart</surname> <given-names>W. D.</given-names></name></person-group> (<year>2020</year>). <article-title>Efficacy of masks and face coverings in controlling outward aerosol particle emission from expiratory activities</article-title>. <source>Sci. Rep.</source> <volume>10</volume>, <fpage>15665</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-020-72798-7</pub-id><pub-id pub-id-type="pmid">32973285</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Babel</surname> <given-names>M.</given-names></name> <name><surname>Russell</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Expectations and speech intelligibility</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>137</volume>, <fpage>2823</fpage>&#x02013;<lpage>2833</lpage>. <pub-id pub-id-type="doi">10.1121/1.4919317</pub-id><pub-id pub-id-type="pmid">25994710</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bates</surname> <given-names>D.</given-names></name> <name><surname>M&#x000E4;chler</surname> <given-names>M.</given-names></name> <name><surname>Bolker</surname> <given-names>B.</given-names></name> <name><surname>Walker</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>Fitting linear mixed-effects models using lme4</article-title>. <source>J. Stat. Softw</source>. <volume>67</volume>, <fpage>1</fpage>&#x02013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v067.i01</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Boersma</surname> <given-names>P.</given-names></name> <name><surname>Weenink</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <source>Praat: Doing phonetics by computer (version 6.1.40)</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.fon.hum.uva.nl/praat/">https://www.fon.hum.uva.nl/praat/</ext-link></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bottalico</surname> <given-names>P.</given-names></name> <name><surname>Murgia</surname> <given-names>S.</given-names></name> <name><surname>Puglisi</surname> <given-names>G. E.</given-names></name> <name><surname>Astolfi</surname> <given-names>A.</given-names></name> <name><surname>Kirk</surname> <given-names>K. I.</given-names></name></person-group> (<year>2020</year>). <article-title>Effect of masks on speech intelligibility in auralized classrooms</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>148</volume>, <fpage>2878</fpage>&#x02013;<lpage>2884</lpage>. <pub-id pub-id-type="doi">10.1121/10.0002450</pub-id><pub-id pub-id-type="pmid">33261397</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Broadbent</surname> <given-names>D. E.</given-names></name></person-group> (<year>1958</year>). <source>Perception and Communication</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Pergamon Press</publisher-name>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>V. A.</given-names></name> <name><surname>Van Engen</surname> <given-names>K. J.</given-names></name> <name><surname>Peelle</surname> <given-names>J. E.</given-names></name></person-group> (<year>2021</year>). <article-title>Face mask type affects audiovisual speech intelligibility and subjective listening effort in young and older adults</article-title>. <source>Cogn. Res. Princip. Implicat.</source> <volume>6</volume>, <fpage>49</fpage>. <pub-id pub-id-type="doi">10.1186/s41235-021-00314-0</pub-id><pub-id pub-id-type="pmid">34275022</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brumm</surname> <given-names>H.</given-names></name> <name><surname>Zollinger</surname> <given-names>S. A.</given-names></name></person-group> (<year>2011</year>). <article-title>The evolution of the Lombard effect: 100 years of psychoacoustic research</article-title>. <source>Behaviour</source> <volume>148</volume>, <fpage>1173</fpage>&#x02013;<lpage>1198</lpage>. <pub-id pub-id-type="doi">10.1163/000579511X605759</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohn</surname> <given-names>M.</given-names></name> <name><surname>Pycha</surname> <given-names>A.</given-names></name> <name><surname>Zellou</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>Intelligibility of face-masked speech depends on speaking style: comparing casual, clear, and emotional speech</article-title>. <source>Cognition</source> <volume>210</volume>, <fpage>104570</fpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2020.104570</pub-id><pub-id pub-id-type="pmid">33450446</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Corey</surname> <given-names>R. M.</given-names></name> <name><surname>Jones</surname> <given-names>U.</given-names></name> <name><surname>Singer</surname> <given-names>A. C.</given-names></name></person-group> (<year>2020</year>). <article-title>Acoustic effects of medical, cloth, and transparent face masks on speech signals</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>148</volume>, <fpage>2371</fpage>&#x02013;<lpage>2375</lpage>. <pub-id pub-id-type="doi">10.1121/10.0002279</pub-id><pub-id pub-id-type="pmid">33138498</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x00027;Onofrio</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Complicating categories: personae mediate racialized expectations of non-native speech</article-title>. <source>J. Sociolinguistics</source> <volume>23</volume>, <fpage>346</fpage>&#x02013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1111/josl.12368</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Dupuis</surname> <given-names>K.</given-names></name> <name><surname>Pichora-Fuller</surname> <given-names>K.</given-names></name></person-group> (<year>2008</year>). <article-title>Effects of emotional content and emotional voice on speech intelligibility in younger and older adults</article-title>. <source>Can. Acoustics</source> <volume>36</volume>, <fpage>114</fpage>&#x02013;<lpage>115</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://jcaa.caa-aca.ca/index.php/jcaa/article/view/2064/1811">https://jcaa.caa-aca.ca/index.php/jcaa/article/view/2064/1811</ext-link></citation>
</ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fairbanks</surname> <given-names>G.</given-names></name></person-group> (<year>1960</year>). <article-title>&#x0201C;The rainbow passage,&#x0201D;</article-title> in <source>Voice and Articulation Drillbook</source>. Vol. <volume>2</volume> (<publisher-name>Harper &#x00026; Row New York</publisher-name>), 127p.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fallon</surname> <given-names>M.</given-names></name> <name><surname>Trehub</surname> <given-names>S. E.</given-names></name> <name><surname>Schneider</surname> <given-names>B. A.</given-names></name></person-group> (<year>2000</year>). <article-title>Children&#x00027;s perception of speech in multitalker babble</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>108</volume>, <fpage>3023</fpage>&#x02013;<lpage>3029</lpage>. <pub-id pub-id-type="doi">10.1121/1.1323233</pub-id><pub-id pub-id-type="pmid">11144594</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fiedler</surname> <given-names>S.</given-names></name> <name><surname>Keller</surname> <given-names>C.</given-names></name> <name><surname>Hanul&#x000ED;kov,&#x000E1;</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Social expectations and intelligibility of Arabic-accented speech in noise,&#x0201D;</article-title> in <source>Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia</source>, <fpage>3085</fpage>&#x02013;<lpage>3089</lpage>.</citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garnier</surname> <given-names>M.</given-names></name> <name><surname>M&#x000E9;nard</surname> <given-names>L.</given-names></name> <name><surname>Alexandre</surname> <given-names>B.</given-names></name></person-group> (<year>2018</year>). <article-title>Hyper-articulation in Lombard speech: an active communicative strategy to enhance visible speech cues?</article-title> <source>J. Acoust. Soc. Am.</source> <volume>144</volume>, <fpage>1059</fpage>&#x02013;<lpage>1074</lpage>. <pub-id pub-id-type="doi">10.1121/1.5051321</pub-id><pub-id pub-id-type="pmid">30180713</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giovanelli</surname> <given-names>E.</given-names></name> <name><surname>Valzolgher</surname> <given-names>C.</given-names></name> <name><surname>Gessa</surname> <given-names>E.</given-names></name> <name><surname>Todeschini</surname> <given-names>M.</given-names></name> <name><surname>Pavani</surname> <given-names>F.</given-names></name></person-group> (<year>2021</year>). <article-title>Unmasking the difficulty of listening to talkers with masks: Lessons from the COVID-19 pandemic</article-title>. <source>Iperception</source> <volume>12</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1177/2041669521998393</pub-id><pub-id pub-id-type="pmid">35145616</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gnevsheva</surname> <given-names>K.</given-names></name></person-group> (<year>2018</year>). <article-title>The expectation mismatch effect in accentedness perception of Asian and Caucasian non-native speakers of English</article-title>. <source>Linguistics</source> <volume>56</volume>, <fpage>581</fpage>&#x02013;<lpage>598</lpage>. <pub-id pub-id-type="doi">10.1515/ling-2018-0006</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hampton</surname> <given-names>T.</given-names></name> <name><surname>Crunkhorn</surname> <given-names>R.</given-names></name> <name><surname>Lowe</surname> <given-names>N.</given-names></name> <name><surname>Bhat</surname> <given-names>J.</given-names></name> <name><surname>Hogg</surname> <given-names>E.</given-names></name> <name><surname>Afifi</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>The negative impact of wearing personal protective equipment on communication during coronavirus disease 2019</article-title>. <source>J. Laryngol. Otol.</source> <volume>134</volume>, <fpage>577</fpage>&#x02013;<lpage>581</lpage>. <pub-id pub-id-type="doi">10.1017/S0022215120001437</pub-id><pub-id pub-id-type="pmid">32641175</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanul&#x000ED;kov&#x000E1;</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>The effect of perceived ethnicity on spoken text comprehension under clear and adverse listening conditions</article-title>. <source>Linguistics Vanguard</source> <volume>4</volume>, <fpage>20170029</fpage>. <pub-id pub-id-type="doi">10.1515/lingvan-2017-0029</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hay</surname> <given-names>J.</given-names></name> <name><surname>Nolan</surname> <given-names>A.</given-names></name> <name><surname>Drager</surname> <given-names>K.</given-names></name></person-group> (<year>2006</year>). <article-title>From fush to feesh: exemplar priming in speech perception</article-title>. <source>Linguistic Rev.</source> <volume>23</volume>, <fpage>351</fpage>&#x02013;<lpage>379</lpage>. <pub-id pub-id-type="doi">10.1515/TLR.2006.014</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hazan</surname> <given-names>V.</given-names></name> <name><surname>Uther</surname> <given-names>M.</given-names></name> <name><surname>Grunland</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;How does foreigner-directed speech differ from other forms of listener-directed clear speaking styles?,&#x0201D;</article-title> in <source>Proceedings of ICPhS 2015. 18th International Congress of Phonetic Sciences</source> (<publisher-loc>Glasgow</publisher-loc>: <publisher-name>University of Glasgow</publisher-name>).</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heald</surname> <given-names>S.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Speech perception as an active cognitive process</article-title>. <source>Front. Syst. Neurosci.</source> <volume>8</volume>, <fpage>35</fpage>. <pub-id pub-id-type="doi">10.3389/fnsys.2014.00035</pub-id><pub-id pub-id-type="pmid">24672438</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ingvalson</surname> <given-names>E. M.</given-names></name> <name><surname>Lansford</surname> <given-names>K. L.</given-names></name> <name><surname>Federova</surname> <given-names>V.</given-names></name> <name><surname>Fernandez</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Listeners&#x00027; attitudes toward accented talkers uniquely predicts accented speech perception</article-title>. <source>J. Acoustical Soc. Am.</source> 141, EL234&#x02013;EL238. <pub-id pub-id-type="doi">10.1121/1.4977583</pub-id><pub-id pub-id-type="pmid">28372098</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Junqua</surname> <given-names>J.</given-names></name></person-group> (<year>1993</year>). <article-title>The Lombard reflex and its role on human listeners and automatic speech recognizers</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>93</volume>, <fpage>510</fpage>&#x02013;<lpage>524</lpage>. <pub-id pub-id-type="doi">10.1121/1.405631</pub-id><pub-id pub-id-type="pmid">8423266</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalikow</surname> <given-names>D. N.</given-names></name> <name><surname>Stevens</surname> <given-names>K. N.</given-names></name> <name><surname>Elliott</surname> <given-names>L. L.</given-names></name></person-group> (<year>1977</year>). <article-title>Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>61</volume>, <fpage>1337</fpage>&#x02013;<lpage>1351</lpage>. <pub-id pub-id-type="doi">10.1121/1.381436</pub-id><pub-id pub-id-type="pmid">881487</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kang</surname> <given-names>O.</given-names></name> <name><surname>Rubin</surname> <given-names>D. L.</given-names></name></person-group> (<year>2009</year>). <article-title>Reverse linguistic stereotyping: measuring the effect of listener expectations on speech evaluation</article-title>. <source>J. Lang. Soc. Psychol.</source> <volume>28</volume>, <fpage>441</fpage>&#x02013;<lpage>456</lpage>. <pub-id pub-id-type="doi">10.1177/0261927X09341950</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kutlu</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>Now you see me, now you mishear me: Raciolinguistic accounts of speech perception in different English varieties</article-title>. <source>J. Multilingual Multicult. Dev</source>. <pub-id pub-id-type="doi">10.1080/01434632.2020.1835929</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lindblom</surname> <given-names>B.</given-names></name></person-group> (<year>1990</year>). <article-title>&#x0201C;Explaining phonetic variation: a sketch of the H&#x00026;H theory,&#x0201D;</article-title> in <source>Speech Production and Speech Modelling</source>, <person-group person-group-type="editor"><name><surname>Hardcastle</surname> <given-names>W. J.</given-names></name> <name><surname>Marchal</surname> <given-names>A.</given-names></name></person-group> editors (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>403</fpage>&#x02013;<lpage>439</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lippi-Green</surname> <given-names>R.</given-names></name></person-group> (<year>2011</year>). <source>English With an Accent: Language, Ideology, and Discrimination in the United States</source>. <edition>2nd Edn</edition>. <publisher-loc>New York, NY</publisher-loc> <publisher-name>Routledge</publisher-name>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lombard</surname> <given-names>&#x000C9;.</given-names></name></person-group> (<year>1911</year>). <article-title>Le signe de l&#x00027;&#x000E9;l&#x000E9;vation de la voix</article-title>. <source>Annales Des Maladies de l&#x00027;Oreille et Du Larynx</source> <volume>37</volume>, <fpage>101</fpage>&#x02013;<lpage>119</lpage>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>Y.</given-names></name> <name><surname>Cooke</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). <article-title>Speech production modifications produced by competing talkers, babble, and stationary noise</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>124</volume>, <fpage>3261</fpage>&#x02013;<lpage>3275</lpage>. <pub-id pub-id-type="doi">10.1121/1.2990705</pub-id><pub-id pub-id-type="pmid">19045809</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magee</surname> <given-names>M.</given-names></name> <name><surname>Lewis</surname> <given-names>C.</given-names></name> <name><surname>Noffs</surname> <given-names>G.</given-names></name> <name><surname>Reece</surname> <given-names>H.</given-names></name> <name><surname>Chan</surname> <given-names>J. C. S.</given-names></name> <name><surname>Zaga</surname> <given-names>C. J.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>148</volume>, <fpage>3562</fpage>&#x02013;<lpage>3568</lpage>. <pub-id pub-id-type="doi">10.1121/10.0002873</pub-id><pub-id pub-id-type="pmid">33379897</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McAuliffe</surname> <given-names>M.</given-names></name> <name><surname>Socolof</surname> <given-names>M.</given-names></name> <name><surname>Mihuc</surname> <given-names>S.</given-names></name> <name><surname>Wagner</surname> <given-names>M.</given-names></name> <name><surname>Sonderegger</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Montreal forced aligner: trainable text-speech alignment using Kaldi</article-title>. <source>Interspeech</source> <volume>2017</volume>, <fpage>498</fpage>&#x02013;<lpage>502</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2017-1386</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McGowan</surname> <given-names>K. B.</given-names></name></person-group> (<year>2015</year>). <article-title>Social expectation improves speech perception in noise</article-title>. <source>Lang. Speech</source> <volume>58</volume>, <fpage>502</fpage>&#x02013;<lpage>521</lpage>. <pub-id pub-id-type="doi">10.1177/0023830914565191</pub-id><pub-id pub-id-type="pmid">27483742</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McLaughlin</surname> <given-names>D. J.</given-names></name> <name><surname>Brown</surname> <given-names>V. A.</given-names></name> <name><surname>Carraturo</surname> <given-names>S.</given-names></name> <name><surname>Van Engen</surname> <given-names>K. J.</given-names></name></person-group> (<year>2022</year>). <article-title>Revisiting the relationship between implicit racial bias and audiovisual benefit for nonnative-accented speech</article-title>. <source>Attenti. Percept. Psychophys</source>. <pub-id pub-id-type="doi">10.3758/s13414-021-02423-w</pub-id><pub-id pub-id-type="pmid">34988904</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niedzielski</surname> <given-names>N.</given-names></name></person-group> (<year>1999</year>). <article-title>The effect of social information on the perception of sociolinguistic variables</article-title>. <source>J. Lang. Soc. Psychol.</source> <volume>18</volume>, <fpage>62</fpage>&#x02013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1177/0261927X99018001005</pub-id><pub-id pub-id-type="pmid">34337391</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Palmiero</surname> <given-names>A. J.</given-names></name> <name><surname>Symons</surname> <given-names>D.</given-names></name> <name><surname>Morgan</surname> <given-names>J. W.</given-names></name> <name><surname>Shaffer</surname> <given-names>R. E.</given-names></name></person-group> (<year>2016</year>). <article-title>Speech intelligibility assessment of protective facemasks and air-purifying respirators</article-title>. <source>J. Occup. Environ. Hyg.</source> <volume>13</volume>, <fpage>960</fpage>&#x02013;<lpage>968</lpage>. <pub-id pub-id-type="doi">10.1080/15459624.2016.1200723</pub-id><pub-id pub-id-type="pmid">27362358</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pichora-Fuller</surname> <given-names>M. K.</given-names></name> <name><surname>Schneider</surname> <given-names>B. A.</given-names></name> <name><surname>Daneman</surname> <given-names>M.</given-names></name></person-group> (<year>1995</year>). <article-title>How young and old adults listen to and remember speech in noise</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>97</volume>, <fpage>593</fpage>&#x02013;<lpage>608</lpage>. <pub-id pub-id-type="doi">10.1121/1.412282</pub-id><pub-id pub-id-type="pmid">7860836</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pick</surname> <given-names>H. L.</given-names></name> <name><surname>Siegel</surname> <given-names>G. M.</given-names></name> <name><surname>Fox</surname> <given-names>P. W.</given-names></name> <name><surname>Garber</surname> <given-names>S. R.</given-names></name> <name><surname>Kearney</surname> <given-names>J. K.</given-names></name></person-group> (<year>1989</year>). <article-title>Inhibiting the Lombard effect</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>85</volume>, <fpage>894</fpage>&#x02013;<lpage>900</lpage>. <pub-id pub-id-type="doi">10.1121/1.397561</pub-id><pub-id pub-id-type="pmid">2926004</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quen&#x000E9;</surname> <given-names>H.</given-names></name> <name><surname>van Delft</surname> <given-names>L. E.</given-names></name></person-group> (<year>2010</year>). <article-title>Non-native durational patterns decrease speech intelligibility</article-title>. <source>Speech Commun.</source> <volume>52</volume>, <fpage>911</fpage>&#x02013;<lpage>918</lpage>. <pub-id pub-id-type="doi">10.1016/j.specom.2010.03.005</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rabbitt</surname> <given-names>P.</given-names></name></person-group> (<year>1966</year>). <article-title>Recognition: Memory for words correctly heard in noise</article-title>. <source>Psychon. Sci.</source> <volume>6</volume>, <fpage>383</fpage>&#x02013;<lpage>384</lpage>. <pub-id pub-id-type="doi">10.3758/BF03330948</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rabbitt</surname> <given-names>P.</given-names></name></person-group> (<year>1968</year>). <article-title>Channel-capacity, intelligibility and immediate memory</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>20</volume>, <fpage>241</fpage>&#x02013;<lpage>248</lpage>. <pub-id pub-id-type="doi">10.1080/14640746808400158</pub-id><pub-id pub-id-type="pmid">5683763</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rogers</surname> <given-names>C. L.</given-names></name> <name><surname>Lister</surname> <given-names>J. J.</given-names></name> <name><surname>Febo</surname> <given-names>D. M.</given-names></name> <name><surname>Besing</surname> <given-names>J. M.</given-names></name> <name><surname>Abrams</surname> <given-names>H. B.</given-names></name></person-group> (<year>2006</year>). <article-title>Effects of bilingualism, noise, and reverberation on speech perception by listeners with normal hearing</article-title>. <source>Appl. Psycholinguist.</source> <volume>27</volume>, <fpage>465</fpage>&#x02013;<lpage>485</lpage>. <pub-id pub-id-type="doi">10.1017/S014271640606036X</pub-id><pub-id pub-id-type="pmid">28534731</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ross</surname> <given-names>L. A.</given-names></name> <name><surname>Saint-Amour</surname> <given-names>D.</given-names></name> <name><surname>Leavitt</surname> <given-names>V. M.</given-names></name> <name><surname>Javitt</surname> <given-names>D. C.</given-names></name> <name><surname>Foxe</surname> <given-names>J. J.</given-names></name></person-group> (<year>2007</year>). <article-title>Do you see what i am saying? Exploring visual enhancement of speech comprehension in noisy environments</article-title>. <source>Cerebral Cortex</source> <volume>17</volume>, <fpage>1147</fpage>&#x02013;<lpage>1153</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhl024</pub-id><pub-id pub-id-type="pmid">16785256</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rubin</surname> <given-names>D. L.</given-names></name></person-group> (<year>1992</year>). <article-title>Nonlanguage factors affecting undergraduates&#x00027; judgments of nonnative English-speaking teaching assistants</article-title>. <source>Res. High. Educ.</source> <volume>33</volume>, <fpage>511</fpage>&#x02013;<lpage>531</lpage>. <pub-id pub-id-type="doi">10.1007/BF00973770</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rubin</surname> <given-names>D. L.</given-names></name> <name><surname>Smith</surname> <given-names>K. A.</given-names></name></person-group> (<year>1990</year>). <article-title>Effects of accent, ethnicity, and lecture topic on undergraduates&#x00027; perceptions of nonnative English-speaking teaching assistants</article-title>. <source>Int. J. Intercult. Relat.</source> <volume>14</volume>, <fpage>337</fpage>&#x02013;<lpage>353</lpage>. <pub-id pub-id-type="doi">10.1016/0147-1767(90)90019-S</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rudner</surname> <given-names>M.</given-names></name> <name><surname>Lunner</surname> <given-names>T.</given-names></name> <name><surname>Behrens</surname> <given-names>T.</given-names></name> <name><surname>Thor&#x000E9;n</surname> <given-names>E. S.</given-names></name> <name><surname>R&#x000F6;nnberg</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Working memory capacity may influence perceived effort during aided speech recognition in noise</article-title>. <source>J. Am. Acad. Audiol.</source> <volume>23</volume>, <fpage>577</fpage>&#x02013;<lpage>589</lpage>. <pub-id pub-id-type="doi">10.3766/jaaa.23.7.7</pub-id><pub-id pub-id-type="pmid">22967733</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sarampalis</surname> <given-names>A.</given-names></name> <name><surname>Kalluri</surname> <given-names>S.</given-names></name> <name><surname>Edwards</surname> <given-names>B.</given-names></name> <name><surname>Hafter</surname> <given-names>E.</given-names></name></person-group> (<year>2009</year>). <article-title>Objective measures of listening effort: effects of background noise and noise reduction</article-title>. <source>J. Speech Lang. Hearing Res.</source> <volume>52</volume>, <fpage>1230</fpage>&#x02013;<lpage>1240</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2009/08-0111)</pub-id><pub-id pub-id-type="pmid">19380604</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saunders</surname> <given-names>G. H.</given-names></name> <name><surname>Jackson</surname> <given-names>I. R.</given-names></name> <name><surname>Visram</surname> <given-names>A. S.</given-names></name></person-group> (<year>2021</year>). <article-title>Impacts of face coverings on communication: an indirect impact of COVID-19</article-title>. <source>Int. J. Audiol.</source> <volume>60</volume>, <fpage>495</fpage>&#x02013;<lpage>506</lpage>. <pub-id pub-id-type="doi">10.1080/14992027.2020.1851401</pub-id><pub-id pub-id-type="pmid">33246380</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scarborough</surname> <given-names>R.</given-names></name> <name><surname>Zellou</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Clarity in communication: &#x0201C;Clear&#x0201D; speech authenticity and lexical neighborhood density effects in speech production and perception</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>134</volume>, <fpage>3793</fpage>&#x02013;<lpage>3807</lpage>. <pub-id pub-id-type="doi">10.1121/1.4824120</pub-id><pub-id pub-id-type="pmid">24180789</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwarz</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>K.</given-names></name> <name><surname>Sim</surname> <given-names>J. H.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Buchanan-Worster</surname> <given-names>E.</given-names></name> <name><surname>Post</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Speech perception through face masks by children and adults</article-title>. <source>Cambridge Language Sciences Annual Symposium.</source> <pub-id pub-id-type="doi">10.33774/coe-2021-l88qk</pub-id><pub-id pub-id-type="pmid">34598631</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smiljani&#x00107;</surname> <given-names>R.</given-names></name> <name><surname>Bradlow</surname> <given-names>A. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Speaking and hearing clearly: talker and listener factors in speaking style changes</article-title>. <source>Lang. Linguist. Compass</source> <volume>3</volume>, <fpage>236</fpage>&#x02013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1111/j.1749-818X.2008.00112.x</pub-id><pub-id pub-id-type="pmid">20046964</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smiljani&#x00107;</surname> <given-names>R.</given-names></name> <name><surname>Keerstock</surname> <given-names>S.</given-names></name> <name><surname>Meemann</surname> <given-names>K.</given-names></name> <name><surname>Ransom</surname> <given-names>S. M.</given-names></name></person-group> (<year>2021</year>). <article-title>Face masks and speaking style affect audio-visual word recognition and memory of native and non-native speech</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>149</volume>, <fpage>4013</fpage>&#x02013;<lpage>4023</lpage>. <pub-id pub-id-type="doi">10.1121/10.0005191</pub-id><pub-id pub-id-type="pmid">34241444</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strand</surname> <given-names>J. F.</given-names></name> <name><surname>Brown</surname> <given-names>V. A.</given-names></name> <name><surname>Merchant</surname> <given-names>M. B.</given-names></name> <name><surname>Brown</surname> <given-names>H. E.</given-names></name> <name><surname>Smith</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Measuring listening effort: Convergent validity, sensitivity, and links With cognitive and personality measures</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>61</volume>, <fpage>1463</fpage>&#x02013;<lpage>1486</lpage>. <pub-id pub-id-type="doi">10.1044/2018_JSLHR-H-17-0257</pub-id><pub-id pub-id-type="pmid">29800081</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sumby</surname> <given-names>W. H.</given-names></name> <name><surname>Pollack</surname> <given-names>I.</given-names></name></person-group> (<year>1954</year>). <article-title>Visual contribution to speech intelligibility in noise</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>26</volume>, <fpage>212</fpage>&#x02013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1121/1.1907309</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Toscano</surname> <given-names>J. C.</given-names></name> <name><surname>Toscano</surname> <given-names>C. M.</given-names></name></person-group> (<year>2021</year>). <article-title>Effects of face masks on speech recognition in multi-talker babble noise</article-title>. <source>PLoS ONE</source> <volume>16</volume>, <fpage>e0246842</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0246842</pub-id><pub-id pub-id-type="pmid">33626073</pub-id></citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Truong</surname> <given-names>T. L.</given-names></name> <name><surname>Weber</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Intelligibility and recall of sentences spoken by adult and child talkers wearing face masks</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>150</volume>, <fpage>1674</fpage>&#x02013;<lpage>1681</lpage>. <pub-id pub-id-type="doi">10.1121/10.0006098</pub-id><pub-id pub-id-type="pmid">34598631</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Engen</surname> <given-names>K. J.</given-names></name> <name><surname>Peelle</surname> <given-names>J. E.</given-names></name></person-group> (<year>2014</year>). <article-title>Listening effort and accented speech</article-title>. <source>Front. Hum. Neurosci.</source> <volume>8</volume>, <fpage>577</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2014.00577</pub-id><pub-id pub-id-type="pmid">25140140</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Wijngaarden</surname> <given-names>S. J.</given-names></name> <name><surname>Steeneken</surname> <given-names>H. J. M.</given-names></name> <name><surname>Houtgast</surname> <given-names>T.</given-names></name></person-group> (<year>2002</year>). <article-title>Quantifying the intelligibility of speech in noise for non-native listeners</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>111</volume>, <fpage>1906</fpage>&#x02013;<lpage>1916</lpage>. <pub-id pub-id-type="doi">10.1121/1.1456928</pub-id><pub-id pub-id-type="pmid">12002873</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yi</surname> <given-names>H.</given-names></name> <name><surname>Pingsterhaus</surname> <given-names>A.</given-names></name> <name><surname>Song</surname> <given-names>W.</given-names></name></person-group> (<year>2021</year>). <article-title>Effects of wearing face masks while using different speaking styles in noise on speech intelligibility during the COVID-19 pandemic</article-title>. <source>Front. Psychol.</source> <volume>12</volume>, <fpage>682677</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2021.682677</pub-id><pub-id pub-id-type="pmid">34295288</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yi</surname> <given-names>H.-G.</given-names></name> <name><surname>Phelps</surname> <given-names>J. E. B.</given-names></name> <name><surname>Smiljani,&#x00107;</surname> <given-names>R.</given-names></name> <name><surname>Chandrasekaran</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Reduced efficiency of audiovisual integration for nonnative speech</article-title>. <source>J. Acoust. Soc. Am.</source> 134, EL387&#x02013;EL393. <pub-id pub-id-type="doi">10.1121/1.4822320</pub-id><pub-id pub-id-type="pmid">24181980</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zekveld</surname> <given-names>A. A.</given-names></name> <name><surname>Kramer</surname> <given-names>S. E.</given-names></name> <name><surname>Festen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2010</year>). <article-title>Pupil response as an indication of effortful listening: the influence of sentence intelligibility</article-title>. <source>Ear Hear.</source> <volume>31</volume>, <fpage>480</fpage>&#x02013;<lpage>490</lpage>. <pub-id pub-id-type="doi">10.1097/AUD.0b013e3181d4f251</pub-id><pub-id pub-id-type="pmid">20588118</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>Excluding problematic sentences with the keywords &#x0201C;slave&#x0201D; and &#x0201C;clan&#x0201D;.</p></fn>
<fn id="fn0002"><p><sup>2</sup>Microphone was located outside of the mask and equidistant from the mouth for the face-masked / non-face-masked conditions.</p></fn>
<fn id="fn0003"><p><sup>3</sup>&#x0201C;The normative auditory threshold for a 1000-Hz sine wave&#x0201D;, per Praat documentation. Therefore, values lower than 2e<sup>&#x02212;05</sup> will have negative values.</p></fn>
<fn id="fn0004"><p><sup>4</sup>Note that while we normalized the intensity of all sound files to 60 dB relative to Praat&#x00027;s default reference level (2e<sup>&#x02212;05</sup> Pascal), the actual volume levels varied across participants&#x00027; machines as this was an at-home, online experiment.</p></fn>
<fn id="fn0005"><p><sup>5</sup>Note that by-Sentence random intercepts were not included, because the sentences were pseudorandomized. Each sentence was always associated with a particular Visual Information, Speaking Style, and Face-masking Condition across the versions, such that they are not random.</p></fn>
</fn-group>
</back>
</article>