<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Commun.</journal-id>
<journal-title>Frontiers in Communication</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Commun.</abbrev-journal-title>
<issn pub-type="epub">2297-900X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomm.2022.896013</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Communication</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Production and Perception of Mandarin Laryngeal Contrast: The Role of Post-plosive F0</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Guo</surname> <given-names>Yuting</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1803982/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kwon</surname> <given-names>Harim</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1122466/overview"/>
</contrib>
</contrib-group>
<aff><institution>Linguistics Program, Department of English, George Mason University</institution>, <addr-line>Fairfax, VA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Georgia Zellou, University of California, Davis, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jiayin Gao, Ludwig Maximilian University of Munich, Germany; James Kirby, Ludwig Maximilian University of Munich, Germany</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Harim Kwon  <email>hkwon20&#x00040;gmu.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Language Sciences, a section of the journal Frontiers in Communication</p></fn>
<fn fn-type="equal" id="fn002"><p>&#x02020;These authors share first authorship</p></fn></author-notes>
<pub-date pub-type="epub">
<day>17</day>
<month>06</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>7</volume>
<elocation-id>896013</elocation-id>
<history>
<date date-type="received">
<day>14</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>05</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Guo and Kwon.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Guo and Kwon</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>This study examines the relation between plosive aspiration and post-plosive f0 (fundamental frequency) in the production and perception of the laryngeal contrast in Mandarin. Production data from 25 Mandarin speakers showed that, in word onsets, VOTs (voice onset time) of aspirated and unaspirated plosives were different, as expected. At the same time, the speakers produced different post-plosive f0 between aspirated and unaspirated plosives, but the difference varied according to the lexical tones &#x02013; post-aspirated f0 was higher than post-unaspirated f0 in high-initial tones (i.e., lexical tones with high onset f0), but the pattern was the opposite and less robust in low-initial tones. In the perception of the same participants, VOT was the primary cue to aspiration but, when VOT was ambiguous, high post-plosive f0 yielded more aspirated responses in general. We claim that the asymmetry in f0 perturbation between high-initial and low-initial tones in production arises from different laryngeal maneuvers for different tonal targets. In low-initial tones, in which the vocal folds are slack and the glottal opening is wider, aspirated plosives have a lower subglottal air pressure than unaspirated plosives at the voicing onset, resulting in lower post-aspirated f0 than post-unaspirated f0. But in high-initial tones, the vocal folds are tense, which requires a higher trans-glottal pressure threshold to initiate phonation at the onset of voicing. As a result, the subglottal pressure does not decrease as much. Instead, the faster airflow in aspirated than unaspirated plosives gives rise to the pattern that post-aspirated f0 is higher than post-unaspirated f0. Regardless of this variation in production, our perception data suggest that Mandarin listeners generalize the f0 perturbation patterns from high-initial tones and associate high post-plosive f0 with aspirated plosives even in low-initial tone contexts. We cautiously claim that the observed perceptual pattern is consistent with the robustly represented production pattern, as high-initial tones are more prevalent and salient in the language and exhibit stronger f0 perturbation in the speakers&#x00027; productions.</p></abstract>
<kwd-group>
<kwd>Mandarin Chinese</kwd>
<kwd>laryngeal contrast</kwd>
<kwd>aspiration</kwd>
<kwd>fundamental frequency (f0)</kwd>
<kwd>production-perception relation</kwd>
<kwd>secondary cue</kwd>
</kwd-group>
<contract-sponsor id="cn001">George Mason University<named-content content-type="fundref-id">10.13039/100006369</named-content></contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="5"/>
<equation-count count="0"/>
<ref-count count="56"/>
<page-count count="15"/>
<word-count count="13427"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<sec>
<title>F0 Perturbation</title>
<p>Laryngeal properties (such as voicing or aspiration) of onset plosives influence the fundamental frequency, or f0, at the onset of the following vowels. This phenomenon, commonly referred to as f0 perturbation, has been widely attested across languages, such as Cantonese (Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>; Luo, <xref ref-type="bibr" rid="B37">2018</xref>), Dutch (L&#x000F6;fqvist et al., <xref ref-type="bibr" rid="B36">1989</xref>), English (House and Fairbanks, <xref ref-type="bibr" rid="B24">1953</xref>; Lehiste and Peterson, <xref ref-type="bibr" rid="B33">1961</xref>; Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>; Ohde, <xref ref-type="bibr" rid="B44">1984</xref>; L&#x000F6;fqvist et al., <xref ref-type="bibr" rid="B36">1989</xref>; Hanson, <xref ref-type="bibr" rid="B19">2009</xref>), French (Kirby and Ladd, <xref ref-type="bibr" rid="B29">2016</xref>), German (Kohler, <xref ref-type="bibr" rid="B31">1982</xref>; Hoole and Honda, <xref ref-type="bibr" rid="B23">2011</xref>), Italian (Kirby and Ladd, <xref ref-type="bibr" rid="B29">2016</xref>), Japanese (Gao and Arai, <xref ref-type="bibr" rid="B16">2019</xref>), Khmer (Kirby, <xref ref-type="bibr" rid="B28">2018</xref>), Mandarin (Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>; Luo, <xref ref-type="bibr" rid="B37">2018</xref>), Russian (Mohr, <xref ref-type="bibr" rid="B40">1971</xref>), Spanish (Dmitrieva et al., <xref ref-type="bibr" rid="B12">2015</xref>), Thai (Gandour, <xref ref-type="bibr" rid="B15">1974</xref>; Kirby, <xref ref-type="bibr" rid="B28">2018</xref>), Vietnamese (Kirby, <xref ref-type="bibr" rid="B28">2018</xref>), Xhosa (Jessen and Roux, <xref ref-type="bibr" rid="B25">2002</xref>), Yoruba (Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>), among others. The most commonly reported pattern shows that a (phonologically) voiced plosive has a lower post-plosive f0 than a (phonologically) voiceless one, although there are some notable patterns.</p>
<p>First, f0 perturbation occurs in so-called true voicing languages and in aspirating languages alike. That is, it seems less relevant whether the language contrasts prevoiced vs. voiceless unaspirated categories or unaspirated vs. aspirated categories. For example, both Spanish and English show similar f0 perturbation (Dmitrieva et al., <xref ref-type="bibr" rid="B12">2015</xref>). This might be because English unaspirated plosives are phonologically voiced (Kingston and Diehl, <xref ref-type="bibr" rid="B27">1994</xref>; Hanson, <xref ref-type="bibr" rid="B19">2009</xref>). However, findings on languages with a three-way laryngeal contrast (prevoiced vs. unaspirated vs. aspirated) suggest that the difference between unaspirated and aspirated categories cannot entirely be reduced to phonological voicing. For example, Kirby (<xref ref-type="bibr" rid="B28">2018</xref>) examines Khmer, Vietnamese, and Thai, all with the three-way contrast, and finds that aspirated plosives are followed by a higher f0 than the unaspirated ones, at least for some speakers in all three languages. This provides evidence for the bona fide effects of consonantal aspiration (or the lack thereof) on the following f0.</p>
<p>Although the commonly reported pattern of f0 perturbation is voiceless (or aspirated) plosives having higher post-plosive f0 than voiced (or unaspirated) ones, this is not always the case. For example, Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>) report that, in Mandarin, f0 is lower after aspirated plosives than after unaspirated plosives because aspiration causes the sub-glottal air pressure to decrease sharply, lowering f0 at the release of the plosives. However, Luo (<xref ref-type="bibr" rid="B37">2018</xref>) provides contradicting findings such that aspiration in Mandarin raises f0 quite robustly. The cause of this discrepancy is unclear (see more in the section: F0 Perturbation in Mandarin).</p>
<p>Second, f0 perturbation is attested in both tonal languages and non-tonal languages although the effects are less robust in tonal languages. For instance, the f0 differences between English unaspirated and aspirated series can last more than 100 ms after the voicing onset whereas they last 40&#x0007E;60 ms in a tonal language, Yoruba (Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>). Other studies on tonal languages (e.g., Chen, <xref ref-type="bibr" rid="B7">2011</xref>, on Shanghainese; Gandour, <xref ref-type="bibr" rid="B15">1974</xref>, on Thai; Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>, on Cantonese; Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>, on Mandarin) also suggest that f0 perturbation is limited to the very onset of the vowel and its exact duration is determined by the tonal contexts. Furthermore, Kirby (<xref ref-type="bibr" rid="B28">2018</xref>) reports that in Thai and Vietnamese, the perturbation effect is clearly observed in citation forms, but not in connected speech. This indicates that the effects of f0 perturbation may interact not only with tonal contexts but also with sentence-level prosody. See also Hanson (<xref ref-type="bibr" rid="B19">2009</xref>), Chen (<xref ref-type="bibr" rid="B7">2011</xref>), and Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>), for similar effects in English, Shanghainese, and Mandarin, respectively.</p>
<p>Third, though the magnitude of the f0 perturbation is quite small (ranging 8&#x02013;16 Hz in different languages, Table 1 in Coetzee et al., <xref ref-type="bibr" rid="B10">2018</xref>), listeners use the f0 at the vowel onset to determine the preceding consonant&#x00027;s laryngeal category across different languages. English listeners, for instance, use f0 as a cue to consonant&#x00027;s laryngeal category not only when VOT, the phonetic property that is primarily responsible for the laryngeal contrast, is ambiguous (e.g., Whalen et al., <xref ref-type="bibr" rid="B50">1990</xref>), but also when it is not ambiguous (e.g., Whalen et al., <xref ref-type="bibr" rid="B51">1993</xref>). Even in tonal languages, in which f0 is primarily responsible to carry tonal information, and the perturbation, if any, is less consistent and temporally limited, post-plosive f0 influences listeners&#x00027; perceptual judgments on onset plosive&#x00027;s laryngeal category. For example, Francis et al. (<xref ref-type="bibr" rid="B14">2006</xref>) report that falling f0 contours at the onset of a high-level tone signal aspirated plosives to Cantonese listeners and this perceptual pattern does not match the f0 patterns in Cantonese plosive productions. They claim that the use of post-plosive f0 as a consonantal cue, therefore, does not originate from the experience of hearing the covarying VOT and f0. Rather, Cantonese listeners&#x00027; perception shows the influence of the language-independent, general auditory enhancing effects among different phonetic properties (Kingston and Diehl, <xref ref-type="bibr" rid="B27">1994</xref>; Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>).</p>
<p>Despite the universality of the phenomenon, the source of f0 perturbation is controversial. Some have argued that f0 perturbation is a physiological or physical epiphenomenon of consonantal voicing or aspiration (e.g., Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>; L&#x000F6;fqvist et al., <xref ref-type="bibr" rid="B36">1989</xref>). Several different hypotheses have been offered on the exact mechanism of f0 perturbation. First, the aerodynamic hypothesis claims that voiced plosives differ from voiceless ones in how air pressure changes during and after their oral closure, leading to differing f0 after the release. In the case of voiced plosives, supraglottal air pressure gradually builds up during the closure because voicing requires a continuous airflow through the glottis. This results in a decrease in the trans-glottal air pressure difference, which in turn leads to a decrease in f0. On the other hand, voiceless plosives have a greater volume of airflow from subglottal to supraglottal cavities upon the release, resulting in faster vocal fold vibration (but see also Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>). Another hypothesis claims that f0 perturbation arises from the states of vocal folds during plosive voicing (e.g., Halle and Stevens, <xref ref-type="bibr" rid="B17">1971</xref>; L&#x000F6;fqvist et al., <xref ref-type="bibr" rid="B36">1989</xref>). During the plosive closure, the vocal folds remain slack for voiced plosives whereas they are stiff for voiceless plosives to halt the vibration. The tension of the vocal folds influences the f0 of the flanking vowels, such that slack vocal folds lower, and stiff vocal folds raise, the rate of their vibration. Still another hypothesis claims that f0 perturbation is due to the larynx height difference between the voiced and voiceless plosives (e.g., Honda, <xref ref-type="bibr" rid="B21">2004</xref>). To allow for vocal fold vibration during the closure, the larynx is lower for voiced plosives than for voiceless ones. As the larynx height is usually positively correlated with f0, voiced plosives have lower post-plosive f0 than voiceless ones.</p>
<p>Despite the differences in their exact mechanisms, these hypotheses commonly suggest that the effects of plosive voicing (or voicelessness) on the following f0 are automatic and determined by the biomechanics of the larynx. In contrast, it has also been claimed that speakers actively induce the f0 differences to enhance the phonological contrast (e.g., Kingston and Diehl, <xref ref-type="bibr" rid="B27">1994</xref>; Kingston, <xref ref-type="bibr" rid="B26">2007</xref>). Under this phonological hypothesis, post-plosive f0 is not a mere by-product of sustaining voicing during the plosive closure or aspiration after the plosive release. Rather, speakers enhance the phonological laryngeal contrast by enhancing covarying phonetic properties. This results in the plosives of different laryngeal categories having distinct post-plosive f0, prolonged beyond the very beginning of the vowel. Therefore, this hypothesis can readily explain why the languages that contrast prevoiced and voiceless plosives (e.g., Spanish) and those contrasting aspirated and unaspirated plosives (e.g., English) show similar f0 perturbation patterns. In addition, in tonal languages, speakers would not enhance consonantal contrast using post-plosive f0 because f0 plays a central role in conveying lexical (or grammatical) information (Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>).</p>
<p>As pointed out in previous research (e.g., Chen, <xref ref-type="bibr" rid="B7">2011</xref>; Hoole and Honda, <xref ref-type="bibr" rid="B23">2011</xref>; Dmitrieva et al., <xref ref-type="bibr" rid="B12">2015</xref>), these two views, automatic vs. phonological, are not incompatible with each other. In fact, it is possible that the biomechanical factors determine the connection between the voicing and f0, which serves as the resource for speakers to use as an enhancement strategy for plosive laryngeal contrast. Building on this previous conversation on f0 perturbation, this study asks how speakers of a tonal language use post-plosive f0 as a consonantal cue. Focusing on the relation between plosive aspiration and post-plosive f0, we investigate the production and perception of Mandarin word-initial plosives in different tonal contexts. The rest of the introduction will briefly review the relevant background on Mandarin and present the main questions for the two experiments.</p>
</sec>
<sec>
<title>F0 in Mandarin</title>
<sec>
<title>Lexical Tones</title>
<p>Mandarin has four lexical tones, typically described as high-level (Tone 1), rising (Tone 2), low-dipping (Tone 3), and falling (Tone 4) (e.g., Xu, <xref ref-type="bibr" rid="B55">1997</xref>; Duanmu, <xref ref-type="bibr" rid="B13">2007</xref>). In this paper, tones are abbreviated as T1, T2, T3, and T4, and syllables produced with a specific tone are noted with a number added to the syllable. For example, /t<sup>h</sup>a1/ refers to the syllable /t<sup>h</sup>a/ with T1.</p>
<p>Xu (<xref ref-type="bibr" rid="B55">1997</xref>) describes the f0 contours of the four lexical tones as the following. T1 begins with a high f0 and maintains the same level through the entire vowel; T2 starts with a low f0, and then falls slightly until 20% into the vowel before rising throughout the rest of the vowel; T3, in citation form, begins with a low f0, falls to the lowest f0 at the midpoint of the vowel, and then rises sharply to the end of the syllable although the final rise is usually absent in non-prepausal positions; and T4 starts with a high f0, and then drops sharply from the 20% of the vowel until the end of the syllable. As f0 perturbation due to onset consonant is expected to be most distinct in the beginning of the vowel (adjacent to the onset consonant), two important aspects of these tones should be noted. First, T1 and T4 begin with a high f0 while T2 and T3 with a low f0. Second, T1 has the most static f0 contour and, in connected speech, T2 and T4 have more dynamic f0 contours than T3 during the first half of the vowel.</p>
<p>As for the physiological properties of Mandarin tones, studies have shown that larynx height is in general positively correlated with f0 (e.g., Hall&#x000E9;, <xref ref-type="bibr" rid="B18">1994</xref>; Moisik et al., <xref ref-type="bibr" rid="B41">2014</xref>). Specifically, the larynx is higher at the syllable onsets in T1 and T4 than in T2 and T3. However, Moisik et al. (<xref ref-type="bibr" rid="B41">2014</xref>) claim that the role of larynx height may be only facilitatory and, thus, the relation between larynx height and tones is not necessarily straightforward. This suggests that speakers may utilize different laryngeal settings (including larynx height, and vocal fold tension, among other things) to produce different tonal targets in Mandarin.</p>
</sec>
<sec>
<title>F0 Perturbation in Mandarin</title>
<p>Mandarin plosives are typically classified as voiceless unaspirated and voiceless aspirated, with aspiration as the primary distinction (Mandarin plosives are henceforth referred to as <bold>unaspirated</bold> and <bold>aspirated</bold> plosives). The language does not have voiced obstruents and, thus, the voiced consonants that can occur as a word onset are sonorants, such as /m n l w j/.</p>
<p>Inconsistent results have been reported on f0 perturbation in Mandarin (e.g., Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>; Luo, <xref ref-type="bibr" rid="B37">2018</xref>; Chi et al., <xref ref-type="bibr" rid="B8">2019</xref>). Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>) suggest that aspiration is associated with low f0 although the specific pattern can be influenced by the tonal contexts. The lowering of aspiration is more robust in tones beginning with a low f0 (T2/T3, henceforth low-initial tones) than in those with a high f0 (T1/T4, high-initial tones). They attribute this pattern to the aerodynamics of the aspiration, which is characterized by a rapid outward flow of a large volume of air at the release of a plosive. This airflow, occurring between the release of oral closure and the glottal pulsing, lowers the subglottal air pressure for the aspirated plosives more than for the unaspirated ones, decreasing post-aspirated f0. The effects of aerodynamic force become even stronger when the intended pitch is low which is realized with slack vocal folds. Therefore, at the onset of low-initial tones, the vocal folds are slack and the f0 difference between aspirated and unaspirated series is enlarged.</p>
<p>By contrast, Luo (<xref ref-type="bibr" rid="B37">2018</xref>) reports that aspiration raises the f0, which extends longer in high-initial tones than in low-initial tones. In T2, they did not find a clear pattern of f0 perturbation. As for the source of this pattern (higher f0 after aspirated than unaspirated plosives), Luo mentions that aspiration is typically associated with high transglottal air pressure, elevated larynx, and stiff vocal folds, all of which raise the f0. On the other hand, she attributes the longer f0 perturbation in the high-initial tones than in low-initial tones to speakers&#x00027; control (e.g., Kingston and Diehl, <xref ref-type="bibr" rid="B27">1994</xref>). According to Luo (<xref ref-type="bibr" rid="B37">2018</xref>), in Mandarin, high-initial tones are more salient than low-initial ones both phonologically and perceptually. Phonologically, high-initial tones are more likely to be preserved in phonological processes, and perceptually, listeners are more accurate in perceiving high-initial tones. Assuming that tonal language speakers actively suppress the biomechanically-motivated automatic f0 perturbation to enhance the tonal contrast (e.g., Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>; Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>), there is less need for this suppression when the tones are salient. Therefore, in Mandarin, high-initial tones allow for more f0 variability than low-initial tones.</p>
<p>The cause of the divergent findings in Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>) and Luo (<xref ref-type="bibr" rid="B37">2018</xref>) is unclear. However, it is worth mentioning that the participants in both studies are all female speakers, who produce the target syllables embedded in different carrier phrases. In Luo&#x00027;s (<xref ref-type="bibr" rid="B37">2018</xref>) carrier phrase, the target syllables are always preceded by T1 whereas Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>) use two different types of carrier phrases differing in the preceding syllable tones, T1 and T3. The two studies also differ in how they use f0 measurements in their analyses. While Xu and Xu&#x00027;s (<xref ref-type="bibr" rid="B54">2003</xref>) analyses are based on the raw f0 measured in Hz, Luo (<xref ref-type="bibr" rid="B37">2018</xref>) uses z-scored f0 normalized by speaker. The different patterns are possibly due to a great inter-speaker variation, as well.</p>
<p>Chi et al. (<xref ref-type="bibr" rid="B8">2019</xref>) compare two male speakers&#x00027; glottal opening and oral airflow in aspirated and unaspirated plosives in T1. Their findings corroborate the possibility of the inter-speaker variation. One of the two tested speakers does not show the f0-aspiration covariation but shows faster oral airflow in aspirated than unaspirated plosives, especially when preceding a low vowel /a/ (Figure 5 in Chi et al., <xref ref-type="bibr" rid="B8">2019</xref>). This speaker shows a negative relationship between the post-plosive f0 and oral airflow rate, suggesting that the post-plosive f0 decreases as the oral airflow rate increases, presumably for the consonant aspiration and a low vowel. This is consistent with the aerodynamic interpretation in Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>). However, the other speaker does not show this airflow rate difference between aspirated and unaspirated plosives. And only this speaker tends to produce higher f0 for aspirated than unaspirated plosives, consistent with Luo&#x00027;s (<xref ref-type="bibr" rid="B37">2018</xref>) findings, although the f0 difference is not large enough to distinguish the aspiration contrast.</p>
<p>Despite the diverging patterns and potential individual variation, the previous findings commonly suggest that the f0 perturbation in Mandarin is fairly limited to the vowel onset. This is consistent with previous findings in other tonal languages (e.g., Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>; Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>; Kirby, <xref ref-type="bibr" rid="B28">2018</xref>).</p>
</sec>
</sec>
<sec>
<title>Current Study</title>
<p>This study examines the role of post-plosive f0 as a secondary cue for Mandarin plosive laryngeal contrast in two experiments. We ask how the lexical tone mediates the f0 patterns in production, as well as the listeners&#x00027; perceptual responses. The f0 at the vowel onset is expected to be influenced, interactively, by the lexical tone and the perturbation effects due to the onset consonants.</p>
<p>Experiment 1 examines the plosive production of Mandarin speakers to investigate the f0 patterns at vowel onset, influenced by the laryngeal category of the onset consonant, in CV syllables. The central questions for Experiment 1 are (1) how the aspiration (or the lack thereof) of the onset consonant changes the f0 at the onset of voicing following the onset consonant, and (2) how the tonal contexts influence the relation between consonant aspiration and f0 at voicing onset, if any. As mentioned above, the existing findings on the f0 perturbation in Mandarin are divergent and inconclusive (e.g., Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>; Luo, <xref ref-type="bibr" rid="B37">2018</xref>; Chi et al., <xref ref-type="bibr" rid="B8">2019</xref>). We aim to provide an additional set of empirical data, including both female and male speakers, on the f0 perturbation in Mandarin.</p>
<p>Experiment 2 examines Mandarin plosive perception. In Experiment 2, we specifically ask (1) whether the f0 differences between different laryngeal categories, if any, are used by Mandarin listeners as a cue to the onset aspiration, and (2) how the tonal contexts influence the listeners&#x00027; use of f0 as a consonantal cue, if at all. It is still unknown whether Mandarin listeners use f0 as a secondary cue to the laryngeal contrast, to the best of our knowledge. Since f0 is the primary cue for lexical tones in the language, Mandarin listeners might not rely on the post-plosive f0 to determine the laryngeal category of the onset plosives. If Mandarin listeners do use the post-plosive f0 as a cue for the onset plosive, such an outcome may have different interpretations depending on the findings in Experiment 1. If the production patterns provide evidence for the perceptual patterns (i.e., if the listeners&#x00027; behaviors reflect the robust patterns present in the speakers&#x00027; production), the listeners&#x00027; behaviors can be attributed to their native language experience. On the other hand, if the listeners associate post-plosive f0 with consonant aspiration in the absence of systematic f0 perturbation patterns in Mandarin productions, their perceptual behaviors could be attributed to the general auditory enhancing effects (Kingston and Diehl, <xref ref-type="bibr" rid="B27">1994</xref>; Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>).</p>
</sec>
</sec>
<sec id="s2">
<title>Experiment 1: Production</title>
<p>Experiment 1 examines Mandarin speakers&#x00027; plosive productions in CV syllables, asking how f0 at the vowel onset changes as a function of the laryngeal category of the onset consonant, in different tonal contexts.</p>
<sec>
<title>Methods</title>
<sec>
<title>Participants</title>
<p>Twenty-five native speakers of Mandarin Chinese (15 female and 10 male, mean age = 26, range = 19&#x0007E;46) were recruited from the George Mason University community, in Virginia, USA. They were self-identified as native speakers of Mandarin, born and raised in the North China. All participants learned and spoke English as their second language, but they reported to be dominant in Mandarin. The participants moved to the US at the mean age of 22 (range 19&#x0007E;35) and had lived in the US for 1&#x0007E;48 months (mean = 13) at the time of testing, except for one participant (F05), who had been in the U.S. for 20 years. After confirming the data from this participant were not distinct from the rest of the group, we decided to include her in the analysis. The individual data are provided in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Materials</xref>. No participants reported any history of speech or hearing disorders. The participants received monetary compensation for their participation.</p>
</sec>
<sec>
<title>Stimuli</title>
<p>The stimuli were 24 monosyllabic Mandarin words, with 3 onset consonants (aspirated, unaspirated, sonorant) <sup>&#x0002A;</sup> 2 vowel contexts (low [a], high [u]) <sup>&#x0002A;</sup> 4 lexical tones. We were mainly interested in comparing aspirated and unaspirated plosives, and sonorant onsets were also included as fillers. For the onset consonants, we used /t/, /t<sup>h</sup>/, and /w/, as they yielded the least number of lexical gaps when combined with the vowels /a/ and /u/. However, /t<sup>h</sup>a2/ is still lexically missing in Mandarin and, thus, was substituted with /p<sup>h</sup>a2/, as f0 patterns for /t<sup>h</sup>a2/ and /p<sup>h</sup>a2/ are known to be similar (Ohde, <xref ref-type="bibr" rid="B44">1984</xref>; Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>). In order to avoid directly comparing syllables with different onsets, we also substituted /ta2/ with /pa2/.</p>
<p>Written Mandarin words corresponding to each of the 24 syllables were selected based on the word frequency data from the Modern Chinese Balanced Corpus (Xiao, <xref ref-type="bibr" rid="B53">2010</xref>, corpus size = 100 million words). Only the words labeled as &#x0201C;most common&#x0201D; were selected. None of the selected words was a bound morpheme in Mandarin. For the complete list of stimuli, see <xref ref-type="supplementary-material" rid="SM1">Appendix A</xref>.</p>
<p>The selected words were embedded in a carrier phrase &#x08BF7;&#x08BF4;___&#x02192;&#x06B21; (/<inline-formula><mml:math id="M1"><mml:mrow><mml:msup><mml:mrow><mml:mover accent='true'><mml:mrow><mml:mtext>t</mml:mtext><mml:mo>&#x00255;</mml:mo></mml:mrow><mml:mo stretchy='true'>&#x02322;</mml:mo></mml:mover></mml:mrow><mml:mtext>h</mml:mtext></mml:msup></mml:mrow></mml:math></inline-formula>i&#x0014B;3 &#x00282;w&#x00254;1 ____ ji2 <inline-formula><mml:math id="M2"><mml:mrow><mml:msup><mml:mrow><mml:mover accent='true'><mml:mrow><mml:mtext>tS</mml:mtext></mml:mrow><mml:mo stretchy='true'>&#x02322;</mml:mo></mml:mover></mml:mrow><mml:mtext>h</mml:mtext></mml:msup></mml:mrow></mml:math></inline-formula>i4/, &#x02018;Please say ____ one time.&#x02019;)<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>, and visually presented to the participants. The visual prompts included the entire carrier phrase in Chinese characters, with the stimulus word both in Chinese characters and Pinyin<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>.</p>
</sec>
<sec>
<title>Procedure</title>
<p>The experiment took place in a sound-attenuated booth at the Phonetics and Phonology Lab at George Mason University. Participants were seated in front of a Macbook computer that presented the stimuli. Their productions were digitally recorded onto a separate Macbook Pro, using a lapel microphone (R&#x000F8;de smartLav&#x0002B;) and an external Focusrite Scarlette Solo 2nd Generation audio-interface, with a sampling rate of 44.1 kHz via the Praat program (Boersma and Weenink, <xref ref-type="bibr" rid="B5">2020</xref>). The microphone was attached to the participants&#x00027; shirt on the upper chest, &#x0007E;6 inches away from the speakers&#x00027; mouth.</p>
<p>The visual prompts for stimuli were presented to the participants one at a time in the middle of the laptop screen using PsychoPy (Peirce, <xref ref-type="bibr" rid="B45">2007</xref>). In order to elicit a comparatively stable speaking rate across participants, the sentences were presented with a fixed inter-stimulus interval of 3.5 seconds. Participants were instructed to read aloud each sentence on the laptop screen as naturally as possible. All written and oral instructions were provided in Mandarin.</p>
<p>Each stimulus (24 words) was repeated 6 times in randomized orders, resulting in a total of 144 trials per speaker. The 144 trials were presented in two blocks of 3 repetitions, with a self-paced break between the blocks. Beforehand, a short practice block with 2 trials was included to familiarize the participants with the task. The recording session took approximately 10 minutes.</p>
</sec>
<sec>
<title>Measurements and Data Preparation</title>
<p>All measurements were taken using Praat (Boersma and Weenink, <xref ref-type="bibr" rid="B5">2020</xref>) by one of the authors (YG). Before taking the measurements, 23 of 3,600 (144 tokens <sup>&#x0002A;</sup> 25 speakers, 0.6%) tokens were removed due to production errors (e.g., not producing the target word, hesitation, self-correction, unintended noise such as coughing or clearing throat, etc.). For the remaining tokens, three different acoustic landmarks were labeled for each target token with the stop onset: (1) the onset of the stop burst, (2) the onset of the periodicity of the vowel following the stop consonant, and (3) the offset of the vowel second formant. VOT was calculated by subtracting (1) from (2), and the vowel duration by subtracting (2) from (3). For the fillers with the sonorant onset, the segmentation between the approximant onset /w/ and the following vowel was determined by visual inspection of the spectral patterns. The boundary was located at the point where the second formant (F2) moved up from the steady-state (Peterson and Lehiste, <xref ref-type="bibr" rid="B46">1960</xref>), as well as the amplitude increased suddenly. The higher formants were used when F2 was not useful.</p>
<p>The f0 values from 20 equidistant points of the post-onset vowel, and then the first 8 (out of 20) f0 values (from the first 35% of the vowel) were used in the subsequent statistical analyses. As the duration of Mandarin sentence-medial vowels varies according to the lexical tones (e.g., Deng et al., <xref ref-type="bibr" rid="B11">2006</xref>), the absolute duration of the 35% of the vowel used in this time-normalized method differs across the tones (mean duration for T1 75 ms; T2 80 ms; T3 75 ms; and T4 71 ms)<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. The f0 values were extracted using a Praat script, with a 600 Hz pitch ceiling, a 75 Hz pitch floor, and a 10 ms time step. Any tracking errors were hand-corrected. In this process, an additional 5.3% of the data were removed due to unreliable f0 tracking when the vowel was not modal-voiced. A large portion of these excluded data was due to creaky voice, mostly in T3, but to a smaller extent in the other tones, when the f0 was low (see Kuang, <xref ref-type="bibr" rid="B32">2017</xref>, for the discussion on creaky voice in different Mandarin tones).</p>
</sec>
</sec>
<sec>
<title>Results</title>
<p>All statistical analyses in this study were conducted in R (R Core Team., <xref ref-type="bibr" rid="B47">2021</xref>). To investigate the f0 perturbation in different tonal contexts, we built a series of linear mixed-effects models using the <italic>lme4</italic> package (Bates et al., <xref ref-type="bibr" rid="B3">2014</xref>) on the normalized f0 (z-score). Z-scores were used instead of the raw f0 values (Hz), to facilitate comparisons across different speakers. In the initial model, we included the following factors as the fixed effects: <sc>onset</sc> (aspirated, unaspirated, sonorant), lexical <sc>tone</sc> (T1, T2, T3, T4), <sc>vowel</sc> height (low, high), <sc>time</sc> points (eight categories from 0 to 7), and their interactions. <sc>Time</sc> was coded using the orthogonal polynomial coding scheme and the rest of the fixed factors were Helmert-coded. The random effects structure of the model was determined using a forward best path algorithm (Barr et al., <xref ref-type="bibr" rid="B2">2013</xref>), and the final model included by-<sc>subject</sc> random intercept, as well as by-<sc>subject</sc> slopes for <sc>onset, tone</sc>, and <sc>vowel</sc>. The best fitting model was selected by comparing models using the likelihood ratio tests. The interactions <sc>onset</sc> <sup>&#x0002A;</sup> <sc>tone</sc> <sup>&#x0002A;</sup> <sc>vowel</sc> <sup>&#x0002A;</sup> <sc>time</sc> and <sc>tone</sc> <sup>&#x0002A;</sup> <sc>time</sc> <sup>&#x0002A;</sup> <sc>vowel</sc> did not improve the model fit [&#x003C7;<sup>2</sup> = 10.60, <italic>p</italic> = 0.99; &#x003C7;<sup>2</sup> = 15.90, <italic>p</italic> = 0.78, respectively] and, thus, they were discarded. Consequently, the best model included four predictors <sc>onset, tone, vowel</sc>, and <sc>time</sc> with the three-way interactions <sc>onset</sc> <sup>&#x0002A;</sup>  <sc>tone</sc> <sup>&#x0002A;</sup>  <sc>time, onset</sc> <sup>&#x0002A;</sup>  <sc>time</sc> <sup>&#x0002A;</sup>  <sc>vowel</sc>, and <sc>onset <sup>&#x0002A;</sup> tone <sup>&#x0002A;</sup> vowel</sc>. The outcome of this final model is in <xref ref-type="supplementary-material" rid="SM1">Appendix B (Table B1</xref>).</p>
<p>Here, we present <italic>p</italic>-values for each significant factor and interaction obtained from the likelihood ratio tests comparing the best model and the model without the factor/interaction under consideration. Significant interactions were followed by <italic>post-hoc</italic> analyses using Tukey&#x00027;s HSD tests using the <italic>emmeans</italic> package (Lenth, <xref ref-type="bibr" rid="B34">2020</xref>). If a predictor is significant in multiple interactions (or a main effect and interactions), only the highest-level interaction is reported along with the results of <italic>post-hoc</italic> testing.</p>
<p>We found the following significant interactions: <sc>onset: tone: vowel</sc> [&#x003C7;<sup>2</sup> = <sc>2843.1</sc>, <italic>p</italic> &#x0003C; 0.0001], <sc>onset: tone: time</sc> [&#x003C7;<sup>2</sup> = <sc>1667.5</sc>, <italic>p</italic> &#x0003C; 0.0001], <sc>onset: time: vowel</sc> [&#x003C7;<sup>2</sup> = 250.8, <italic>p</italic> &#x0003C; 0.0001]. As the predictor of our main interest, <sc>onset</sc>, was involved in multiple three-way interactions, we conducted the <italic>post-hoc</italic> Tukey pairwise comparisons on <sc>onset</sc> <sup>&#x0002A;</sup><sc>tone</sc> <sup>&#x0002A;</sup><sc>vowel</sc> <sup>&#x0002A;</sup><sc>time</sc>. The results of the pairwise comparisons are summarized in <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T3">3</xref>, <xref ref-type="table" rid="T4">4</xref>, using the differences between the &#x003B2; coefficient values of different onset consonants. Shaded in <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T3">3</xref>, <xref ref-type="table" rid="T4">4</xref> are the cells with significant f0 differences presumably attributable to onset consonants &#x02013; that is, the cells with unidirectional f0 differences starting from time point 0 (closest to the onset consonant) and continuing without a break.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>F0 difference (z-score): aspirated&#x02013;unaspirated (Tukey HSD <italic>post-hoc</italic> pairwise comparisons).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Time points</bold></th>
<th valign="top" align="center"><bold>0 (0%)</bold></th>
<th valign="top" align="right"><bold>1 (5%)</bold></th>
<th valign="top" align="right"><bold>2 (10%)</bold></th>
<th valign="top" align="right"><bold>3 (15%)</bold></th>
<th valign="top" align="right"><bold>4 (20%)</bold></th>
<th valign="top" align="right"><bold>5 (25%)</bold></th>
<th valign="top" align="right"><bold>6 (30%)</bold></th>
<th valign="top" align="right"><bold>7 (35%)</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Tone</bold></th>
<th valign="top" align="center"><bold>Vowel</bold></th>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">T1</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.11&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.18&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.19&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.18&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.12&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.27&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.23&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.20&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.17&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.16&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T2</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.20&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">&#x02212;0.06<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">&#x02212;0.02</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">0.01</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.03</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.01</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.03</td>
<td valign="top" align="right">0.04</td>
<td valign="top" align="right">0.07&#x0002A;</td>
<td valign="top" align="right">0.07&#x0002A;</td>
<td valign="top" align="right">0.07&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T3</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.32&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.26&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.17&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.14&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.11&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.12&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.08&#x0002A;</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.14&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.16&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.14&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.11&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.10&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">&#x02212;0.07&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T4</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.08&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.10&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.06<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">0.03</td>
<td valign="top" align="right">&#x02212;0.04</td>
<td valign="top" align="right">&#x02212;0.06</td>
<td valign="top" align="right">&#x02212;0.09&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.29&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.23&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.05</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">&#x02212;0.03</td>
<td valign="top" align="right">&#x02212;0.06<sup>(&#x0002A;)</sup></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Significance codes: <sup>&#x0002A;&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.001, <sup>&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.01, <sup>&#x0002A;</sup> for p &#x0003C; 0.05, and <sup>(&#x0002A;)</sup> for p &#x0003C; 0.1. Shaded cells indicate significant f0 differences that are unidirectional starting from time point 0 and continuing without a break</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Aspirated plosives&#x00027; VOT in different tonal contexts (Tukey HSD <italic>post-hoc</italic> comparisons).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Tonal contrast</bold></th>
<th valign="top" align="right"><bold>Estimate (&#x003B2;)</bold></th>
<th valign="top" align="center"><bold>df</bold></th>
<th valign="top" align="right"><bold>t ratio</bold></th>
<th valign="top" align="right"><italic><bold>p</bold></italic><bold>-value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">T1&#x02013;T2</td>
<td valign="top" align="right">&#x02212;8.114</td>
<td valign="top" align="center">2347</td>
<td valign="top" align="right">&#x02212;7.271</td>
<td valign="top" align="right">&#x0003C;0.0001&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T1&#x02013;T3</td>
<td valign="top" align="right">&#x02212;14.730</td>
<td valign="top" align="center">2347</td>
<td valign="top" align="right">&#x02212;13.244</td>
<td valign="top" align="right">&#x0003C;0.0001&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T1&#x02013;T4</td>
<td valign="top" align="right">0.494</td>
<td valign="top" align="center">2347</td>
<td valign="top" align="right">0.444</td>
<td valign="top" align="right">0.9708</td>
</tr>
<tr>
<td valign="top" align="left">T2&#x02013;T3</td>
<td valign="top" align="right">&#x02212;6.616</td>
<td valign="top" align="center">2347</td>
<td valign="top" align="right">&#x02212;5.928</td>
<td valign="top" align="right">&#x0003C;0.0001&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T2&#x02013;T4</td>
<td valign="top" align="right">8.608</td>
<td valign="top" align="center">2347</td>
<td valign="top" align="right">7.701</td>
<td valign="top" align="right">&#x0003C;0.0001&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T3&#x02013;T4</td>
<td valign="top" align="right">15.224</td>
<td valign="top" align="center">2347</td>
<td valign="top" align="right">13.666</td>
<td valign="top" align="right">&#x0003C;0.0001&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Significance codes: <sup>&#x0002A;&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.001, <sup>&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.01, <sup>&#x0002A;</sup> for p &#x0003C; 0.05, and <sup>(&#x0002A;)</sup> for p &#x0003C; 0.1</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>F0 difference (z-score): unaspirated&#x02013;sonorant (Tukey HSD <italic>post-hoc</italic> pairwise comparisons).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Time points</bold></th>
<th valign="top" align="center"><bold>0 (0%)</bold></th>
<th valign="top" align="right"><bold>1 (5%)</bold></th>
<th valign="top" align="right"><bold>2 (10%)</bold></th>
<th valign="top" align="right"><bold>3 (15%)</bold></th>
<th valign="top" align="right"><bold>4 (20%)</bold></th>
<th valign="top" align="right"><bold>5 (25%)</bold></th>
<th valign="top" align="right"><bold>6 (30%)</bold></th>
<th valign="top" align="right"><bold>7 (35%)</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Tone</bold></th>
<th valign="top" align="center"><bold>Vowel</bold></th>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">T1</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.05</td>
<td valign="top" align="right">&#x02212;0.02</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.06<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">&#x02212;0.06<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">&#x02212;0.08&#x0002A;</td>
<td valign="top" align="right">&#x02212;0.07&#x0002A;</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.17&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.11&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.05</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">&#x02212;0.01</td>
<td valign="top" align="right">&#x02212;0.02</td>
</tr>
<tr>
<td valign="top" align="left">T2</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.24&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.06<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">&#x02212;0.01</td>
<td valign="top" align="right">&#x02212;0.04</td>
<td valign="top" align="right">&#x02212;0.06<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">&#x02212;0.07&#x0002A;</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.14&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.07&#x0002A;</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">&#x02212;0.03</td>
<td valign="top" align="right">&#x02212;0.07&#x0002A;</td>
<td valign="top" align="right">&#x02212;0.12&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">&#x02212;0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">&#x02212;0.16&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T3</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.27&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.20&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.12&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.08&#x0002A;</td>
<td valign="top" align="right">0.07<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">0.05</td>
<td valign="top" align="right">0.04</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.22&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.17&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.10&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.07&#x0002A;</td>
<td valign="top" align="right">0.05</td>
<td valign="top" align="right">0.03</td>
<td valign="top" align="right">0.00</td>
</tr>
<tr>
<td valign="top" align="left">T4</td>
<td valign="top" align="center">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.16&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.09&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.03</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.03</td>
<td valign="top" align="right">0.03</td>
<td valign="top" align="right">0.03</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.24&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.19&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.17&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.12&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Significance codes: <sup>&#x0002A;&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.001, <sup>&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.01, <sup>&#x0002A;</sup> for p &#x0003C; 0.05, and <sup>(&#x0002A;)</sup> for p &#x0003C; 0.1. Shaded cells indicate significant f0 differences that are unidirectional starting from time point 0 and continuing without a break</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>F0 difference (z-score): aspirated&#x02013;sonorant (Tukey HSD <italic>post-hoc</italic> pairwise comparisons).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left" colspan="2"><bold>Time points</bold></th>
<th valign="top" align="left"><bold>0 (0%)</bold></th>
<th valign="top" align="right"><bold>1 (5%)</bold></th>
<th valign="top" align="right"><bold>2 (10%)</bold></th>
<th valign="top" align="right"><bold>3 (15%)</bold></th>
<th valign="top" align="right"><bold>4 (20%)</bold></th>
<th valign="top" align="right"><bold>5 (25%)</bold></th>
<th valign="top" align="right"><bold>6 (30%)</bold></th>
<th valign="top" align="right"><bold>7 (35%)</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Tone</bold></th>
<th valign="top" align="left"><bold>Vowel</bold></th>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
<th/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">T1</td>
<td valign="top" align="left">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.23&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.21&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.17&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.14&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.12&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.08&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.07&#x0002A;</td>
<td valign="top" align="right">0.06<sup>(&#x0002A;)</sup></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.44&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.35&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.29&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.22&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.18&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.15&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.12&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.10&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T2</td>
<td valign="top" align="left">Low</td>
<td valign="top" align="right">0.04</td>
<td valign="top" align="right">0.02</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">&#x02212;0.02</td>
<td valign="top" align="right">&#x02212;0.04</td>
<td valign="top" align="right">&#x02212;0.04</td>
<td valign="top" align="right">&#x02212;0.04</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.16&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.08&#x0002A;</td>
<td valign="top" align="right">0.04</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">&#x02212;0.03</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.07<sup>(&#x0002A;)</sup></td>
<td valign="top" align="right">&#x02212;0.08&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">T3</td>
<td valign="top" align="left">Low</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.06</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.03</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.04</td>
<td valign="top" align="right">&#x02212;0.04</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.08&#x0002A;</td>
<td valign="top" align="right">0.01</td>
<td valign="top" align="right">0.00</td>
<td valign="top" align="right">&#x02212;0.03</td>
<td valign="top" align="right">&#x02212;0.04</td>
<td valign="top" align="right">&#x02212;0.05</td>
<td valign="top" align="right">&#x02212;0.06</td>
<td valign="top" align="right">&#x02212;0.07<sup>(&#x0002A;)</sup></td>
</tr>
<tr>
<td valign="top" align="left">T4</td>
<td valign="top" align="left">Low</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.25&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.19&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.12&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.08&#x0002A;</td>
<td valign="top" align="right">0.04</td>
<td valign="top" align="right">&#x02212;0.01</td>
<td valign="top" align="right">&#x02212;0.03</td>
<td valign="top" align="right">&#x02212;0.06</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">High</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.53&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.41&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.31&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.24&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.18&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.13&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right" style="background-color:#b4b3b2">0.11&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="right">0.06<sup>(&#x0002A;)</sup></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Significance codes: <sup>&#x0002A;&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.001, <sup>&#x0002A;&#x0002A;</sup> for p &#x0003C; 0.01, <sup>&#x0002A;</sup> for p &#x0003C; 0.05, and <sup>(&#x0002A;)</sup> for p &#x0003C; 0.1. Shaded cells indicate significant f0 differences that are unidirectional starting from time point 0 and continuing without a break</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> presents the mean f0 contours of the post-onset vowels. The contours are smoothed with loess and the shading displays a 95% confidence interval. To facilitate the visual interpretation of the figure, the z-normalized f0 is converted back to the Hz scale using the group mean (Brunelle et al., <xref ref-type="bibr" rid="B6">2020</xref>), and the f0 contours of the entire duration of the post-onset vowels are plotted instead of the first 35% used in the statistical analysis. The vertical dotted line is added to indicate the 35% threshold included in the statistical analysis. The f0 contours are time-normalized, aligned from the voicing onset to the vowel offset (see Xu and Xu, <xref ref-type="bibr" rid="B56">2021</xref>, for the comparison between different alignments). Individual speakers&#x00027; production data are presented in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Materials</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Normalized F0 of Mandarin syllables.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-896013-g0001.tif"/>
</fig>
<sec>
<title>Aspirated and Unaspirated Stops</title>
<p>The f0 contours following an aspirated plosive (f0-<sc>asp</sc>) and those following an unaspirated plosive (f0-<sc>unasp</sc>) showed distinct patterns, but both the direction and the duration of the f0 differences varied according to the tonal contexts (<xref ref-type="table" rid="T1">Table 1</xref>). As for the direction of the f0 differences, f0-<sc>asp</sc> was higher than f0-<sc>unasp</sc> (indicated by positive numbers in <xref ref-type="table" rid="T1">Table 1</xref>) in T1 and T4, while the pattern showed the opposite direction in T2 and T3 (with the exception for /t<sup>h</sup>u2/&#x0007E;/tu2/ pair which showed no significant difference). The perturbation duration was also mediated by the tonal contexts. Specifically, the longest perturbation duration was observed in T1 and T3. In T1, the f0-<sc>asp</sc> differed significantly from the f0-<sc>unasp</sc> throughout the selected 35% of the vowel in T1 and T3 (corresponding to the mean duration of 75 ms in both tones), followed by T4 (10&#x0007E;15% or 20&#x0007E;30 ms). The perturbation due to aspiration (or lack thereof) was fairly limited in T2, either to the vowel onset (5% or 11 ms) in the /a/ context or not significant in the /u/ context.</p>
<p>As for the effects of <sc>vowel</sc>, the syllables with the high vowel /u/ had higher f0 than those with the low /a/, showing the expected vowel-intrinsic f0 patterns (e.g., Whalen and Levitt, <xref ref-type="bibr" rid="B52">1995</xref>). This effect of vowel-intrinsic f0 was greater in high-initial tones than in low-initial tones (see <xref ref-type="fig" rid="F1">Figure 1</xref>). In addition, the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> in high-initial tones was greater in the /u/-contexts than in the /a/-contexts, but the same difference in low-initial tones was greater in the /a/-contexts than in the /u/-contexts.</p>
<p>In addition, aspirated plosives had longer VOT than unaspirated plosives, as expected (<xref ref-type="fig" rid="F2">Figure 2</xref>). The influence of plosive <sc>aspiration</sc> (aspirated, <bold>unaspirated</bold>), lexical <sc>tone</sc> (<bold>T1</bold>, T2, T3, T4), and <sc>vowel</sc> height (<bold>low</bold>, high) on VOT (ms) was examined in a linear mixed effect model (Bates et al., <xref ref-type="bibr" rid="B3">2014</xref>). The reference levels are bold-faced. The model included the interactions among the fixed factors, and by-<sc>subject</sc> random intercept. The model output is presented in <xref ref-type="supplementary-material" rid="SM1">Appendix B (Table B2</xref>). The results revealed a significant three-way interaction <sc>aspiration <sup>&#x0002A;</sup> tone <sup>&#x0002A;</sup> vowel</sc>, and the follow-up Tukey&#x00027;s HSD tests (Lenth, <xref ref-type="bibr" rid="B34">2020</xref>) confirmed that aspirated and unaspirated stops were significantly different [&#x003B2; = &#x02212;97.7, <italic>p</italic> &#x0003C; 0.0001]. Of interest to the current study, we also found significant effects of <sc>tone</sc> on the VOT of aspirated plosives. As shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, the VOTs of aspirated plosives were the longest in T3, followed by T2, and T1 and T4 had the shortest VOT.<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> The results of the <italic>post-hoc</italic> Tukey&#x00027;s HSD comparisons are in <xref ref-type="table" rid="T2">Table 2</xref>. The VOTs of the unaspirated plosives did not show such effects of <sc>tone</sc>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Distribution of VOT across different tones. Dashed lines represent the mean VOT values.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-896013-g0002.tif"/>
</fig>
</sec>
<sec>
<title>Comparing Obstruents and Sonorants</title>
<p>Although the current study mainly aims to examine the f0 difference between aspirated and unaspirated plosives, we also compared f0-<sc>son</sc> (f0 following a sonorant onset) with f0-<sc>asp</sc> and f0-<sc>unasp</sc>. Across different tone and vowel contexts, f0-<sc>unasp</sc> was consistently greater than f0-<sc>son</sc>, at least at the vowel onset (see <xref ref-type="table" rid="T3">Table 3</xref>). The difference between f0-<sc>asp</sc> and f0-<sc>son</sc> was less consistent (<xref ref-type="table" rid="T4">Table 4</xref>), varying mostly with the tonal contexts, in the same way as the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc>.</p>
<p>The duration of f0 perturbation varied in different tones as well as in different vowel contexts. The difference between f0-<sc>asp</sc> and f0-<sc>son</sc> mirrored the patterns showed between f0-<sc>asp</sc> and f0-<sc>unasp</sc> in high-initial tones. The difference between f0-<sc>asp/unasp</sc> and f0-<sc>son</sc> also showed some influence of the vowel context. The difference lasted longer in the /u/-contexts than in the /a/-contexts in high-initial tones, but not in low-initial tones.</p>
</sec>
</sec>
<sec>
<title>Interim Summary and Discussion</title>
<p>To summarize, the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> showed opposite directions in high-initial tones and low-initial tones. On the other hand, the most consistent f0 difference across different tonal contexts was observed between f0-<sc>unasp</sc> and f0-<sc>son</sc> such that f0-<sc>unasp</sc> was consistently higher than f0-<sc>son</sc>. These outcomes suggest aspiration and voicing (or lack of voicing and aspiration) separately influenced the f0 at the vowel onset.</p>
<p>First, aspirated plosives, compared to unaspirated plosives, influenced the f0 in different directions in high- vs. low-initial tones. Among the voiceless plosives, aspiration cooccurred with high f0 in the high-initial tones but with low f0 in the low-initial tones. The duration of this aspiration effect also depended on the tonal context. The difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> in the current study lasted the longest in T1 and T3, followed by T4. T2 showed little, if any, perturbation due to aspiration.</p>
<p>Second, although our main goal was to examine the perturbation due to consonant aspiration, we could also observe the voicing effect. F0-<sc>son</sc> was consistently lower than f0-<sc>unasp</sc>, suggesting that voicelessness raised (or voicing lowered) post-onset f0, consistent with the commonly observed cross-linguistic pattern. This effect was consistent throughout all tones.</p>
<p>The difference between f0-<sc>asp</sc> and f0-<sc>son</sc> seemed to reflect the interaction of these two effects. That is, if the f0-<sc>son</sc> could be considered as the baseline, voicelessness (both unaspirated and unaspirated) raised f0, and in low-initial tones, aspiration lowered f0, resulting in little difference between f0-<sc>asp</sc> and f0-<sc>son</sc>. On the other hand, in high-initial tones, both aspiration and voicelessness raised f0, leading to a greater difference between f0-<sc>asp</sc> and f0-<sc>son</sc>.</p>
<p>The effect of vowel height interacted with the tonal contexts such that in high-initial tones, syllables with a high vowel showed greater f0 perturbation than those with a low vowel; in low-initial tones, syllables with a low vowel showed greater perturbation effects.</p>
</sec>
</sec>
<sec id="s3">
<title>Experiment 2: Perception</title>
<p>Although complicated, the observed f0 perturbation patterns in Experiment 1 can be predicted as a function of the consonant&#x00027;s laryngeal category and the lexical tone. In this regard, the findings from Experiment 1 suggest that Mandarin has a systematic f0 perturbation at least for the tested speakers. Experiment 2 examines the perception of plosive aspiration contrast by the same Mandarin speakers. The purpose is to investigate whether Mandarin speakers, who produce systematically different f0 contours after aspirated and unaspirated plosives, use the f0 information to perceive the plosives&#x00027; laryngeal categories.</p>
<sec>
<title>Methods</title>
<sec>
<title>Participants</title>
<p>The same individuals from Experiment 1 also participated in Experiment 2. Related to the task of Experiment 2, all participants reported to be right-handed.</p>
</sec>
<sec>
<title>Stimuli</title>
<p>Perception stimuli were created by recording natural productions of the syllables /t<sup>h</sup>u/ in isolation, and manipulating them in Praat (Boersma and Weenink, <xref ref-type="bibr" rid="B5">2020</xref>) to create a series of stops covarying in VOT and f0. A female native Mandarin speaker recorded the base syllables in four tones (i.e., /t<sup>h</sup>u1/, /t<sup>h</sup>u2/, /t<sup>h</sup>u3/, /t<sup>h</sup>u4/) in isolation. Aspirated stops were selected as the base tokens and unaspirated tokens were created by removing the aspirated portions from the base tokens. Consistent with previous studies using similar methods (e.g., Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>), removing the aspiration noise and shortening the VOT resulted in more natural sounding tokens than adding in aspiration noise and lengthening the VOT in our pilot works. The high back vowel /u/ was selected because /u/ provides a full set (all four tones) of real Mandarin words for both aspirated and unaspirated alveolar stops. We wanted to avoid the situation in which one of the choices is a word and the other is not. In addition, the vowel contexts did not influence the results in our pilot works using both /a/ and /u/ vowels.</p>
<p>To obtain a fine-grained picture of the respective roles of VOT and f0 in the perception of Mandarin stop aspiration, 49 distinct syllables were initially created from each of the four base tokens (i.e., /t<sup>h</sup>u1/, /t<sup>h</sup>u2/, /t<sup>h</sup>u3/, /t<sup>h</sup>u4/). The 49 syllables covaried in stop VOT and post-stop f0, by fully crossing 7 steps of VOT and 7 steps of post-stop f0.</p>
<p>The mean VOT duration of the 4 base tokens was 99 ms, and the VOT step size was approximately 14 ms. Starting at the nearest zero crossing point from the end of the stop burst, about 14 ms of aspiration was manually removed incrementally in Praat until the VOT of the base token was around 14 ms. As a result, mean VOT values for each step were as follows: step 1 = 14 ms, step 2 = 28 ms, step 3 = 42 ms, step 4 = 56 ms, step 5 = 72 ms, step 6 = 86 ms, and step 7 = 99 ms.</p>
<p>Post-plosive f0 was manipulated using the TD-PSOLA (Moulines and Charpentier, <xref ref-type="bibr" rid="B42">1990</xref>) implemented in Praat. First, the first 35% of the vowel was selected, and then the pitch curve of the selected vowel portion was simplified with the stylize function in Praat (frequency resolution 2 Hz). The onset f0 for each of the base tokens before manipulation were T1 = 323 Hz, T2 = 241 Hz, T3 = 210 Hz, and T4 = 371 Hz. Then, to create the 7 steps of post-plosive f0, the initial pitch point was either raised or lowered by 20 Hz, 40 Hz, and 60 Hz. F0 during the rest of the 35% of the vowel was proportionately increased or decreased. All the tokens were resynthesized with TD-PSOLA after the manipulation.</p>
<p>The tokens after manipulation were checked by four Mandarin native listeners for their naturalness, and all were judged to be good tokens of the original syllables. We conducted a pilot study with additional four Mandarin listeners, and VOT step 6 (84 ms) and step 7 (99 ms) never elicited different perceptual responses and, thus, VOT step 6 stimuli were removed from the experiment to keep the experiment short. The final set of perception stimuli included 168 (4 tones <sup>&#x0002A;</sup> 7 steps of f0 <sup>&#x0002A;</sup> 6 steps of VOT) unique tokens.</p>
</sec>
<sec>
<title>Procedure</title>
<p>Experiment 2, the perception experiment, was conducted after the production experiment, out of the concern that listening to the stimuli would influence the subsequent productions of the related sounds. After completing the production experiment, participants took a 5-min break before beginning the perception experiment.</p>
<p>Using PsychoPy (Peirce, <xref ref-type="bibr" rid="B45">2007</xref>), the participants were presented with a forced-choice identification task. While listening to the stimuli, two Chinese characters constituting the aspirated and unaspirated pairs (e.g., &#x07A81;/t<sup>h</sup>u1/ vs. &#x07763;/tu1/) were displayed on the laptop screen. Thirteen participants saw the screen with /t<sup>h</sup>/- syllables on the left and /t/-syllables on the right, and 12 participants saw the opposite. The auditory stimuli were presented through Sennheiser HD 280 pro headphones. The participants were instructed to choose the word they heard by selecting one of the two characters using a Cedrus button box (model RB-740).</p>
<p>The experiment was blocked by the lexical tones and the order among the blocks was counter-balanced across participants. Within each block, each of the 42 tokens (7 f0 steps <sup>&#x0002A;</sup> 6 VOT steps) was repeated three times in different random orders. There were self-paced breaks between blocks. The entire task took about 20 minutes.</p>
</sec>
</sec>
<sec>
<title>Results</title>
<p>A total of 12,600 responses (25 participants <sup>&#x0002A;</sup> 4 blocks <sup>&#x0002A;</sup> 42 tokens <sup>&#x0002A;</sup> 3 repetitions) were collected. Prior to the statistical analyses, the responses with the reaction time (measured from the onset of the audio stimuli to the button hit) that are more than 3 standard deviations away from the participant&#x00027;s mean (232 responses, 1.8%) were discarded. Then, to determine the influence of each acoustic property (VOT, post-plosive f0) on the identification of the onset laryngeal category, the responses (aspirated vs. unaspirated) were statistically analyzed using the binary logistic regression models built with the <italic>lme4</italic> packages in R (Bates et al., <xref ref-type="bibr" rid="B3">2014</xref>). The reference category for the responses was aspirated and, thus, the coefficients &#x003B2; represent the log odds of unaspirated responses. The full model initially included <sc>VOT step</sc>, <sc>F0 step</sc>, <sc>tone</sc> (T1, T2, T3, T4), and their interactions, as fixed effects. <sc>VOT step</sc> (1&#x02013;7 without step 6) and <sc>F0 step</sc> (1-7) were included as continuous variables. <sc>Tone</sc> was orthogonally contrast coded (T1, T4 vs. T2, T3; T1 vs. T4; T2 vs. T3) to examine whether there are significant response differences between the high-initial tones (T1, T4) and the low-initial tones (T2, T3), as well as within the two tonal groups. The random effects structure of the model was determined using a forward best path algorithm (Barr et al., <xref ref-type="bibr" rid="B2">2013</xref>), and the final model included by-<sc>subject</sc> and by-<sc>word</sc> intercepts, as well as by-<sc>subject</sc> slopes for <sc>VOT step, F0 step</sc>, and <sc>tone.</sc> Interaction terms between fixed effects were included if they were directly related to our research question or if their inclusion improved the model fit based on a likelihood ratio test (<italic>p</italic> &#x0003C; 0.05). As a result, the final model included <sc>F0 step</sc> <sup>&#x0002A;</sup> <sc>tone</sc> which was central to our research question. The full outcome of this final model is in <xref ref-type="supplementary-material" rid="SM1">Appendix B (Table B3</xref>). A graph of predicted responses is in <xref ref-type="fig" rid="F3">Figure 3</xref>. Raw response data for individual listeners are in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Materials</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Predicted perceptual responses based on the logistic regression model. Vertical lines represent the 95% confidence interval.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomm-07-896013-g0003.tif"/>
</fig>
<p>The likelihood ratio tests comparing the best model and the model without the predictor under consideration indicated that all fixed effects significantly influenced the listeners&#x00027; responses. First, <sc>VOT step</sc> significantly contributed to model fit [&#x003C7;<sup>2</sup>= 45.56, <italic>p</italic> &#x0003C; 0.0001]. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, VOT step 1 elicited the highest rate of unaspirated responses across the four tones and, as the VOT increased, the possibility of unaspirated responses decreased. Second, the <sc>F0 step</sc> was also significant [&#x003C7;<sup>2</sup>= 14.37, <italic>p</italic> = 0.0062]: the higher the <sc>F0 step</sc> is, the less likely it is to elicit the unaspirated responses. Finally, as for <sc>tone</sc>, the first tonal contrast (T1, T4 vs. T2, T3) contributed significantly to model fit [&#x003C7;<sup>2</sup>= 30.47, <italic>p</italic> &#x0003C; 0.0001], and high-initial tones (T1 and T4) elicited significantly less unaspirated responses than low-initial tones (T2 and T3) [&#x003B2; = &#x02212;3.82, <italic>p</italic> &#x0003C; 0.0001]. The differences between the high-initial tones and the low-initial tones were the most conspicuous when VOT was short, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. For example, while the unaspirated responses were less than 50% at VOT step 2 in tones 1 and 4, in tones 2 and 3, a similar decrease was at step 3. This indicates that the stimuli belonging to the second step of VOT (28 ms), for instance, more likely elicited aspirated responses in the high-initial tones, but unaspirated responses in the low-initial tones. The second [<italic>p</italic> = 0.74] and third [<italic>p</italic> = 0.35] tonal contrasts were not significant, suggesting that listeners&#x00027; responses in T1 vs. T4 and T2 vs. T3 were not significantly different.</p>
<p>The interaction <sc>F0 step</sc>: <sc>tone</sc> was not significant [&#x003C7;<sup>2</sup> = <sc>5.37</sc>, <italic>p</italic> = 0.15], but was included in the model as it was central to our research question. To verify whether the effects of <sc>F0 step</sc> across different tones, displayed in <xref ref-type="fig" rid="F3">Figure 3</xref>, differed significantly, <italic>post-hoc</italic> Tukey tests were performed using the emtrends() function in the <italic>emmeans</italic> package (Lenth, <xref ref-type="bibr" rid="B34">2020</xref>). It has been suggested that <italic>post-hoc</italic> analyses on non-significant interactions can be informative when the main effects of the predictors participating in an interaction are significant (e.g., Wei et al., <xref ref-type="bibr" rid="B49">2012</xref>). The results of these <italic>post-hoc</italic> analyses suggest that the effects of <sc>F0 step</sc> did not differ as a function of <sc>tone.</sc> None of the pairwise comparisons were significant, as shown in <xref ref-type="table" rid="T5">Table 5</xref>. Therefore, the current data do not provide evidence that the <sc>F0 step</sc> effects were influenced by tones. Rather, the current outcome appears to suggest that Mandarin listeners associated high post-plosive f0 with aspirated plosives across different tones.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Estimated trend of F0 step on tonal contrast (Tukey HSD <italic>post-hoc</italic> comparisons).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Tonal contrast</bold></th>
<th valign="top" align="right"><bold>Estimate (&#x003B2;)</bold></th>
<th valign="top" align="center"><bold>Standard Error</bold></th>
<th valign="top" align="right"><bold>z. ratio</bold></th>
<th valign="top" align="center"><italic><bold>p</bold></italic><bold>-value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">T1&#x02013;T2</td>
<td valign="top" align="right">&#x02212;0.206</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="right">&#x02212;1.537</td>
<td valign="top" align="center">0.4152</td>
</tr>
<tr>
<td valign="top" align="left">T1&#x02013;T3</td>
<td valign="top" align="right">&#x02212;0.150</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="right">&#x02212;1.122</td>
<td valign="top" align="center">0.6758</td>
</tr>
<tr>
<td valign="top" align="left">T1&#x02013;T4</td>
<td valign="top" align="right">&#x02212;0.323</td>
<td valign="top" align="center">0.141</td>
<td valign="top" align="right">&#x02212;2.286</td>
<td valign="top" align="center">0.1013</td>
</tr>
<tr>
<td valign="top" align="left">T2&#x02013;T3</td>
<td valign="top" align="right">0.055</td>
<td valign="top" align="center">0.125</td>
<td valign="top" align="right">0.442</td>
<td valign="top" align="center">0.9712</td>
</tr>
<tr>
<td valign="top" align="left">T2&#x02013;T4</td>
<td valign="top" align="right">&#x02212;0.118</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="right">&#x02212;0.881</td>
<td valign="top" align="center">0.8148</td>
</tr>
<tr>
<td valign="top" align="left">T3&#x02013;T4</td>
<td valign="top" align="right">&#x02212;0.173</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="right">&#x02212;1.293</td>
<td valign="top" align="center">0.5674</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Interim Summary and Discussion</title>
<p>The current findings demonstrate, as expected, that VOT is the primary cue of aspiration contrast in Mandarin. The unaspirated responses decreased as VOT became longer, across all f0 steps and lexical tones. At VOT step 1 (14 ms), which falls in the typical VOT range of the Mandarin unaspirated plosives (e.g., Rochet and Fei, <xref ref-type="bibr" rid="B48">1991</xref>), the listeners provided the highest number of unaspirated responses, and starting from VOT step 4 (56 ms), the listeners tended to give mainly aspirated responses. The VOT categorical boundary for the aspirated-unaspirated plosives seemed to be different between high-initial tones vs. low-initial tones. Specifically, the VOT categorical boundaries occurred one step earlier in the high-initial tone stimuli than in the low-initial tone stimuli. At step 2, the low-initial tone stimuli yielded mostly unaspirated responses whereas the high-initial tone stimuli were more likely to yield aspirated responses (see <xref ref-type="fig" rid="F3">Figure 3</xref>).</p>
<p>Although VOT was clearly the most influential cue for the aspiration, the listeners still used post-plosive f0 in deciding whether the plosive was aspirated or not. The current outcomes related to the f0 steps and lexical tones commonly suggest that the listeners associated high post-plosive f0 with the aspirated stops and low post-plosive f0 with unaspirated stops. The stimuli with raised f0 elicited more aspirated responses than those with lowered f0. In addition, stimuli with low-initial tones (T2, T3) elicited significantly more unaspirated responses than stimuli with high-initial tones (T1, T4). This is consistent with the pattern observed in the production experiment in which the aspirated plosives in T2 and T3 had longer VOT than those in T1 and T4 (<xref ref-type="fig" rid="F2">Figure 2</xref>). This suggests that lower post-plosive f0, whether it be a part of the lexical tone or not, made the stops with an ambiguous VOT more likely to be judged as unaspirated than as aspirated.</p>
<p>Taken together, the current results suggest that Mandarin listeners extracted both consonantal and tonal information from f0 at the vowel onset. This perceptual pattern, however, did not precisely reflect the f0 perturbation observed in the same speakers&#x00027; production patterns. In production, the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> was not consistent across different tones, showing the opposite directions in high- vs. low-initial tones. Despite this divergent pattern in production, when VOT was ambiguous, the same speakers gave more aspirated responses in higher f0 steps both in high-initial and low-initial tones.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<sec>
<title>Post-onset F0 in Production</title>
<p>The main findings of Experiment 1, which compares f0-<sc>asp</sc>, f0-<sc>unasp</sc>, and f0-<sc>son</sc> in four tonal contexts, can be summarized as the following. First, the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> shows the opposite directions in high-initial tones and low-initial tones. In high-initial tones, f0-<sc>asp</sc> is higher than f0-<sc>unasp</sc> whereas f0-<sc>asp</sc> is lower than f0-<sc>unasp</sc> in low-initial tones. Second, f0-<sc>unasp</sc> is consistently higher than f0-<sc>son</sc> throughout the tonal contexts. Third, the difference between f0-<sc>asp</sc> and f0-<sc>son</sc> reflects the combination of these two effects. These outcomes suggest that the f0 at the vowel onset in Mandarin shows two separate perturbation effects, one due to aspiration and the other due to voicing. Between aspirated and unaspirated voiceless plosives, f0-<sc>asp</sc> is higher in high-initial tones and lower in low-initial tones than f0-<sc>unasp</sc>. Between voiceless plosives and voiced sonorants, voicelessness raises (or voicing lowers) f0 across the tonal contexts. Consequently, the difference between f0-<sc>asp</sc> and f0-<sc>son</sc> is greater in high-initial tones than in low-initial tones.</p>
<p>The current findings on f0 perturbation due to aspiration are partially consistent with the conflicting previous findings on Mandarin. Our findings in the low-initial tones are in line with Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>), showing that aspiration lowers the post-plosive f0, compared to f0-<sc>unasp</sc>, in low-initial tones. At the same time, we also find that in high-initial tones, aspiration raises the post-plosive f0, again compared to f0-<sc>unasp</sc>, and this outcome is consistent with Luo&#x00027;s (<xref ref-type="bibr" rid="B37">2018</xref>) findings. The raising effects of the consonantal aspiration in Luo (<xref ref-type="bibr" rid="B37">2018</xref>) are greater in high-initial tones, the lowering effects in Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>) are greater in low-initial tones, and our data show both of these patterns. These findings, taken together, reaffirm the dichotomy between the high-initial and low-initial tones.</p>
<p>The exact source of this dichotomy is puzzling, but we suggest that the tonal dichotomy is consistent with the interpretation that the f0 perturbation due to aspiration in Mandarin is bio-mechanically motivated. The observed tonal dichotomy can be explained by the differences in the laryngeal settings utilized in different tones. According to Moisik et al. (<xref ref-type="bibr" rid="B41">2014</xref>), the larynx height in general is positively correlated with f0 in Mandarin tone productions. As the laryngeal setting influences the vocal fold tension (e.g., Honda et al., <xref ref-type="bibr" rid="B22">1999</xref>; Moisik et al., <xref ref-type="bibr" rid="B41">2014</xref>), in high tones, the larynx is usually raised and the vocal folds are stretched and stiffened whereas the larynx is lowered and the vocal folds are slackened in low tones. When vocal folds are stiffened, they are resistant to vibration (i.e., require a greater volume of air flowing more rapidly than slack folds), but once they are set to vibrate, they vibrate at a high frequency. Also, stiffer vocal folds are often accompanied by a narrower glottal opening during the voiceless portion of a plosive (e.g., McCrea and Morris, <xref ref-type="bibr" rid="B38">2005</xref>; Narayan and Bowden, <xref ref-type="bibr" rid="B43">2013</xref>). On the other hand, slackened vocal folds are more prone to vibration and a wide glottal opening during a plosive.</p>
<p>The difference in the status of the vocal folds and the glottis has two notable consequences in the current study. The first consequence is the VOT difference in high-initial vs. low-initial tones. In the current study, aspirated plosives in high-initial tones have shorter VOT than those in low-initial tones (see <xref ref-type="fig" rid="F2">Figure 2</xref>). According to McCrea and Morris (<xref ref-type="bibr" rid="B38">2005</xref>) and Narayan and Bowden (<xref ref-type="bibr" rid="B43">2013</xref>), stiff vocal folds and a narrow glottal opening result in shorter VOT of aspirated plosives, presumably accompanied by a faster airflow, in high f0 environments than in low f0 environments. The second consequence is the influence of aspiration on the post-plosive f0. Depending on the laryngeal settings for different tones, the influence of plosive aspiration on the post-plosive f0 can take different forms. According to the aerodynamic predictions, as claimed in Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>), aspirated plosives, with a greater volume of air escaping through glottis between the oral release and the voicing onset, have a lower subglottal air pressure than unaspirated plosives at the voicing onset. This results in the f0-<sc>asp</sc> being lower than f0-<sc>unasp</sc>. This pattern (f0-<sc>asp</sc> &#x0003C; f0-<sc>unasp</sc>) appears when the vocal folds are slack and the glottal opening is wider, as in the low-initial tones in Mandarin. We claim that, in the high-initial tones, the aerodynamic effect is manifested in a different form because of the high tension of the vocal folds. As stiff vocal folds are more resistant to vibration and require a faster airflow to vibrate, the subglottal air pressure would not go down as much even in aspirated plosives. That is, the laryngeal setting and the resulting vocal fold tension in the high-initial tones require a higher trans-glottal pressure threshold than those in the low-initial tones, to initiate phonation at the onset of voicing after the plosive release. If the subglottal air pressure were to go down to the same extent regardless of the vocal fold tension, the trans-glottal pressure difference would not have been enough for the stiff folds to vibrate in the high-initial tones. Consequently, in the high-initial tones, the faster airflow in aspirated plosives (than in unaspirated plosives, see also Klatt et al., <xref ref-type="bibr" rid="B30">1968</xref>), when combined with the high tissue tension and the narrow glottal opening, would increase the f0-<sc>asp</sc> more than f0-<sc>unasp</sc>. Chen (<xref ref-type="bibr" rid="B7">2011</xref>) proposes a similar dichotomy (tense vocal folds in a high-f0 context and slackened vocal folds in a low-f0 context, giving rise to distinct f0 perturbation patterns) for cross-linguistic variation. Our findings suggest that the tonal dichotomy can be observed even within a language.</p>
<p>Finally, although we suggest that the f0 perturbation in Mandarin is attributable to the biomechanics of the larynx, the current findings are also consistent, in several different aspects, with the claim that speakers of tonal languages would control the f0 perturbation to enhance (or not to impede) the tonal contrast (e.g., Hombert et al., <xref ref-type="bibr" rid="B20">1979</xref>; Francis et al., <xref ref-type="bibr" rid="B14">2006</xref>). First, the magnitude of the perturbation is greater in high-initial tones than in low-initial tones. Assuming that the high tones are salient in Mandarin (Luo, <xref ref-type="bibr" rid="B37">2018</xref>) and the tones that are already salient do not need to be further enhanced, Mandarin speakers have more room for f0 variation in high-initial tones than in low-initial tones. This, according to Luo (<xref ref-type="bibr" rid="B37">2018</xref>), is the reason why the f0 raising due to aspiration is greater in high-initial tones in her study. Our findings differ from Luo&#x00027;s (<xref ref-type="bibr" rid="B37">2018</xref>) that we observe not only the f0 raising in high-initial tones but also the lowering in low-initial tones. Still the size of the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> is greater in high-initial tones than in low-initial tones (<xref ref-type="table" rid="T1">Table 1</xref>), consistent with the claim that speakers would restrict the biomechanically-motivated f0 fluctuations when the tonal contrast is less salient and, thus, more vulnerable to misperception. Second, the perturbation lasts longer in the tones with a static f0 contour during the first half of the vowel than in those with a dynamic f0 contour. In Mandarin, the f0 contours for T1 and T3 are relatively steady during the first half of the vowel whereas those for T2 and T4 are more dynamic (see the section: Lexical tones). And the current findings indicate that the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> lasts the longest in T1 and T3, followed by T4, and then T2 (<xref ref-type="table" rid="T1">Table 1</xref>). This seems to provide evidence for the speakers&#x00027; control over f0 perturbation in a (subconscious) effort to preserve the tonal contrast. When the tones require dynamic f0 changes earlier in the vowel, speakers suppress the f0 variation automatically induced by the onset consonant. Tones with relatively steady f0 contours, on the other hand, would allow for more variability in f0 due to non-tonal factors, such as the aspiration of onset consonants.</p>
</sec>
<sec>
<title>Post-onset F0 in Perception</title>
<p>Our production data show that the f0 perturbation in Mandarin varies according to the lexical tones. As discussed in Post-onset f0 in production, this variation appears to be systematic, reflecting different laryngeal maneuvers for different tonal targets. Still, the same speakers, when they are presented with auditory stimuli varying in plosive VOT and post-plosive f0, are more likely to select the aspirated category when the post-plosive f0 is high and when VOT is ambiguous. The associations (high f0-aspirated and low f0-unaspirated) are valid even in the low-initial tones which show an opposite perturbation pattern in the production. In other words, there seems to be an intriguing mismatch between the production and the perception with regard to Mandarin speakers&#x00027; use of f0 as a cue for consonant aspiration.</p>
<p>We propose several different factors contributing to this apparent mismatch. First, listeners are more attentive to the phonetic patterns present in salient contexts. Since Mandarin high-initial tones are more salient than low-initial tones both phonologically and perceptually, as suggested by Luo (<xref ref-type="bibr" rid="B37">2018</xref>), listeners may use the pattern presented in the salient tones that associates high f0 with aspirated plosives even when they perceive the low-initial tone stimuli. The production patterns in less salient low-initial tones are likely to be unattended. Second, the distribution of Mandarin lexical tones also suggests that the perturbation patterns in high-initial tones are more prevalent in the language. Liu and Ma (<xref ref-type="bibr" rid="B35">1986</xref>), based on their survey of two different corpora, the National Standard Corpus of Mandarin Words and the Chinese Vocabulary Corpus, show that T4 is the most frequent (32%) and T3 is the least frequent (17%) in Mandarin. T1 and T2 account for 24&#x0007E;25% of Mandarin words. This means that the two high-initial tones (T1 and T4) compose more than half (56&#x0007E;57%) of the Mandarin lexicon while the two low-initial tones, when combined, comprise about 40% of the lexicon. In addition, T3 is subject to tone sandhi (Duanmu, <xref ref-type="bibr" rid="B13">2007</xref>), and when followed by another T3, becomes T2, which has the minimal, if any, perturbation due to aspiration (see <xref ref-type="table" rid="T1">Table 1</xref>, and also the same pattern is reported in Luo, <xref ref-type="bibr" rid="B37">2018</xref>). Taking all these together, Mandarin listeners are presumably exposed to the f0 perturbation pattern that f0-<sc>asp</sc> is higher than f0-<sc>unasp</sc> more frequently than to the opposite pattern. Also, even in the infrequent cases when the listeners actually hear the pattern of f0-<sc>asp</sc> &#x0003C; f0-<sc>unasp</sc>, they are less likely to attend to this covariation occurring in less salient tonal contexts. Therefore, we claim that the Mandarin listeners&#x00027; perception reflects the predominant pattern in their production. The perturbation pattern from the low-initial tones (f0-<sc>asp</sc> &#x0003C; f0-<sc>unasp</sc>) is not robustly represented, as T3 is the least frequent in the language and vulnerable to sandhi, and the perturbation in T2 is weak at best. Consequently, Mandarin listeners are likely to learn, from their native language experience, that high f0 is associated with aspirated plosives and low f0 with the unaspirated plosives, and use the high post-plosive f0 as a secondary cue to consonant aspiration.</p>
<p>Francis et al. (<xref ref-type="bibr" rid="B14">2006</xref>) also report a discrepancy between production and perception, in their investigation of the f0 perturbation in Cantonese. Cantonese listeners use post-plosive f0 as a cue for consonant aspiration but Cantonese speakers&#x00027; production does not provide evidence for the association between high f0 and plosive aspiration. As the listeners&#x00027; perceptual responses cannot be explained by their native language experience, Francis et al. (<xref ref-type="bibr" rid="B14">2006</xref>) claim that the listeners&#x00027; perception is guided by a language-independent, general auditory enhancing effects among different phonetic properties (e.g., Kingston and Diehl, <xref ref-type="bibr" rid="B27">1994</xref>), which could have been facilitated by the listeners&#x00027; experience with English. Unlike Francis et al. (<xref ref-type="bibr" rid="B14">2006</xref>), we do see evidence for the association between high f0 and aspiration in Mandarin speakers&#x00027; production. This suggests that the perceptual pattern observed in the current study may not be entirely due to the general auditory effects but, rather, due to the listeners&#x00027; native language experience. However, we acknowledge that we cannot rule out the potential influence of the English experience. The participants in this study speak English as their second language, residing in Virginia, USA at the time of testing. We still expect the English influence, if any, to be minimal since bilingual listeners&#x00027; categorization, which requires language-specific phonological judgments, shows the language mode effects (e.g., Antoniou et al., <xref ref-type="bibr" rid="B1">2012</xref>). In this study, the experiments were carried out in Mandarin by a native Mandarin-speaking experimenter, and the perception task asked the listeners to select the Mandarin character matching the stimuli they heard.</p>
</sec>
<sec>
<title>Concluding Remarks: Mandarin Aspiration Contrast</title>
<p>The current outcomes confirm that VOT is the phonetic property primarily responsible for Mandarin aspiration contrast. In production, Mandarin aspirated plosives and unaspirated plosives are well-separated by the VOT alone (<xref ref-type="fig" rid="F2">Figure 2</xref>), and Mandarin listeners primarily rely on VOT to distinguish the aspirated plosives from the unaspirated ones in perception (<xref ref-type="fig" rid="F3">Figure 3</xref>). The VOT boundary, however, seems to vary according to the tonal contexts. The VOT of aspirated plosives is greater in low-initial tones than in high-initial tones in production (<xref ref-type="fig" rid="F2">Figure 2</xref>). We suggest that this variation arises from the biomechanics of the larynx as, in high f0 ranges, VOT of aspirated stops decreases due to vocal fold tension (McCrea and Morris, <xref ref-type="bibr" rid="B38">2005</xref>; Narayan and Bowden, <xref ref-type="bibr" rid="B43">2013</xref>). In perception, Mandarin listeners are sensitive to this contextual VOT variation, providing more unaspirated responses in low-initial tones than in high-initial tones (<xref ref-type="fig" rid="F3">Figure 3</xref>). Taken together, these findings suggest that the VOT boundary for Mandarin aspiration contrast is flexible and influenced by the tonal contexts. This is comparable to the well-documented covariation between VOT and place of articulation. In production, labial plosives have the shortest VOT, with the plosives of backer places of articulation having longer VOT (e.g., Peterson and Lehiste, <xref ref-type="bibr" rid="B46">1960</xref>; Cho and Ladefoged, <xref ref-type="bibr" rid="B9">1999</xref>). And listeners attend to this systematic variation. For example, the VOT boundary between voiced and voiceless categories is at a lower VOT range in labial plosives than in velar plosives (e.g., Miller, <xref ref-type="bibr" rid="B39">1977</xref>; Benk&#x000ED;, <xref ref-type="bibr" rid="B4">2001</xref>). When the variation in the speech signal is systematic, although it may not be uniform across contexts, the contextual variation does not impede but facilitates listeners&#x00027; perception.</p>
<p>The current findings also provide evidence for a systematic variation in post-plosive f0 influenced both by the consonant aspiration and by the lexical tone. Depending on whether the tone begins at a high vs. low f0 range, the consonantal influence on f0 takes a different form. This can be attributed to the different laryngeal settings for different tonal targets. Despite the variation in production, Mandarin listeners use the post-plosive f0 as a secondary cue for plosive aspiration, associating high f0 with the aspirated category even in the low-initial tones which show an opposite perturbation pattern in the production. When the stimuli VOT is within a typical range of aspirated or unaspirated plosives, the listeners&#x00027; responses are predominantly determined by the stop VOT. However, when the VOT is ambiguous (step 2 in high-initial tones and step 3 in low-initial tones, <xref ref-type="fig" rid="F3">Figure 3</xref>), high post-plosive f0 stimuli, in general, yielded more aspirated responses despite a fairly large inter-listener variation (see the individual data in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>). That being said, the overall perceptual pattern pooled across the listeners may arguably originate from the f0 perturbation patterns in Mandarin production. As the high-initial tones are more salient and more prevalent in the Mandarin lexicon, the listeners attend more to the perturbation patterns present in high-initial tones (f0-<sc>asp</sc> &#x0003E; f0-<sc>unasp</sc>) than those in low-initial tones (f0-<sc>unasp</sc> &#x0003E; f0-<sc>asp)</sc>. Although post-plosive f0 varies according to the tonal contexts in production, its role as the secondary cue to consonant aspiration in perception does not seem to be modulated by the tonal contexts.</p>
<p>Finally, the current study only reports the pooled results, but we should note that the data exhibit a considerable individual variation in both experiments (see the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>). In production, some speakers show a quite clear f0 perturbation conforming to the group pattern while others show the conforming pattern only in a few tones but not in the others. In perception, post-plosive f0 does not seem to be an informative cue to consonant aspiration for all listeners, and some listeners seem to use f0 differently than others. The reason for these variations is unclear, and they do not seem to be structured in an immediately noticeable way. Still, this individual variation is intriguing and calls for a focused investigation, which we leave for a future study.</p>
</sec>
</sec>
<sec sec-type="data-availability" id="s5">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s6">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by George Mason University IRB. The patients/participants provided their written informed consent to participate in this study.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>YG: study conception and design, data collection and analysis, interpretation of results, and writing the initial draft. HK: supervising, data analysis, data visualization, interpretation of results, and writing and revising the manuscript. Both authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This work was supported by the George Mason University Dissertation Completion Grant awarded to YG and by the Linguistics Program at George Mason University.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec> </body>
<back>
<ack><p>This work is a revised version of part of YG&#x00027;s dissertation (<italic>Production and Perception of Laryngeal Contrasts in Mandarin and English by Mandarin Speakers</italic>. George Mason University, 2020). The authors would like to thank Georgia Zellou and two reviewers for their constructive comments, the audience at the 178th meeting of the Acoustical Society of America and LabPhon 17 where portions of this work have been presented, Steven Weinberger, Vincent Chanethom, and Doug Wulf for helpful discussions related to this work, and participants for making this study possible.</p>
</ack>
<sec sec-type="supplementary-material" id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fcomm.2022.896013/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fcomm.2022.896013/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_2.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antoniou</surname> <given-names>M.</given-names></name> <name><surname>Tyler</surname> <given-names>M. D.</given-names></name> <name><surname>Best</surname> <given-names>C. T.</given-names></name></person-group> (<year>2012</year>). <article-title>Two ways to listen: do L2-dominant bilinguals perceive stop voicing according to language mode?</article-title> <source>J. Phone.</source> <volume>40</volume>, <fpage>582</fpage>&#x02013;<lpage>594</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2012.05.005</pub-id><pub-id pub-id-type="pmid">22844163</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barr</surname> <given-names>D. J.</given-names></name> <name><surname>Levy</surname> <given-names>R.</given-names></name> <name><surname>Scheepers</surname> <given-names>C.</given-names></name> <name><surname>Tily</surname> <given-names>H. J.</given-names></name></person-group> (<year>2013</year>). <article-title>Random effects structure for confirmatory hypothesis testing: keep it maximal</article-title>. <source>J. Mem. Lang.</source> <volume>68</volume>, <fpage>255</fpage>&#x02013;<lpage>278</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2012.11.001</pub-id><pub-id pub-id-type="pmid">24403724</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bates</surname> <given-names>D.</given-names></name> <name><surname>Maechler</surname> <given-names>M.</given-names></name> <name><surname>Bolker</surname> <given-names>B.</given-names></name> <name><surname>Walker</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <source>lme4: Linear Mixed-Effects Models Using Eigen and S4. R Package Version 1.1-7</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://CRAN.Rproject.org/package1/4lme4">http://CRAN.Rproject.org/package1/4lme4</ext-link></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benk&#x000ED;</surname> <given-names>J. R..</given-names></name></person-group> (<year>2001</year>). <article-title>Place of articulation and first formant transition pattern both affect perception of voicing in English</article-title>. <source>J. Phone.</source> <volume>29</volume>, <fpage>1</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1006/jpho.2000.0128</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Boersma</surname> <given-names>P.</given-names></name> <name><surname>Weenink</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <source>Praat: Doing Phonetics by Computer, Version 6.1.12</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.praat.org/">http://www.praat.org/</ext-link></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brunelle</surname> <given-names>M.</given-names></name> <name><surname>T&#x1EA5;n</surname> <given-names>T. T.</given-names></name> <name><surname>Kirby</surname> <given-names>J.</given-names></name> <name><surname>Giang</surname> <given-names>&#x00110;. L.</given-names></name></person-group> (<year>2020</year>). <article-title>Transphonologization of voicing in chru: studies in production and perception</article-title>. <source>Lab. Phonol</source>. 11, 15. <pub-id pub-id-type="doi">10.5334/labphon.278</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y..</given-names></name></person-group> (<year>2011</year>). <article-title>How does phonology guide phonetics in segment&#x02013;f0 interaction?</article-title> <source>J. Phone.</source> <volume>39</volume>, <fpage>612</fpage>&#x02013;<lpage>625</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2011.04.001</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chi</surname> <given-names>Y.</given-names></name> <name><surname>Honda</surname> <given-names>K.</given-names></name> <name><surname>Wei</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Glottographic and aerodynamic analysis on consonant aspiration and onset f0 in Mandarin Chinese,&#x0201D;</article-title> in; <source>International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source> (<publisher-loc>Brighton</publisher-loc>), <fpage>6480</fpage>&#x02013;<lpage>6484</lpage>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cho</surname> <given-names>T.</given-names></name> <name><surname>Ladefoged</surname> <given-names>P.</given-names></name></person-group> (<year>1999</year>). <article-title>Variation and universals in VOT: evidence from 18 languages</article-title>. <source>J. Phone.</source> <volume>27</volume>, <fpage>207</fpage>&#x02013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1006/jpho.1999.0094</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coetzee</surname> <given-names>A. W.</given-names></name> <name><surname>Beddor</surname> <given-names>P. S.</given-names></name> <name><surname>Shedden</surname> <given-names>K.</given-names></name> <name><surname>Styler</surname> <given-names>W.</given-names></name> <name><surname>Wissing</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Plosive voicing in afrikaans: differential cue weighting and tonogenesis</article-title>. <source>J. Phone.</source> <volume>66</volume>, <fpage>185</fpage>&#x02013;<lpage>216</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2017.09.009</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>D.</given-names></name> <collab>&#x09093;&#x04E39;</collab> <name><surname>Feng</surname> <given-names>S.</given-names></name> <collab>&#x077F3;&#x0950B;</collab> <name><surname>Lu</surname> <given-names>S.</given-names></name> <collab>&#x05415;&#x058EB;&#x06960;</collab></person-group> (<year>2006</year>). <article-title>&#x0666E;&#x0901A;&#x08BDD;&#x04E0E;&#x053F0;&#x06E7E;&#x056FD;&#x08BED;&#x058F0;&#x08C03;&#x07684;&#x05BF9;&#x06BD4;&#x05206;&#x06790;[The contrast on tone between Putonghua and Taiwan Mandarin]</article-title>. &#x058F0;&#x05B66;&#x05B66;&#x062A5;[<italic>Sheng Xue Xue Bao &#x02013; Acta Acoustica</italic>] <volume>31</volume>, <fpage>536</fpage>&#x02013;<lpage>541</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dmitrieva</surname> <given-names>O.</given-names></name> <name><surname>Llanos</surname> <given-names>F.</given-names></name> <name><surname>Shultz</surname> <given-names>A. A.</given-names></name> <name><surname>Francis</surname> <given-names>A. L.</given-names></name></person-group> (<year>2015</year>). <article-title>Phonological status, not voice onset time, determines the acoustic realization of onset f0 as a secondary voicing cue in Spanish and English</article-title>. <source>J. Phone.</source> <volume>49</volume>, <fpage>77</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2014.12.005</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Duanmu</surname> <given-names>S..</given-names></name></person-group> (<year>2007</year>). <source>The Phonology of Standard Chinese</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Francis</surname> <given-names>A. L.</given-names></name> <name><surname>Ciocca</surname> <given-names>V.</given-names></name> <name><surname>Wong</surname> <given-names>V. K. M.</given-names></name> <name><surname>Chan</surname> <given-names>J. K. L.</given-names></name></person-group> (<year>2006</year>). <article-title>Is fundamental frequency a cue to aspiration in initial stops?</article-title> <source>J. Acoust. Soc. Am.</source> <volume>120</volume>, <fpage>2884</fpage>&#x02013;<lpage>2895</lpage>. <pub-id pub-id-type="doi">10.1121/1.2346131</pub-id><pub-id pub-id-type="pmid">17139746</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gandour</surname> <given-names>J. T..</given-names></name></person-group> (<year>1974</year>). <article-title>Consonant types and tone in Siamese</article-title>. <source>J. Phone.</source> <volume>2</volume>, <fpage>337</fpage>&#x02013;<lpage>350</lpage>. <pub-id pub-id-type="doi">10.1016/S0095-4470(19)31303-8</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>J.</given-names></name> <name><surname>Arai</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Plosive (de-)voicing and f0 perturbations in Tokyo Japanese: positional variation, cue enhancement, and contrast recovery</article-title>. <source>J. Phone.</source> <volume>77</volume>, <fpage>100932</fpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2019.100932</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halle</surname> <given-names>M.</given-names></name> <name><surname>Stevens</surname> <given-names>K. N.</given-names></name></person-group> (<year>1971</year>). A Note on Laryngeal Features. Quarterly Progress Report, Research Laboratory of Electronics, MIT. <volume>101</volume>, <fpage>198</fpage>&#x02013;<lpage>213</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hall&#x000E9;</surname> <given-names>P..</given-names></name></person-group> (<year>1994</year>). <article-title>Evidence for tone-specific activity of the sternohyoid muscle in modern standard Chinese</article-title>. <source>Lang. Speech</source> <volume>37</volume>, <fpage>103</fpage>&#x02013;<lpage>123</lpage>. <pub-id pub-id-type="doi">10.1177/002383099403700201</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanson</surname> <given-names>H. M..</given-names></name></person-group> (<year>2009</year>). <article-title>Effects of obstruent consonants on fundamental frequency at vowel onset in English</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>125</volume>, <fpage>425</fpage>&#x02013;<lpage>441</lpage>. <pub-id pub-id-type="doi">10.1121/1.3021306</pub-id><pub-id pub-id-type="pmid">19173428</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hombert</surname> <given-names>J. M.</given-names></name> <name><surname>Ohala</surname> <given-names>J. J.</given-names></name> <name><surname>Ewan</surname> <given-names>W. G.</given-names></name></person-group> (<year>1979</year>). <article-title>Phonetic explanations for the development of tones</article-title>. <source>Language</source> <volume>55</volume>, <fpage>37</fpage>&#x02013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.2307/412518</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Honda</surname> <given-names>K..</given-names></name></person-group> (<year>2004</year>). <article-title>Physiological factors causing tonal characteristics of speech: From global to local prosody</article-title>. <source>Proc Speech Prosody</source> <volume>2004</volume>, <fpage>739</fpage>&#x02013;<lpage>744</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Honda</surname> <given-names>K.</given-names></name> <name><surname>Hirai</surname> <given-names>H.</given-names></name> <name><surname>Masaki</surname> <given-names>S.</given-names></name> <name><surname>Shimada</surname> <given-names>Y.</given-names></name></person-group> (<year>1999</year>). <article-title>Role of vertical larynx movement and cervical lordosis in F0 control</article-title>. <source>Lang. Speech</source> <volume>42</volume>, <fpage>401</fpage>&#x02013;<lpage>411</lpage>. <pub-id pub-id-type="doi">10.1177/00238309990420040301</pub-id><pub-id pub-id-type="pmid">10845244</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hoole</surname> <given-names>P.</given-names></name> <name><surname>Honda</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;Automaticity vs. feature-enhancement in the control of segmental f0,&#x0201D;</article-title> in <source>Where do phonological features come from?: Cognitive, physical and developmental bases of distinctive speech categories Language Faculty and Beyond (LFAB): Internal and external variation in linguistics</source>, eds N. Clements and R. Ridouane (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>John Benjamins</publisher-name>), <fpage>133</fpage>&#x02013;<lpage>171</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>House</surname> <given-names>A. S.</given-names></name> <name><surname>Fairbanks</surname> <given-names>G.</given-names></name></person-group> (<year>1953</year>). <article-title>The influence of consonant environment upon the secondary acoustical characteristics of vowels</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>25</volume>, <fpage>105</fpage>&#x02013;<lpage>113</lpage>. <pub-id pub-id-type="doi">10.1121/1.1906982</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jessen</surname> <given-names>M.</given-names></name> <name><surname>Roux</surname> <given-names>J. C.</given-names></name></person-group> (<year>2002</year>). <article-title>Voice quality differences associated with stops and clicks in Xhosa</article-title>. <source>J. Phone.</source> <volume>30</volume>, <fpage>1</fpage>&#x02013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1006/jpho.2001.0150</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kingston</surname> <given-names>J..</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Segmental influences on f0: automatic or controlled?&#x0201D;</article-title> in <source>Tones and Tunes, Volume 2: Experimental Studies in Word and Sentence Prosody</source>, eds Gussenhoven and T. Riad (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Mouton de Gruyter</publisher-name>), <fpage>171</fpage>&#x02013;<lpage>201</lpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingston</surname> <given-names>J.</given-names></name> <name><surname>Diehl</surname> <given-names>R. L.</given-names></name></person-group> (<year>1994</year>). <article-title>Phonetic knowledge</article-title>. <source>Language</source> <volume>70</volume>, <fpage>419</fpage>&#x02013;<lpage>494</lpage>. <pub-id pub-id-type="doi">10.1353/lan.1994.0023</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kirby</surname> <given-names>J..</given-names></name></person-group> (<year>2018</year>). <article-title>Onset pitch perturbations and the cross-linguistic implementation of voicing: Evidence from tonal and non-tonal languages</article-title>. <source>J. Phone.</source> <volume>71</volume>, <fpage>326</fpage>&#x02013;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2018.09.009</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kirby</surname> <given-names>J.</given-names></name> <name><surname>Ladd</surname> <given-names>D. R</given-names></name></person-group>. (<year>2016</year>). <article-title>Effects of obstruent voicing on vowel F0: evidence from &#x0201C;true voicing&#x0201D; languages</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>40</volume>, <fpage>2400</fpage>&#x02013;<lpage>2411</lpage>. <pub-id pub-id-type="doi">10.1121/1.4962445</pub-id><pub-id pub-id-type="pmid">27794357</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Klatt</surname> <given-names>D. H.</given-names></name> <name><surname>Stevens</surname> <given-names>K. N.</given-names></name> <name><surname>Meade</surname> <given-names>J.</given-names></name></person-group> (<year>1968</year>). <article-title>&#x0201C;Studies of articulatory activity and airflow during speech in sound production in man,&#x0201D;</article-title> in <source>Annals of the New York Academy of Science</source>, eds A. Bouhuys (<publisher-loc>New York, NY</publisher-loc>), <fpage>42</fpage>&#x02013;<lpage>55</lpage>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kohler</surname> <given-names>K. J..</given-names></name></person-group> (<year>1982</year>). <article-title>F0 in the production of lenis and fortis plosives</article-title>. <source>Phonetica</source> <volume>39</volume>, <fpage>199</fpage>&#x02013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1159/000261663</pub-id><pub-id pub-id-type="pmid">7156204</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuang</surname> <given-names>J..</given-names></name></person-group> (<year>2017</year>). <article-title>Covariation between voice quality and pitch: Revisiting the case of Mandarin creaky voice</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>142</volume>, <fpage>1693</fpage>&#x02013;<lpage>1706</lpage>. <pub-id pub-id-type="doi">10.1121/1.5003649</pub-id><pub-id pub-id-type="pmid">28964062</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lehiste</surname> <given-names>I.</given-names></name> <name><surname>Peterson</surname> <given-names>G. E.</given-names></name></person-group> (<year>1961</year>). <article-title>Some basic considerations in the analysis of intonation</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>33</volume>, <fpage>419</fpage>&#x02013;<lpage>425</lpage>. <pub-id pub-id-type="doi">10.1121/1.1908681</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Lenth</surname> <given-names>R..</given-names></name></person-group> (<year>2020</year>). <source>emmeans: Estimated Marginal Means, aka Least-Squares Means. R Package Version 1.4.5</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=emmeans">https://CRAN.R-project.org/package=emmeans</ext-link></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>L. Y.</given-names></name> <collab>&#x05218;&#x08FDE;&#x05143;</collab> <name><surname>Ma</surname> <given-names>Y. F.</given-names></name> <collab>&#x09A6C;&#x04EA6;&#x051E1;</collab></person-group> (<year>1986</year>). <article-title>&#x0666E;&#x0901A;&#x08BDD;&#x058F0;&#x08C03;&#x05206;&#x05E03;&#x0548C;&#x058F0;&#x08C03;&#x07ED3;&#x06784;&#x09891;&#x05EA6;[The distribution of Mandarin tones and the frequency of tonal phrases]</article-title>. &#x08BED;&#x06587;&#x05EFA;&#x08BBE;[<italic>Language Planning</italic>] <volume>3</volume>, <fpage>21</fpage>&#x02013;<lpage>23</lpage>.</citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>L&#x000F6;fqvist</surname> <given-names>A.</given-names></name> <name><surname>Baer</surname> <given-names>T.</given-names></name> <name><surname>McGarr</surname> <given-names>N. S.</given-names></name> <name><surname>Story</surname> <given-names>R. S.</given-names></name></person-group> (<year>1989</year>). <article-title>The cricothyroid muscle in voicing control</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>85</volume>, <fpage>1314</fpage>&#x02013;<lpage>1321</lpage>. <pub-id pub-id-type="doi">10.1121/1.397462</pub-id><pub-id pub-id-type="pmid">2708673</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>Q..</given-names></name></person-group> (<year>2018</year>). <source>Consonantal Effects on F0 in Tonal Languages (Doctoral dissertation)</source>. <publisher-name>Michigan State University, East Lansing</publisher-name>.</citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCrea</surname> <given-names>C. R.</given-names></name> <name><surname>Morris</surname> <given-names>R. J.</given-names></name></person-group> (<year>2005</year>). <article-title>The effects of fundamental frequency levels on voice onset time in normal adult male speakers</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>48</volume>, <fpage>1013</fpage>&#x02013;<lpage>1024</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2005/069)</pub-id><pub-id pub-id-type="pmid">16411791</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>J. L..</given-names></name></person-group> (<year>1977</year>). <article-title>Nonindependence of feature processing in initial consonants</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>20</volume>, <fpage>519</fpage>&#x02013;<lpage>528</lpage>. <pub-id pub-id-type="doi">10.1044/jshr.2003.519</pub-id><pub-id pub-id-type="pmid">904313</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohr</surname> <given-names>B..</given-names></name></person-group> (<year>1971</year>). <article-title>Intrinsic variations in the speech signal</article-title>. <source>Phonetica</source> <volume>23</volume>, <fpage>65</fpage>&#x02013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1159/000259332</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moisik</surname> <given-names>S. R.</given-names></name> <name><surname>Lin</surname> <given-names>H.</given-names></name> <name><surname>Esling</surname> <given-names>J. H.</given-names></name></person-group> (<year>2014</year>). <article-title>A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS)</article-title>. <source>J. Int. Phon. Assoc.</source> <volume>44</volume>, <fpage>21</fpage>&#x02013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1017/S0025100313000327</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moulines</surname> <given-names>E.</given-names></name> <name><surname>Charpentier</surname> <given-names>F.</given-names></name></person-group> (<year>1990</year>). <article-title>Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones</article-title>. <source>Speech Commun.</source> <volume>9</volume>, <fpage>453</fpage>&#x02013;<lpage>467</lpage>. <pub-id pub-id-type="doi">10.1016/0167-6393(90)90021-Z</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Narayan</surname> <given-names>C.</given-names></name> <name><surname>Bowden</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>Pitch affects voice onset time (VOT): a cross-linguistic study</article-title>. <source>Proc. Meet. Acoust.</source> <volume>19</volume>, <fpage>060095</fpage>. <pub-id pub-id-type="doi">10.1121/1.4800681</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ohde</surname> <given-names>R. N..</given-names></name></person-group> (<year>1984</year>). <article-title>Fundamental frequency as an acoustic correlate of stop consonant voicing</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>75</volume>, <fpage>224</fpage>&#x02013;<lpage>230</lpage>. <pub-id pub-id-type="doi">10.1121/1.390399</pub-id><pub-id pub-id-type="pmid">6699284</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peirce</surname> <given-names>J. W..</given-names></name></person-group> (<year>2007</year>). <article-title>PsychoPy&#x02014;psychophysics software in python</article-title>. <source>J. Neurosci. Methods</source> <volume>162</volume>, <fpage>8</fpage>&#x02013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1016/j.jneumeth.2006.11.017</pub-id><pub-id pub-id-type="pmid">17254636</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peterson</surname> <given-names>G. E.</given-names></name> <name><surname>Lehiste</surname> <given-names>I.</given-names></name></person-group> (<year>1960</year>). <article-title>Duration of syllable nuclei in English</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>32</volume>, <fpage>693</fpage>&#x02013;<lpage>703</lpage>. <pub-id pub-id-type="doi">10.1121/1.1908183</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="web"><person-group person-group-type="author"><collab>R Core Team.</collab></person-group> (<year>2021</year>). <source>A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org">https://www.R-project.org</ext-link></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rochet</surname> <given-names>B. L.</given-names></name> <name><surname>Fei</surname> <given-names>Y.</given-names></name></person-group> (<year>1991</year>). <article-title>Effect of consonant and vowel context on Mandarin Chinese VOT: production and perception</article-title>. <source>Can. Acoust.</source> <volume>19</volume>, <fpage>105</fpage>.</citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>J.</given-names></name> <name><surname>Carroll</surname> <given-names>R. J.</given-names></name> <name><surname>Harden</surname> <given-names>K. K.</given-names></name> <name><surname>Wu</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Comparisons of treatment means when factors do not interact in two-factorial studies</article-title>. <source>Amino Acids</source> <volume>42</volume>, <fpage>2031</fpage>&#x02013;<lpage>2035</lpage>. <pub-id pub-id-type="doi">10.1007/s00726-011-0924-0</pub-id><pub-id pub-id-type="pmid">21547361</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whalen</surname> <given-names>D. H.</given-names></name> <name><surname>Abramson</surname> <given-names>A. S.</given-names></name> <name><surname>Lisker</surname> <given-names>L.</given-names></name> <name><surname>Mody</surname> <given-names>M.</given-names></name></person-group> (<year>1990</year>). <article-title>Gradient effects of fundamental frequency on stop consonant voicing judgments</article-title>. <source>Phonetica</source> <volume>47</volume>, <fpage>36</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1159/000261851</pub-id><pub-id pub-id-type="pmid">2277812</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whalen</surname> <given-names>D. H.</given-names></name> <name><surname>Abramson</surname> <given-names>A. S.</given-names></name> <name><surname>Lisker</surname> <given-names>L.</given-names></name> <name><surname>Mody</surname> <given-names>M.</given-names></name></person-group> (<year>1993</year>). <article-title>F0 gives voicing information even with unambiguous voice onset times</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>93</volume>, <fpage>2152</fpage>&#x02013;<lpage>2159</lpage>. <pub-id pub-id-type="doi">10.1121/1.406678</pub-id><pub-id pub-id-type="pmid">8473630</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whalen</surname> <given-names>D. H.</given-names></name> <name><surname>Levitt</surname> <given-names>A. G.</given-names></name></person-group> (<year>1995</year>). <article-title>The universality of intrinsic F0 of vowels</article-title>. <source>J. Phone.</source> <volume>23</volume>, <fpage>349</fpage>&#x02013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1016/S0095-4470(95)80165-0</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiao</surname> <given-names>H.</given-names></name> <collab>&#x08096;&#x0822A;</collab></person-group> (<year>2010</year>). &#x073B0;&#x04EE3;&#x06C49;&#x08BED;&#x0901A;&#x07528;&#x05E73;&#x08861;&#x08BED;&#x06599;&#x05E93;&#x05EFA;&#x08BBE;&#x04E0E;&#x05E94;&#x07528;[The construction and application of the general modern Chinese balanced corpus]. &#x0534E;&#x06587;&#x04E16;&#x0754C;[<italic>Chinese World</italic>]. <volume>106</volume>, <fpage>24</fpage>&#x02013;<lpage>29</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>C. X.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name></person-group> (<year>2003</year>). <article-title>Effects of consonant aspiration on Mandarin tones</article-title>. <source>J. Int. Phon. Assoc.</source> <volume>33</volume>, <fpage>165</fpage>&#x02013;<lpage>181</lpage>. <pub-id pub-id-type="doi">10.1017/S0025100303001270</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Y..</given-names></name></person-group> (<year>1997</year>). <article-title>Contextual tonal variations in Mandarin</article-title>. <source>J. Phone.</source> <volume>25</volume>, <fpage>61</fpage>&#x02013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1006/jpho.1996.0034</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Consonantal f0 perturbation in American English involves multiple mechanisms</article-title>. <source>J. Int. Phon. Assoc.</source> <volume>149</volume>, <fpage>2877</fpage>&#x02013;<lpage>2895</lpage>. <pub-id pub-id-type="doi">10.1121/10.0004239</pub-id><pub-id pub-id-type="pmid">33940879</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>Note that the post-plosive f0 is likely affected by the preceding T1 in the carrier sentence (see, for example, Xu and Xu, <xref ref-type="bibr" rid="B54">2003</xref>, for the discussion on this carryover effects). According to Xu and Xu (<xref ref-type="bibr" rid="B54">2003</xref>), both f0-<sc>asp</sc> and f0-<sc>unasp</sc> are higher after T1/T4 than after T2/T3, but f0-<sc>unasp</sc> shows greater carryover effects than f0-<sc>asp.</sc> If this is the case, it is possible that the preceding T1 elevated f0-<sc>unasp</sc> more than f0-<sc>asp</sc> and, consequently, the difference between f0-<sc>asp</sc> and f0-<sc>unasp</sc> in the current outcome is overplayed in low-initial tones but underplayed high-initial tones.</p></fn>
<fn id="fn0002"><p><sup>2</sup>Pinyin was included because this experiment was designed in parallel with a separate study testing L2 learners of Mandarin. Native Mandarin speakers would not need Pinyin to read common words in Chinese.</p></fn>
<fn id="fn0003"><p><sup>3</sup>We also tried a different method, in which we extracted the f0 values every 8 ms for the first 64 ms of the post-onset vowel, but the results were consistent with those obtained from the time-normalized method reported here.</p></fn>
<fn id="fn0004"><p><sup>4</sup>Note our stimuli for T2 included bilabial /p<sup>h</sup>a2/ and /pa2/ instead of /t<sup>h</sup>a2/ and /ta2/. As coronal plosives usually have longer VOTs than labial plosives, this is expected to influence the reported VOT values for T2. We suspect that the VOT difference between T2 and T3 would have been exaggerated due to this difference in places of articulation.</p></fn>
</fn-group>
</back>
</article>