<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="review-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/feduc.2024.1410795</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Mini Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Multimodal cues in L2 lexical tone acquisition: current research and future directions</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes"><name><surname>Farran</surname> <given-names>Bashar M.</given-names></name><xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1145354/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
<contrib contrib-type="author"><name><surname>Morett</surname> <given-names>Laura M.</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/432720/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff><institution>Department of Speech, Language and Hearing Sciences, University of Missouri</institution>, <addr-line>Columbia, MO</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0001">
<p>Edited by: Xin Wang, Macquarie University, Australia</p>
</fn>
<fn fn-type="edited-by" id="fn0002">
<p>Reviewed by: Haiquan Huang, Hubei University of Technology, China</p>
<p>Debra Hardison, Michigan State University, United States</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Bashar M. Farran, <email>bfarran@health.missouri.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>07</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>9</volume>
<elocation-id>1410795</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>04</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>07</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2024 Farran and Morett.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Farran and Morett</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>This review discusses the effectiveness of visual and haptic cues for second language (L2) lexical tone acquisition, with a special focus on observation and production of hand gestures. It explains how these cues can facilitate initial acquisition of L2 lexical tones via multimodal depictions of pitch. In doing so, it provides recommendations for incorporation of multimodal cues into L2 lexical tone pedagogy.</p>
</abstract>
<kwd-group>
<kwd>lexical tone</kwd>
<kwd>second language acquisition</kwd>
<kwd>multimodality</kwd>
<kwd>gesture</kwd>
<kwd>tonal languages</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="68"/>
<page-count count="6"/>
<word-count count="4996"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Language, Culture and Diversity</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1">
<label>1</label>
<title>Introduction</title>
<p>Imagine a language where the meaning of a word hinges on its pitch. This is the reality in tonal languages, where pitches, not just phonemes, determine word meaning. Most world languages, including Mandarin Chinese, Vietnamese, Thai, Yor&#x00F9;b&#x00E1; and various African languages, are tonal (<xref ref-type="bibr" rid="ref38">Maddieson, 2013</xref>). While mastery of tonal first languages (L1s) comes naturally, second language (L2) learning of tonal languages entails a unique challenge, particularly for learners whose first language is atonal (<xref ref-type="bibr" rid="ref58">Wang et al., 2006</xref>, <xref ref-type="bibr" rid="ref57">2020</xref>).</p>
<p>L2 acquisition of lexical tones encompasses both perception and production. Although perception often precedes production in L2 lexical tone acquisition (<xref ref-type="bibr" rid="ref59">Wang et al., 1999</xref>), the relationship between them is not always straightforward, and improvements in perception do not necessarily entail improvements in production, and vice versa (<xref ref-type="bibr" rid="ref31">Leather, 2011</xref>). L2 lexical tone acquisition involves perception of not only auditory cues, but also visual and haptic cues such as hand gestures (<xref ref-type="bibr" rid="ref17">Gullberg, 2006</xref>). The importance of these multimodal cues in facilitating L2 lexical tone perception and production has increasingly gained recognition (<xref ref-type="bibr" rid="ref43">McCafferty, 2004</xref>; <xref ref-type="bibr" rid="ref21">Hostetter, 2011</xref>; <xref ref-type="bibr" rid="ref33">Lewis and Kirkhart, 2022</xref>; <xref ref-type="bibr" rid="ref66">Zhang et al., 2023</xref>). Multisensory learning, which integrates multiple sensory modalities, is more effective than unisensory approaches due to optimization of the brain for multisensory environments, suggesting that L2 lexical tone pedagogy could be enhanced by incorporating such approaches (<xref ref-type="bibr" rid="ref53">Shams and Seitz, 2008</xref>). <xref ref-type="bibr" rid="ref37">Macedonia and Kepler (2013)</xref> argue that use of pedagogical approaches informed by neuroscience findings into L2 instruction can significantly enhance learning via a three-pronged approach: (1) utilizing multisensory experiences for vocabulary acquisition, (2) incorporating imitation exercises to leverage mirror neurons for pronunciation training, and (3) tailoring instruction to brain development stages for optimal grammar and pronunciation outcomes. Moreover, multisensory cues enhance learning outcomes by supporting content comprehension (<xref ref-type="bibr" rid="ref7">Dick et al., 2009</xref>). Understanding how nonverbal cues enhance auditory representations can shed light on how multimodal approaches can be leveraged to facilitate acquisition of an unfamiliar tonal L2 (<xref ref-type="bibr" rid="ref64">Yip, 2002</xref>; <xref ref-type="bibr" rid="ref35">Liu et al., 2022</xref>).</p>
</sec>
<sec id="sec2">
<label>2</label>
<title>Auditory training methods</title>
<p>Cognitively, tonal languages require awareness of pitch, which permits discrimination, identification, and manipulation of lexical tones. In the intricate acoustic signal of speech, multiple cues such as formant frequencies, amplitude, and temporal information coexist with pitch contours. Thus, tonal language comprehension entails selective attention to pitch cues in conjunction with suppression of other acoustic information (<xref ref-type="bibr" rid="ref22">Huang and Johnson, 2011</xref>). This selective attention to pitch cues is shaped by experience with lexical tone. Moreover, pitch perception in tonal languages goes beyond recognizing static pitch levels as it entails tracking rapid pitch movements and complex tonal contours over time (<xref ref-type="bibr" rid="ref9">Gandour, 1983</xref>; <xref ref-type="bibr" rid="ref62">Xie and Myers, 2015</xref>). Thus, processing of pitch within the speech stream is critical to L2 lexical tone acquisition (<xref ref-type="bibr" rid="ref23">Jasmin et al., 2020</xref>).</p>
<p>Neurologically, the ability to selectively focus on pitch involves specialized mechanisms shaped by tonal language experience (<xref ref-type="bibr" rid="ref11">Gandour et al., 2003</xref>; <xref ref-type="bibr" rid="ref63">Xu et al., 2006</xref>). Lexical tone processing involves both subcortical and cortical structures (<xref ref-type="bibr" rid="ref12">Gandour and Krishnan, 2016</xref>). Initially, L2 lexical tone processing is predominantly handled by the right hemisphere or bilaterally, but with increased exposure, it becomes more left lateralized and akin to L1 processing (<xref ref-type="bibr" rid="ref13">Gandour et al., 2004</xref>; <xref ref-type="bibr" rid="ref55">Wang et al., 2004</xref>; <xref ref-type="bibr" rid="ref10">Gandour, 2006</xref>; <xref ref-type="bibr" rid="ref61">Xi et al., 2010</xref>; <xref ref-type="bibr" rid="ref25">Kaan et al., 2013</xref>).</p>
<p>Considering the cognitive and neurological complexities of lexical tone processing, auditory methods have been developed to facilitate L2 lexical tone learning. These methods include discrimination training, categorization training, and auditory corrective feedback.</p>
<p>Discrimination training involves exposure to contrasting pairs of tones and subsequent testing via determination of whether trained tones are the same or different. For example, m&#x00E1; and m&#x00E0; could be presented consecutively in training, and discrimination between the rising and falling tones could then be tested by determining whether ch&#x00F3; and ch&#x00F2; are perceived as the same or different. Discrimination tasks are perceptual, involving the discernment of differences in pitch contours and other acoustic cues. Discrimination training leads to significant improvements in perception of differences between lexical tones (<xref ref-type="bibr" rid="ref59">Wang et al., 1999</xref>; <xref ref-type="bibr" rid="ref60">Wayland and Guion, 2004</xref>; <xref ref-type="bibr" rid="ref20">Hao, 2012</xref>).</p>
<p>Categorization training involves exposure to labeled tones and subsequent testing via labeling of unlabeled tones. For example, the tones in m&#x00E1; and m&#x00E0; could be labeled as rising and falling in training, and categorization could then be tested by labeling m&#x00E1; as rising and m&#x00E0; as falling. Thus, identification tasks draw on memory as well as perception because they require mapping acoustic features of lexical tones onto their representations. Categorization training improves L2 lexical tone identification, particularly in the early stages of acquisition, but may not be sufficient for accurate production (<xref ref-type="bibr" rid="ref30">Leather, 1990</xref>; <xref ref-type="bibr" rid="ref56">Wang et al., 2003</xref>; <xref ref-type="bibr" rid="ref8">Duanmu, 2007</xref>; <xref ref-type="bibr" rid="ref28">Ladefoged and Johnson, 2015</xref>).</p>
<p>The distinction between discrimination and categorization is significant because discrimination can precede categorization in L2 lexical tone acquisition. However, discrimination and categorization are related; thus, they can support one another. Understanding the relationship between discrimination and categorization is essential for designing effective language learning materials, speech recognition systems, and other natural language processing applications for tonal languages.</p>
<p>Discrimination and categorization training based on a small set of stimuli in experimental tasks may not fully capture the natural variations of lexical tones in everyday speech. This limitation helped lead to the emergence of High Variability Perception Training (HVPT) in lexical tone learning tasks. This training entails exposure to lexical tones within varying linguistic contexts or produced by multiple speakers in the interest of more closely approximating the natural variability encountered in real-life tonal language processing (<xref ref-type="bibr" rid="ref36">Lively et al., 1994</xref>; <xref ref-type="bibr" rid="ref51">Pisoni and Lively, 1995</xref>). HVPT improves both perception and production of L2 lexical tones as it enhances generalization across different contexts and speakers (<xref ref-type="bibr" rid="ref16">Guion et al., 2000</xref>; <xref ref-type="bibr" rid="ref56">Wang et al., 2003</xref>). This approach emphasizes the importance of exposure to diverse linguistic input to achieve more robust language learning outcomes.</p>
<p>Auditory corrective feedback may consist of recasts, in which the correct tone is heard in response to incorrect tone production; contrastive feedback, which highlights the difference between attempted and correct pronunciation; and explicit feedback, which provides verbal explanations of errors and correction techniques (<xref ref-type="bibr" rid="ref32">Lee and Lyster, 2016</xref>; <xref ref-type="bibr" rid="ref52">Saito, 2021</xref>). The effectiveness of auditory corrective feedback relies upon perception as well as memory because differences between incorrect and correct tones must be perceived and remembered to produce them correctly. Auditory corrective feedback improves L2 lexical tone production accuracy by highlighting errors and modeling correct pronunciation (<xref ref-type="bibr" rid="ref3">Bryfonski and Ma, 2020</xref>).</p>
<p>While auditory methods have been a mainstay in L2 lexical tone acquisition, they have limitations stemming from challenges inherent in relying solely on auditory input and feedback. Furthermore, L1 background and the L2 tone system may limit the effectiveness of auditory methods.</p>
</sec>
<sec id="sec3">
<label>3</label>
<title>Visual cues</title>
<p>Visual cues can be powerful tools for enhancing L2 lexical tone acquisition. One approach utilizes static visual depictions of lexical tone pitch contours (<xref ref-type="fig" rid="fig1">Figure 1</xref>). These depictions, which may consist of lines, graphs, or color-coded charts, visually represent fundamental frequency (F0) variations characterizing tones (<xref ref-type="bibr" rid="ref14">Godfroid et al., 2017</xref>). Such visual depictions facilitate understanding of lexical tone contours (<xref ref-type="bibr" rid="ref68">Zhou and Olson, 2023</xref>), as evidenced by enhanced perception of lexical tones cross-linguistically (<xref ref-type="bibr" rid="ref4">Burnham et al., 2022</xref>). Moreover, visual depictions of pitch contours improve categorization of L2 lexical tones compared to auditory input (<xref ref-type="bibr" rid="ref6">Chun et al., 2012</xref>).</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Images of pitch contours of Mandarin lexical tones.</p>
</caption>
<graphic xlink:href="feduc-09-1410795-g001.tif"/>
</fig>
<p>Building upon the benefits of visual depictions of pitch contours, another approach leverages pitch gestures to enhance L2 lexical tone learning. Also known as tone gestures or tone-bearing gestures, pitch gestures are hand or body movements that visually convey pitch patterns of words or syllables via fundamental frequency (<xref ref-type="bibr" rid="ref45">Morett and Chang, 2015</xref>; <xref ref-type="fig" rid="fig2">Figure 2</xref>). Pitch gestures spontaneously occur in conjunction with tonal languages (<xref ref-type="bibr" rid="ref26">Krahmer and Swerts, 2007</xref>) and are often produced with the hands or head but may also include eyebrow movements or body posture changes corresponding with tones (<xref ref-type="bibr" rid="ref1">Antoniou and Chin, 2018</xref>; <xref ref-type="bibr" rid="ref27">Lacombe et al., 2022</xref>).</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Pitch gestures for Mandarin lexical tones.</p>
</caption>
<graphic xlink:href="feduc-09-1410795-g002.tif"/>
</fig>
<p>Observing pitch gestures enhances perception and production of L2 lexical tones. Observing eye movements, head movements, and hand gestures conveying pitch contours enhances understanding and pronunciation of L2 Mandarin tones (<xref ref-type="bibr" rid="ref5">Chen and Massaro, 2008</xref>). Additionally, observing pitch gestures positively impacts discrimination between L2 Mandarin words differing in lexical tone (<xref ref-type="bibr" rid="ref45">Morett and Chang, 2015</xref>; <xref ref-type="bibr" rid="ref44">Morett, 2023</xref>).</p>
<p>Visual cues such as observed pitch gestures provide tangible depictions of lexical tones that strengthen mental representations of them via encoding and retrieval and enhance their perception and memory. In addition, visual cues offer additional support when auditory processing is impaired or exposure to tonal languages is limited.</p>
<p>While observing pitch gestures supports L2 lexical tone perception and production, relying solely on visual input may entail limitations. Visual depictions alone may not fully capture the richness and complexity of tonal variation, leading to incomplete or oversimplified learning outcomes. Additionally, visual depictions of lexical tones may encourage dependence on visual cues, neglecting development of auditory perception skills necessary for real-world communication. For example, use only of visual input for L2 Mandarin tone learning results in lower perception accuracy compared to use of both visual and auditory input (<xref ref-type="bibr" rid="ref24">Jiang, 2017</xref>). Therefore, integrating visual cues with input from audition and other modalities may yield superior learning outcomes.</p>
<p>Theories providing explanations for the effects of visual cues on L2 lexical tone acquisition include dual coding theory and multimedia learning theory. Dual coding theory posits that information can be processed via both auditory (verbal) and visual (non-verbal) channels (<xref ref-type="bibr" rid="ref47">Paivio, 1991</xref>, <xref ref-type="bibr" rid="ref48">2014a</xref>), each of which has strengths and weaknesses. Visual cues excel at conveying spatial information and relationships, while verbal cues are better suited for conveying linear sequences and abstract concepts. When visual and verbal cues occur together, tones can be processed via both the auditory and visual channels simultaneously. The resulting multimodal representations enhance encoding, storage, and retrieval of L2 lexical tones, improving their acquisition (<xref ref-type="bibr" rid="ref49">Paivio, 2014b</xref>).</p>
<p>Multimedia learning theory emphasizes the importance of using multiple modes of representation to facilitate learning. This theory emphasizes combining different modalities (e.g., auditory, visual) to optimize learning outcomes and improve comprehension and retention of material (<xref ref-type="bibr" rid="ref39">Mayer, 2005</xref>, <xref ref-type="bibr" rid="ref40">2009</xref>; <xref ref-type="bibr" rid="ref18">Gullberg, 2022</xref>). It posits that learning is an active process that entails building connections between information presented in different modalities. Like dual coding theory, multimedia learning theory maintains that presenting corresponding verbal and visual information simultaneously can enhance learning. This process leads to deeper understanding, improved retention, and enhanced knowledge transfer and real-world application (<xref ref-type="bibr" rid="ref42">Mayer and Moreno, 1998</xref>; <xref ref-type="bibr" rid="ref39">Mayer, 2005</xref>, <xref ref-type="bibr" rid="ref41">2014</xref>). For L2 lexical tone acquisition, multimodal methods that combine auditory verbal input with visual representations of pitch contours are consistent with multimedia learning theory.</p>
</sec>
<sec id="sec4">
<label>4</label>
<title>Haptic cues</title>
<p>Haptic approaches to L2 lexical tone learning involve the use of bodily movements to facilitate and reinforce production and perception of lexical tones. Haptic approaches posit that physical interaction with lexical tone can enhance its cognitive processing and memory retention. Examples of haptic approaches may include hand movements conveying tonal contours or tactile feedback corresponding to pitch changes. One promising haptic approach is gesture production, which entails enactment of specific hand or arm movements to convey lexical tones. This approach capitalizes on the close connection between speech production and bodily movements, as well as the benefit of haptic cues for language learning.</p>
<p>Pitch gesture production improves discrimination and production of L2 lexical tone (<xref ref-type="bibr" rid="ref19">Hannah et al., 2017</xref>). More specifically, producing pitch gestures, rather than merely observing them, leads to better learning outcomes (<xref ref-type="bibr" rid="ref2">Baills et al., 2019</xref>). Producing hand gestures in conjunction with lexical tone not only enhances production of lexical tone but also improves discernment of subtle tonal differences (<xref ref-type="bibr" rid="ref67">Zheng et al., 2018</xref>; <xref ref-type="bibr" rid="ref34">Li et al., 2020</xref>; <xref ref-type="bibr" rid="ref65">Yu et al., 2024</xref>). This suggests that producing hand movements results in deeper understanding of tonal contrasts, enhancing L2 tone acquisition. From a neurological perspective, speech perception and production involve distributed neural networks that encompass not only auditory and motor cortices but also somatosensory and premotor areas (<xref ref-type="bibr" rid="ref15">Guenther and Vladusich, 2012</xref>). This overlap suggests that haptic cues may recruit additional neural resources, resulting in enriched representations of lexical tones.</p>
<p>Despite their potential benefits, haptic approaches to L2 lexical tone acquisition may entail challenges. Firstly, the design and implementation of activities involving haptic cues requires careful consideration. Appropriate gestures or movements must be selected and consistently mapped to lexical tones, ensuring that associations are intuitive and easy to remember. Secondly, explicit instruction and feedback may be necessary to ensure that lexical tones are conveyed accurately via haptic cues. Thirdly, cultural and contextual factors may influence the acceptability and effectiveness of learning approaches involving haptic cues.</p>
<p>Multimodal methods incorporating haptic cues align with the principles of embodied cognition, providing evidence that cognitive processes are grounded in sensorimotor experiences and interactions with the physical world (<xref ref-type="bibr" rid="ref29">Lakoff and Johnson, 2017</xref>; <xref ref-type="bibr" rid="ref54">Shapiro, 2019</xref>). Embodied cognition proposes that recruitment of multiple sensory modalities facilitates acquisition and representation of abstract concepts by activating relevant physical experiences via mental simulation. Mental simulation leads to a stronger connection between acoustic features of tone and embodied experience, fostering more accurate production and perception.</p>
</sec>
<sec id="sec5">
<label>5</label>
<title>Integrated multimodal cues</title>
<p>Research has increasingly explored integration of multimodal cues in the auditory, visual, and haptic modalities to enhance perception and production of L2 lexical tone. This approach focuses on the synergistic effects of engaging multiple sensory channels via complementary sources of information and its reinforcement of the mapping between lexical tones and their depictions. Integration of multiple modalities engages a broad range of cognitive and sensory processes, resulting in effective learning. This enhances attention, memory, and engagement with content, leading to improved acquisition and retention of L2 lexical tone. Thus, integration of visual and haptic cues should enrich representations of lexical tone, enhancing categorization and differentiation of lexical tones. Visual and haptic cues should be consistent with the vertical conceptual metaphor of pitch, which posits that high pitch is associated with upward positions and motion and that low pitch is associated with downward positions and motion. Visual&#x2013;auditory mappings aligned with this metaphor result in accurate and robust representations of L2 lexical tones (<xref ref-type="bibr" rid="ref46">Morett et al., 2022</xref>).</p>
<p>Multimodal approaches may help overcome the challenges associated with learning L2 Mandarin tones (<xref ref-type="bibr" rid="ref50">Pelzl et al., 2022</xref>). Moreover, methods integrating visual and haptic cues are more effective than unimodal methods, highlighting the benefits of multimodality in facilitating L2 lexical tone acquisition (<xref ref-type="bibr" rid="ref14">Godfroid et al., 2017</xref>). However, the effectiveness of multimodality may depend on several factors, such as the specific combination of modalities employed, the design and implementation of instructional materials, and prior tonal language experience. Although the factors discussed here provide explanations for the effectiveness of multimodal approaches, further research is needed to fully understand the underlying mechanisms and to optimize the design and implementation of multimodal instructional approaches to L2 lexical tone acquisition.</p>
</sec>
<sec sec-type="discussion" id="sec6">
<label>6</label>
<title>Discussion</title>
<p>Moving forward, insights from this review can inform development of strategies to enhance L2 tone acquisition. One strategy is to incorporate multimodal cues into existing curricula, leveraging techniques such as pitch gesture observation, pitch gesture production, and images of pitch contours to enhance L2 lexical tone acquisition. However, it is essential to critically evaluate existing instructional methods to determine their efficacy for both teachers and learners. To ensure maximum effectiveness, activities should convey lexical tone intuitively via the vertical conceptual metaphor of pitch.</p>
<p>Although existing research provides insight into how multimodal learning benefits L2 lexical tone acquisition, several topics warrant further investigation. Future research should determine the optimal combination of cues in different modalities by comparing their impacts on L2 lexical tone learning, as assessed via multiple measures. Additionally, research on the cognitive and neural correlates of lexical tone learning is needed to better understand the mechanisms enabling enrichment of representations via multimodal input. Furthermore, development and evaluation of technology-based tools presents opportunities to leverage digital technologies to enhance L2 tone instruction via multimodal learning. Addressing these research gaps will advance the understanding of multimodal learning and its implications for L2 lexical tone acquisition, informing development of practices that facilitate L2 lexical tone learning.</p>
<p>In summary, research illuminating the impact of multimodal cues on L2 lexical tone acquisition presents compelling evidence supporting their efficacy, particularly with respect to observation and production of hand gestures. Incorporating visual and haptic cues from gestures alongside auditory cues provides an enriched learning experience, enhancing perception and production of L2 lexical tone. The research reviewed here underscores the benefits of multimodal approaches, highlighting how visual depictions such as observed pitch gestures and haptic approaches such as gesture production can complement auditory input, resulting in enriched mental representations of L2 lexical tones. Taken together, this work demonstrates that multimodality enriches mental representations of L2 lexical tone, leading to improved learning outcomes.</p>
</sec>
<sec sec-type="author-contributions" id="sec7">
<title>Author contributions</title>
<p>BF: Writing &#x2013; original draft. LM: Writing &#x2013; review &#x0026; editing.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="sec8">
<title>Funding</title>
<p>The authors declare that financial support was received for the research, authorship, and/or publication of this article. LM was funded by US National Science Foundation CAREER award #2140073.</p>
</sec>
<sec sec-type="COI-statement" id="sec9">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="sec10">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antoniou</surname> <given-names>M.</given-names></name> <name><surname>Chin</surname> <given-names>J. L. L.</given-names></name></person-group> (<year>2018</year>). <article-title>What can lexical tone training studies in adults tell us about tone processing in children?</article-title> <source>Front. Psychol.</source> <volume>9</volume>:<fpage>1</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2018.00001</pub-id>, PMID: <pub-id pub-id-type="pmid">29410639</pub-id></citation>
</ref>
<ref id="ref2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baills</surname> <given-names>F.</given-names></name> <name><surname>Su&#x00E1;rez-Gonz&#x00E1;lez</surname> <given-names>N.</given-names></name> <name><surname>Gonz&#x00E1;lez-Fuente</surname> <given-names>S.</given-names></name> <name><surname>Prieto</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Observing and producing pitch gestures facilitates the learning of Mandarin Chinese tones and words</article-title>. <source>Stud. Second. Lang. Acquis.</source> <volume>41</volume>, <fpage>33</fpage>&#x2013;<lpage>58</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0272263118000074</pub-id></citation>
</ref>
<ref id="ref3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bryfonski</surname> <given-names>L.</given-names></name> <name><surname>Ma</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>Effects of implicit versus explicit corrective feedback on Mandarin tone acquisition in a SCMC learning environment</article-title>. <source>Stud. Second. Lang. Acquis.</source> <volume>42</volume>, <fpage>61</fpage>&#x2013;<lpage>88</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0272263119000317</pub-id></citation>
</ref>
<ref id="ref4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burnham</surname> <given-names>D.</given-names></name> <name><surname>Vatikiotis-Bateson</surname> <given-names>E.</given-names></name> <name><surname>Vilela Barbosa</surname> <given-names>A.</given-names></name> <name><surname>Menezes</surname> <given-names>J. V.</given-names></name> <name><surname>Yehia</surname> <given-names>H. C.</given-names></name> <name><surname>Morris</surname> <given-names>R. H.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Seeing lexical tone: head and face motion in production and perception of Cantonese lexical tones</article-title>. <source>Speech Comm.</source> <volume>141</volume>, <fpage>40</fpage>&#x2013;<lpage>55</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.specom.2022.03.011</pub-id></citation>
</ref>
<ref id="ref5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>T. H.</given-names></name> <name><surname>Massaro</surname> <given-names>D. W.</given-names></name></person-group> (<year>2008</year>). <article-title>Seeing pitch: visual information for lexical tones of Mandarin-Chinese</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>123</volume>, <fpage>2356</fpage>&#x2013;<lpage>2366</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.2839004</pub-id>, PMID: <pub-id pub-id-type="pmid">18397038</pub-id></citation>
</ref>
<ref id="ref6">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Chun</surname> <given-names>D.</given-names></name> <name><surname>Jiang</surname> <given-names>Y.</given-names></name> <name><surname>&#x00C1;vila Reyes</surname> <given-names>N.</given-names></name></person-group> (<year>2012</year>). <article-title>Visualization of tone for learning Mandarin Chinese</article-title>. In: <source>Proceedings of the 4th Pronunciation in Second Language Learning and Teaching Conference</source>, (Eds.), <person-group person-group-type="editor"><name><surname>Levis</surname> <given-names>J.</given-names></name> <name><surname>LeVelle</surname> <given-names>K.</given-names></name></person-group>  (<publisher-loc>IA</publisher-loc>: <publisher-name>Iowa State University</publisher-name>). <fpage>77</fpage>&#x2013;<lpage>89</lpage>.</citation>
</ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dick</surname> <given-names>A. S.</given-names></name> <name><surname>Goldin-Meadow</surname> <given-names>S.</given-names></name> <name><surname>Hasson</surname> <given-names>U.</given-names></name> <name><surname>Skipper</surname> <given-names>J. I.</given-names></name> <name><surname>Small</surname> <given-names>S. L.</given-names></name></person-group> (<year>2009</year>). <article-title>Co-speech gestures influence neural activity in brain regions associated with processing semantic information</article-title>. <source>Hum. Brain Mapp.</source> <volume>30</volume>, <fpage>3509</fpage>&#x2013;<lpage>3526</lpage>. doi: <pub-id pub-id-type="doi">10.1002/hbm.20774</pub-id>, PMID: <pub-id pub-id-type="pmid">19384890</pub-id></citation>
</ref>
<ref id="ref8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Duanmu</surname> <given-names>S.</given-names></name>
</person-group> (<year>2007</year>). <source>The phonology of standard Chinese</source>. <edition>2nd</edition> Edn: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="ref9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gandour</surname> <given-names>J. T.</given-names></name>
</person-group> (<year>1983</year>). <article-title>Tone perception in far eastern languages</article-title>. <source>J. Phon.</source> <volume>11</volume>, <fpage>149</fpage>&#x2013;<lpage>175</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0095-4470(19)30813-7</pub-id></citation>
</ref>
<ref id="ref10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gandour</surname> <given-names>J. T.</given-names></name>
</person-group> (<year>2006</year>). &#x201C;<article-title>Brain mapping of Chinese speech prosody</article-title>&#x201D; In: <source>The handbook of east Asian psycholinguistics: volume 1: Chinese</source>. eds. <person-group person-group-type="editor"><name><surname>Bates</surname> <given-names>E.</given-names></name> <name><surname>Tan</surname> <given-names>L. H.</given-names></name> <name><surname>Tzeng</surname> <given-names>O. J. L.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>308</fpage>&#x2013;<lpage>319</lpage>. doi: <pub-id pub-id-type="doi">10.1017/CBO9780511550751.030</pub-id></citation>
</ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gandour</surname> <given-names>J. T.</given-names></name> <name><surname>Dzemidzic</surname> <given-names>M.</given-names></name> <name><surname>Wong</surname> <given-names>D.</given-names></name> <name><surname>Lowe</surname> <given-names>M.</given-names></name> <name><surname>Tong</surname> <given-names>Y.</given-names></name> <name><surname>Hsieh</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Temporal integration of speech prosody is shaped by language experience: an fMRI study</article-title>. <source>Brain Lang.</source> <volume>84</volume>, <fpage>318</fpage>&#x2013;<lpage>336</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0093-934X(02)00505-9</pub-id>, PMID: <pub-id pub-id-type="pmid">12662974</pub-id></citation>
</ref>
<ref id="ref12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gandour</surname> <given-names>J. T.</given-names></name> <name><surname>Krishnan</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Processing tone languages</article-title>&#x201D; in <source>Neurobiology of language</source> (<publisher-loc>San Diego</publisher-loc>: <publisher-name>Elsevier</publisher-name>), <fpage>1095</fpage>&#x2013;<lpage>1107</lpage>. doi: <pub-id pub-id-type="doi">10.1016/B978-0-12-407794-2.00087-0</pub-id></citation>
</ref>
<ref id="ref13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gandour</surname> <given-names>J. T.</given-names></name> <name><surname>Tong</surname> <given-names>Y.</given-names></name> <name><surname>Wong</surname> <given-names>D.</given-names></name> <name><surname>Talavage</surname> <given-names>T.</given-names></name> <name><surname>Dzemidzic</surname> <given-names>M.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Hemispheric roles in the perception of speech prosody</article-title>. <source>Neuroimage</source> <volume>23</volume>, <fpage>344</fpage>&#x2013;<lpage>357</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuroimage.2004.06.004</pub-id>, PMID: <pub-id pub-id-type="pmid">15325382</pub-id></citation>
</ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Godfroid</surname> <given-names>A.</given-names></name> <name><surname>Lin</surname> <given-names>C.-H.</given-names></name> <name><surname>Ryu</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Hearing and seeing tone through color: an efficacy study of web-based, multimodal Chinese tone perception training</article-title>. <source>Lang. Learn.</source> <volume>67</volume>, <fpage>819</fpage>&#x2013;<lpage>857</lpage>. doi: <pub-id pub-id-type="doi">10.1111/lang.12246</pub-id></citation>
</ref>
<ref id="ref15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guenther</surname> <given-names>F. H.</given-names></name> <name><surname>Vladusich</surname> <given-names>T.</given-names></name></person-group> (<year>2012</year>). <article-title>A neural theory of speech acquisition and production</article-title>. <source>J. Neurolinguistics</source> <volume>25</volume>, <fpage>408</fpage>&#x2013;<lpage>422</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jneuroling.2009.08.006</pub-id>, PMID: <pub-id pub-id-type="pmid">22711978</pub-id></citation>
</ref>
<ref id="ref16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guion</surname> <given-names>S. G.</given-names></name> <name><surname>Flege</surname> <given-names>J. E.</given-names></name> <name><surname>Akahane-Yamada</surname> <given-names>R.</given-names></name> <name><surname>Pruitt</surname> <given-names>J. C.</given-names></name></person-group> (<year>2000</year>). <article-title>An investigation of current models of second language speech perception: the case of Japanese adults&#x2019; perception of English consonants</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>107</volume>, <fpage>2711</fpage>&#x2013;<lpage>2724</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.428657</pub-id>, PMID: <pub-id pub-id-type="pmid">10830393</pub-id></citation>
</ref>
<ref id="ref17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gullberg</surname> <given-names>M.</given-names></name>
</person-group> (<year>2006</year>). <article-title>Some reasons for studying gesture and second language acquisition (Hommage &#x00E0; Adam Kendon)</article-title>. <source>Int. Rev. Appl. Linguist. Lang. Teach.</source> <volume>44</volume>, <fpage>103</fpage>&#x2013;<lpage>124</lpage>. doi: <pub-id pub-id-type="doi">10.1515/iral.2006.004</pub-id></citation>
</ref>
<ref id="ref18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gullberg</surname> <given-names>M.</given-names></name>
</person-group> (<year>2022</year>). &#x201C;<article-title>Studying multimodal language processing</article-title>&#x201D; In: <source>The Routledge handbook of second language acquisition and psycholinguistics</source>. eds. <person-group person-group-type="editor"><name><surname>Godfroid</surname> <given-names>A.</given-names></name> <name><surname>Hopp</surname> <given-names>H.</given-names></name></person-group>. <edition>1st</edition> ed (<publisher-loc>New York</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>137</fpage>&#x2013;<lpage>149</lpage>. doi: <pub-id pub-id-type="doi">10.4324/9781003018872-14</pub-id></citation>
</ref>
<ref id="ref19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hannah</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Jongman</surname> <given-names>A.</given-names></name> <name><surname>Sereno</surname> <given-names>J. A.</given-names></name> <name><surname>Cao</surname> <given-names>J.</given-names></name> <name><surname>Nie</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Cross-modal association between auditory and visuospatial information in Mandarin tone perception in noise by native and non-native perceivers</article-title>. <source>Front. Psychol.</source> <volume>8</volume>, <fpage>1</fpage>&#x2013;<lpage>15</lpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2017.02051</pub-id></citation>
</ref>
<ref id="ref20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hao</surname> <given-names>Y.-C.</given-names></name>
</person-group> (<year>2012</year>). <article-title>Second language acquisition of Mandarin Chinese tones by tonal and non-tonal language speakers</article-title>. <source>J. Phon.</source> <volume>40</volume>, <fpage>269</fpage>&#x2013;<lpage>279</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.wocn.2011.11.001</pub-id>, PMID: <pub-id pub-id-type="pmid">30405478</pub-id></citation>
</ref>
<ref id="ref21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hostetter</surname> <given-names>A. B.</given-names></name>
</person-group> (<year>2011</year>). <article-title>When do gestures communicate? A meta-analysis</article-title>. <source>Psychol. Bull.</source> <volume>137</volume>, <fpage>297</fpage>&#x2013;<lpage>315</lpage>. doi: <pub-id pub-id-type="doi">10.1037/a0022128</pub-id>, PMID: <pub-id pub-id-type="pmid">21355631</pub-id></citation>
</ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>T.</given-names></name> <name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Language specificity in speech perception: perception of Mandarin tones by native and nonnative listeners</article-title>. <source>Phonetica</source> <volume>67</volume>, <fpage>243</fpage>&#x2013;<lpage>267</lpage>. doi: <pub-id pub-id-type="doi">10.1159/000327392</pub-id>, PMID: <pub-id pub-id-type="pmid">21525779</pub-id></citation>
</ref>
<ref id="ref23">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Jasmin</surname> <given-names>K.</given-names></name> <name><surname>Sun</surname> <given-names>H.</given-names></name> <name><surname>Tierney</surname> <given-names>A. T.</given-names></name></person-group> (<year>2020</year>). <article-title>Effects of language experience on domain-general perceptual strategies</article-title> bio Rxiv. doi: <pub-id pub-id-type="doi">10.1101/2020.01.02.892943</pub-id></citation>
</ref>
<ref id="ref24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>Y.</given-names></name>
</person-group> (<year>2017</year>). <article-title>Examining the auditory approach: lexical effects in the perceptual judgment of Chinese L2 tone production</article-title>. <source>Chin. a Sec. Lang. Res.</source> <volume>6</volume>, <fpage>225</fpage>&#x2013;<lpage>250</lpage>. doi: <pub-id pub-id-type="doi">10.1515/caslar-2017-0010</pub-id></citation>
</ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaan</surname> <given-names>E.</given-names></name> <name><surname>Wayland</surname> <given-names>R.</given-names></name> <name><surname>Keil</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Changes in oscillatory brain networks after lexical tone training</article-title>. <source>Brain Sci.</source> <volume>3</volume>:<fpage>2</fpage>. doi: <pub-id pub-id-type="doi">10.3390/brainsci3020757</pub-id></citation>
</ref>
<ref id="ref26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krahmer</surname> <given-names>E.</given-names></name> <name><surname>Swerts</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>The effects of visual beats on prosodic prominence: acoustic analyses, auditory perception and visual perception</article-title>. <source>J. Mem. Lang.</source> <volume>57</volume>, <fpage>396</fpage>&#x2013;<lpage>414</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jml.2007.06.005</pub-id></citation>
</ref>
<ref id="ref27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lacombe</surname> <given-names>N.</given-names></name> <name><surname>Dias</surname> <given-names>T.</given-names></name> <name><surname>Petitpierre</surname> <given-names>G.</given-names></name></person-group> (<year>2022</year>). <article-title>Can gestures give us access to thought? A systematic literature review on the role of co-thought and co-speech gestures in children with intellectual disabilities</article-title>. <source>J. Nonverbal Behav.</source> <volume>46</volume>, <fpage>119</fpage>&#x2013;<lpage>136</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10919-022-00396-4</pub-id></citation>
</ref>
<ref id="ref28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ladefoged</surname> <given-names>P.</given-names></name> <name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <source>A course in phonetics</source>. <edition>7th</edition> Edn. <publisher-loc>Stamford, CT</publisher-loc>: <publisher-name>Cengage Learning</publisher-name>.</citation>
</ref>
<ref id="ref29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lakoff</surname> <given-names>G.</given-names></name> <name><surname>Johnson</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <source>Metaphors we live by</source>. <publisher-loc>Chicago</publisher-loc>: <publisher-name>University of Chicago Press</publisher-name>.</citation>
</ref>
<ref id="ref30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Leather</surname> <given-names>J.</given-names></name>
</person-group> (<year>1990</year>). &#x201C;<article-title>Perceptual and productive learning of Chinese lexical tone by Dutch and English speakers</article-title>&#x201D; In: <source>New Sounds 90: Proceedings of the Amsterdam symposium on the Acquisition of Second Language Speech</source>. eds. <person-group person-group-type="editor"><name><surname>Leather</surname> <given-names>J.</given-names></name> <name><surname>James</surname> <given-names>A.</given-names></name></person-group>. (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>University of Amsterdam</publisher-name>), <fpage>305</fpage>&#x2013;<lpage>341</lpage>.</citation>
</ref>
<ref id="ref31">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Leather</surname> <given-names>J.</given-names></name>
</person-group> (<year>2011</year>). <source>Interrelation of perceptual and productive learning in the initial acquisition of second-language tone</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>De Gruyter Mouton</publisher-name>. <fpage>75</fpage>&#x2013;<lpage>102</lpage>. doi: <pub-id pub-id-type="doi">10.1515/9783110882933.75</pub-id></citation>
</ref>
<ref id="ref32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>A. H.</given-names></name> <name><surname>Lyster</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>The effects of corrective feedback on instructed L2 speech perception</article-title>. <source>Stud. Second. Lang. Acquis.</source> <volume>38</volume>, <fpage>35</fpage>&#x2013;<lpage>64</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0272263115000194</pub-id></citation>
</ref>
<ref id="ref33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>T. N.</given-names></name> <name><surname>Kirkhart</surname> <given-names>M. W.</given-names></name></person-group> (<year>2022</year>). &#x201C;<article-title>Researching the effect of gestures on the learning and retention of vocabulary in a naturalistic setting</article-title>&#x201D; In: <source>Gesture and multimodality in second language acquisition: a research guide</source>. eds. <person-group person-group-type="editor"><name><surname>Stam</surname> <given-names>G.</given-names></name> <name><surname>Urbanski</surname> <given-names>K.</given-names></name></person-group>. <edition>1st</edition> ed (<publisher-loc>New York</publisher-loc>: <publisher-name>Routledge</publisher-name>). doi: <pub-id pub-id-type="doi">10.4324/9781003100683</pub-id></citation>
</ref>
<ref id="ref34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Baills</surname> <given-names>F.</given-names></name> <name><surname>Prieto</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Observing and producing durational hand gestures facilitates the pronunciation of novel vowel-length contrasts</article-title>. <source>Stud. Second. Lang. Acquis.</source> <volume>42</volume>, <fpage>1015</fpage>&#x2013;<lpage>1039</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0272263120000054</pub-id></citation>
</ref>
<ref id="ref35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Lai</surname> <given-names>R.</given-names></name> <name><surname>Singh</surname> <given-names>L.</given-names></name> <name><surname>Kalashnikova</surname> <given-names>M.</given-names></name> <name><surname>Wong</surname> <given-names>P. C. M.</given-names></name> <name><surname>Kasisopa</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>The tone atlas of perceptual discriminability and perceptual distance: four tone languages and five language groups</article-title>. <source>Brain Lang.</source> <volume>229</volume>:<fpage>105106</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bandl.2022.105106</pub-id>, PMID: <pub-id pub-id-type="pmid">35390675</pub-id></citation>
</ref>
<ref id="ref36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lively</surname> <given-names>S. E.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Yamada</surname> <given-names>R. A.</given-names></name> <name><surname>Tohkura</surname> <given-names>Y.</given-names></name> <name><surname>Yamada</surname> <given-names>T.</given-names></name></person-group> (<year>1994</year>). <article-title>Training Japanese listeners to identify English/r/and/l/. III. Long-term retention of new phonetic categories</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>96</volume>, <fpage>2076</fpage>&#x2013;<lpage>2087</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.410149</pub-id>, PMID: <pub-id pub-id-type="pmid">7963022</pub-id></citation>
</ref>
<ref id="ref37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Macedonia</surname> <given-names>M.</given-names></name> <name><surname>Kepler</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Three good reasons why foreign language instructors need neuroscience</article-title>. <source>J. Stud. Educ.</source> <volume>3</volume>:<fpage>1</fpage>. doi: <pub-id pub-id-type="doi">10.5296/jse.v3i4.4168</pub-id></citation>
</ref>
<ref id="ref38">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Maddieson</surname> <given-names>I.</given-names></name>
</person-group> (<year>2013</year>). &#x201C;<article-title>Tone</article-title>&#x201D; in <source>The world atlas of language structures online. WALS online (v2020.3) [data set]</source>. eds. <person-group person-group-type="editor"><name><surname>Dryer</surname> <given-names>M.</given-names></name> <name><surname>Haspelmath</surname> <given-names>M.</given-names></name></person-group>. Available at: <ext-link xlink:href="https://zenodo.org/record/7385533" ext-link-type="uri">https://zenodo.org/record/7385533</ext-link></citation>
</ref>
<ref id="ref39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mayer</surname> <given-names>R. E.</given-names></name>
</person-group> (<year>2005</year>). &#x201C;<article-title>Cognitive theory of multimedia learning</article-title>&#x201D; In: <source>The Cambridge handbook of multimedia learning</source>. ed. <person-group person-group-type="editor">
<name><surname>Mayer</surname> <given-names>R.</given-names></name>
</person-group>. (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>31</fpage>&#x2013;<lpage>48</lpage>. doi: <pub-id pub-id-type="doi">10.1017/CBO9780511816819.004</pub-id></citation>
</ref>
<ref id="ref40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mayer</surname> <given-names>R. E.</given-names></name>
</person-group> (<year>2009</year>). <source>Multimedia learning</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation>
</ref>
<ref id="ref41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mayer</surname> <given-names>R. E.</given-names></name>
</person-group> (<year>2014</year>). <article-title>Incorporating motivation into multimedia learning</article-title>. <source>Learn. Instr.</source> <volume>29</volume>, <fpage>171</fpage>&#x2013;<lpage>173</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.learninstruc.2013.04.003</pub-id>, PMID: <pub-id pub-id-type="pmid">38317141</pub-id></citation>
</ref>
<ref id="ref42">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Mayer</surname> <given-names>R. E.</given-names></name> <name><surname>Moreno</surname> <given-names>R.</given-names></name></person-group> (<year>1998</year>). <article-title>A cognitive theory of multimedia learning: implications for design principles</article-title>. <source>J. Educ. Psychol.</source> <volume>91</volume>, <fpage>358</fpage>&#x2013;<lpage>368</lpage>.</citation>
</ref>
<ref id="ref43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCafferty</surname> <given-names>S. G.</given-names></name>
</person-group> (<year>2004</year>). <article-title>Space for cognition: gesture and second language learning</article-title>. <source>Int. J. Appl. Linguist.</source> <volume>14</volume>, <fpage>148</fpage>&#x2013;<lpage>165</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1473-4192.2004.0057m.x</pub-id></citation>
</ref>
<ref id="ref44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morett</surname> <given-names>L. M.</given-names></name>
</person-group> (<year>2023</year>). <article-title>Observing gesture at learning enhances subsequent phonological and semantic processing of L2 words: an N400 study</article-title>. <source>Brain Lang.</source> <volume>246</volume>:<fpage>105327</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bandl.2023.105327</pub-id>, PMID: <pub-id pub-id-type="pmid">37804717</pub-id></citation>
</ref>
<ref id="ref45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morett</surname> <given-names>L. M.</given-names></name> <name><surname>Chang</surname> <given-names>L.-Y.</given-names></name></person-group> (<year>2015</year>). <article-title>Emphasising sound and meaning: pitch gestures enhance Mandarin lexical tone acquisition</article-title>. <source>Lang. Cogn. Neurosci.</source> <volume>30</volume>, <fpage>347</fpage>&#x2013;<lpage>353</lpage>. doi: <pub-id pub-id-type="doi">10.1080/23273798.2014.923105</pub-id></citation>
</ref>
<ref id="ref46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morett</surname> <given-names>L. M.</given-names></name> <name><surname>Feiler</surname> <given-names>J. B.</given-names></name> <name><surname>Getz</surname> <given-names>L. M.</given-names></name></person-group> (<year>2022</year>). <article-title>Elucidating the influences of embodiment and conceptual metaphor on lexical and non-speech tone learning</article-title>. <source>Cognition</source> <volume>222</volume>:<fpage>105014</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2022.105014</pub-id>, PMID: <pub-id pub-id-type="pmid">35033864</pub-id></citation>
</ref>
<ref id="ref47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paivio</surname> <given-names>A.</given-names></name>
</person-group> (<year>1991</year>). <article-title>Dual coding theory: retrospect and current status</article-title>. <source>Can. J. Psychol.</source> <volume>45</volume>, <fpage>255</fpage>&#x2013;<lpage>287</lpage>. doi: <pub-id pub-id-type="doi">10.1037/h0084295</pub-id>, PMID: <pub-id pub-id-type="pmid">17651011</pub-id></citation>
</ref>
<ref id="ref48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Paivio</surname> <given-names>A.</given-names></name>
</person-group> (<year>2014a</year>). &#x201C;<article-title>Bilingual dual coding theory and memory</article-title>&#x201D; in <source>Foundations of bilingual memory</source>. eds. <person-group person-group-type="editor"><name><surname>Heredia</surname> <given-names>R. R.</given-names></name> <name><surname>Altarriba</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>New York</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>41</fpage>&#x2013;<lpage>62</lpage>.</citation>
</ref>
<ref id="ref49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paivio</surname> <given-names>A.</given-names></name>
</person-group> (<year>2014b</year>). <article-title>Intelligence, dual coding theory, and the brain</article-title>. <source>Intelligence</source> <volume>47</volume>, <fpage>141</fpage>&#x2013;<lpage>158</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.intell.2014.09.002</pub-id></citation>
</ref>
<ref id="ref50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pelzl</surname> <given-names>E.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Qi</surname> <given-names>C.</given-names></name></person-group> (<year>2022</year>). <article-title>Native language experience with tones influences both phonetic and lexical processes when acquiring a second tonal language</article-title>. <source>J. Phon.</source> <volume>95</volume>:<fpage>101197</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.wocn.2022.101197</pub-id></citation>
</ref>
<ref id="ref51">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Lively</surname> <given-names>S. E.</given-names></name></person-group> (<year>1995</year>). &#x201C;<article-title>Variability and invariance in speech perception: a new look at old problems in perceptual learning</article-title>&#x201D; In:  <source>Speech perception and linguistic experience: issues in cross-language research</source>. ed. <person-group person-group-type="editor">
<name><surname>Strange</surname> <given-names>W.</given-names></name>
</person-group>. (<publisher-loc>Timonium, MD</publisher-loc>: <publisher-name>York Press</publisher-name>), <fpage>433</fpage>&#x2013;<lpage>459</lpage>.</citation>
</ref>
<ref id="ref52">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Saito</surname> <given-names>K.</given-names></name>
</person-group> (<year>2021</year>). &#x201C;<article-title>Effects of corrective feedback on second language pronunciation development</article-title>&#x201D; In: <source>The Cambridge handbook of corrective feedback in second language learning and teaching</source>. eds. <person-group person-group-type="editor"><name><surname>Kartchava</surname> <given-names>E.</given-names></name> <name><surname>Nassaji</surname> <given-names>H.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>407</fpage>&#x2013;<lpage>428</lpage>.</citation>
</ref>
<ref id="ref53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shams</surname> <given-names>L.</given-names></name> <name><surname>Seitz</surname> <given-names>A. R.</given-names></name></person-group> (<year>2008</year>). <article-title>Benefits of multisensory learning</article-title>. <source>Trends Cogn. Sci.</source> <volume>12</volume>, <fpage>411</fpage>&#x2013;<lpage>417</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tics.2008.07.006</pub-id>, PMID: <pub-id pub-id-type="pmid">18805039</pub-id></citation>
</ref>
<ref id="ref54">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shapiro</surname> <given-names>L.</given-names></name>
</person-group> (<year>2019</year>). <source>Embodied cognition</source>. <edition>2nd</edition> Edn. <publisher-loc>. New York, NY</publisher-loc>: <publisher-name>Routledge</publisher-name>.</citation>
</ref>
<ref id="ref55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Behne</surname> <given-names>D. M.</given-names></name> <name><surname>Jongman</surname> <given-names>A.</given-names></name> <name><surname>Sereno</surname> <given-names>J. A.</given-names></name></person-group> (<year>2004</year>). <article-title>The role of linguistic experience in the hemispheric processing of lexical tone</article-title>. <source>Appl. Psycholinguist.</source> <volume>25</volume>, <fpage>449</fpage>&#x2013;<lpage>466</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0142716404001213</pub-id>, PMID: <pub-id pub-id-type="pmid">11305893</pub-id></citation>
</ref>
<ref id="ref56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Jongman</surname> <given-names>A.</given-names></name> <name><surname>Sereno</surname> <given-names>J. A.</given-names></name></person-group> (<year>2003</year>). <article-title>Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>113</volume>, <fpage>1033</fpage>&#x2013;<lpage>1043</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.1531176</pub-id>, PMID: <pub-id pub-id-type="pmid">12597196</pub-id></citation>
</ref>
<ref id="ref57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>T.</given-names></name> <name><surname>Potter</surname> <given-names>C. E.</given-names></name> <name><surname>Saffran</surname> <given-names>J. R.</given-names></name></person-group> (<year>2020</year>). <article-title>Plasticity in second language learning: the case of Mandarin tones</article-title>. <source>Lang. Learn. Dev.</source> <volume>16</volume>, <fpage>231</fpage>&#x2013;<lpage>243</lpage>. doi: <pub-id pub-id-type="doi">10.1080/15475441.2020.1737072</pub-id>, PMID: <pub-id pub-id-type="pmid">33716583</pub-id></citation>
</ref>
<ref id="ref58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Sereno</surname> <given-names>J. A.</given-names></name> <name><surname>Jongman</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). &#x201C;<article-title>L2 acquisition and processing of Mandarin tones</article-title>&#x201D; in <source>The handbook of east Asian psycholinguistics: volume 1: Chinese</source>. eds. <person-group person-group-type="editor"><name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Tan</surname> <given-names>L. H.</given-names></name> <name><surname>Bates</surname> <given-names>E.</given-names></name> <name><surname>Tzeng</surname> <given-names>O. J. L.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>250</fpage>&#x2013;<lpage>256</lpage>.</citation>
</ref>
<ref id="ref59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Spence</surname> <given-names>M. M.</given-names></name> <name><surname>Jongman</surname> <given-names>A.</given-names></name> <name><surname>Sereno</surname> <given-names>J. A.</given-names></name></person-group> (<year>1999</year>). <article-title>Training American listeners to perceive Mandarin tones</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>106</volume>, <fpage>3649</fpage>&#x2013;<lpage>3658</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.428217</pub-id>, PMID: <pub-id pub-id-type="pmid">10615703</pub-id></citation>
</ref>
<ref id="ref60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wayland</surname> <given-names>R. P.</given-names></name> <name><surname>Guion</surname> <given-names>S. G.</given-names></name></person-group> (<year>2004</year>). <article-title>Training English and Chinese listeners to perceive Thai tones: a preliminary report</article-title>. <source>Lang. Learn.</source> <volume>54</volume>, <fpage>681</fpage>&#x2013;<lpage>712</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1467-9922.2004.00283.x</pub-id></citation>
</ref>
<ref id="ref61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xi</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Shu</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name></person-group> (<year>2010</year>). <article-title>Categorical perception of lexical tones in Chinese revealed by mismatch negativity</article-title>. <source>Neuroscience</source> <volume>170</volume>, <fpage>223</fpage>&#x2013;<lpage>231</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuroscience.2010.06.077</pub-id>, PMID: <pub-id pub-id-type="pmid">20633613</pub-id></citation>
</ref>
<ref id="ref62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>X.</given-names></name> <name><surname>Myers</surname> <given-names>E.</given-names></name></person-group> (<year>2015</year>). <article-title>The impact of musical training and tone language experience on talker identification</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>137</volume>, <fpage>419</fpage>&#x2013;<lpage>432</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.4904699</pub-id>, PMID: <pub-id pub-id-type="pmid">25618071</pub-id></citation>
</ref>
<ref id="ref63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Gandour</surname> <given-names>J.</given-names></name> <name><surname>Talavage</surname> <given-names>T.</given-names></name> <name><surname>Wong</surname> <given-names>D.</given-names></name> <name><surname>Dzemidzic</surname> <given-names>M.</given-names></name> <name><surname>Tong</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Activation of the left planum temporale in pitch processing is shaped by language experience</article-title>. <source>Hum. Brain Mapp.</source> <volume>27</volume>, <fpage>173</fpage>&#x2013;<lpage>183</lpage>. doi: <pub-id pub-id-type="doi">10.1002/hbm.20176</pub-id>, PMID: <pub-id pub-id-type="pmid">16035045</pub-id></citation>
</ref>
<ref id="ref64">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yip</surname> <given-names>M.</given-names></name>
</person-group> (<year>2002</year>). &#x201C;<article-title>Introduction</article-title>&#x201D; in <source>Tone</source> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>16</lpage>.</citation>
</ref>
<ref id="ref65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Cai</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Production rather than observation: comparison between the roles of embodiment and conceptual metaphor in L2 lexical tone learning</article-title>. <source>Learn. Instr.</source> <volume>92</volume>:<fpage>101905</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.learninstruc.2024.101905</pub-id></citation>
</ref>
<ref id="ref66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Ding</surname> <given-names>R.</given-names></name> <name><surname>Frassinelli</surname> <given-names>D.</given-names></name> <name><surname>Tuomainen</surname> <given-names>J.</given-names></name> <name><surname>Klavinskis-Whiting</surname> <given-names>S.</given-names></name> <name><surname>Vigliocco</surname> <given-names>G.</given-names></name></person-group> (<year>2023</year>). <article-title>The role of multimodal cues in second language comprehension</article-title>. <source>Sci. Rep.</source> <volume>13</volume>:<fpage>20824</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-023-47643-2</pub-id></citation>
</ref>
<ref id="ref67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>A.</given-names></name> <name><surname>Hirata</surname> <given-names>Y.</given-names></name> <name><surname>Kelly</surname> <given-names>S. D.</given-names></name></person-group> (<year>2018</year>). <article-title>Exploring the effects of imitating hand gestures and head nods on L1 and L2 Mandarin tone production</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>61</volume>, <fpage>2179</fpage>&#x2013;<lpage>2195</lpage>. doi: <pub-id pub-id-type="doi">10.1044/2018_JSLHR-S-17-0481</pub-id>, PMID: <pub-id pub-id-type="pmid">30193334</pub-id></citation>
</ref>
<ref id="ref68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>A.</given-names></name> <name><surname>Olson</surname> <given-names>D.</given-names></name></person-group> (<year>2023</year>). <article-title>The use of visual feedback to train L2 lexical tone: evidence from Mandarin phonetic acquisition</article-title>. <source>Pronun. Sec. Lang. Learn. Teach. Proc.</source> <volume>13</volume>:<fpage>1</fpage>. doi: <pub-id pub-id-type="doi">10.31274/psllt.15715</pub-id></citation>
</ref>
</ref-list>
</back>
</article>