<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="brief-report">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Sci.</journal-id>
<journal-title>Frontiers in Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Sci.</abbrev-journal-title>
<issn pub-type="epub">2624-9898</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomp.2022.885657</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computer Science</subject>
<subj-group>
<subject>Perspective</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Thoughts on the usage of audible smiling in speech synthesis applications</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Trouvain</surname> <given-names>J&#x000FC;rgen</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1226103/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Weiss</surname> <given-names>Benjamin</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1074592/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Language Science and Technology, Saarland University</institution>, <addr-line>Saarbr&#x000FC;cken</addr-line>, <country>Germany</country></aff>
<aff id="aff2"><sup>2</sup><institution>Quality and Usability Lab, Technische Universit&#x000E4;t Berlin</institution>, <addr-line>Berlin</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Oliver Niebuhr, University of Southern Denmark, Denmark</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Yonghong Yan, Institute of Acoustics (CAS), China; Simon Stone, Technical University Dresden, Germany</p></fn>
<corresp id="c001">&#x0002A;Correspondence: J&#x000FC;rgen Trouvain <email>trouvain&#x00040;lst.uni-saarland.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Computer Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>4</volume>
<elocation-id>885657</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>08</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Trouvain and Weiss.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Trouvain and Weiss</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>In this perspective paper we explore the question how audible smiling can be integrated in speech synthesis applications. In human-human communication, smiling can serve various functions, such as signaling politeness or as a marker of trustworthiness and other aspects that raise and maintain the social likeability of a speaker. However, in human-machine communication, audible smiling is nearly unexplored, but could be an advantage in different applications such as dialog systems. The rather limited knowledge of the details of audible smiling and their exploitation for speech synthesis applications is a great challenge. This is also true for modeling smiling in spoken dialogs and testing it with users. Thus, this paper argues to fill the research gaps in identifying factors that constitute and affect audible smiling in order to incorporate it in speech synthesis applications. The major claim is to focus on the dynamics of audible smiling on various levels.</p></abstract>
<kwd-group>
<kwd>speech synthesis</kwd>
<kwd>social signaling</kwd>
<kwd>computational paralinguistics</kwd>
<kwd>smiling</kwd>
<kwd>trustworthiness</kwd>
</kwd-group>
<contract-sponsor id="cn001">Universit&#x000E4;t des Saarlandes<named-content content-type="fundref-id">10.13039/501100005690</named-content></contract-sponsor>
<counts>
<fig-count count="1"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="50"/>
<page-count count="7"/>
<word-count count="5452"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Would users appreciate audible smiling in speech produced by machines, be it text-to-speech (TTS) systems on computers, virtual agents or social robots? If yes, how should the synthesis of audible smiling be approached? There is evidence that smiling in human-human interaction is not only visible but also audible (Tartter, <xref ref-type="bibr" rid="B42">1980</xref>; Tartter and Braun, <xref ref-type="bibr" rid="B41">1994</xref>). Visual signals are often seen as primary in face-to-face situations and acoustic signals as secondary. However, in situations restricted to the auditory-acoustic channel, for instance on the telephone, smiling can only be perceived by vocal and consequently acoustic features.</p>
<p>There is a multitude of functions smiling can have in different everyday social settings. Smiling can be interpreted as a marker of friendliness and politeness, it can be used to express amusement and exhilaration, and it is often applied to build trust between speakers because it strongly increases their social likeability. Obviously, there is a great potential for this effective and attractive social signal to be exploited in human-computer interaction (HCI) and we see a need to clarify the complexity of this topic before we start with the development of technical solutions and the testing of their usability. Thus, the aim of this perspective paper is to provide an ordered collection of thoughts on audible smiling in synthetic speech. We identify findings, but also problems, on various levels and suggest approaches for solutions. These critical thoughts are neither meant as &#x0201C;wishful thinking&#x0201D; nor as a feasibility study, but they are supposed to shed light on a potential asset to technical systems in terms of an increased naturalness in the interaction with a human user. Our thoughts are always directed to the <italic>auditory</italic> channel (without ignoring the dominant visual channel).</p>
<p>For the domain of visual smiles, it is understood that dynamic characteristics of facial and head movements inherently contribute to the production and perception of the social signal categorization that is manifested as visible smile (Frank et al., <xref ref-type="bibr" rid="B15">2003</xref>). Quite relevant for audible smiles, this includes duration, as, e.g., videos of smiling faces rated as amused were longer than those rated as polite (Ambadar et al., <xref ref-type="bibr" rid="B1">2009</xref>). Examining the morphological and dynamic characteristics of smiles is required in order to discriminate various functions of smiles (Rychlowska et al., <xref ref-type="bibr" rid="B37">2017</xref>). For the acoustic channel, however, this change of perspective has not yet taken place. So, our claim is to concentrate on the dynamics of smiled speech. This change needs to happen on various levels, as the challenges are</p>
<list list-type="bullet">
<list-item><p><italic>Temporal dynamics:</italic> Since it is highly unlikely that entire utterances are articulated smiling, the factors affecting choice of sections for expressing a social signal with smiling needs to be identified. In a conversation, this comprises when spoken feedback (&#x0201C;u-hu&#x0201D;, &#x0201C;m-hm&#x0201D;, &#x0201C;yeah&#x0201D;) is audibly smiled.</p></list-item>
<list-item><p><italic>Intensity dynamics:</italic> Are there, like with other affective display in speech, degrees and nuances of audible smiling, and what are their regulatory factors? From a perceptual point of view, graded intensity in smiled speech synthesis can be perceived by humans (El Haddad et al., <xref ref-type="bibr" rid="B10">2015</xref>).</p></list-item>
<list-item><p><italic>Social signaling:</italic> Smiling can be analyzed as a referring expression. But it is still unclear, if referents can be distinguished in speech, what they are and which social and affective function is linked with this smile.</p></list-item>
</list>
<p>To meet these challenges, we propose to deepen the research on the acoustic properties, perception, and interpretation of smiled speech and to identify factors affecting its dynamics, e.g., content, discourse markers, and social function/meaning.</p>
</sec>
<sec id="s2">
<title>2. Functions of smiling</title>
<sec>
<title>2.1. Affective-social components</title>
<p>Prototypical associations of smiling are positive affective states such as <italic>happiness</italic> and <italic>joy</italic>, a <italic>good mood</italic> or <italic>contentment</italic>. Smiling can also be used for <italic>seduction</italic> or as an expression of <italic>amusement</italic> (Schr&#x000F6;der et al., <xref ref-type="bibr" rid="B39">1998</xref>), but also to mark <italic>irony</italic>. On the recipient&#x00027;s side, smiling belongs to those social signals that can generate the impression of <italic>interest, friendliness</italic>, and <italic>intimacy</italic> (Floyd and Erbert, <xref ref-type="bibr" rid="B14">2003</xref>; Krumhuber et al., <xref ref-type="bibr" rid="B22">2007</xref>; Burgoon et al., <xref ref-type="bibr" rid="B5">2018</xref>).</p>
<p>These positively associated types of smiling bear an authentic character. In addition there are non-authentic or not genuinely felt types of smiling. Happiness is probably often expressed with a smile, but a smile is not necessarily linked to positive emotions. Examples include situations where negative emotions are masked with an expression of joy or situations in which individuals have feelings of <italic>uncertainty, nervousness</italic> or <italic>embarrassment</italic> (Keltner, <xref ref-type="bibr" rid="B18">1995</xref>). This wide range of meanings of smiling have even been replicated in artificial faces (Ochs et al., <xref ref-type="bibr" rid="B30">2017</xref>). Further examples include the expression of <italic>dominance</italic> toward others. The difference between &#x0201C;authentic/felt&#x0201D; and &#x0201C;non-authentic/non-felt&#x0201D; smiling in the visual channel is mainly reflected by the contraction or non-contraction of the eye-ring muscle (m. orbicularis oculi), the so-called Duchenne-smile (Ekman and Friesen, <xref ref-type="bibr" rid="B9">1982</xref>). This difference in muscle contraction is also the base for separating <italic>trustworthy</italic> from <italic>deceptive</italic> behavior (Ekman and Friesen, <xref ref-type="bibr" rid="B9">1982</xref>). For the visual channel, this fundamental distinction seems to be established, however not for the acoustic channel.</p>
<p>The general impression that a smiling face is regarded as more <italic>attractive</italic> than a non-smiling face is evidenced by numerous studies. For instance, in a Brazilian study smiling faces were considered as happier and even as more attractive than a neutral expression (Otta et al., <xref ref-type="bibr" rid="B34">1982</xref>). Regarding the concept of visual attractiveness, a smile enforces the positive assessment of faces, particularly of females (Lau, <xref ref-type="bibr" rid="B24">1982</xref>). Moreover, females typically smile more when flirting (Moore, <xref ref-type="bibr" rid="B26">1985</xref>). For male faces this effect is not that clear (Mehu et al., <xref ref-type="bibr" rid="B25">2008</xref>; Okubo et al., <xref ref-type="bibr" rid="B33">2015</xref>). The transfer of these findings to synthetic voices is difficult because non-verbal material (photographs and videos), particularly in field research, has no reported relation to co-verbal smiling.</p>
<p>There are several forms of social smiling. They are core features for the display of <italic>politeness</italic> and <italic>friendliness</italic>, but they are not necessarily expressed with a Duchenne marker. A smile can also be used to show <italic>empathy</italic> and <italic>agreement</italic> with somebody else. Studies demonstrate that smiles that were rated as more genuine strongly predict judgments about the <italic>trustworthiness</italic> (Centorrino et al., <xref ref-type="bibr" rid="B7">2015</xref>). In the majority of cases, genuine smiles trigger a reciprocal social action, even in HCI (Kr&#x000E4;mer et al., <xref ref-type="bibr" rid="B21">2013</xref>).</p>
<p>It would generally be helpful if synthesized speech applications could express social functions associated with a smiling voice when appropriate. However, while state-of-the-art findings (Section 3) and data-driven models (Section 5.1) have reached a level to produce audible smiling, a solid basis to confirm or reject differences in audible smiles that refer to different social functions, is not known to us. In addition, other factors regulating the dynamics and location of audible smiling are yet to be modeled. As an example can serve the reciprocity in terms of initiating smiles and smiling back (Arias et al., <xref ref-type="bibr" rid="B2">2018</xref>). While the relevance of reciprocate smiling is evident, is still unknown which conversational sections have to be synthesized and how the exact timing of this reciprocal mechanism works. On a broader lever, however, i.e., by treating this mechanism as synchrony, it could be shown that it is observable throughout whole conversations (Rauzy et al., <xref ref-type="bibr" rid="B35">2022</xref>).</p>
<p>This goal is in line with the aims of social signal processing, e.g., formulated by Vinciarelli et al. (<xref ref-type="bibr" rid="B48">2012</xref>): &#x0201C;a human-centered vision of computing where intelligent machines seamlessly integrate and support human-human interactions, embody natural modes of human communication for interacting with their users [&#x02026;] At its heart, social intelligence aims at correct perception, accurate interpretation, and appropriate display of social signals&#x0201D;. In our view, this demand is still completely blank with respect to appropriate and perceivable synthesis of smiled speech.</p>
</sec>
<sec>
<title>2.2. Cultural interpretations of smiling</title>
<p>It is tempting to assume that positive smiles are always realized as an &#x0201C;authentic&#x0201D; smile with a Duchenne marker. Likewise, it could be assumed that a smile of an unacquainted person is generally perceived as attractive, friendly and definitively positive (e.g., pictures in application letters or on personal homepages). However, there is evidence that in some non-Western cultures an authentic smile is not bound to a Duchenne marker (Thibault et al., <xref ref-type="bibr" rid="B43">2012</xref>). Moreover, in a cross-cultural comparative study investigating face perception with subjects from more than 40 cultures, it could be shown that in some cultures smiling faces of unacquainted persons leave a negative impression on observers (Krys et al., <xref ref-type="bibr" rid="B23">2016</xref>). Thus, smiling <italic>per se</italic> does not necessarily lead to a more positive impression of the perceiver. This could also be the case for audibly transmitted smiling, particularly when coming from a synthesized voice. Another example of cultural diversity regarding the usage of smiling is provided by a study where Chinese and Dutch kindergarten children were asked to play a game&#x02014;either alone or together with peers (Mui et al., <xref ref-type="bibr" rid="B27">2017</xref>). In contrast to the Dutch children, who did not change their smiling behavior between both conditions, the Chinese children smiled more when playing with other children.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Acoustic characteristics of smiled voices</title>
<p>A clear distinction should be made between smiling and laughter. Both concepts can have similar functions and sometimes they are used as synonyms (e.g., the expression &#x0201C;s/he laughed with me&#x0201D; when actually the <italic>smile</italic> of a person was directed to another person). Laughter can occur with much variability and complexity (Truong et al., <xref ref-type="bibr" rid="B47">2019</xref>). Most forms of laughter do not overlap with speech, in contrast to &#x0201C;speech-laughs&#x0201D; (Nwokah et al., <xref ref-type="bibr" rid="B28">1999</xref>; Trouvain, <xref ref-type="bibr" rid="B45">2001</xref>) where laughter occurs while articulating. This &#x0201C;laughed speech&#x0201D; is mainly characterized by a high degree of breathiness together with a vibrato-like voice quality (often only for two syllables) and thus differs from &#x0201C;smiled speech&#x0201D; (Trouvain, <xref ref-type="bibr" rid="B45">2001</xref>; Erickson et al., <xref ref-type="bibr" rid="B13">2009</xref>).</p>
<p>Various studies were able to show that smiling is also perceivable from speech and without visual information (Tartter, <xref ref-type="bibr" rid="B42">1980</xref>; Tartter and Braun, <xref ref-type="bibr" rid="B41">1994</xref>). Utterances produced with a non-emotional mechanical lip spreading are perceived as being more &#x0201C;smiled&#x0201D; than utterances without lip spreading (Robson and Beck, <xref ref-type="bibr" rid="B36">1999</xref>). Perceivable smiled speech can be explained with changes of various acoustic parameters: Compared to non-smiled speech the fundamental frequency (F0) is higher due to a higher overall muscular tension, the second formant (F2) is higher due to a shortened vocal tract from lip spreading and a raised larynx. By articulatory synthesis, it could be verified that all three factors have a perceptual effect, but combined, the audible smiling is stronger (Stone et al., <xref ref-type="bibr" rid="B40">2022</xref>). These effects can also be observed for the high unrounded front vowel [i], in contrast to vowels that are (more) rounded and/or lower and/or further back, e.g., [o]. This reflects Ohala&#x00027;s &#x0201C;i-face&#x0201D; for smiling and &#x0201C;o-face&#x0201D; for threatening (Ohala, <xref ref-type="bibr" rid="B31">1980</xref>, <xref ref-type="bibr" rid="B32">1984</xref>). The described tendencies have been confirmed by later studies (Schr&#x000F6;der et al., <xref ref-type="bibr" rid="B39">1998</xref>; Drahota et al., <xref ref-type="bibr" rid="B8">2008</xref>).</p>
<p>The perception of smiling from voice also depends on the perceived intensity. A cross-lingual study (Emond et al., <xref ref-type="bibr" rid="B11">2016</xref>) showed that listeners need more time to recognize a mild smile compared to more intense smiles. The same study also revealed a linguistic advantage for the recognition of audible smiles. Listeners were slower in smiling detection and recognized fewer smiles when they did not share the same accent or the same language as the speakers.</p>
<p>Further studies are needed to achieve a more differentiated overview of phonetic parameters such as intensity, duration and voice quality in smiled speech. A particular focus should be on perception, especially the timing of smiling in dialogues, the perceived intensity of audible smiling, and the cross-modal aspects of smiling perception.</p>
</sec>
<sec id="s4">
<title>4. Possible applications</title>
<p>Often the motivation of researchers and developers is to make machines more human-like. An example for this positive transfer is a study with a human-like virtual agent where adult subjects smiled longer at the robots when the robots showed some (visual) smile as well. This means that the reciprocal smile was increased on the side of the users (Kr&#x000E4;mer et al., <xref ref-type="bibr" rid="B21">2013</xref>). However, it is not clear whether human users really benefit from a smiling interaction with a machine. For instance, a study where children (9 years) used social robots as learning tutors showed that the children achieved better results when the robots did not act in a friendly way (Kennedy et al., <xref ref-type="bibr" rid="B19">2017</xref>). This illustrates the need for shifting from mere synthesis of smiling toward proper manifestation of audible smiles as expression of a specific social meaning, for which a solid basis on constituting factors is required.</p>
<sec>
<title>4.1. Audiobooks</title>
<p>Audiobooks are a wide field of applications for synthetic speech. In audiobook productions using human voices, direct speech of various characters in fictional literature can either be displayed by different professional speakers or by the same speaker who uses different voice qualities for the characters. In synthesized audiobooks, a given character or situation could be displayed by a &#x0201C;smiled voice&#x0201D;. A requirement for an appropriate application of smiled speech synthesis would be a text analysis tool that finds those portions of direct speech where smiling fits. This could either be done by finding words of the semantic field of smiling (e.g., grin, mischievous, friendly) or by a sentiment analysis directed to friendliness, politeness and further functions of smiling.</p>
</sec>
<sec>
<title>4.2. Social robots</title>
<p>In contrast to virtual (embodied conversational) agents where a high-quality animation of the facial expression is possible (Ochs et al., <xref ref-type="bibr" rid="B29">2010</xref>), many social robots without a display head like Pepper or Nao, do not have the possibility to generate a visible smile. An audible smile could be helpful as a social signal to avoid an uncanny valley effect. Virtual agents with visual smiling were regarded as friendlier and more attractive than those without, and smiling also enforces the impression of extroversion (Cafaro et al., <xref ref-type="bibr" rid="B6">2012</xref>).</p>
<p>A special dimension is opened up when social robots have children as users, for instance care takers in nursing homes or training dolls for autistic children. In the interaction between children and social robots an increased degree of familiarity and trust seems to be substantial. Important components to achieve this are non-verbal behavior, feedback control and other forms of interaction management (Belpaeme et al., <xref ref-type="bibr" rid="B3">2018</xref>). Smiling, including audible smiling, can also play a relevant role in this context.</p>
</sec>
<sec>
<title>4.3. Dialog systems</title>
<p>The coordinated interaction of conversations depends on proper timing of production or even missing production of spoken signals to convey meaning and to ensure the conversational flow. This kind of coordination comprises back-channel and turn-taking signals (Enfield, <xref ref-type="bibr" rid="B12">2017</xref>) as well as (automatic) convergence (Branigan et al., <xref ref-type="bibr" rid="B4">2010</xref>). Both kinds are potentially subject to audible smiles, but only for the latter, empirical evidence is known to the best of our knowledge.</p>
<p>This kind of convergence could also be observed by Kr&#x000E4;mer et al. (<xref ref-type="bibr" rid="B21">2013</xref>) where adult subjects smiled for a longer time with artificial agents when also the agent showed a (visual) smile. This is in line with Torre et al. (<xref ref-type="bibr" rid="B44">2020</xref>) who directly tested audible smiling in a gaming scenario and found increased trustworthiness even in contradicting behavioral evidence.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Smiled synthetic speech</title>
<p>The challenge for integrating audible smiling in speech synthesis can be regarded at different levels. For different methods of <italic>signal generation</italic> the limited knowledge about audible smiling in humans should be exploited. <italic>Modeling</italic> audible smiling in dialogs requires the control of temporal, discourse-relevant and cultural aspects, in addition to the signal generation. Last but not least the <italic>evaluation</italic> of appropriateness in given applications represents the third component.</p>
<sec>
<title>5.1. Signal generation</title>
<p>Articulatory synthesis would be an obvious choice for verifying the perceptual validity of findings concerning the properties of smiled speech. Testing and verifying such analytic results for perception, in our domain social signals manifested by smiling and its interplay with phonetic dynamics, testing and verifying such analytic results for perception could greatly benefit from articulatory synthesis to produce intensity and dynamic nuances in a controlled way. However, given the current increase in (and maybe demand for) high signal quality, a data-driven approach seems also advisable. An early attempt of smiled speech synthesis for HMM synthesis, utilizing parallel corpora, confirms the perceptual effect of smiling intensity, but also reveals issues with naturalness (El Haddad et al., <xref ref-type="bibr" rid="B10">2015</xref>). These, however, seem to be overcome in more recent work (Kirkland et al., <xref ref-type="bibr" rid="B20">2021</xref>). Still, the typical limitation of data-driven synthesis, i.e., difficulty to draw conceptual conclusions like identifying relevant factors, is not overcome.</p>
</sec>
<sec>
<title>5.2. Modeling</title>
<p>Using smiled synthetic speech in real-world applications requires a contextually appropriate control of the synthesis that considers content and culture when selecting sections to be produced as smiled. An automatic symbolic annotation of those sections requires a language- and culture-specific model and a sentiment analysis of the text to be synthesized.</p>
<p>Interactive applications could be enhanced by the generation of discourse-dependent social signals, for instance reciprocal smiling. This in return requires a clarification how smiling in human-human interaction is distinct in audio-visual from audio-only situations. In general, we have to model the audible smiling with regard to its timing (duration, start and end relative to speech) and acoustic quality (distinctiveness to non-smiled speech).</p>
</sec>
<sec>
<title>5.3. Evaluation</title>
<p>Since smiling has so many functions, an evaluation must monitor the matching of the intention of the generated smile and the interpretation of the perceived smile in given situations. A general assessment with mean opinion scores seems not to be the right way for evaluation but <italic>contextual appropriateness</italic> as demanded by Wagner et al. (<xref ref-type="bibr" rid="B49">2019</xref>). Thus, the needs and preferences of the users of synthetic speech must be tested, preferably in a behavioral paradigm, i.e., not (solely) by explicit ratings, but observed behavioral (gamified) choices instead. Smiled voice should not be regarded as one style of expressiveness, but as a carrier mechanism to transport many different expressions&#x02014;with each expressive function evaluated separately as illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Different functions of smiling as evaluation parameters to test the appropriateness of signal generation, application, acoustic quality, and timing.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-885657-g0001.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s6">
<title>6. Discussion</title>
<p>There is no doubt that smiling serves a multitude of important social functions. Currently, politeness and friendliness are not yet the core features of synthetic voices. Although the call for more and better expressiveness in speech synthesis has been around for a while (Schr&#x000F6;der, <xref ref-type="bibr" rid="B38">2009</xref>), there have been hardly any attempts to tackle this challenge for smiled voices.</p>
<p>For audio books, it can clearly be beneficial to use smiled voices. In dialogical applications, there may be an advantage when users are made aware of smiled voice. Irritations for users evoked by such a human-like unfamiliarity and unexpected peculiarity should be avoided.</p>
<p>Should speaking machines be able to smile? Anthropomorphizing of non-human objects is a possibility which can be applied to building trust. However, it could also lead to disappointed expectations regarding social competence or even to an &#x0201C;uncanny valley of mind&#x0201D; (Gray and Wegner, <xref ref-type="bibr" rid="B16">2012</xref>).</p>
<p>What should be the next steps in the upcoming years (or decades) in order to achieve a more human-like smiling behavior in speech synthesis? There is a research gap for smiled voice in many respects. Research in voice attractiveness still lacks vocal aspects of smiling as an effective mean of sexy, likable and charismatic speakers (Weiss et al., <xref ref-type="bibr" rid="B50">2020</xref>). The majority of research in human smiling is exclusively concentrated on the visual channel. In this research direction, the main objects of study are pictures of faces (often without glasses and beards, and face masks). It lacks the temporal dynamics, the changing intensity of the smile, and the situational and verbal context in which the smile occurred. These can be very important features when modeling smiling in speech (for synthesis or other applications).</p>
<p>In addition, the relation between the visual and the acoustic information is under-explored, particularly in talk-in-interaction (between humans and in HCI). Moreover, situations with machine-aided communication, such as human-robot interaction or as a training device for autistic children, require a thorough understanding of the effects of smiling in the audio-only and the audio-visual modalities as well as in &#x0201C;smile-in-interaction&#x0201D;.</p>
<p>Although our thoughts aim at the audio-only aspects of synthesized speech, it can of course also be useful when thinking about audio-visual aspects in speech synthesis, as e.g., in embodied conversational agents or in social robots.</p>
<p>Based on the presented, albeit limited, state description, we argue to fill the identified gaps. While in principle, audible smiling can already be synthesizes for a given duration of speech, the challenges are in quantitative models that incorporate the communicative factors of smiling function (<xref ref-type="fig" rid="F1">Figure 1</xref>), and timing and dynamics of audible smiling in their interrelation to the linguistic and coordinating properties of speech, like smiling duration and intensity within phrases and turns. With such models, we expect a major advance in communicative meaningful synthesis of, e.g., audio books&#x00027; or artificial agents&#x00027; speech. In short, basic research is needed with respect to (i) when exactly, (ii) to which degree, and (iii) for which purpose humans smile in spoken interaction. The research gap also concerns the phonetic aspects of smiled speech. How do acoustics and perception interact? How is visual information processed in combination with acoustic information in speech? How can manipulations be evaluated? Research and development both require more (annotated) data which currently do not exist in the quality and amount needed. We did not regard other social factors, like gender (Hiersch et al., <xref ref-type="bibr" rid="B17">2022</xref>) or status, which are known to affect overall amounts of smiling display, but which we expects not as impactful at this particular state of research. Taken together, we consider our thoughts on audible smiled speech as a contribution that helps to further develop <italic>social signal processing</italic> (Vinciarelli et al., <xref ref-type="bibr" rid="B48">2012</xref>).</p>
</sec>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>JT and BW: concept, literature research, and formulations. Both authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>We acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and Saarland University within the &#x02018;Open Access Publication Funding&#x02019; programme.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack><p>We thank commentators of a conference presentation given to a similar topic (Trouvain and Weiss, <xref ref-type="bibr" rid="B46">2020</xref>). We are particularly grateful to Iona Gessinger and Mikey Elmers for comments on an earlier draft of this paper.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ambadar</surname> <given-names>Z.</given-names></name> <name><surname>Cohn</surname> <given-names>J. F.</given-names></name> <name><surname>Reed</surname> <given-names>L. I.</given-names></name></person-group> (<year>2009</year>). <article-title>All smiles are not created equal: Morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous</article-title>. <source>J. Nonverbal Behav</source>. <volume>1</volume>, <fpage>17</fpage>&#x02013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1007/s10919-008-0059-5</pub-id><pub-id pub-id-type="pmid">19554208</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arias</surname> <given-names>P.</given-names></name> <name><surname>Belin</surname> <given-names>P.</given-names></name> <name><surname>Aucouturier</surname> <given-names>J.-J.</given-names></name></person-group> (<year>2018</year>). <article-title>Auditory smiles trigger unconscious facial imitation</article-title>. <source>Curr. Biol</source>. <volume>28</volume>, <fpage>R782</fpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2018.05.084</pub-id><pub-id pub-id-type="pmid">30130496</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belpaeme</surname> <given-names>T.</given-names></name> <name><surname>Vogt</surname> <given-names>P.</given-names></name> <name><surname>van den Berghe</surname> <given-names>R.</given-names></name> <name><surname>Bergmann</surname> <given-names>K.</given-names></name> <name><surname>Goksun</surname> <given-names>T.</given-names></name> <name><surname>Haas</surname> <given-names>M</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Guidelines for designing social robots as second language tutors</article-title>. <source>Int. J. Soc. Robot</source>. <volume>10</volume>, <fpage>325</fpage>&#x02013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.1007/s12369-018-0467-6</pub-id><pub-id pub-id-type="pmid">30996752</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Branigan</surname> <given-names>H.</given-names></name> <name><surname>Pickering</surname> <given-names>M.</given-names></name> <name><surname>Pearson</surname> <given-names>J.</given-names></name> <name><surname>Mclean</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Linguistic alignment between people and computers</article-title>. <source>J. Pragmat</source>. <volume>42</volume>, <fpage>2355</fpage>&#x02013;<lpage>2368</lpage>. <pub-id pub-id-type="doi">10.1016/j.pragma.2009.12.012</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burgoon</surname> <given-names>J.</given-names></name> <name><surname>Buller</surname> <given-names>D.</given-names></name> <name><surname>Hale</surname> <given-names>J.</given-names></name> <name><surname>Turck</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Relational messages associated with nonverbal behaviors</article-title>. <source>Hum. Commun. Res</source>. <volume>10</volume>, <fpage>351</fpage>&#x02013;<lpage>378</lpage>. <pub-id pub-id-type="doi">10.1111/j.1468-2958.1984.tb00023.x</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cafaro</surname> <given-names>A.</given-names></name> <name><surname>Vilhjalmsson</surname> <given-names>H.</given-names></name> <name><surname>Bickmore</surname> <given-names>T.</given-names></name> <name><surname>Heylen</surname> <given-names>D.</given-names></name> <name><surname>Johannsdottir</surname> <given-names>K.</given-names></name> <name><surname>Valgarosson</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;First impressions: users&#x00027; judgments of virtual agents&#x00027; personality and interpersonal attitude in first encounters,&#x0201D;</article-title> in <source>Proc. 12th Int&#x00027;l Conf. Intell. Virtual Agents</source> (<publisher-loc>Santa Cruz, CA</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-33197-8_7</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Centorrino</surname> <given-names>S.</given-names></name> <name><surname>Djemai</surname> <given-names>E.</given-names></name> <name><surname>Hopfensitz</surname> <given-names>A.</given-names></name> <name><surname>Milinski</surname> <given-names>M.</given-names></name> <name><surname>Seabright</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Honest signaling in trust interactions: smiles rated as genuine induce trust and signal higher earning opportunities</article-title>. <source>Evol. Hum. Behav</source>. <volume>36</volume>, <fpage>8</fpage>&#x02013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1016/j.evolhumbehav.2014.08.001</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drahota</surname> <given-names>A.</given-names></name> <name><surname>Costall</surname> <given-names>A.</given-names></name> <name><surname>Reddy</surname> <given-names>V.</given-names></name></person-group> (<year>2008</year>). <article-title>The vocal communication of different kinds of smile</article-title>. <source>Speech Commun</source>. <volume>50</volume>, <fpage>278</fpage>&#x02013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1016/j.specom.2007.10.001</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ekman</surname> <given-names>P.</given-names></name> <name><surname>Friesen</surname> <given-names>W.</given-names></name></person-group> (<year>1982</year>). <article-title>Felt, false, and miserable smiles</article-title>. <source>J. Nonverbal Behav</source>. <volume>6</volume>, <fpage>238</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1007/BF00987191</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>El Haddad</surname> <given-names>K.</given-names></name> <name><surname>Cakmak</surname> <given-names>H.</given-names></name> <name><surname>Moinet</surname> <given-names>A.</given-names></name> <name><surname>Dupont</surname> <given-names>S.</given-names></name> <name><surname>Dutoit</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;An HMM-approach for synthesizing amused speech with a controllable intensity of smile,&#x0201D;</article-title> in <source>IEEE International Symposium on Signal Processing and Information Technology</source> (<publisher-loc>Abu Dhabi</publisher-loc>), <fpage>7</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1109/ISSPIT.2015.7394422</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Emond</surname> <given-names>C.</given-names></name> <name><surname>Rilliard</surname> <given-names>A.</given-names></name> <name><surname>Trouvain</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Perception of smiling in speech in different modalities by native vs. non-native speakers,&#x0201D;</article-title> in <source>Proc. Speech Prosody</source>, (<publisher-loc>Boston, MA</publisher-loc>), <fpage>639</fpage>&#x02013;<lpage>643</lpage>. <pub-id pub-id-type="doi">10.21437/SpeechProsody.2016-131</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Enfield</surname> <given-names>N. J.</given-names></name></person-group> (<year>2017</year>). <source>How We Talk. The Inner Workings of Conversation</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Basic Books</publisher-name>.</citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Erickson</surname> <given-names>D.</given-names></name> <name><surname>Menezes</surname> <given-names>C.</given-names></name> <name><surname>Sakakibara</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Are you laughing, smiling or crying?&#x0201D;</article-title> in <source>Proc. APSIPA Summit and Conference</source> (<publisher-loc>Sapporo</publisher-loc>), <fpage>529</fpage>&#x02013;<lpage>537</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Floyd</surname> <given-names>K.</given-names></name> <name><surname>Erbert</surname> <given-names>L.</given-names></name></person-group> (<year>2003</year>). <article-title>Relational message interpretations of nonverbal matching behavior: an application of the social meaning model</article-title>. <source>J. Soc. Psychol</source>. <volume>143</volume>, <fpage>581</fpage>&#x02013;<lpage>597</lpage>. <pub-id pub-id-type="doi">10.1080/00224540309598465</pub-id><pub-id pub-id-type="pmid">14609054</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>M.</given-names></name> <name><surname>Ekman</surname> <given-names>P.</given-names></name> <name><surname>Friesen</surname> <given-names>W.</given-names></name></person-group> (<year>2003</year>). <article-title>Behavioral markers and recognizability of the smile of enjoyment</article-title>. <source>J. Pers. Soc. Psychol</source>. <volume>64</volume>:<fpage>83</fpage>&#x02013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.64.1.83</pub-id><pub-id pub-id-type="pmid">8421253</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gray</surname> <given-names>K.</given-names></name> <name><surname>Wegner</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Feeling robots and human zombies: mind perception and the uncanny valley</article-title>. <source>Cognition</source> <volume>125</volume>, <fpage>125</fpage>&#x02013;<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2012.06.007</pub-id><pub-id pub-id-type="pmid">22784682</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hiersch</surname> <given-names>P.</given-names></name> <name><surname>McKeown</surname> <given-names>G.</given-names></name> <name><surname>Latu</surname> <given-names>I.</given-names></name> <name><surname>Rychlowska</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Gender differences, smiling, and economic negotiation outcomes,&#x0201D;</article-title> in <source>Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-Span within the 13th Language Resources and Evaluation Conference</source> (<publisher-loc>Marseille</publisher-loc>: <publisher-name>European Language Resources Association</publisher-name>), <fpage>11</fpage>&#x02013;<lpage>15</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keltner</surname> <given-names>D.</given-names></name></person-group> (<year>1995</year>). <article-title>Signs of appeasement: evidence for the distinct displays of embarrassment, amusement, and shame</article-title>. <source>J. Pers. Soc. Psychol</source>. <volume>68</volume>, <fpage>441</fpage>&#x02013;<lpage>454</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.68.3.441</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kennedy</surname> <given-names>J.</given-names></name> <name><surname>Baxter</surname> <given-names>P.</given-names></name> <name><surname>Belpaeme</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>The impact of robot tutor nonverbal social behavior on child learning</article-title>. <source>Front. ICT Hum. Media Interact</source>. 4, 6. <pub-id pub-id-type="doi">10.3389/fict.2017.00006</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kirkland</surname> <given-names>A.</given-names></name> <name><surname>W&#x000E5;,odarczak</surname> <given-names>M.</given-names></name> <name><surname>Gustafson</surname> <given-names>J.</given-names></name> <name><surname>Szekely</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Perception of smiling voice in spontaneous speech synthesis,&#x0201D;</article-title> in <source>Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)</source>, <fpage>108</fpage>&#x02013;<lpage>112</lpage>. <pub-id pub-id-type="doi">10.21437/SSW.2021-19</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000E4;mer</surname> <given-names>N.</given-names></name> <name><surname>Kopp</surname> <given-names>S.</given-names></name> <name><surname>Becker-Asano</surname> <given-names>C.</given-names></name> <name><surname>Sommer</surname> <given-names>N.</given-names></name></person-group> (<year>2013</year>). <article-title>Smile and the world will smile with you &#x02013; the effects of a virtual agent&#x00027;s smile on users&#x00027; evaluation and behavior</article-title>. <source>Int. J. Hum. Comput. Stud</source>. <volume>71</volume>, <fpage>335</fpage>&#x02013;<lpage>349</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijhcs.2012.09.006</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krumhuber</surname> <given-names>E.</given-names></name> <name><surname>Manstead</surname> <given-names>A.</given-names></name> <name><surname>Cosker</surname> <given-names>D.</given-names></name> <name><surname>Marshall</surname> <given-names>D.</given-names></name> <name><surname>Rosin</surname> <given-names>P.</given-names></name> <name><surname>Kappas</surname> <given-names>A.</given-names></name></person-group> (<year>2007</year>). <article-title>Facial dynamics as indicators of trustworthiness and cooperative behavior</article-title>. <source>Emotion</source> <volume>7</volume>, <fpage>730</fpage>&#x02013;<lpage>735</lpage>. <pub-id pub-id-type="doi">10.1037/1528-3542.7.4.730</pub-id><pub-id pub-id-type="pmid">18039040</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krys</surname> <given-names>K.</given-names></name> <name><surname>Vauclair</surname> <given-names>C.-M.</given-names></name> <name><surname>Capaldi</surname> <given-names>C. A.</given-names></name> <name><surname>Miu-Chi Lun</surname> <given-names>V.</given-names></name> <name><surname>Bond</surname> <given-names>M. H.</given-names></name> <name><surname>Dom&#x000ED;nguez-Espinosa</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Be careful where you smile: culture shapes judgments of intelligence and honesty of smiling individuals</article-title>. <source>J. Nonverbal Behav</source>. <volume>40</volume>, <fpage>101</fpage>&#x02013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1007/s10919-015-0226-4</pub-id><pub-id pub-id-type="pmid">27194817</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lau</surname> <given-names>S.</given-names></name></person-group> (<year>1982</year>). <article-title>The effect of smiling on person perception</article-title>. <source>J. Soc. Psychol</source>. <volume>117</volume>, <fpage>63</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1080/00224545.1982.9713408</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mehu</surname> <given-names>M.</given-names></name> <name><surname>Little</surname> <given-names>A.</given-names></name> <name><surname>Dunbar</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Sex differences in the effect of smiling on social judgments: an evolutionary approach</article-title>. <source>J. Soc. Evol. Cult. Psychol</source>. <volume>2</volume>, <fpage>103</fpage>&#x02013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1037/h0099351</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>M.</given-names></name></person-group> (<year>1985</year>). <article-title>Non-verbal courtship patterns in women: context and consequences</article-title>. <source>Ethol. Sociobiol</source>. <volume>6</volume>, <fpage>237</fpage>&#x02013;<lpage>247</lpage>. <pub-id pub-id-type="doi">10.1016/0162-3095(85)90016-0</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mui</surname> <given-names>P.</given-names></name> <name><surname>Goudbeek</surname> <given-names>M.</given-names></name> <name><surname>Swerts</surname> <given-names>M.</given-names></name> <name><surname>Hovasapian</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Children&#x00027;s non-verbal displays of winning and losing: effects of social and cultural contexts on smiles</article-title>. <source>J. Nonverbal Behav</source>. <volume>41</volume>, <fpage>67</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1007/s10919-016-0241-0</pub-id><pub-id pub-id-type="pmid">28203037</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nwokah</surname> <given-names>E.</given-names></name> <name><surname>Hsu</surname> <given-names>H.-C.</given-names></name> <name><surname>Davies</surname> <given-names>P.</given-names></name> <name><surname>Fogel</surname> <given-names>A.</given-names></name></person-group> (<year>1999</year>). <article-title>The integration of laughter and speech in vocal communication: a dynamic systems perspective</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>42</volume>, <fpage>880</fpage>&#x02013;<lpage>894</lpage>. <pub-id pub-id-type="doi">10.1044/jslhr.4204.880</pub-id><pub-id pub-id-type="pmid">10450908</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ochs</surname> <given-names>M.</given-names></name> <name><surname>Niewiadomski</surname> <given-names>R.</given-names></name> <name><surname>Pelachaud</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>&#x0201C;How a virtual agent should smile? Morphological and dynamic characteristics of virtual agent&#x00027;s smiles,&#x0201D;</article-title> in <source>Proc. Int&#x00027;l Conf. on Intelligent Virtual Agents</source>, <fpage>427</fpage>&#x02013;<lpage>440</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-15892-6_47</pub-id><pub-id pub-id-type="pmid">21989611</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ochs</surname> <given-names>M.</given-names></name> <name><surname>Pelachaud</surname> <given-names>C.</given-names></name> <name><surname>McKeown</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>A user-perception based approach to create smiling embodied conversational agents</article-title>. <source>ACM Trans. Interact. Intell. Syst</source>. <volume>7</volume>, <fpage>1</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1145/2925993</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ohala</surname> <given-names>J.</given-names></name></person-group> (<year>1980</year>). <article-title>The acoustic origin of the smile</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>68</volume>, <fpage>S33</fpage>. <pub-id pub-id-type="doi">10.1121/1.2004679</pub-id><pub-id pub-id-type="pmid">34719254</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ohala</surname> <given-names>J.</given-names></name></person-group> (<year>1984</year>). <article-title>An ethological perspective on common cross-language utilization of f0 of voice</article-title>. <source>Phonetica</source> <volume>41</volume>, <fpage>1</fpage>&#x02013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1159/000261706</pub-id><pub-id pub-id-type="pmid">6204347</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okubo</surname> <given-names>M.</given-names></name> <name><surname>Ishikawa</surname> <given-names>K.</given-names></name> <name><surname>Kobayashi</surname> <given-names>A.</given-names></name> <name><surname>Laeng</surname> <given-names>B.</given-names></name> <name><surname>Tommasi</surname> <given-names>L.</given-names></name></person-group> (<year>2015</year>). <article-title>Cool guys and warm husbands: the effect of smiling on male facial attractiveness for short- and long-term relationships</article-title>. <source>Evol. Psychol</source>. <volume>13</volume>, <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1177/1474704915600567</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Otta</surname> <given-names>E.</given-names></name> <name><surname>Abrosio</surname> <given-names>F. F. E.</given-names></name> <name><surname>Hoshino</surname> <given-names>R. L.</given-names></name></person-group> (<year>1982</year>). <article-title>Reading a smiling face: messages conveyed by various forms of smiling</article-title>. <source>Percept. Motor Skills</source> <volume>82</volume>, <fpage>1111</fpage>&#x02013;<lpage>1121</lpage>. <pub-id pub-id-type="doi">10.2466/pms.1996.82.3c.1111</pub-id><pub-id pub-id-type="pmid">8823879</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rauzy</surname> <given-names>S.</given-names></name> <name><surname>Amoyal</surname> <given-names>M.</given-names></name> <name><surname>Priego-Valverde</surname> <given-names>B.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;A measure of the smiling synchrony in the conversational face-to-face interaction corpus PACO-CHEESE,&#x0201D;</article-title> in <source>Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-Span within the 13th Language Resources and Evaluation Conference</source> (<publisher-loc>Marseille</publisher-loc>: <publisher-name>European Language Resources Association</publisher-name>), <fpage>16</fpage>&#x02013;<lpage>20</lpage>.</citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Robson</surname> <given-names>J.</given-names></name> <name><surname>Beck</surname> <given-names>J. M.</given-names></name></person-group> (<year>1999</year>). <article-title>&#x0201C;Hearing smiles&#x02013;perceptual, acoustic and production aspects of labial spreading,&#x0201D;</article-title> in <source>Proc. 14th Int&#x00027;l Congress of Phonetic Sciences (ICPhS)</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>219</fpage>&#x02013;<lpage>222</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rychlowska</surname> <given-names>M.</given-names></name> <name><surname>Jack</surname> <given-names>R. E.</given-names></name> <name><surname>Garrod</surname> <given-names>O. G. B.</given-names></name> <name><surname>Schyns</surname> <given-names>P. G.</given-names></name> <name><surname>Martin</surname> <given-names>J. D.</given-names></name> <name><surname>Niedenthal</surname> <given-names>P. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Functional smiles: tools for love, sympathy, and war. <italic>Psychol</italic></article-title>. <source>Sci</source>. <volume>28</volume>, <fpage>1259</fpage>&#x02013;<lpage>1270</lpage>. <pub-id pub-id-type="doi">10.1177/0956797617706082</pub-id><pub-id pub-id-type="pmid">28741981</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schr&#x000F6;der</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Expressive speech synthesis: past, present, and possible futures,&#x0201D;</article-title> in <source>Affective Information Processing</source>, eds J. Tao and T. Tan (<publisher-loc>London</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>111</fpage>&#x02013;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-84800-306-4_7</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schr&#x000F6;der</surname> <given-names>M.</given-names></name> <name><surname>Auberge</surname> <given-names>V.</given-names></name> <name><surname>Cathiard</surname> <given-names>M.-A.</given-names></name></person-group> (<year>1998</year>). <article-title>&#x0201C;Can we hear smiles?&#x0201D;</article-title> in <source>Proc. Conference on Spoken Language Processing (ICSLP)</source> (<publisher-loc>Sydney, NSW</publisher-loc>), <fpage>559</fpage>&#x02013;<lpage>562</lpage>. <pub-id pub-id-type="doi">10.21437/ICSLP.1998-106</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stone</surname> <given-names>S.</given-names></name> <name><surname>Abdul-Hak</surname> <given-names>P.</given-names></name> <name><surname>Birkholz</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Perceptual cues for smiled voice - an articulatory synthesis study,&#x0201D;</article-title> in <source>Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2022</source>, eds O. Niebuhr, M. S. Lundmark, and H. Weston (<publisher-loc>Dresden</publisher-loc>: <publisher-name>TUD Press</publisher-name>), <fpage>131</fpage>&#x02013;<lpage>138</lpage>.</citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tartter</surname> <given-names>V.</given-names></name> <name><surname>Braun</surname> <given-names>D.</given-names></name></person-group> (<year>1994</year>). <article-title>Hearing smiles and frowns in normal and whisper registers</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>96</volume>, <fpage>2101</fpage>&#x02013;<lpage>2107</lpage>. <pub-id pub-id-type="doi">10.1121/1.410151</pub-id><pub-id pub-id-type="pmid">7963024</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tartter</surname> <given-names>V. C.</given-names></name></person-group> (<year>1980</year>). <article-title>Happy talk: perceptual and acoustic effects of smiling on speech</article-title>. <source>Percept. Psychophys</source>. <volume>27</volume>, <fpage>24</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.3758/BF03199901</pub-id><pub-id pub-id-type="pmid">7367197</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thibault</surname> <given-names>P.</given-names></name> <name><surname>Levesque</surname> <given-names>M.</given-names></name> <name><surname>Gosselin</surname> <given-names>P.</given-names></name> <name><surname>Hess</surname> <given-names>U.</given-names></name></person-group> (<year>2012</year>). <article-title>The Duchenne marker is not a universal signal of smile authenticty&#x02013;but it can be learned! <italic>Soc. Psychol</italic></article-title>. <volume>43</volume>, <fpage>215</fpage>&#x02013;<lpage>221</lpage>. <pub-id pub-id-type="doi">10.1027/1864-9335/a000122</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Torre</surname> <given-names>I.</given-names></name> <name><surname>Goslin</surname> <given-names>J.</given-names></name> <name><surname>White</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>If your device could smile: people trust happy-sounding artificial agents more</article-title>. <source>Comput. Hum. Behav</source>. <volume>105</volume>, <fpage>106216</fpage>. <pub-id pub-id-type="doi">10.1016/j.chb.2019.106215</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Trouvain</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>&#x0201C;Phonetic aspects of &#x02018;speech-laughs&#x00027;,&#x0201D;</article-title> in <source>Proc. Conference on Orality &#x00026;Gestuality (ORAGE)</source> (<publisher-loc>Aix-en-Provence</publisher-loc>), <fpage>634</fpage>&#x02013;<lpage>639</lpage>.</citation>
</ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Trouvain</surname> <given-names>J.</given-names></name> <name><surname>Weiss</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Uberlegungen zu wahrnehmbarem Lacheln in synthetischen Stimmen,&#x0201D;</article-title> in <source>31th Conference Elektronische Sprachsignalverarbeitung</source> (<publisher-loc>Magdeburg</publisher-loc>), <fpage>26</fpage>&#x02013;<lpage>33</lpage>.</citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Truong</surname> <given-names>K.</given-names></name> <name><surname>Trouvain</surname> <given-names>J.</given-names></name> <name><surname>Jansen</surname> <given-names>M.-P.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Towards an annotation scheme for complex laughter in speech corpora,&#x0201D;</article-title> in <source>Proc. Interspeech</source> (<publisher-loc>Graz</publisher-loc>), <fpage>529</fpage>&#x02013;<lpage>533</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2019-1557</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vinciarelli</surname> <given-names>A.</given-names></name> <name><surname>Pantic</surname> <given-names>M.</given-names></name> <name><surname>Heylen</surname> <given-names>D.</given-names></name> <name><surname>Pelachaud</surname> <given-names>C.</given-names></name> <name><surname>Poggi</surname> <given-names>I.</given-names></name> <name><surname>D&#x00027;Errico</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Bridging the gap between social animal and unsocial machine: a survey of social signal processing</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>3</volume>, <fpage>69</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1109/T-AFFC.2011.27</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wagner</surname> <given-names>P.</given-names></name> <name><surname>Beskow</surname> <given-names>J.</given-names></name> <name><surname>Betz</surname> <given-names>S.</given-names></name> <name><surname>Edlund</surname> <given-names>J.</given-names></name> <name><surname>Gustafson</surname> <given-names>J.</given-names></name> <name><surname>Eje Henter</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>&#x0201C;Speech synthesis evaluation&#x02013;state-of-the-art assessment and suggestion for a novel research program,&#x0201D;</article-title> in <source>Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10)</source> (<publisher-loc>Vienna</publisher-loc>), <fpage>105</fpage>&#x02013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.21437/SSW.2019-19</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="editor"><name><surname>Weiss</surname> <given-names>B.</given-names></name> <name><surname>Trouvain</surname> <given-names>J.</given-names></name> <name><surname>Barkat-Defradas</surname> <given-names>M.</given-names></name> <name><surname>Ohala</surname> <given-names>J.</given-names></name></person-group> (eds.). (<year>2020</year>). <source>Voice Attractiveness: Studies on Sexy, Likable, and Charismatic Speakers. Prosody, Phonology, and Phonetics</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-981-15-6627-1</pub-id></citation>
</ref>
</ref-list> 
</back>
</article>