<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2023.1119355</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Talk like me: Exploring the feedback speech rate regulation strategy of the voice user interface for elderly people</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes"><name><surname>Wang</surname> <given-names>Junfeng</given-names></name>
<xref rid="c001" ref-type="corresp">
<sup>&#x002A;</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2129656/overview"/>
</contrib>
<contrib contrib-type="author"><name><surname>Yang</surname> <given-names>Shuyu</given-names></name>
</contrib>
<contrib contrib-type="author"><name><surname>Xu</surname> <given-names>Zhiyu</given-names></name>
</contrib>
</contrib-group>
<aff><institution>College of Design and Innovation, Shenzhen Technology University</institution>, <addr-line>Shenzhen</addr-line>, <country>China</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by">
<p>Edited by: Xin Zhang, Tianjin University, China</p>
</fn>
<fn id="fn0002" fn-type="edited-by">
<p>Reviewed by: Jeffrey Ho, Hong Kong Polytechnic University, Hong Kong, SAR China; Ra&#x00FA;l Marticorena, University of Burgos, Spain; Sheng Tan, Trinity University, United States</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Junfeng Wang, <email>wangjunfeng@sztu.edu.cn</email></corresp>
<fn id="fn0003" fn-type="other">
<p>This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>17</day>
<month>03</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1119355</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>01</day>
<month>03</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Wang, Yang and Xu.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Wang, Yang and Xu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Voice user interface (VUI) is widely used in developing intelligent products due to its low learning cost. However, most of such products do not consider the cognitive and language ability of elderly people, which leads to low interaction efficiency, poor user experience, and unfriendliness to them. Firstly, the paper analyzes the factors which influence the voice interaction behavior of elderly people: speech rate of elderly people, dialog task type, and feedback word count. And then, the voice interaction simulation experiment was designed based on the wizard of Oz testing method. Thirty subjects (<italic>M</italic>&#x2009;=&#x2009;61.86&#x2009;years old, SD&#x2009;=&#x2009;7.16; 15 males and 15 females) were invited to interact with the prototype of a voice robot through three kinds of dialog tasks and six configurations of the feedback speech rate. Elderly people&#x2019;s speech rates at which they speak to a person and to a voice robot, the feedback speech rates they expected for three dialog tasks were collected. The correlation between subjects&#x2019; speech rate and the expected feedback speech rate, the influence of dialog task type, and feedback word count on elderly people&#x2019;s expected feedback speech rate were analyzed. The results show that elderly people speak to a voice robot with a lower speech rate than they speak to a person, and they expected the robot feedback speech rate to be lower than the rate they speak to the robot. There is a positive correlation between subjects&#x2019; speech rate and the expected speech rate, which implies that elderly people with faster speech rates expected a faster feedback speech rate. There is no significant difference between the elderly people&#x2019;s expected speech rate for non-goal-oriented and goal-oriented dialog tasks. Meanwhile, a negative correlation between the feedback word count and the expected feedback speech rate is found. This study extends the knowledge boundaries of VUI design by investigating the influencing factors of voice interaction between elderly people and VUI. These results also provide practical implications for developing suitable VUI for elderly people, especially for regulating the feedback speech rate of VUI.</p>
</abstract>
<kwd-group>
<kwd>voice user interface (VUI)</kwd>
<kwd>elderly people</kwd>
<kwd>feedback speech rate</kwd>
<kwd>regulation strategy</kwd>
<kwd>speech convergence</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="10"/>
<equation-count count="1"/>
<ref-count count="82"/>
<page-count count="13"/>
<word-count count="9032"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Research in speech recognition began in the 1950s. Early technology only recognizes the English pronunciation of 10 numerals (<xref ref-type="bibr" rid="ref40">Li, 2018</xref>; <xref ref-type="bibr" rid="ref66">Shahrebaki et al., 2018</xref>). Current voice interaction systems have been employed in many devices that feature human-computer interaction technology. It can recognize complete human natural language utterances, forming a significant and booming market segment (<xref ref-type="bibr" rid="ref78">Zen et al., 2013</xref>). In the process of voice interaction, the user activates the device by speaking specific voice commands. After receiving the commands, the device recognizes the commands and gives feedback. The feedback is then transformed into a sound that simulates a human voice with artificial speech synthesis technology and plays through the speaker to form voice feedback (<xref ref-type="bibr" rid="ref25">Hess and Zellman, 2018</xref>; <xref ref-type="bibr" rid="ref68">Singh et al., 2021</xref>).</p>
<p>The visual interaction interface necessitates learning its operation (<xref ref-type="bibr" rid="ref56">Page, 2014</xref>), understanding the meaning of graphical elements, and searching for the object to be operated within the visual range, all of which lead to high learning costs (<xref ref-type="bibr" rid="ref42">Liu and Ma, 2010</xref>; <xref ref-type="bibr" rid="ref3">Bai et al., 2020</xref>). Voice interaction, in contrast, only requires users&#x2019; short-term memory and clear verbal expression, with low learning costs. Thus, it is appropriate for children, the elderly, and people with visual impairment (<xref ref-type="bibr" rid="ref28">Huang and Liu, 2017</xref>; <xref ref-type="bibr" rid="ref81">Zhang and Zhang, 2019</xref>; <xref ref-type="bibr" rid="ref60">Pradhan et al., 2020</xref>; <xref ref-type="bibr" rid="ref27">Hua, 2021</xref>; <xref ref-type="bibr" rid="ref22">Guo et al., 2022</xref>). Some scholars have applied it to designing and developing products for the elderly. For instance, <xref ref-type="bibr" rid="ref1">Antonio et al. (2014)</xref> built an intelligent system with multichannel interaction for the elderly by collecting a language database of them and constructing an elderly-specific language recognizer. <xref ref-type="bibr" rid="ref31">Jia (2018)</xref> researched the elderly using the SJTU user research system and constructed corresponding voice interaction application scenarios for them. Furthermore, they concluded that when completing task-oriented dialogs, short and simple dialogs can lessen the memory load of the elderly and increase task completion, according to the experiment.</p>
<p>However, the design of current voice interaction systems has not considered the reduced hearing, cognitive and comprehension abilities of the elderly, let alone specific adjustments to the feedback speech rate of the system (<xref ref-type="bibr" rid="ref51">Murad et al., 2019</xref>). This causes the issue that older users frequently have difficulty hearing or remembering while using voice interaction devices (<xref ref-type="bibr" rid="ref13">Czaja et al., 2006</xref>; <xref ref-type="bibr" rid="ref39">Lee and Coughlin, 2015</xref>), which significantly negatively impacts their interaction effectiveness and user experience (<xref ref-type="bibr" rid="ref2">Baba et al., 2004</xref>; <xref ref-type="bibr" rid="ref38">Lee, 2015</xref>; <xref ref-type="bibr" rid="ref79">Zhang, 2021</xref>). This paper takes the feedback speech rate of voice interaction systems for the elderly as the research object. The feedback speech rate that the elderly expect in different task scenarios is collected through voice interaction simulation experiments. The speech rate regulation strategy of the voice interaction system is constructed accordingly to improve the efficiency and user experience of the elderly when using the system.</p>
</sec>
<sec id="sec2">
<label>2.</label>
<title>Literature review</title>
<sec id="sec3">
<label>2.1.</label>
<title>Speech rate of the elderly</title>
<p>Communication is a process in which both the speaker and the addressee express information by language, and its purpose is to convey information. Speech accommodation theory (SCT) is a sociolinguistic theory developed recently (<xref ref-type="bibr" rid="ref76">Yuan, 1992</xref>; <xref ref-type="bibr" rid="ref46">Ma, 1998</xref>). The theory suggests that the addressee&#x2019;s speech acts characteristics can be a reference standard for the speaker to regulate speech acts. Speech convergence is a phenomenon that in daily conversation, the speaker&#x2019;s speech pattern (diction, speech rate, grammar, phonology, et al.) is influenced by the addressee&#x2019;s speech pattern and adjusted to it to gain the addressee&#x2019;s approval and affirmation (<xref ref-type="bibr" rid="ref44">Beebe and Giles, 1984</xref>).</p>
<p>Speech convergence is more likely to occur when the speaker is a subordinate or junior, and the conversation is serious. In this situation, the speaker&#x2019;s speech pattern will converge with those of the addressee (<xref ref-type="bibr" rid="ref8">Brennan and Clark, 1996</xref>; <xref ref-type="bibr" rid="ref4">Barr and Keysar, 2002</xref>; <xref ref-type="bibr" rid="ref21">Gijssels et al., 2016</xref>). Kemper&#x2019;s study found that when young people converse with the elderly, they will slow their speech rate and reduce the length and complexity of the discourse to help the elderly understand the message, i.e., speech regulation mechanisms toward the elderly (<xref ref-type="bibr" rid="ref34">Kemper, 1994</xref>). A study by <xref ref-type="bibr" rid="ref32">Jiang (2017)</xref> noted that young people adopt speech convergence in telephone conversations with aged people, including but not limited to speech rate, average sentence length, and pause duration. The aim is to assist aged people in understanding the message, gain their trust, and enhance emotional intimacy.</p>
<p>Human-computer voice interaction design aims to simulate human-to-human verbal communication (<xref ref-type="bibr" rid="ref51">Murad et al., 2019</xref>). Many intelligent voice devices define the voice interaction system as the user&#x2019;s &#x201C;intelligent voice assistant&#x201D; (<xref ref-type="bibr" rid="ref61">Rakotomalala et al., 2021</xref>) and refer to the user as the &#x201C;master&#x201D; during the conversation. This design reflects the relationship of ownership, subordination, and domination between the user and the voice device, comparable to the relationship between superiors and subordinates in interpersonal interactions (<xref ref-type="bibr" rid="ref54">Nass et al., 1994</xref>; <xref ref-type="bibr" rid="ref59">Powers and Kiesler, 2006</xref>; <xref ref-type="bibr" rid="ref55">Ostrowski et al., 2022</xref>). On the other hand, voice interaction devices are inanimate objects, and users have lower intimacy and trust during initial use. They also interact more cautiously, which is not conducive to conveying information effectively (<xref ref-type="bibr" rid="ref15">Dautenhahn, 2004</xref>; <xref ref-type="bibr" rid="ref26">H&#x00F6;flich and El Bayed, 2015</xref>; <xref ref-type="bibr" rid="ref69">Song et al., 2022</xref>). Thus, it is reasonable and essential to apply speech convergence strategies to improve the dialog relationship between voice interaction devices and users.</p>
<p>Based on speech accommodation theory, when elderly people interact with the system through voice, the system adjusts speech acts characteristics of the feedback following the elderly&#x2019;s, which will enhance the emotional experience of the elderly (<xref ref-type="bibr" rid="ref52">Myers et al., 2018</xref>, <xref ref-type="bibr" rid="ref53">2019</xref>). The paper analyzes the correlation between the speech rate of the elderly and their expected feedback speech rate. It also summarizes the feedback speech rate regulation strategies of the voice interaction system toward elderly people based on experimental results.</p>
</sec>
<sec id="sec4">
<label>2.2.</label>
<title>Dialog task type</title>
<p>In existing research, the types of dialog tasks were divided into non-task-oriented dialogs and task-oriented dialogs according to the different purposes (<xref ref-type="bibr" rid="ref45">Luna-Garcia et al., 2018</xref>) for which users use voice interaction devices (<xref ref-type="bibr" rid="ref11">Chen et al., 2017</xref>).</p>
<p>Non-task-oriented dialogs primarily refer to forms of interaction in which users have no clear expectations or specific goals about the feedback from the voice interaction system. Typical applications include listening to music, stories, and operas. The device recognizes the type of content users want to listen to after activating the device with the wake-up word. Then the device starts to play the audio, and users enter the listening stage. Users do not initiate the voice interaction process again until the audio finishes or the audio is dissatisfying. In this usage scenario, the user&#x2019;s purpose is to pass the time and relieve loneliness. When interacting with devices, users can identify only some of the information intentionally or memorize it. Users only must confirm that the device is playing the needed content, which requires a low cognitive load.</p>
<p>Task-oriented dialogs mainly refer to voice interaction systems that assist users in accomplishing specific tasks, such as checking the weather, making hotel or restaurant reservations, et al., by single or multiple rounds of dialogs. With a specific goal, users activate the device and give the corresponding voice command when utilizing such functions. Then users extract the information needed from the feedback played by the device and store it in short-term or long-term memory, which can be applied to the specific task. A more complex situation is that the voice interaction system conveys information to users through multi-round dialogs, such as multi-round quizzes and quiz games. In these dialogs, users must mobilize brain functions such as thinking and memory to participate in the interaction. They must also comprehend and analyze the received feedback to respond with the highest cognitive load (<xref ref-type="bibr" rid="ref64">Sayago et al., 2019</xref>; <xref ref-type="bibr" rid="ref50">Moore and Urakami, 2022</xref>).</p>
<p>As shown in <xref rid="tab1" ref-type="table">Table 1</xref>, this paper conducts a voice interaction simulation experiment based on the two kinds of dialog tasks to analyze the influence of dialog task types on the expected feedback speech rate for the elderly. However, people interact with VUI with no clear goals, they also execute a dialog task. It is a little bit confusing to name this kind of interaction as a non-task-oriented dialog. So, goal-oriented, and non-goal oriented are used to describe the interactions that with and without a clear goal, respectively.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Functions and dialog tasks classifications of voice interaction devices.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Types of dialogue task</th>
<th align="left" valign="top">Examples of device functions</th>
<th align="left" valign="top">Interaction behavior characteristics</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">Non-goal-oriented</td>
<td align="left" valign="middle">Play music, stories, operas, etc.</td>
<td align="left" valign="middle">Users can recognize only some feedback information, primarily for pleasure with poor purpose.</td>
</tr>
<tr>
<td align="left" valign="middle">Goal-oriented</td>
<td align="left" valign="middle">The broadcast, weather forecast, news, English words, etc.</td>
<td align="left" valign="middle">Dialogs with single or more rounds, users have strong purposiveness, must recognize, and memorize the feedback information from the system, and sometimes respond to the feedback.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec5">
<label>2.3.</label>
<title>Word count of single feedback</title>
<p>Human brain nerve cells gradually diminish after the age of 50 (<xref ref-type="bibr" rid="ref71">Svennerholm et al., 1997</xref>; <xref ref-type="bibr" rid="ref65">Scahill et al., 2003</xref>). Although the brain&#x2019;s basic functions can generally be maintained, the volume of brain tissue may atrophy to varying degrees in some individuals, resulting in memory loss and personality changes. Brain nerve cells also impact the regulatory role of the brain&#x2019;s involvement in other organs, affecting other organs&#x2019; functional performance (<xref ref-type="bibr" rid="ref73">Wang, 2002</xref>). Compared to young people, the perceptual abilities of the elderly become blunted, and degenerative changes may occur in vision, visual perception, hearing, and auditory perception (<xref ref-type="bibr" rid="ref24">Hawthorn, 2000</xref>; <xref ref-type="bibr" rid="ref77">Yuan et al., 2006</xref>; <xref ref-type="bibr" rid="ref75">Wilkinson and Cornish, 2018</xref>).</p>
<p>Elderly people find it more challenging to accept new things (<xref ref-type="bibr" rid="ref82">Ziman and Walsh, 2018</xref>; <xref ref-type="bibr" rid="ref37">Kowalski et al., 2020</xref>) and technologies (<xref ref-type="bibr" rid="ref33">Kalimullah and Sushmitha, 2017</xref>) and acquire external information due to the deterioration in their perceptual abilities. Therefore, voice interaction simulation experiments were designed based on the feedback of different word counts from the voice interaction system to investigate the impact of the word count in single feedback on the elderly&#x2019;s expected feedback speech rate.</p>
</sec>
<sec id="sec6">
<label>2.4.</label>
<title>Voice user interface for elderly people</title>
<p>Voice activated human machine interaction has developed rapidly in recent years. This has attracted extensive attention from the academic community. VUIs offer elderly people multiple advantages over traditional GUI/hardware interfaces by being fewer motor skills required, efficient of getting information, intuitive to interact, and rich of meaning through tone, volume, intonation, and speed (<xref ref-type="bibr" rid="ref64">Sayago et al., 2019</xref>). More researchers are involved in the research of VUI for elderly people.</p>
<p><xref ref-type="bibr" rid="ref82">Ziman and Walsh (2018)</xref> studied the factors affecting seniors&#x2019; perceptions of VUI, and indicated that familiarity, usability, habit, aversion to typing, and efficiency of voice input are the most critical factors that influenced the seniors&#x2019; perceptions and acceptance of VUI. Research (<xref ref-type="bibr" rid="ref60">Pradhan et al., 2020</xref>; <xref ref-type="bibr" rid="ref69">Song et al., 2022</xref>) focused on the adoption and usage of VUI by elderly people with low technology use found that perceived usefulness, perceived ease of use, and trust are decisive factors that determined elderly people adapt VUI or not and influenced their attitudes to VUI. After studied the patterns of tactics that people employ to overcome the problems when interacting with VUI, Myers (<xref ref-type="bibr" rid="ref52">Myers et al., 2018</xref>) indicated that feedback strategy could be a worthy point to improve VUI&#x2019;s user experience.</p>
<p><xref ref-type="bibr" rid="ref48">Meena et al. (2014)</xref> proposed a data-driven approach to building models for online detection of suitable feedback response locations in the user&#x2019;s speech. The results from the user evaluation through human computer interaction show that the model trained on speaker behavioral cues offers both smoother turn-transitions and more responsive system behavior. Other research that concentrated on feedback position during conversation identified feedback locations through multimodal models (<xref ref-type="bibr" rid="ref6">Boudin et al., 2021</xref>), Interdisciplinary Corpus (<xref ref-type="bibr" rid="ref5">Boudin, 2022</xref>), and voice activity (<xref ref-type="bibr" rid="ref72">Truong et al., 2010</xref>; <xref ref-type="bibr" rid="ref16">Ekstedt and Skantze, 2022</xref>).</p>
</sec>
</sec>
<sec id="sec7" sec-type="methods">
<label>3.</label>
<title>Methods</title>
<p>This study explores the influence of dialog type and feedback word count on the user&#x2019;s expected feedback speech rate.</p>
<sec id="sec8">
<label>3.1.</label>
<title>Research design</title>
<p>This study applies Wizard of Oz testing to conduct voice interaction simulation experiments (<xref ref-type="bibr" rid="ref12">Cordasco et al., 2014</xref>; <xref ref-type="bibr" rid="ref47">Ma et al., 2022</xref>). Wizard of Oz testing is a method in which the tester acts as a &#x201C;wizard&#x201D; to manipulate the object to be tested to make it interact with the subject and collects relevant experimental data (<xref ref-type="bibr" rid="ref14">Dahlb&#x00E4;ck et al., 1993</xref>). This method is widely applied to study the usability and user acceptance of voice interaction systems, natural language applications, command languages, imaging systems, and pervasive computing applications in the prototype stage (<xref ref-type="bibr" rid="ref67">Shin et al., 2019</xref>). In the development of voice-interactive product design, Wizard of Oz can assist UX researchers in cutting costs by quickly testing the usability of products at different stages (<xref ref-type="bibr" rid="ref74">White and Lutter, 2003</xref>). Some studies used the Wizard of Oz testing to investigate the elderly&#x2019;s acceptance of smart home products equipped with voice-interactive systems (<xref ref-type="bibr" rid="ref58">Portet et al., 2013</xref>; <xref ref-type="bibr" rid="ref57">Porcheron et al., 2020</xref>).</p>
<p>Before the experiment, dialog tasks between users and voice interaction devices were designed based on the current mainstream voice interaction process. During the experiment, subjects are first informed about the experiment&#x2019;s background and the task requirements. Then subjects perform the appropriate voice interaction behavior in line with the task requirements, and the experimenter manipulates the product to provide feedback. The assistant collected subjects&#x2019; speech rate data during the process and the subjects&#x2019; expected feedback speech rate with the speech rate satisfaction scale (<xref ref-type="bibr" rid="ref20">Ghosh et al., 2018</xref>). The assistant also needs to observe and record subjects&#x2019; performance and attitudes. Finally, user interviews and quantitative analysis were conducted to explore the users&#x2019; expectations of the feedback speech rate under different task scenarios (<xref ref-type="bibr" rid="ref30">Iniguez et al., 2021</xref>).</p>
</sec>
<sec id="sec9">
<label>3.2.</label>
<title>Subjects and settings</title>
<p>To select the qualified subjects, the experimenter explained the process and conducted a pre-talk test to ensure that the subjects could hear the conversation clearly in the experimental environment and understand the overall process before the experiment. Thirty elderly people with normal hearing and no obvious cognitive impairment were chosen for the study; 10 were between 60 and 70&#x2009;years old, six were 70&#x2009;years old or above, and the others were 50 and 60.</p>
<p>The experiment was conducted in a quiet indoor environment with soft light (<xref rid="fig1" ref-type="fig">Figure 1</xref>). The experimenter had previously explained the experimental task process to the subjects before the experiment. The subjects sat facing the 15.6-inch display, a Philips SPA510 dual-channel speaker was used to play the voice interaction content, and a Newman ZM02 microphone collected the user&#x2019;s voice commands.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Environment for voice interaction simulation.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g001.tif"/>
</fig>
</sec>
<sec id="sec10">
<label>3.3.</label>
<title>Materials</title>
<sec id="sec11">
<label>3.3.1.</label>
<title>Dialog tasks</title>
<p>Currently, intelligent voice interaction devices are widely used in daily life. The cognitive load of the elderly is low when engaging in non-goal-oriented conversations while high when engaging in goal-oriented conversations.</p>
<p>To investigate the correlation between the user&#x2019;s expected feedback speech rate and the type of dialog task, Task 1 is set as a non-goal-oriented dialog in the scenario that the elderly was bored at home and wanted to listen to the radio for entertainment. It is a light-load interaction task. Task 2 is a goal-oriented dialog, with the scenario of elderly people checking the weather the next day before going out, which was a heavy-load interaction task. In order to find the relationship between the user&#x2019;s expected feedback speech rate and the word count in single feedback, Task 3 is also designed as a goal-oriented dialog in which the scenario is that the elderly is checking the next day&#x2019;s schedule. The word count in the feedback of Task 3 is different from that in Task 2.</p>
<p>In a text-to-speech system, the length of punctuation that pauses in the corpus is approximately equal to a word. So, the punctuation in the corpus is counted into the total words. <xref rid="tab2" ref-type="table">Table 2</xref> shows the three dialog tasks. The voice dialog is in Chinese, so the feedback words of the three tasks are also counted in Chinese characters.</p>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption>
<p>Dialog task of voice interaction experiment.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle">No.</th>
<th align="left" valign="middle">Task</th>
<th align="center" valign="middle">Words count</th>
<th align="left" valign="middle">Type of dialog task</th>
<th align="left" valign="middle">Dialog scenario</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle" rowspan="2">Task 1</td>
<td align="left" valign="middle" rowspan="2">Listen to the news</td>
<td align="center" valign="middle" rowspan="2">37</td>
<td align="left" valign="middle" rowspan="2">Non-goal-oriented</td>
<td align="left" valign="top"><inline-graphic xlink:href="fpsyg-14-1119355-igr0001.tif"/> What is in the news today?</td>
</tr>
<tr>
<td align="left" valign="top"><inline-graphic xlink:href="fpsyg-14-1119355-igr0002.tif"/> International rating agency Fitch downgraded the credit ratings of 33 economic entities in the first half of the year due to the epidemic, which surged to a record high.</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="2">Task 2</td>
<td align="left" valign="middle" rowspan="2">Check the weather</td>
<td align="center" valign="middle" rowspan="2">37</td>
<td align="left" valign="middle" rowspan="4">Goal-oriented</td>
<td align="left" valign="top"><inline-graphic xlink:href="fpsyg-14-1119355-igr0003.tif"/> How is the weather in Shenzhen tomorrow?</td>
</tr>
<tr>
<td align="left" valign="top"><inline-graphic xlink:href="fpsyg-14-1119355-igr0004.tif"/> There will be light rain in Shenzhen tomorrow; the temperature is 28&#x2013;22 Celsius, which is suitable for wearing short sleeves or shirts. Please remember to bring an umbrella when you go out.</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="2">Task 3</td>
<td align="left" valign="middle" rowspan="2">Check the schedule</td>
<td align="center" valign="middle" rowspan="2">18</td>
<td align="left" valign="top"><inline-graphic xlink:href="fpsyg-14-1119355-igr0005.tif"/> What is the time of the physical examination tomorrow morning?</td>
</tr>
<tr>
<td align="left" valign="top"><inline-graphic xlink:href="fpsyg-14-1119355-igr0006.tif"/> Okay, your appointment for a physical examination is at 9:30 tomorrow morning.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To avoid sequential effects, a set of experiments was conducted with dialogs tasks in three scenarios, including Task 1, &#x201C;listen to the news,&#x201D; Task 2, &#x201C;check the weather,&#x201D; and Task 3, &#x201C;check the schedule.&#x201D; Each subject was required to complete six experiments with six random gears of feedback speed for all tasks.</p>
</sec>
<sec id="sec12">
<label>3.3.2.</label>
<title>Feedback speech rate</title>
<p>Speech rate typically refers to the speed of articulation, while it can also refer to the auditory perceptual impression of the pacing of words (<xref ref-type="bibr" rid="ref10">Cao, 2003</xref>). The acceptable speech rate is below 300 words/min, and over this range, listeners may have difficulty following the conversation (<xref ref-type="bibr" rid="ref58">Portet et al., 2013</xref>). Besides, a speech rate of 100&#x2013;150 syllables/min is considered a &#x201C;super-slow speech rate,&#x201D; which is rare in daily conversations (<xref ref-type="bibr" rid="ref49">Meng, 2006</xref>). Therefore, six speech rates were defined as the feedback speed configurations for the experiment, as shown in <xref rid="tab3" ref-type="table">Table 3</xref>. The six feedback speech rates of the system were 2.25 words/s (135 words/min), 2.75 words/s (165 words/min), 3.25 words/s (195 words/min), 3.75 words/s (225 words/min), 4.25 words/s (255 words/min) and 4.75 words/s (285 words/min).</p>
<table-wrap position="float" id="tab3">
<label>Table 3</label>
<caption>
<p>Configurations of feedback speech rate.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">No.</th>
<th align="center" valign="top">1</th>
<th align="center" valign="top">2</th>
<th align="center" valign="top">3</th>
<th align="center" valign="top">4</th>
<th align="center" valign="top">5</th>
<th align="center" valign="top">6</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">Feedback speech rate(words/s)</td>
<td align="center" valign="middle">2.25</td>
<td align="center" valign="middle">2.75</td>
<td align="center" valign="middle">3.25</td>
<td align="center" valign="middle">3.75</td>
<td align="center" valign="middle">4.25</td>
<td align="center" valign="middle">4.75</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec13">
<label>3.3.3.</label>
<title>Feedback corpus</title>
<p>The text of the feedback corpus was synthesized into speech using the Swift text-to-speech system, and the pronunciation source was the built-in standard female voice with a 16&#x2009;khz sampling rate. <xref rid="fig2" ref-type="fig">Figure 2</xref> depicts the process of the feedback corpus synthesizing. Firstly, the text of the feedback corpus was designed according to the purpose of the dialog. Secondly, each speech was generated by Swift, the text-to-speech system. Thirdly the synthesized speech was imported into Adobe Audition to modify the speech rate according to the configurations in <xref rid="tab3" ref-type="table">Table 3</xref>, and then export the corpus for the experiment.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>The process of synthesizing the feedback corpus.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g002.tif"/>
</fig>
</sec>
</sec>
<sec id="sec14">
<label>3.4.</label>
<title>Experimental process</title>
<p>The experiment was designed based on Wizard of Oz testing, in which the subject initiated the voice interaction, and the experimenter acted as a &#x201C;wizard&#x201D; to operate the prototype of the voice interaction system to give feedback. The prototype was made of an avatar and synthesized feedback corpus. To guide subjects, we made some slides incorporated with the prototype (see <xref rid="fig3" ref-type="fig">Figure 3</xref>). The dialog between subjects and voice robot prototype was simulated by switching the slides.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>The Prototype of the voice interaction system.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g003.tif"/>
</fig>
<p>Take Task 1, &#x201C;listen to the news,&#x201D; as an example. The experimenter explained the experimental process to the subject before it started. To ensure the experiment ran smoothly, the subjects were informed of the simulated scenario of the dialog task and activated the voice interaction system in the pre-test. Slide (1) is the experiment guide, showing the subjects the dialog scenario and wake-up words. After issuing the wake-up word, subjects entered slide (2), and the experimenter played feedback, &#x201C;Hey, I&#x2019;m here!&#x201D; indicating that the system was activated. The slide (3), an experimental guide, was shown to suggest to the subjects the dialog scenarios and voice commands to be issued. During the experiment, to enhance the subjects&#x2019; immersion, waiting and speaking animations were added to the icons of the robot.</p>
<p>The experiment was conducted with a 5-point Likert scale to evaluate subjects&#x2019; satisfaction with the feedback speech rate of the system (<xref ref-type="bibr" rid="ref17">Feng, 2002</xref>). Subjects&#x2019; satisfaction was calculated based on the options corresponding to the ratings in <xref rid="tab4" ref-type="table">Table 4</xref>. A higher satisfaction score of the feedback speech rate indicates a higher acceptance by the user.</p>
<table-wrap position="float" id="tab4">
<label>Table 4</label>
<caption>
<p>Speech rate satisfaction scale.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Options</th>
<th align="left" valign="top">Description</th>
<th align="center" valign="top">Rating</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">The system speaks too slowly, and I cannot accept it.</td>
<td align="center" valign="top">1 point</td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top">The system speaks slowly, but I can accept it.</td>
<td align="center" valign="top">3 points</td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top">The system speaks at just the right speed.</td>
<td align="center" valign="top">5 points</td>
</tr>
<tr>
<td align="left" valign="top">4</td>
<td align="left" valign="top">The system speaks fast, but I can accept it.</td>
<td align="center" valign="top">3 points</td>
</tr>
<tr>
<td align="left" valign="top">5</td>
<td align="left" valign="top">The system speaks too fast, and I cannot accept it.</td>
<td align="center" valign="top">1 point</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>While performing the experimental task, the subjects&#x2019; speech content was recorded with Adobe Audition CC2020 and a microphone in 16&#x2009;kHz, mono, 32-bit audio format.</p>
<p>When subjects finished all dialog task, some open-ended questions were asked. The mainly purpose of this post-experiment interview is to know the attitudes and using problems of elderly people about VUI and voice robot. The questions include: &#x201C;what do you think about the experience of talking with VUI or voice robot?,&#x201D; &#x201C;Did you have any problems when you talking with VUI or voice robot?,&#x201D; &#x201C;Did you always understand what the VUI or voice robot said? If not, what makes it incomprehensible?&#x201D;</p>
</sec>
</sec>
<sec id="sec15" sec-type="results">
<label>4.</label>
<title>Results</title>
<sec id="sec16">
<label>4.1.</label>
<title>Speech rate</title>
<p>The number of words per second during the subjects&#x2019; speech was defined as the subject&#x2019;s speech rate, noted as <italic>V<sub>s</sub></italic> in words/s (<xref ref-type="bibr" rid="ref41">Li et al., 2019</xref>). Subjects&#x2019; raw recordings were imported into Adobe Audition 2020 for further processing. According to the speech rate test methodology proposed by <xref ref-type="bibr" rid="ref35">Kim et al. (2015)</xref>, subjects&#x2019; single complete speech was intercepted from the recording for speech rate analysis. If the subject paused for longer than 2&#x2009;s in a single speech, the pause was deleted to obtain the corpus in the stable state, which was used to extract the subject&#x2019;s speech rate. If the pause time was between 1 and 2&#x2009;s, it was necessary to determine whether the subjects showed obvious doubt or nervousness according to the recorded video and choose the corpus that the subject was in a stable situation. If the subject paused for less than 1&#x2009;s, it was considered normal.</p>
<p>The number of words is denoted as <italic>S</italic> and the total duration after deleting the silent pauses is denoted as <italic>T</italic> in single speech. The subjects&#x2019; speech rate <italic>V<sub>s</sub></italic> during the voice interaction is calculated by <xref ref-type="disp-formula" rid="EQ1">Equation (1)</xref>.</p>
<disp-formula id="EQ1">
<label>(1)</label>
<mml:math id="M1">
<mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Before the experiment started, speeches subjects spoke to the experimenter were recorded. Six sentences were extracted to analyze the speech rates at which subjects speak to the experimenter. At the end of the experiment, six sets of speech rate data were collected from each subject separately. The mathematical expectation speech rates at which subjects speak to a person and a voice robot are shown in <xref rid="fig4" ref-type="fig">Figure 4</xref>. One subject did not complete the experiment, and then there were 29 subjects&#x2019; data. A paired <italic>t</italic>-test was performed on the two kinds of speech rates, and the results show a significant difference (<italic>p</italic>&#x2009;=&#x2009;0.000), meaning subjects speak to a voice robot at a very different speech rate than a person.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Distribution of subjects&#x2019; speech rate.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g004.tif"/>
</fig>
</sec>
<sec id="sec17">
<label>4.2.</label>
<title>Correlation between subjects&#x2019; speech rate and expected feedback speech rate</title>
<p>Experiments were conducted randomly with subjects on the three dialog tasks with six speech rates for feedback from the voice robot. After each session, subjects were asked to score the feedback speech rate on a satisfaction scale shown in <xref rid="tab4" ref-type="table">Table 4</xref>. At the end of the experiment, the speech rate with the highest score was determined as the subjects expected feedback speech rate for this task. If more than one feedback speech rate configuration that got the highest score, then the mean of these configurations is defined as the expected feedback speech rate.</p>
<p>The subjects&#x2019; expected feedback speech rates of the three dialog tasks are, respectively, noted as <italic>W</italic><sub>1</sub>, <italic>W</italic><sub>2</sub>, and <italic>W</italic><sub>3</sub>, which are shown in <xref rid="fig5" ref-type="fig">Figure 5</xref>. Paired <italic>t</italic>-test was performed on the speech rate at which subjects speak to a voice robot and the expected feedback speech rate. The results showed (<xref rid="tab5" ref-type="table">Table 5</xref>) that there were significant differences (<italic>p</italic>&#x2009;=&#x2009;0.00) between the two variables for all three dialog tasks. That means in all cases of dialog task type and feedback word count, subjects expected feedback speech rate lower than their speech rate at which they speak to a voice robot.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>Distribution of subjects&#x2019; expected speech rate of feedback.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g005.tif"/>
</fig>
<table-wrap position="float" id="tab5">
<label>Table 5</label>
<caption>
<p>Paired <italic>t</italic>-test of the subjects&#x2019; speech rate and the expected feedback speech rate.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th colspan="7">Mean</th>
<th>Std. deviation</th>
<th>Std. error mean</th>
<th align="center" valign="middle" colspan="2">95% Confidence Interval of the Difference</th>
<th>
<italic>t</italic>
</th>
<th>df</th>
<th>Sig. (two-tailed)</th>
<th align="center" valign="middle">Lower</th>
<th align="center" valign="middle">Upper</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="bottom" colspan="9">Pair 1 <italic>V<sub>s</sub></italic> &#x2013; <italic>W</italic><sub>1</sub></td>
<td align="center" valign="middle">0.872414</td>
<td align="center" valign="middle">0.595469</td>
<td align="center" valign="middle">0.110576</td>
<td align="center" valign="middle">0.645910</td>
<td align="center" valign="middle">1.098918</td>
<td align="center" valign="middle">7.890</td>
<td align="center" valign="middle">28</td>
<td align="center" valign="middle">0.000</td>
</tr>
<tr>
<td align="left" valign="bottom" colspan="9">Pair 2 <italic>V<sub>s</sub></italic> &#x2013; <italic>W</italic><sub>2</sub></td>
<td align="center" valign="middle">0.827931</td>
<td align="center" valign="middle">0.517324</td>
<td align="center" valign="middle">0.096065</td>
<td align="center" valign="middle">0.631152</td>
<td align="center" valign="middle">1.024711</td>
<td align="center" valign="middle">8.618</td>
<td align="center" valign="middle">28</td>
<td align="center" valign="middle">0.000</td>
</tr>
<tr>
<td align="left" valign="bottom" colspan="9">Pair 3 <italic>V<sub>s</sub></italic> &#x2013; <italic>W</italic><sub>3</sub></td>
<td align="center" valign="middle">0.641034</td>
<td align="center" valign="middle">0.514465</td>
<td align="center" valign="middle">0.095534</td>
<td align="center" valign="middle">0.445343</td>
<td align="center" valign="middle">0.836726</td>
<td align="center" valign="middle">6.710</td>
<td align="center" valign="middle">28</td>
<td align="center" valign="middle">0.000</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Pearson correlation analysis was conducted on the subjects&#x2019; speech rate <italic>V<sub>s</sub></italic> and the subject&#x2019;s expected system feedback speech rate <italic>W</italic><sub>1</sub>, <italic>W</italic><sub>2</sub>, <italic>W</italic><sub>3</sub>, and the results are shown in <xref rid="tab6" ref-type="table">Table 6</xref>. For Task 1, the results indicate no significant correlation (<italic>r</italic>&#x2009;=&#x2009;0.113, <italic>p</italic>&#x2009;=&#x2009;0.56) between the subjects&#x2019; speech rate and the expected feedback speech rate. For Task 2 and Task 3, there are significant positive correlations (<italic>r</italic>&#x2009;=&#x2009;0.417, <italic>p</italic>&#x2009;=&#x2009;0.025 and <italic>r</italic>&#x2009;=&#x2009;0.399, <italic>p</italic>&#x2009;=&#x2009;0.032) between the subjects&#x2019; speech rate and the expected feedback speech rate.</p>
<table-wrap position="float" id="tab6">
<label>Table 6</label>
<caption>
<p>Correlation between the subjects&#x2019; speech rate and the expected feedback speech rate.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th/>
<th align="center" valign="top">
<italic>V<sub>s</sub></italic>
</th>
<th align="center" valign="top">
<italic>W</italic>
<sub>1</sub>
</th>
<th align="center" valign="top">
<italic>W</italic>
<sub>2</sub>
</th>
<th align="center" valign="top">
<italic>W</italic>
<sub>3</sub>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="3">
<italic>V<sub>s</sub></italic>
</td>
<td align="left" valign="top">Pearson correlation</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.113</td>
<td align="center" valign="top">0.417&#x002A;</td>
<td align="center" valign="top">0.399<sup>&#x002A;</sup></td>
</tr>
<tr>
<td align="left" valign="top">Sig. (two-tailed)</td>
<td/>
<td align="center" valign="top">0.561</td>
<td align="center" valign="top">0.025</td>
<td align="center" valign="top">0.032</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>N</italic>
</td>
<td align="center" valign="top">29</td>
<td align="center" valign="top">29</td>
<td align="center" valign="top">29</td>
<td align="center" valign="top">29</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="3">
<italic>W</italic>
<sub>1</sub>
</td>
<td align="left" valign="top">Pearson correlation</td>
<td align="center" valign="top">0.113</td>
<td align="center" valign="top">1</td>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">Sig. (two-tailed)</td>
<td align="center" valign="top">0.561</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">
<italic>N</italic>
</td>
<td align="center" valign="top">29</td>
<td align="center" valign="top">29</td>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top" rowspan="3">
<italic>W</italic>
<sub>2</sub>
</td>
<td align="left" valign="top">Pearson correlation</td>
<td align="center" valign="top">0.417&#x002A;</td>
<td/>
<td align="center" valign="top">1</td>
<td/>
</tr>
<tr>
<td align="left" valign="top">Sig. (two-tailed)</td>
<td align="center" valign="top">0.025</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">
<italic>N</italic>
</td>
<td align="center" valign="top">29</td>
<td/>
<td align="center" valign="top">29</td>
<td/>
</tr>
<tr>
<td align="left" valign="top" rowspan="3">
<italic>W</italic>
<sub>3</sub>
</td>
<td align="left" valign="top">Pearson correlation</td>
<td align="center" valign="top">0.399<sup>&#x002A;</sup></td>
<td align="center" valign="top">0.336</td>
<td/>
<td align="center" valign="top">1</td>
</tr>
<tr>
<td align="left" valign="top">Sig. (two-tailed)</td>
<td align="center" valign="top">0.032</td>
<td align="center" valign="top">0.075</td>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">
<italic>N</italic>
</td>
<td align="center" valign="top">29</td>
<td align="center" valign="top">29</td>
<td/>
<td align="center" valign="top">29</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The variations of the expected system feedback speech rate <italic>W</italic><sub>2</sub> and <italic>W</italic><sub>3</sub> with the subjects&#x2019; speech rate <italic>V<sub>s</sub></italic> are shown in <xref rid="fig6" ref-type="fig">Figures 6</xref>, <xref rid="fig7" ref-type="fig">7</xref> separately. A linear regression analysis was conducted to analyze the specific performance of the correlation between subjects&#x2019; speech rates and the expected feedback speech rates. The linear regression models are also shown in <xref rid="fig6" ref-type="fig">Figures 6</xref>, <xref rid="fig7" ref-type="fig">7</xref>. The <italic>F</italic>-test (<italic>F</italic>&#x2009;=&#x2009;5.670, <italic>p</italic>&#x2009;=&#x2009;0.025; <italic>F</italic>&#x2009;=&#x2009;5.122, <italic>p</italic>&#x2009;=&#x2009;0.032) and <italic>T</italic>-test (<italic>t</italic>&#x2009;=&#x2009;2.381, <italic>p</italic>&#x2009;=&#x2009;0.025; <italic>t</italic>&#x2009;=&#x2009;2.263, <italic>p</italic>&#x2009;=&#x2009;0.032) also show a significance of the linear relationship of the regression model and the regression coefficient.</p>
<fig position="float" id="fig6">
<label>Figure 6</label>
<caption>
<p>The variation of the expected feedback speech rate <italic>W<sub>2</sub></italic> with the subjects&#x2019; speech rate V<sub>s</sub>.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g006.tif"/>
</fig>
<fig position="float" id="fig7">
<label>Figure 7</label>
<caption>
<p>The variation of the expected feedback speech rate <italic>W<sub>3</sub></italic> with the subjects&#x2019; speech rate V<sub>s</sub>.</p>
</caption>
<graphic xlink:href="fpsyg-14-1119355-g007.tif"/>
</fig>
</sec>
<sec id="sec18">
<label>4.3.</label>
<title>The influence of dialog task on expected feedback speech rate</title>
<p>LSD-<italic>t</italic>-test was conducted on the subjects&#x2019; expected feedback speech rates <italic>W</italic><sub>1</sub>, <italic>W</italic><sub>2</sub>, and <italic>W</italic><sub>3</sub>. The results are shown in <xref rid="tab7" ref-type="table">Table 7</xref>. The significance of the chi-square test was <italic>p</italic>&#x2009;=&#x2009;0.866, which validates the homogeneity of the collected data. As shown in <xref rid="tab8" ref-type="table">Table 8</xref>, a one-way ANOVA was carried out on the subjects&#x2019; expected feedback speech rates in different types of dialog tasks. The results indicate no significant difference (<italic>p</italic>&#x2009;=&#x2009;0.065) among the three expected feedback speech rates across dialog task types. That means dialog task type did not significantly affect the expected feedback speech rate.</p>
<table-wrap position="float" id="tab7">
<label>Table 7</label>
<caption>
<p>Homogeneity test of variance for <italic>W</italic><sub>1</sub>, <italic>W</italic><sub>2</sub>, and <italic>W</italic><sub>3</sub>.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th align="left" valign="top" colspan="2">Levin statistics</th>
<th align="center" valign="top">df 1</th>
<th align="center" valign="top">df 2</th>
<th align="center" valign="top">Sig.</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle" rowspan="4">Expected feedback speech rate<break/><italic>W</italic></td>
<td align="left" valign="middle">Based on average</td>
<td align="center" valign="middle">0.144</td>
<td align="center" valign="middle">2</td>
<td align="center" valign="middle">84</td>
<td align="center" valign="middle">0.866</td>
</tr>
<tr>
<td align="left" valign="middle">Based on median</td>
<td align="center" valign="middle">0.182</td>
<td align="center" valign="middle">2</td>
<td align="center" valign="middle">84</td>
<td align="center" valign="middle">0.834</td>
</tr>
<tr>
<td align="left" valign="middle">Based on the median and with adjusted degrees of freedom</td>
<td align="center" valign="middle">0.182</td>
<td align="center" valign="middle">2</td>
<td align="center" valign="middle">79.445</td>
<td align="center" valign="middle">0.834</td>
</tr>
<tr>
<td align="left" valign="middle">Based on average after clipping</td>
<td align="center" valign="middle">0.176</td>
<td align="center" valign="middle">2</td>
<td align="center" valign="middle">84</td>
<td align="center" valign="middle">0.839</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="tab8">
<label>Table 8</label>
<caption>
<p>One-way ANOVA test for expected feedback speech rate in different types of dialog tasks.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top" colspan="6">Expected feedback speech rate <italic>W</italic></th>
</tr>
<tr>
<th/>
<th align="center" valign="top">Sum of squares</th>
<th align="center" valign="top">df</th>
<th align="center" valign="top">Mean square</th>
<th align="center" valign="top">
<italic>F</italic>
</th>
<th align="center" valign="top">Sig.</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">Interblock</td>
<td align="center" valign="middle">0.368</td>
<td align="center" valign="middle">1</td>
<td align="center" valign="middle">0.368</td>
<td align="center" valign="middle">3.493</td>
<td align="center" valign="middle">0.065</td>
</tr>
<tr>
<td align="left" valign="middle">Interclass</td>
<td align="center" valign="middle">8.950</td>
<td align="center" valign="middle">85</td>
<td align="center" valign="middle">0.105</td>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="middle">Total</td>
<td align="center" valign="middle">9.318</td>
<td align="center" valign="middle">85</td>
<td/>
<td/>
<td/>
</tr>
</tbody>
</table>
</table-wrap>
<p>The <italic>post hoc</italic> LSD-<italic>t</italic>-test was conducted on the users&#x2019; expected system feedback speech rate <italic>W</italic><sub>1</sub>, <italic>W</italic><sub>2</sub>, and <italic>W</italic><sub>3</sub>, and the correlation between the data was examined in two pairs. The results are depicted in <xref rid="tab9" ref-type="table">Table 9</xref>. There is no significant difference between <italic>W</italic><sub>1</sub> and <italic>W</italic><sub>2</sub> (<italic>p</italic>&#x2009;=&#x2009;0.831) under the conditions of different types of dialog tasks and the same number of words of feedback, which also indicates that the dialog task type does not affect the elderly&#x2019;s satisfaction with the expected feedback speech rate.</p>
<table-wrap position="float" id="tab9">
<label>Table 9</label>
<caption>
<p><italic>Post hoc</italic> LSD-<italic>t</italic> test for <italic>W</italic><sub>1</sub>, <italic>W</italic><sub>2</sub>, and <italic>W</italic><sub>3</sub>.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle" rowspan="2">(I) Group</th>
<th align="left" valign="middle" rowspan="2">(J) Group</th>
<th align="center" valign="middle" rowspan="2">Mean value (I-J)</th>
<th align="center" valign="middle" rowspan="2">Standard error</th>
<th align="center" valign="middle" rowspan="2">Sig.</th>
<th align="center" valign="middle" colspan="2">95% confidence interval</th>
</tr>
<tr>
<th align="center" valign="top">Lower limit</th>
<th align="center" valign="top">Superior limit</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="2">
<italic>W<sub>1</sub></italic>
</td>
<td align="center" valign="top">
<italic>W<sub>2</sub></italic>
</td>
<td align="center" valign="top">&#x2212;0.018621</td>
<td align="center" valign="top">0.087207</td>
<td align="center" valign="top">0.831</td>
<td align="center" valign="top">&#x2212;0.19204</td>
<td align="center" valign="top">0.15480</td>
</tr>
<tr>
<td align="center" valign="top">
<italic>W<sub>3</sub></italic>
</td>
<td align="center" valign="top">&#x2212;0.201207<sup>&#x002A;</sup></td>
<td align="center" valign="top">0.087207</td>
<td align="center" valign="top">0.024</td>
<td align="center" valign="top">&#x2212;0.37463</td>
<td align="center" valign="top">&#x2212;0.02779</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2">
<italic>W<sub>2</sub></italic>
</td>
<td align="center" valign="top">
<italic>W<sub>1</sub></italic>
</td>
<td align="center" valign="top">0.018621</td>
<td align="center" valign="top">0.087207</td>
<td align="center" valign="top">0.831</td>
<td align="center" valign="top">&#x2212;0.15480</td>
<td align="center" valign="top">0.19204</td>
</tr>
<tr>
<td align="center" valign="top">
<italic>W<sub>3</sub></italic>
</td>
<td align="center" valign="top">&#x2212;0.182586<sup>&#x002A;</sup></td>
<td align="center" valign="top">0.087207</td>
<td align="center" valign="top">0.039</td>
<td align="center" valign="top">&#x2212;0.35601</td>
<td align="center" valign="top">&#x2212;0.00917</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2">
<italic>W<sub>3</sub></italic>
</td>
<td align="center" valign="top">
<italic>W<sub>1</sub></italic>
</td>
<td align="center" valign="top">0.201207<sup>&#x002A;</sup></td>
<td align="center" valign="top">0.087207</td>
<td align="center" valign="top">0.024</td>
<td align="center" valign="top">0.02779</td>
<td align="center" valign="top">0.37463</td>
</tr>
<tr>
<td align="center" valign="top">
<italic>W<sub>2</sub></italic>
</td>
<td align="center" valign="top">0.182586<sup>&#x002A;</sup></td>
<td align="center" valign="top">0.087207</td>
<td align="center" valign="top">0.039</td>
<td align="center" valign="top">0.00917</td>
<td align="center" valign="top">0.35601</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><sup>&#x002A;</sup> The significance level of the mean difference <italic>p</italic> &#x003C; 0.05.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="sec19">
<label>4.4.</label>
<title>The influence of the feedback word count on expected feedback speech rate</title>
<p>The feedback word count of Task 1 and Task 2 was 37, and the mean value of the subjects&#x2019; expected feedback rate was 3.61 words/s and 3.63 words/s, respectively. The word count of Task 3 was 18, and the mean value of the expected feedback speech rate was 3.81 words/s.</p>
<p>As shown in <xref rid="tab9" ref-type="table">Table 9</xref>, the subjects&#x2019; expected feedback speech rates <italic>W</italic><sub>1</sub> and<italic>W</italic><sub>3</sub> are significantly different (<italic>p</italic>&#x2009;=&#x2009;0.024). <italic>W</italic><sub>2</sub> and <italic>W</italic><sub>3</sub> differ significantly (<italic>p</italic>&#x2009;=&#x2009;0.039) alike. One-way ANOVA was also carried out on the subjects&#x2019; expected feedback speech rates in different word counts of dialog tasks. As shown in <xref rid="tab10" ref-type="table">Table 10</xref>, the results indicate a significant difference (<italic>p</italic>&#x2009;=&#x2009;0.005) among the three expected feedback speech rates across feedback word count. That means feedback word count affects the expected feedback speech rate significantly.</p>
<table-wrap position="float" id="tab10">
<label>Table 10</label>
<caption>
<p>One-way ANOVA test for expected feedback speech rate in different word counts of dialog task.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top" colspan="6">Expected feedback speech rate <italic>W</italic></th>
</tr>
<tr>
<th/>
<th align="center" valign="top">Sum of Squares</th>
<th align="center" valign="top">df</th>
<th align="center" valign="top">Mean square</th>
<th align="center" valign="top">
<italic>F</italic>
</th>
<th align="center" valign="top">Sig.</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">Interblock</td>
<td align="center" valign="middle">0.846</td>
<td align="center" valign="middle">1</td>
<td align="center" valign="middle">0.846</td>
<td align="center" valign="middle">8.484</td>
<td align="center" valign="middle">0.005</td>
</tr>
<tr>
<td align="left" valign="middle">Interclass</td>
<td align="center" valign="middle">8.473</td>
<td align="center" valign="middle">85</td>
<td align="center" valign="middle">0.100</td>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="middle">Total</td>
<td align="center" valign="middle">9.318</td>
<td align="center" valign="middle">85</td>
<td/>
<td/>
<td/>
</tr>
</tbody>
</table>
</table-wrap>
<p>It suggests that there was a difference in the expected speech rate of feedback between Task 1, &#x201C;listen to the news,&#x201D; and Task 3, &#x201C;check the schedule.&#x201D; Under conditions of different types of tasks and word counts of feedback, there is a difference in subjects&#x2019; expected feedback speech rates. Thus, it is essential to determine the factors affecting feedback speech rate. It indicates a difference between the subjects&#x2019; expected system feedback speech rate when completing the dialog task of acquiring information in Task 2 and Task 3, i.e., task scenarios with the same task type but different feedback word counts.</p>
<p>It demonstrates that the amount of system feedback words impacts users&#x2019; assessments of the system feedback speech rate. Subjects preferred a slower system feedback speech rate in the task scenario with more words of feedback than the feedback with fewer words.</p>
<p>Data analysis and post-experiment user interviews found that when the system feedback contained more words, the elderly&#x2019;s expected feedback speech rate was significantly slower than the scenario with fewer words of feedback. It indicates that elderly people have limited cognitive ability when processing feedback information. When receiving more feedback, they take more time to remember and store the information; therefore, elderly people expect the feedback speech rate to be slower.</p>
</sec>
</sec>
<sec id="sec20" sec-type="discussions">
<label>5.</label>
<title>Discussion</title>
<p>There is a significant difference between the speech rate subjects speak to people and the voice robot. When elderly people speak to a robot, they consciously slow their speech rate. Most elderly people are not very familiar with voice interaction technology and the product, which led to a slight on voice robot (<xref ref-type="bibr" rid="ref7">Branigan et al., 2011</xref>; <xref ref-type="bibr" rid="ref36">Koulouri et al., 2016</xref>). &#x201C;I think the robot may not be as smart as people, so I speak to it with a lower speech rate to ensure it hears me clearly and understand me,&#x201D; subject No. 14 said in the interview after the experiment. Whether this phenomenon exists in people familiar with voice user interfaces and artificial intelligence technology is pending.</p>
<p>Elderly people expect the voice robot to give feedback at a slower speech rate than their own. From the aspect of language expression, speech rate reflects one&#x2019;s cognitive, understanding and memory skills (<xref ref-type="bibr" rid="ref62">R&#x00F6;nnberg et al., 1989</xref>; <xref ref-type="bibr" rid="ref63">Sanju et al., 2019</xref>; <xref ref-type="bibr" rid="ref43">Lotfi et al., 2020</xref>; <xref ref-type="bibr" rid="ref29">Huiyang and Min, 2022</xref>). Elderly people want their interlocutor to talk to themselves with a lower speech rate to ensure they can hear and understand the speaker clearly. Even the speaker is a robot. Meanwhile, elderly people with faster speech rates expect a faster feedback speech rate, which confirms the current study results that people with faster speech rates expect their interlocutor to respond with a faster speech rate (<xref ref-type="bibr" rid="ref9">Brown et al., 1985</xref>; <xref ref-type="bibr" rid="ref23">Hargrave et al., 1994</xref>; <xref ref-type="bibr" rid="ref18">Freud et al., 2018</xref>). These suggest to some extent that the speech convergence is applicable to the interaction between elderly people and VUI.</p>
<p>Compared with non-goal-oriented dialog, goal-oriented dialog features specific information acquisition, which may require the listener to concentrate more on the speaker&#x2019;s feedback (<xref ref-type="bibr" rid="ref19">Galley, 2007</xref>; <xref ref-type="bibr" rid="ref80">Zhang et al., 2018</xref>; <xref ref-type="bibr" rid="ref70">Stigall et al., 2020</xref>). Nevertheless, the results show that dialog task type did not significantly affect the expected feedback speech rate. Subject No. 7 said, &#x201C;regardless of whether I have a clear goal of information acquisition, I always hope to hear the voice clearly and try my best to understand what I heard.&#x201D; Based on our observations of the subjects during the experiments, they always try their best to listen and remember the voice robot&#x2019;s feedback, even though they are not required to do so. This may be slightly different from the scenario of the elderly listening to the radio and music in their leisure time. Nearly 83% of the subjects (<italic>N</italic>&#x2009;=&#x2009;25) said that they did not and would not remember all details of the music and radio they listened to in their leisure time.</p>
<p>As mentioned above, subjects always try their best to remember the information of the feedback from the voice robot. It is reasonable that feedback word count significantly affects the expected feedback speech rate. Although the experimental scenario is different from the real scenario of voice user interface usage, we still believe that the voice user interface designers should set a reasonable feedback speech rate to ensure users can accurately capture all the content of the feedback.</p>
</sec>
<sec id="sec21" sec-type="conclusions">
<label>6.</label>
<title>Conclusion</title>
<p>This paper focuses on the effects of the elderly&#x2019;s speech rate, types of voice interaction task, and the word count of feedback on elderly people&#x2019;s expected feedback speech rate. It is found that the elderly&#x2019;s speech rate and the word count in a single feedback have a significant influence on elderly people&#x2019;s expected feedback speech rate. However, dialog task type affects the expected feedback speech rate inapparently. The faster elderly people speak, the faster feedback speech rate they desire, but not faster than their own. The more words of feedback are, the slower the elderly&#x2019;s expected feedback speech rate is. These results also provide valuable implications for VUI user experience design. The feedback speech rate should be defined according to the interacting speech rate of elderly people and the word count of feedback content.</p>
<p>This study was designed for theoretical and practical application, especially the linear regression models of subjects&#x2019; speech rate and their expected feedback speech rate could be applied to developing a voice robot or other applications with the voice user interface. Besides, the word count of the feedback is another factor that should be considered when defining the feedback speech rate. In this study, two typical scenarios, which contain 18 and 37 Chinese words, are used in the experiment, respectively. The results show a significant difference in the expected feedback speech rates. However, these two numbers are not guidelines to follow. More research should focus on the effect of word count or information blocks on the expected feedback speech rate.</p>
<p>This study is carried out with Chinese people, and the materials are also made of Chinese characters and mandarin, so the results just explain the interaction between Chinese elderly people and voice robots. As different languages are spoken with different speech rates intrinsically, the problems this paper focuses on could be studied further in other languages.</p>
</sec>
<sec id="sec22" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="sec23">
<title>Ethics statement</title>
<p>The studies involving human participants were reviewed and approved by Ethics Committee of Shenzhen Technology University. The patients/participants provided their written informed consent to participate in this study.</p>
</sec>
<sec id="sec24">
<title>Author contributions</title>
<p>JW: conceptualization, methodology, investigation, supervision, project administration, and writing &#x2013; review and editing. SY and ZX: data curation. SY and JW: formal analysis. SY: visualization. ZX: materials. JW and SY: writing &#x2013; original draft preparation. All authors have read and agreed to the published version of the manuscript.</p>
</sec>
<sec id="sec25" sec-type="funding-information">
<title>Funding</title>
<p>This research was supported by the Humanities and Social Science Projects of the Ministry of Education of China (Grant No. 21YJC760078) and the Postgraduate Education and Teaching Reform Project of Guangdong Province of China (Grant No. 2022JGXM094).</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Antonio</surname> <given-names>T</given-names></name> <name><surname>Annika</surname> <given-names>H</given-names></name> <name><surname>Jairo</surname> <given-names>A</given-names></name> <name><surname>Nuno</surname> <given-names>A</given-names></name> <name><surname>Geza</surname> <given-names>N</given-names></name></person-group>. (<year>2014</year>). <article-title>Speech-centric multimodal interaction for easy-to-access online services &#x2013; a personal life assistant for the elderly</article-title>, In <conf-name>Proceeding of 5th International conference on software development and Technologies for Enhancing Accessibility and Fighting Info-Exclusion</conf-name>, <volume>27</volume>, <fpage>389</fpage>&#x2013;<lpage>397</lpage></citation>
</ref>
<ref id="ref2">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Baba</surname> <given-names>A.</given-names></name> <name><surname>Yoshizawa</surname> <given-names>S.</given-names></name> <name><surname>Yamada</surname> <given-names>M.</given-names></name> <name><surname>Lee</surname> <given-names>A.</given-names></name> <name><surname>Shikano</surname> <given-names>K.</given-names></name></person-group> (<year>2004</year>). &#x201C;<article-title>Acoustic models of the elderly for large-vocabulary continuous speech recognition</article-title>&#x201D; in <source>Electronics and Communications in Japan, Part II</source>, vol. <volume>87</volume>, <publisher-loc>Tokyo</publisher-loc>, <fpage>49</fpage>&#x2013;<lpage>57</lpage>.</citation>
</ref>
<ref id="ref3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bai</surname> <given-names>X.</given-names></name> <name><surname>Yu</surname> <given-names>J.</given-names></name> <name><surname>Qin</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Cognitive aging of the elderly population and interaction Interface Design of Elderly Products, packaging</article-title>. <source>Engineering</source>, <volume>10</volume>, <fpage>7</fpage>&#x2013;<lpage>12</lpage>. doi: <pub-id pub-id-type="doi">10.19554/j.cnki.1001-3563.2020.10.002</pub-id></citation>
</ref>
<ref id="ref4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barr</surname> <given-names>D. J.</given-names></name> <name><surname>Keysar</surname> <given-names>B.</given-names></name></person-group> (<year>2002</year>). <article-title>Anchoring comprehension in linguistic precedents</article-title>. <source>J. Mem. Lang.</source> <volume>46</volume>, <fpage>391</fpage>&#x2013;<lpage>418</lpage>. doi: <pub-id pub-id-type="doi">10.1006/jmla.2001.2815</pub-id></citation>
</ref>
<ref id="ref44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beebe</surname> <given-names>L.</given-names></name> <name><surname>Giles</surname> <given-names>H.</given-names></name></person-group> (<year>1984</year>). <article-title>Speech-accommodation theories: a discussion in terms of second-language acquisition</article-title>. <source>Int. J. Sociol. Lang.</source> <volume>1984</volume>, <fpage>5</fpage>&#x2013;<lpage>32</lpage>. doi: <pub-id pub-id-type="doi">10.1515/ijsl.1984.46.5</pub-id></citation>
</ref>
<ref id="ref5">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Boudin</surname> <given-names>Auriane</given-names></name></person-group>. (<year>2022</year>). &#x201C;<article-title>Interdisciplinary corpus-based approach for exploring multimodal conversational feedback</article-title>,&#x201D; in <source>International conference on multimodal interaction</source>. <conf-loc>Bengaluru: ACM</conf-loc>, <fpage>705</fpage>&#x2013;<lpage>710</lpage>.</citation>
</ref>
<ref id="ref6">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Boudin</surname> <given-names>Auriane</given-names></name> <name><surname>Bertrand</surname> <given-names>Roxane</given-names></name> <name><surname>Rauzy</surname> <given-names>St&#x00E9;phane</given-names></name> <name><surname>Ochs</surname> <given-names>Magalie</given-names></name> <name><surname>Blache</surname> <given-names>Philippe</given-names></name></person-group>. (<year>2021</year>). <article-title>A multimodal model for predicting conversational feedbacks</article-title>.&#x201D; In <source>Text, speech, and dialogue</source>, (Eds.) <person-group person-group-type="editor"><name><surname>Ek&#x0161;tein</surname> <given-names>Kamil</given-names></name> <name><surname>P&#x00E1;rtl</surname> <given-names>Franti&#x0161;ek</given-names></name> <name><surname>Konop&#x00ED;k</surname> <given-names>Miloslav</given-names></name></person-group>, <volume>12848</volume>:<fpage>537</fpage>&#x2013;49. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name></citation>
</ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Branigan</surname> <given-names>H. P.</given-names></name> <name><surname>Pickering</surname> <given-names>M. J.</given-names></name> <name><surname>Pearson</surname> <given-names>J.</given-names></name> <name><surname>McLean</surname> <given-names>J. F.</given-names></name> <name><surname>Brown</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>The role of beliefs in lexical alignment: evidence from dialogs with humans and computers</article-title>. <source>Cognition</source> <volume>121</volume>, <fpage>41</fpage>&#x2013;<lpage>57</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2011.05.011</pub-id>, PMID: <pub-id pub-id-type="pmid">21723549</pub-id></citation>
</ref>
<ref id="ref8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brennan</surname> <given-names>S. E.</given-names></name> <name><surname>Clark</surname> <given-names>H. H.</given-names></name></person-group> (<year>1996</year>). <article-title>Conceptual pacts and lexical choice in conversation</article-title>. <source>J. Exp. Psychol. Learn. Mem. Cogn.</source> <volume>22</volume>, <fpage>1482</fpage>&#x2013;<lpage>1493</lpage>.</citation>
</ref>
<ref id="ref9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>B. L.</given-names></name> <name><surname>Giles</surname> <given-names>H.</given-names></name> <name><surname>Thakerar</surname> <given-names>J. N.</given-names></name></person-group> (<year>1985</year>). <article-title>Speaker evaluations as a function of speech rate, accent and context</article-title>. <source>Lang. Commun.</source> <volume>5</volume>, <fpage>207</fpage>&#x2013;<lpage>220</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0271-5309(85)90011-4</pub-id></citation>
</ref>
<ref id="ref10">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Cao</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). <article-title>The characteristics and changes of speeches, Phonetic Research Report of Institute of Linguistics, Chinese Academy of Social Sciences</article-title>, <source>6th Chinese Academic Conference on Modern Phonetics</source>. <publisher-loc>Beijing</publisher-loc>, <fpage>143</fpage>&#x2013;<lpage>148</lpage>.</citation>
</ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Yin</surname> <given-names>D.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>A survey on dialogue systems: recent advances and new Frontiers</article-title>. <source>ACM SIGKDD Explor. Newsl.</source> <volume>19</volume>, <fpage>25</fpage>&#x2013;<lpage>35</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3166054.3166058</pub-id></citation>
</ref>
<ref id="ref12">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Cordasco</surname> <given-names>G</given-names></name> <name><surname>Esposito</surname> <given-names>M</given-names></name> <name><surname>Masucci</surname> <given-names>F</given-names></name> <name><surname>Riviello</surname> <given-names>M</given-names></name> <name><surname>Esposito</surname> <given-names>A</given-names></name></person-group>, (<year>2014</year>). <article-title>Assessing voice user interfaces: the vAssist system prototype</article-title> In <conf-name>Proceeding 5th IEEE conference on cognitive infocommunications (coginfocom)</conf-name>, <conf-loc>New York</conf-loc>, <fpage>91</fpage>&#x2013;<lpage>96</lpage>, <volume>2014</volume>.</citation>
</ref>
<ref id="ref13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Czaja</surname> <given-names>S. J.</given-names></name> <name><surname>Charness</surname> <given-names>N.</given-names></name> <name><surname>Fisk</surname> <given-names>A. D.</given-names></name> <name><surname>Hertzog</surname> <given-names>C.</given-names></name> <name><surname>Nair</surname> <given-names>S. N.</given-names></name> <name><surname>Rogers</surname> <given-names>W. A.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Factors predicting the use of technology: findings from the center for research and education on aging and technology enhancement (CREATE)</article-title>. <source>Psychol. Aging</source> <volume>21</volume>, <fpage>333</fpage>&#x2013;<lpage>352</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0882-7974.21.2.333</pub-id>, PMID: <pub-id pub-id-type="pmid">16768579</pub-id></citation>
</ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dahlb&#x00E4;ck</surname> <given-names>N.</given-names></name> <name><surname>J&#x00F6;nsson</surname> <given-names>A.</given-names></name> <name><surname>Ahrenberg</surname> <given-names>L.</given-names></name></person-group> (<year>1993</year>). <article-title>Wizard of Oz studies &#x2014; why and how</article-title>. <source>Knowl. Based Syst.</source> <volume>6</volume>, <fpage>258</fpage>&#x2013;<lpage>266</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0950-7051(93)90017-N</pub-id></citation>
</ref>
<ref id="ref15">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Dautenhahn</surname> <given-names>K</given-names></name></person-group>, (<year>2004</year>) <article-title>Robots we like to live with&#x2048; &#x2013; a developmental perspective on a personalized, life-long robot companion, RO-MAN 2004</article-title>. In proceeding of <conf-name>13th IEEE International workshop on robot and human interactive communication (IEEE catalog no. 04TH8759)</conf-name>, <fpage>17</fpage>&#x2013;<lpage>22</lpage></citation>
</ref>
<ref id="ref16">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Ekstedt</surname> <given-names>Erik</given-names></name> <name><surname>Skantze</surname> <given-names>Gabriel</given-names></name></person-group>. (<year>2022</year>). Voice activity projection: self-supervised learning of turn-taking events. <source>Interspeech</source> <volume>2022</volume>, <fpage>5190</fpage>&#x2013;<lpage>5194</lpage> doi: <pub-id pub-id-type="doi">10.21437/Interspeech.2022-10955</pub-id></citation>
</ref>
<ref id="ref17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Feng</surname> <given-names>X</given-names></name></person-group>. (<year>2002</year>). <source>Design of Questionnaire in social survey</source>, <publisher-loc>Tianjin</publisher-loc>: <publisher-name>Tianjin People&#x2019;s Publishing House</publisher-name>.</citation>
</ref>
<ref id="ref18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freud</surname> <given-names>D.</given-names></name> <name><surname>Ezrati-Vinacour</surname> <given-names>R.</given-names></name> <name><surname>Amir</surname> <given-names>O.</given-names></name></person-group> (<year>2018</year>). <article-title>Speech rate adjustment of adults during conversation</article-title>. <source>J. Fluen. Disord.</source> <volume>57</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jfludis.2018.06.002</pub-id>, PMID: <pub-id pub-id-type="pmid">29960136</pub-id></citation>
</ref>
<ref id="ref19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Galley</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Book Review</article-title>. <source>Ergonomics</source> <volume>50</volume>, <fpage>319</fpage>&#x2013;<lpage>321</lpage>. doi: <pub-id pub-id-type="doi">10.1080/00140130500401530</pub-id></citation>
</ref>
<ref id="ref20">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Ghosh</surname> <given-names>D</given-names></name> <name><surname>Foong</surname> <given-names>P</given-names></name> <name><surname>Zhang</surname> <given-names>S</given-names></name> <name><surname>Zhao</surname> <given-names>S</given-names></name></person-group>. (<year>2018</year>). &#x201C;<article-title>Assessing the utility of the system usability scale for evaluating voice-based user interfaces, the sixth international symposium of Chinese CHI</article-title>,&#x201D; in <source>Association for Computing Machinery</source>. <publisher-loc>New York, NY, USA</publisher-loc>, <fpage>11</fpage>&#x2013;<lpage>15</lpage>.</citation>
</ref>
<ref id="ref21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gijssels</surname> <given-names>T.</given-names></name> <name><surname>Casasanto</surname> <given-names>L. S.</given-names></name> <name><surname>Jasmin</surname> <given-names>K.</given-names></name> <name><surname>Hagoort</surname> <given-names>P.</given-names></name> <name><surname>Casasanto</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>Speech accommodation without priming: the case of pitch</article-title>. <source>Discourse Process.</source> <volume>53</volume>, <fpage>233</fpage>&#x2013;<lpage>251</lpage>. doi: <pub-id pub-id-type="doi">10.1080/0163853X.2015.1023965</pub-id></citation>
</ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Yin</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Adaptive aging design of smart TV VUI based on language cognition</article-title>. <source>Packag. Eng.</source> <volume>43</volume>, <fpage>50</fpage>&#x2013;<lpage>54</lpage>. doi: <pub-id pub-id-type="doi">10.19554/j.cnki.1001-3563.2022.08.007</pub-id></citation>
</ref>
<ref id="ref23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hargrave</surname> <given-names>S.</given-names></name> <name><surname>Kalinowski</surname> <given-names>J.</given-names></name> <name><surname>Stuart</surname> <given-names>A.</given-names></name> <name><surname>Armson</surname> <given-names>J.</given-names></name> <name><surname>Jones</surname> <given-names>K.</given-names></name></person-group> (<year>1994</year>). <article-title>Effect of frequency-altered feedback on stuttering frequency at normal and fast speech rates</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>37</volume>, <fpage>1313</fpage>&#x2013;<lpage>1319</lpage>. doi: <pub-id pub-id-type="doi">10.1044/jshr.3706.1313</pub-id>, PMID: <pub-id pub-id-type="pmid">7877290</pub-id></citation>
</ref>
<ref id="ref24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hawthorn</surname> <given-names>D.</given-names></name></person-group> (<year>2000</year>). <article-title>Possible implications of aging for interface designers</article-title>. <source>Interact. Comput.</source> <volume>12</volume>, <fpage>507</fpage>&#x2013;<lpage>528</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0953-5438(99)00021-1</pub-id></citation>
</ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hess</surname> <given-names>A. S.</given-names></name> <name><surname>Zellman</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Could you please repeat that? Speech design best practices for minimizing errors</article-title>. <source>Proc. Hum. Fact. Ergon. Soc. Annu. Meet</source> <volume>62</volume>, <fpage>1002</fpage>&#x2013;<lpage>1006</lpage>. doi: <pub-id pub-id-type="doi">10.1177/1541931218621231</pub-id></citation>
</ref>
<ref id="ref26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>H&#x00F6;flich</surname> <given-names>JR</given-names></name> <name><surname>El Bayed</surname> <given-names>A</given-names></name></person-group>. (<year>2015</year>). <source>Perception, acceptance, and the social construction of robots&#x2014;Exploratory studies, Social Robots from a Human Perspective</source>, <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>, <fpage>39</fpage>&#x2013;<lpage>51</lpage>.</citation>
</ref>
<ref id="ref27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hua</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>Research on information interaction of digital reading products for preschool children based on user experience</article-title>. <source>Design</source> <volume>34</volume>, <fpage>121</fpage>&#x2013;<lpage>123</lpage>.</citation>
</ref>
<ref id="ref28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>Interactive innovation design of barrier-free products for the blind</article-title>. <source>Packag. Eng.</source> <volume>38</volume>, <fpage>108</fpage>&#x2013;<lpage>113</lpage>. doi: <pub-id pub-id-type="doi">10.19554/j.cnki.1001-3563.2017.24.023</pub-id></citation>
</ref>
<ref id="ref29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huiyang</surname> <given-names>S.</given-names></name> <name><surname>Min</surname> <given-names>W.</given-names></name></person-group> (<year>2022</year>). <article-title>Improving interaction experience through lexical convergence: the prosocial effect of lexical alignment in human-human and human-computer interactions</article-title>. <source>Int. J. Hum. Comput. Interact.</source> <volume>38</volume>, <fpage>28</fpage>&#x2013;<lpage>41</lpage>. doi: <pub-id pub-id-type="doi">10.1080/10447318.2021.1921367</pub-id></citation>
</ref>
<ref id="ref30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iniguez</surname> <given-names>A. L.</given-names></name> <name><surname>Gaytan</surname> <given-names>L. S.</given-names></name> <name><surname>Garcia-Ruiz</surname> <given-names>M. A.</given-names></name> <name><surname>Maciel</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Usability questionnaires to evaluate voice user interfaces</article-title>. <source>IEEE Lat. Am. Trans.</source> <volume>19</volume>, <fpage>1468</fpage>&#x2013;<lpage>1477</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TLA.2021.9468439</pub-id></citation>
</ref>
<ref id="ref31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jia</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <source>Research of AI speaker voice interaction design for the aged</source>. <publisher-loc>Shenzhen</publisher-loc>: <publisher-name>South China University of Technology</publisher-name>.</citation>
</ref>
<ref id="ref32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). <article-title>Speech regulation in telephone conversation between young and old people</article-title>. <source>Education</source> <volume>28</volume>:<fpage>291</fpage>.</citation>
</ref>
<ref id="ref33">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kalimullah</surname> <given-names>K</given-names></name> <name><surname>Sushmitha</surname> <given-names>D</given-names></name></person-group>. (<year>2017</year>). <article-title>Influence of design elements in Mobile applications on user experience of elderly people</article-title>, <conf-name>8th International conference on emerging ubiquitous systems and pervasive networks (EUSPN 2017)/7th International conference on current and future trends of information and communication Technologies in Healthcare (icth-2017) / affiliated workshops</conf-name>, <conf-loc>Amsterdam: Elsevier Science Bv</conf-loc>, <volume>113</volume>, <fpage>352</fpage>&#x2013;<lpage>359</lpage></citation>
</ref>
<ref id="ref34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kemper</surname> <given-names>S.</given-names></name></person-group> (<year>1994</year>). <article-title>Elder speak: speech accommodations to older adults</article-title>. <source>Neuropsychol. Cogn. Aging</source> <volume>1</volume>, <fpage>17</fpage>&#x2013;<lpage>28</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09289919408251447</pub-id></citation>
</ref>
<ref id="ref35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Zhao</surname> <given-names>F.</given-names></name></person-group> (<year>2015</year>). <article-title>A study of speech speed in normal adults reading aloud and spontaneous speech</article-title>. <source>J. Audiol. Speech Disord</source> <volume>23</volume>, <fpage>240</fpage>&#x2013;<lpage>243</lpage>.</citation>
</ref>
<ref id="ref36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koulouri</surname> <given-names>T.</given-names></name> <name><surname>Lauria</surname> <given-names>S.</given-names></name> <name><surname>Macredie</surname> <given-names>R. D.</given-names></name></person-group> (<year>2016</year>). <article-title>Do (and say) as I say: linguistic adaptation in human&#x2013;computer dialogs</article-title>. <source>Hum. Comput. Interact.</source> <volume>31</volume>, <fpage>59</fpage>&#x2013;<lpage>95</lpage>. doi: <pub-id pub-id-type="doi">10.1080/07370024.2014.934180</pub-id></citation>
</ref>
<ref id="ref37">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kowalski</surname> <given-names>J</given-names></name> <name><surname>Jaskulska</surname> <given-names>A</given-names></name> <name><surname>Skorupska</surname> <given-names>K</given-names></name> <name><surname>Abramczuk</surname> <given-names>K</given-names></name> <name><surname>Kopec</surname> <given-names>W</given-names></name> <name><surname>Marasek</surname> <given-names>K</given-names></name></person-group>. (<year>2020</year>). <article-title>Older adults and voice interaction: a pilot study with Google home</article-title>. In <conf-name>Proceedings of the 31st Australian conference on human-computer-interaction (ozchi'19)</conf-name>, <conf-loc>New York: Assoc Computing Machinery</conf-loc>, <fpage>423</fpage>&#x2013;<lpage>427</lpage>.</citation>
</ref>
<ref id="ref38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Aging and speech understanding</article-title>. <source>J. Audiol. Otol.</source> <volume>19</volume>, <fpage>7</fpage>&#x2013;<lpage>13</lpage>. doi: <pub-id pub-id-type="doi">10.7874/jao.2015.19.1.7</pub-id>, PMID: <pub-id pub-id-type="pmid">26185785</pub-id></citation>
</ref>
<ref id="ref39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>C.</given-names></name> <name><surname>Coughlin</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>PERSPECTIVE: older Adults' adoption of technology: an integrated approach to identifying determinants and barriers</article-title>. <source>J. Prod. Innov. Manage.</source> <volume>32</volume>, <fpage>747</fpage>&#x2013;<lpage>759</lpage>. doi: <pub-id pub-id-type="doi">10.1111/jpim.12176</pub-id></citation>
</ref>
<ref id="ref40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Overview of speech recognition technology based on human-computer interaction</article-title>. <source>Electron. World</source> <volume>21</volume>:<fpage>105</fpage>. doi: <pub-id pub-id-type="doi">10.19353/j.cnki.dzsj.2018.21.060</pub-id></citation>
</ref>
<ref id="ref41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name></person-group> (<year>2019</year>). <article-title>Optimization of VUI feedback mechanism based on the time perception</article-title>. <source>Decor. Furnish.</source> <volume>7</volume>, <fpage>100</fpage>&#x2013;<lpage>103</lpage>. doi: <pub-id pub-id-type="doi">10.16272/j.cnki.cn11-1392/j.2019.07.023</pub-id></citation>
</ref>
<ref id="ref42">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>H</given-names></name> <name><surname>Ma</surname> <given-names>F</given-names></name></person-group>. (<year>2010</year>). <article-title>Research on visual elements of web UI design</article-title>, <conf-name>IEEE 11th International conference on computer-aided Industrial Design &#x0026; Conceptual Design 1</conf-name>, <fpage>428</fpage>&#x2013;<lpage>430</lpage></citation>
</ref>
<ref id="ref43">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Lotfi</surname> <given-names>Y.</given-names></name> <name><surname>Samadi-Qaleh-Juqy</surname> <given-names>Z.</given-names></name> <name><surname>Moosavi</surname> <given-names>A.</given-names></name> <name><surname>Sadjedi</surname> <given-names>H.</given-names></name> <name><surname>Bakhshi</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>The effects of spatial auditory training on speech perception in noise in the elderly</article-title>. <source>Crescent J. Med. Biol. Sci.</source> <volume>7</volume>, <fpage>40</fpage>&#x2013;<lpage>46</lpage>.</citation>
</ref>
<ref id="ref45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luna-Garcia</surname> <given-names>H.</given-names></name> <name><surname>Mendoza-Gonzalez</surname> <given-names>R.</given-names></name> <name><surname>Gamboa-Rosales</surname> <given-names>H.</given-names></name> <name><surname>Celaya-Padilla</surname> <given-names>J.</given-names></name> <name><surname>Galvan-Tejada</surname> <given-names>C.</given-names></name> <name><surname>Lopez-Monteagudo</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Mental models associated to voice user interfaces for infotainment systems</article-title>. <source>Dyna</source> <volume>93</volume>:<fpage>245</fpage>. doi: <pub-id pub-id-type="doi">10.6036/8766</pub-id></citation>
</ref>
<ref id="ref46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>L.</given-names></name></person-group> (<year>1998</year>). <article-title>The formation and application of speech regulation theory</article-title>. <source>Hum. Soc. Sci. J. Hainan Univ.</source> <volume>1</volume>, <fpage>78</fpage>&#x2013;<lpage>81</lpage>.</citation>
</ref>
<ref id="ref47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>Q.</given-names></name> <name><surname>Zhou</surname> <given-names>R.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name></person-group> (<year>2022</year>). <article-title>Rationally or emotionally: how should voice user interfaces reply to users of different genders considering user experience?</article-title> <source>Cogn. Tech. Work</source> <volume>24</volume>, <fpage>233</fpage>&#x2013;<lpage>246</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10111-021-00687-8</pub-id></citation>
</ref>
<ref id="ref48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meena</surname> <given-names>R.</given-names></name> <name><surname>Skantze</surname> <given-names>G.</given-names></name> <name><surname>Gustafson</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Data-driven models for timing feedback responses in a map task dialogue system</article-title>. <source>Comput. Speech Lang.</source> <volume>28</volume>, <fpage>903</fpage>&#x2013;<lpage>922</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.csl.2014.02.002</pub-id></citation>
</ref>
<ref id="ref49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>G.</given-names></name></person-group> (<year>2006</year>). <article-title>Chinese language speed and listening teaching as a second language</article-title>. <source>World Chin. Teach.</source> <volume>2</volume>, <fpage>129</fpage>&#x2013;<lpage>137</lpage>.</citation>
</ref>
<ref id="ref50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>B. A.</given-names></name> <name><surname>Urakami</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <article-title>The impact of the physical and social embodiment of voice user interfaces on user distraction</article-title>. <source>Int. J. Human Comput Stud</source> <volume>161</volume>:<fpage>102784</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ijhcs.2022.102784</pub-id></citation>
</ref>
<ref id="ref51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murad</surname> <given-names>C.</given-names></name> <name><surname>Munteanu</surname> <given-names>C.</given-names></name> <name><surname>Cowan</surname> <given-names>B. R.</given-names></name> <name><surname>Clark</surname> <given-names>L.</given-names></name></person-group> (<year>2019</year>). <article-title>Revolution or evolution? Speech interaction and HCI design guidelines</article-title>. <source>IEEE Pervasive Comput.</source> <volume>18</volume>, <fpage>33</fpage>&#x2013;<lpage>45</lpage>. doi: <pub-id pub-id-type="doi">10.1109/MPRV.2019.2906991</pub-id></citation>
</ref>
<ref id="ref52">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Myers</surname> <given-names>C</given-names></name> <name><surname>Furqan</surname> <given-names>A</given-names></name> <name><surname>Nebolsky</surname> <given-names>J</given-names></name> <name><surname>Caro</surname> <given-names>K</given-names></name> <name><surname>Zhu</surname> <given-names>J</given-names></name></person-group>. (<year>2018</year>). <article-title>Patterns for how users overcome obstacles in voice user interfaces</article-title>. In <conf-name>Proceedings of the 2018 chi conference on human factors in computing systems (chi 2018)</conf-name>, <conf-loc>New York, NY, Assoc Computing Machinery</conf-loc>.</citation>
</ref>
<ref id="ref53">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Myers</surname> <given-names>C M</given-names></name> <name><surname>Furqan</surname> <given-names>A</given-names></name> <name><surname>Zhu</surname> <given-names>J</given-names></name></person-group>. (<year>2019</year>). <article-title>The impact of user characteristics and preferences on performance with an unfamiliar voice user interface, chi 2019</article-title>. In <conf-name>Proceedings of the 2019 chi conference on human factors in computing systems</conf-name>, <conf-loc>New York, NY, Assoc Computing Machinery</conf-loc>.</citation>
</ref>
<ref id="ref54">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Nass</surname> <given-names>C</given-names></name> <name><surname>Steuer</surname> <given-names>J</given-names></name> <name><surname>Tauber</surname> <given-names>E R</given-names></name></person-group>. (<year>1994</year>). <article-title>Computers are social actors</article-title>, In <conf-loc>Proceedings of the SIGCHI conference on human factors in computing systems</conf-loc>, <fpage>72</fpage>&#x2013;<lpage>78</lpage></citation>
</ref>
<ref id="ref55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ostrowski</surname> <given-names>A. K.</given-names></name> <name><surname>Fu</surname> <given-names>J.</given-names></name> <name><surname>Zygouras</surname> <given-names>V.</given-names></name> <name><surname>Park</surname> <given-names>H. W.</given-names></name> <name><surname>Breazeal</surname> <given-names>C.</given-names></name></person-group> (<year>2022</year>). <article-title>Speed dating with voice user interfaces: understanding how families interact and perceive voice user interfaces in a group setting</article-title>. <source>Front. Robot AI</source> <volume>8</volume>:<fpage>730992</fpage>. doi: <pub-id pub-id-type="doi">10.3389/frobt.2021.730992</pub-id></citation>
</ref>
<ref id="ref56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Page</surname> <given-names>T.</given-names></name></person-group> (<year>2014</year>). <article-title>Touchscreen mobile devices and older adults: a usability study</article-title>. <source>Int. J. Hum. Fact. Ergon.</source> <volume>3</volume>:<fpage>65</fpage>. doi: <pub-id pub-id-type="doi">10.1504/IJHFE.2014.062550</pub-id></citation>
</ref>
<ref id="ref57">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Porcheron</surname> <given-names>M</given-names></name> <name><surname>Fischer</surname> <given-names>J E</given-names></name> <name><surname>Valstar</surname> <given-names>M</given-names></name></person-group>. (<year>2020</year>). <article-title>NottReal: a tool for voice-based wizard of Oz studies</article-title>, In <conf-name>Proceedings of the 2nd conference on conversational user interfaces</conf-name>, <conf-loc>Bilbao Spain: ACM</conf-loc>, <fpage>1</fpage>&#x2013;<lpage>3</lpage>.</citation>
</ref>
<ref id="ref58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Portet</surname> <given-names>F.</given-names></name> <name><surname>Vacher</surname> <given-names>M.</given-names></name> <name><surname>Golanski</surname> <given-names>C.</given-names></name> <name><surname>Roux</surname> <given-names>C.</given-names></name> <name><surname>Meillon</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects</article-title>. <source>Pers. Ubiquit. Comput.</source> <volume>17</volume>, <fpage>127</fpage>&#x2013;<lpage>144</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00779-011-0470-5</pub-id></citation>
</ref>
<ref id="ref59">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Powers</surname> <given-names>A</given-names></name> <name><surname>Kiesler</surname> <given-names>S</given-names></name></person-group>. (<year>2006</year>). <article-title>The advisor robot: Tracing people's mental model from a robot's physical attributes</article-title>. In <conf-name>Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction</conf-name>, <conf-loc>New York, NY: Association for Computing Machinery</conf-loc>, <fpage>218</fpage>&#x2013;<lpage>225</lpage>.</citation>
</ref>
<ref id="ref60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pradhan</surname> <given-names>A.</given-names></name> <name><surname>Lazar</surname> <given-names>A.</given-names></name> <name><surname>Findlater</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Use of intelligent voice assistants by older adults with low technology use</article-title>. <source>ACM Trans. Comput. Hum. Interact.</source> <volume>27</volume>:<fpage>31</fpage>. doi: <pub-id pub-id-type="doi">10.1145/3373759</pub-id></citation>
</ref>
<ref id="ref61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rakotomalala</surname> <given-names>F.</given-names></name> <name><surname>Randriatsarafara</surname> <given-names>H. N.</given-names></name> <name><surname>Hajalalaina</surname> <given-names>A. R.</given-names></name></person-group> (<year>2021</year>). <article-title>Voice user Interface: literature review, challenges and future directions</article-title>. <source>Syst. Theor. Control Comput. J.</source> <volume>1</volume>, <fpage>65</fpage>&#x2013;<lpage>89</lpage>. doi: <pub-id pub-id-type="doi">10.52846/stccj.2021.1.2.26</pub-id></citation>
</ref>
<ref id="ref62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>R&#x00F6;nnberg</surname> <given-names>J.</given-names></name> <name><surname>Arlinger</surname> <given-names>S.</given-names></name> <name><surname>Lyxell</surname> <given-names>B.</given-names></name> <name><surname>Kinnefors</surname> <given-names>C.</given-names></name></person-group> (<year>1989</year>). <article-title>Visual evoked potentials: relation to adult speechreading and cognitive function</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>32</volume>, <fpage>725</fpage>&#x2013;<lpage>735</lpage>. doi: <pub-id pub-id-type="doi">10.1044/jshr.3204.725</pub-id></citation>
</ref>
<ref id="ref63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanju</surname> <given-names>K.</given-names></name> <name><surname>Himanshu</surname> <given-names>D. R.</given-names></name> <name><surname>Yadav</surname> <given-names>A. K.</given-names></name></person-group> (<year>2019</year>). <article-title>Relationship between listening, speech and language, cognition and pragmatic skill in children with cochlear implant</article-title>. <source>IP Indian J. Anat. Surg. Head, Neck Brain</source> <volume>5</volume>, <fpage>72</fpage>&#x2013;<lpage>75</lpage>. doi: <pub-id pub-id-type="doi">10.18231/j.ijashnb.2019.019</pub-id></citation>
</ref>
<ref id="ref64">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Sayago</surname> <given-names>S</given-names></name> <name><surname>Neves</surname> <given-names>B</given-names></name> <name><surname>Cowan</surname> <given-names>B</given-names></name></person-group>. (<year>2019</year>). <article-title>Voice assistants and older people: Some open issues</article-title>. In <conf-name>Proceedings of the 1st International conference on conversational user interfaces</conf-name>, <conf-loc>New York, NY, Association for Computing Machinery</conf-loc>, <fpage>1</fpage>&#x2013;<lpage>3</lpage></citation>
</ref>
<ref id="ref65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scahill</surname> <given-names>R. I.</given-names></name> <name><surname>Frost</surname> <given-names>C.</given-names></name> <name><surname>Jenkins</surname> <given-names>R.</given-names></name> <name><surname>Whitwell</surname> <given-names>J. L.</given-names></name> <name><surname>Rossor</surname> <given-names>M. N.</given-names></name> <name><surname>Fox</surname> <given-names>N. C.</given-names></name></person-group> (<year>2003</year>). <article-title>A longitudinal study of brain volume changes in Normal aging using serial registered magnetic resonance imaging</article-title>. <source>Arch. Neurol.</source> <volume>60</volume>, <fpage>989</fpage>&#x2013;<lpage>994</lpage>. doi: <pub-id pub-id-type="doi">10.1001/archneur.60.7.989</pub-id>, PMID: <pub-id pub-id-type="pmid">12873856</pub-id></citation>
</ref>
<ref id="ref66">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Shahrebaki</surname> <given-names>AS</given-names></name> <name><surname>Imran</surname> <given-names>A</given-names></name> <name><surname>Olfati</surname> <given-names>N</given-names></name> <name><surname>Svendsen</surname> <given-names>T</given-names></name></person-group>. (<year>2018</year>). <article-title>Acoustic feature comparison for different speaking rates</article-title>, In proceeding of the <conf-name>human-computer interaction: interaction technologies, HCI International 2018, Part III</conf-name>, <conf-loc>Cham</conf-loc>, <volume>10903</volume>, <fpage>176</fpage>&#x2013;<lpage>189</lpage>.</citation>
</ref>
<ref id="ref67">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Shin</surname> <given-names>A</given-names></name> <name><surname>Oh</surname> <given-names>J</given-names></name> <name><surname>Lee</surname> <given-names>J</given-names></name></person-group>. (<year>2019</year>). <article-title>Apprentice of Oz: human in the loop system for conversational robot wizard of Oz</article-title>, <conf-name>14th ACM/IEEE International conference on human-robot interaction (HRI)</conf-name>, <fpage>516</fpage>&#x2013;<lpage>517</lpage>, <volume>2019</volume>.</citation>
</ref>
<ref id="ref68">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>M</given-names></name> <name><surname>Rafat</surname> <given-names>Y</given-names></name> <name><surname>Bhatt</surname> <given-names>S</given-names></name> <name><surname>Jain</surname> <given-names>A</given-names></name> <name><surname>Dev</surname> <given-names>A</given-names></name></person-group>. (<year>2021</year>). &#x201C;<article-title>Continuous Speech Recognition Technologies-A Review</article-title>&#x201D;, in <source>Recent Developments in Acoustics</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer Singapore</publisher-name>, <fpage>85</fpage>&#x2013;<lpage>94</lpage>.</citation>
</ref>
<ref id="ref69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Cheng</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>The investigation of adoption of voice-user Interface (VUI) in smart home systems among Chinese older adults</article-title>. <source>Sensors</source> <volume>22</volume>:<fpage>1614</fpage>. doi: <pub-id pub-id-type="doi">10.3390/s22041614</pub-id>, PMID: <pub-id pub-id-type="pmid">35214513</pub-id></citation>
</ref>
<ref id="ref70">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Stigall</surname> <given-names>Brodrick</given-names></name> <name><surname>Waycott</surname> <given-names>Jenny</given-names></name> <name><surname>Baker</surname> <given-names>Steven</given-names></name> <name><surname>Caine</surname> <given-names>Kelly</given-names></name></person-group>. (<year>2020</year>). <article-title>Older adults&#x2019; perception and use of voice user interfaces: a preliminary review of the computing literature</article-title>. In <conf-name>Proceedings of the 31st Australian conference on human-computer-interaction (Ozchi&#x2019;19)</conf-name>, <fpage>423</fpage>&#x2013;<lpage>427</lpage>. <conf-loc>New York: Assoc Computing Machinery</conf-loc></citation>
</ref>
<ref id="ref71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Svennerholm</surname> <given-names>L.</given-names></name> <name><surname>Bostr&#x00F6;m</surname> <given-names>K.</given-names></name> <name><surname>Jungbjer</surname> <given-names>B.</given-names></name></person-group> (<year>1997</year>). <article-title>Changes in weight and compositions of major membrane components of human brain during the span of adult human life of swedes</article-title>. <source>Acta Neuropathol.</source> <volume>94</volume>, <fpage>345</fpage>&#x2013;<lpage>352</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s004010050717</pub-id>, PMID: <pub-id pub-id-type="pmid">9341935</pub-id></citation>
</ref>
<ref id="ref72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Truong</surname> <given-names>K. P.</given-names></name> <name><surname>Poppe</surname> <given-names>R.</given-names></name> <name><surname>Heylen</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <article-title>A rule-based backchannel prediction model using pitch and pause information</article-title>. <source>Interspeech</source>, <fpage>3058</fpage>&#x2013;<lpage>3061</lpage>. doi: <pub-id pub-id-type="doi">10.21437/Interspeech.2010-59</pub-id></citation>
</ref>
<ref id="ref73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name></person-group> (<year>2002</year>). <article-title>Changes of physiological function and clinical rational drug use in the elderly</article-title>. <source>Chin. Commun. Physician</source> <volume>18</volume>, <fpage>7</fpage>&#x2013;<lpage>8</lpage>.</citation>
</ref>
<ref id="ref74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>White</surname> <given-names>K. F.</given-names></name> <name><surname>Lutter</surname> <given-names>W. G.</given-names></name></person-group> (<year>2003</year>). <article-title>Behind the curtain: lessons learned from a wizard of Oz field experiment</article-title>. <source>SIGGROUP Bullet.</source> <volume>24</volume>, <fpage>129</fpage>&#x2013;<lpage>135</lpage>. doi: <pub-id pub-id-type="doi">10.1145/1052829.1052854</pub-id></citation>
</ref>
<ref id="ref75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilkinson</surname> <given-names>C.</given-names></name> <name><surname>Cornish</surname> <given-names>K.</given-names></name></person-group> (<year>2018</year>). <article-title>An overview of participatory design applied to physical and digital product interaction for older people</article-title>. <source>Multimodal Technol. Interact.</source> <volume>2</volume>:<fpage>79</fpage>. doi: <pub-id pub-id-type="doi">10.3390/mti2040079</pub-id></citation>
</ref>
<ref id="ref76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Y.</given-names></name></person-group> (<year>1992</year>). <article-title>A study of speech regulation theory in sociolinguistics</article-title>. <source>Foreign Lang. Teach. Res.</source>, <fpage>18</fpage>&#x2013;<lpage>24+80</lpage>.</citation>
</ref>
<ref id="ref77">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>J</given-names></name> <name><surname>Liberman</surname> <given-names>M</given-names></name> <name><surname>Cieri</surname> <given-names>C</given-names></name></person-group>. (<year>2006</year>). <article-title>Towards an integrated understanding of speaking rate in conversation</article-title>. In <conf-name>Proceeding of Interspeech 2006 and 9th International conference on spoken language processing</conf-name>, <volume>1-5</volume>, <conf-loc>Baixas: ISCA-INT Speech Communication Assoc</conf-loc>, <fpage>541</fpage>&#x2013;<lpage>544</lpage></citation>
</ref>
<ref id="ref78">
<citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zen</surname> <given-names>H</given-names></name> <name><surname>Senior</surname> <given-names>A</given-names></name> <name><surname>Schuster</surname> <given-names>M</given-names></name></person-group>, (<year>2013</year>) <article-title>Statistical parametric speech synthesis using deep neural networks</article-title>, <conf-name>IEEE International conference on acoustics, speech and signal processing (ICASSP)</conf-name>, <conf-loc>New York</conf-loc>, <fpage>7962</fpage>&#x2013;<lpage>7966</lpage></citation>
</ref>
<ref id="ref79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Research on emotional design of speech interaction in the elderly</article-title>. <source>Audio Eng.</source> <volume>45</volume>, <fpage>28</fpage>&#x2013;<lpage>30</lpage>.</citation>
</ref>
<ref id="ref80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Huang</surname> <given-names>M.</given-names></name> <name><surname>Zhao</surname> <given-names>Z.</given-names></name> <name><surname>Ji</surname> <given-names>F.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Zhu</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Memory-augmented dialogue management for task-oriented dialogue systems</article-title>. <source>arXiv</source>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1805.00150</pub-id></citation>
</ref>
<ref id="ref81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Interface usability for the elderly users in the past 10 years</article-title>. <source>Packag. Eng.</source> <volume>40</volume>, <fpage>217</fpage>&#x2013;<lpage>222</lpage>.</citation>
</ref>
<ref id="ref82">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ziman</surname> <given-names>R</given-names></name> <name><surname>Walsh</surname> <given-names>G</given-names></name></person-group>. (<year>2018</year>). <source>Factors affecting Seniors' perceptions of voice-enabled user interfaces, chi 2018: Extended abstracts of the 2018 chi conference on human factors in computing systems</source> <publisher-loc>New York, NY</publisher-loc>, <publisher-name>Assoc Computing Machinery</publisher-name></citation>
</ref>
</ref-list>
</back>
</article>