<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Sci.</journal-id>
<journal-title>Frontiers in Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Sci.</abbrev-journal-title>
<issn pub-type="epub">2624-9898</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomp.2021.770492</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computer Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Predicting Activation Liking of People With Dementia</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Steinert</surname> <given-names>Lars</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/884663/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Putze</surname> <given-names>Felix</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/129624/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>K&#x000FC;ster</surname> <given-names>Dennis</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/389729/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Schultz</surname> <given-names>Tanja</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/748491/overview"/>
</contrib>
</contrib-group>
<aff><institution>Cognitive Systems Lab, Department of Mathematics and Computer Science, University of Bremen</institution>, <addr-line>Bremen</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Youngjun Cho, University College London, United Kingdom</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Saturnino Luz, University of Edinburgh, United Kingdom; Emilie Brotherhood, University College London, United Kingdom</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Lars Steinert <email>lars.steinert&#x00040;uni-bremen.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Computer Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>01</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>3</volume>
<elocation-id>770492</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>12</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Steinert, Putze, K&#x000FC;ster and Schultz.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Steinert, Putze, K&#x000FC;ster and Schultz</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract><p>Physical, social and cognitive activation is an important cornerstone in non-pharmacological therapy for People with Dementia (PwD). To support long-term motivation and well-being, activation contents first need to be perceived positively. Prompting for explicit feedback, however, is intrusive and interrupts the activation flow. Automated analyses of verbal and non-verbal signals could provide an unobtrusive means of recommending suitable contents based on implicit feedback. In this study, we investigate the correlation between engagement responses and self-reported activation ratings. Subsequently, we predict ratings of PwD based on verbal and non-verbal signals in an unconstrained care setting. Applying Long-Short-Term-Memory (LSTM) networks, we can show that our classifier outperforms chance level. We further investigate which features are the most promising indicators for the prediction of activation ratings of PwD.</p></abstract>
<kwd-group>
<kwd>dementia</kwd>
<kwd>activation</kwd>
<kwd>rating prediction</kwd>
<kwd>engagement</kwd>
<kwd>LSTM</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="49"/>
<page-count count="9"/>
<word-count count="6478"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Dementia describes a syndrome that is characterized by the loss of cognitive function and behavioral changes. This includes memory, language skills, and the ability to focus and pay attention (WHO, <xref ref-type="bibr" rid="B48">2017</xref>). It has been shown that the physical, social, and cognitive stimulation of People with Dementia (PwD) has significant positive effects on their cognitive functioning (Spector et al., <xref ref-type="bibr" rid="B43">2003</xref>; Woods et al., <xref ref-type="bibr" rid="B49">2012</xref>) and can lead to a higher quality of life (Schreiner et al., <xref ref-type="bibr" rid="B38">2005</xref>; Cohen-Mansfield et al., <xref ref-type="bibr" rid="B11">2011</xref>). It is furthermore often (implicitly) assumed, that activation contents need to be perceived positively to help maintain long-term motivation and well-being. This can be supported by a recommender system that suggests appropriate activation contents. Here, an activation content is defined as a stimulus of a certain type (image gallery, video, audio, quiz, game, phrase or text) on a certain topic, e.g. gardening, sports, or animals to cognitively, socially, or physically activate PwD and which aims for the general maintenance or enhancement of the according functions (Clare and Woods, <xref ref-type="bibr" rid="B8">2004</xref>). However, prompting for explicit user feedback is intrusive as it disturbs the activation flow. Studies have shown that verbal and non-verbal signals can be promising indicators for the internal states of healthy individuals (Masip et al., <xref ref-type="bibr" rid="B27">2014</xref>; Tkal&#x0010D;i&#x0010D; et al., <xref ref-type="bibr" rid="B47">2019</xref>). Even PwD who might suffer from blunted affect or aphasia, might remain able to provide verbal and non-verbal signals throughout all stages of the disease (Steinert et al., <xref ref-type="bibr" rid="B46">2021</xref>). For this study, we use the I-CARE dataset (Schultz et al., <xref ref-type="bibr" rid="B40">2018</xref>, <xref ref-type="bibr" rid="B41">2021</xref>) which consists of verbal and non-verbal signals of PwD who used a tablet-based activation system over multiple sessions in an unconstrained care setting. Previous studies have already investigated the recognition of engagement of PwD (Steinert et al., <xref ref-type="bibr" rid="B45">2020</xref>, <xref ref-type="bibr" rid="B46">2021</xref>), which is defined as &#x0201C;the act of being occupied or involved with an external stimulus&#x0201D; (Cohen-Mansfield et al., <xref ref-type="bibr" rid="B10">2009</xref>). Here, we explicitly consider the argument that activation contents should not only be engaging but also need to be perceived positively to maintain long-term motivation and well-being. In this study, we thus first investigate the correlation between engagement responses and self-reported activation ratings. Second, we analyze if self-reported activation ratings of PwD can be predicted based on verbal and non-verbal signals. Third, we explore the permutation-based feature importance of our classifier to generate hypotheses about possible underlying mechanisms. Last, we discuss the unique challenges involved with predicting activation ratings of elderly PwD. To the best of our knowledge, there are no prior studies that have investigated the prediction of activation ratings of PwD based on verbal and non-verbal signals.</p>
</sec>
<sec id="s2">
<title>2. Related Works</title>
<p>Research into the preservation of cognitive resources of PwD has a long history. A number of studies have investigated the effects of activation on perceived well-being, affect, engagement, and other affective states. However, detecting and interpreting the verbal and non-verbal signals of PwD can be particularly challenging due to the broad range of deleterious effects of aphasia or blunted affect on communication (Jones et al., <xref ref-type="bibr" rid="B19">2015</xref>; WHO, <xref ref-type="bibr" rid="B48">2017</xref>). In this section, we will (1) provide an overview of different non-pharmacological interventions that target the activation of PwD and (2) highlight relevant research into the production of (interpretable) verbal and non-verbal signals of PwD.</p>
<p>Over 20 years ago, Olsen et al. (<xref ref-type="bibr" rid="B32">2000</xref>) introduced &#x0201C;Media Memory Lane,&#x0201D; a system that provides nostalgic music and videos to elicit long term memory stimulation for people with Alzheimer&#x00027;s Disease (AD). An evaluation of this system with 15 day care clients showed positive effects on engagement, affect, activity-related talking, and reduced fidgeting. Astell et al. (<xref ref-type="bibr" rid="B4">2010</xref>) evaluated the Computer Interactive Reminiscence and Conversation Aid (CIRCA) system, a touch screen system that presents photographs, music and video clips to enhance the interaction between PwD and caregivers. Their study demonstrated significant differences in verbal and non-verbal behavior when comparing the system with traditional reminiscence therapy sessions. Smith et al. (<xref ref-type="bibr" rid="B42">2009</xref>) produced audiovisual biographies based on photographs and personally meaningful music in cooperation with families of PwD. They further used a television set and a DVD player as a familiar interface for their participants. Several studies have also proposed music as a promising factor in non-pharmacological approaches (Spiro, <xref ref-type="bibr" rid="B44">2010</xref>). Accordingly, Riley et al. (<xref ref-type="bibr" rid="B36">2009</xref>) introduced a touch screen system that allows PwD to create music regardless of any prior musical knowledge. Evaluating the system in three pilot studies, the authors reported engagement in the activity for all participants. Manera et al. (<xref ref-type="bibr" rid="B26">2015</xref>) developed a tablet-based kitchen and cooking simulation for elderly people with mild cognitive impairment. After four weeks of training, most participants rated the experience to be interesting, highly satisfying, and as eliciting more positive than negative emotions. Together, these findings underline the positive effects of non-pharmacological interventions for PwD, as well as for their (in)formal caregivers.</p>
<p>Asplund et al. (<xref ref-type="bibr" rid="B3">1995</xref>) investigated affect in the facial expressions of four severe demented participants during activities such as morning care or playing music. The authors compared unstructured judgements of facial expressions with assessments using the Facial Action Coding System [FACS, Ekman et al. (<xref ref-type="bibr" rid="B13">2002</xref>)] and showed that while facial cues become sparse and unclear, they are still interpretable to a certain degree. Mograbi et al. (<xref ref-type="bibr" rid="B29">2012</xref>) conducted a study with 22 participants with mild to moderate dementia who watched films for emotion elicitation. The authors manually annotated facial expressions, namely happiness, surprise, fear, sadness, disgust, anger, and contempt of the PwD and the controls. While they reported little difference in their production, PwD showed a narrower range of expressions which were less intense. This is in line with other studies that report that PwD may suffer from emotional blunting (Kumfor and Piguet, <xref ref-type="bibr" rid="B22">2012</xref>; Perugia et al., <xref ref-type="bibr" rid="B35">2020</xref>). To examine the quality and the decrease of emotional responses of PwD, Magai et al. (<xref ref-type="bibr" rid="B25">1996</xref>) conducted a study with 82 PwD with moderate or severe dementia and their families. Two research assistants were trained to manually code the participants&#x00027; affective behavior, namely interest, joy, sadness, anger, contempt, fear, disgust, and knit brow expressions. Their results suggest that emotional expressivity, however, may not vary much depending on the stage of the disease.</p>
<p>Another important modality for the recognition of affective states is speech (Schuller, <xref ref-type="bibr" rid="B39">2018</xref>). Nazareth (<xref ref-type="bibr" rid="B31">2019</xref>) demonstrated that lexical and acoustic features can be used to predict emotional valence in spontaneous speech of elderly. However, research has shown that speech also undergoes disease-related changes in dementia, e.g. impairments in the production of prosody (Roberts et al., <xref ref-type="bibr" rid="B37">1996</xref>; Horley et al., <xref ref-type="bibr" rid="B18">2010</xref>). This is particularly pertinent in frontotemporal dementia (Budson and Kowall, <xref ref-type="bibr" rid="B6">2011</xref>).</p>
<p>Overall, there seems to be no strong direct link between the ability to produce (interpretable) verbal and non-verbal signals of emotions and the stage of the disease. It rather appears to be a combination of multiple factors such as the dementia type, co-morbidities, medication, and personality. Also, the context seems to play a role. Lee et al. (<xref ref-type="bibr" rid="B23">2017</xref>) showed that social and verbal interactions increase positive emotional responses. Notably even the merely implicit presence of a friend has been shown to be sufficient for eliciting this effect in healthy adults (Fridlund, <xref ref-type="bibr" rid="B16">1991</xref>). Thus, emotional expressiveness appears to be extremely sensitive to contextual factors, and PwD might stand to benefit from such factors.</p>
</sec>
<sec id="s3">
<title>3. Data Collection</title>
<sec>
<title>3.1. I-CARE System</title>
<p>The dataset used in this study was collected with the I-CARE system. I-CARE is a tablet-based activation system that is designed to be jointly used by PwD and (in)formal caregivers. The system is mobile and can be used at any location with and internet connection. It provides 346 user-specific activation contents (image galleries, videos, audios, quizzes, games, phrases and texts) on various topics such as gardening, sports, baking, or animals. The system also allows for the uploading of one&#x00027;s own contents to put more emphasis on biographical work (Schultz et al., <xref ref-type="bibr" rid="B40">2018</xref>, <xref ref-type="bibr" rid="B41">2021</xref>). At the same time, it allows for a multimodal data collection using the tablet&#x00027;s camera and microphone to capture video (30 FPS) and audio signals (16 kHz), respectively. The tablet used in the present work was a Google Pixel C (10.2-inch display) or Huawei MediaPad M5 (10.8-inch display). <xref ref-type="fig" rid="F1">Figure 1</xref> shows exemplary how an activation session could look like.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The left figure shows two participants and one instructor from a project partner, who explains the procedure. The right figure shows two participants during an activation session (&#x000A9;AWO Karlsruhe).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-03-770492-g0001.tif"/>
</fig>
</sec>
<sec>
<title>3.2. Experimental Setting</title>
<p>The data collection for this study was conducted in different care facilities in Southern Germany as a part of the I-CARE project (Schultz et al., <xref ref-type="bibr" rid="B40">2018</xref>, <xref ref-type="bibr" rid="B41">2021</xref>). Participants of the study were PwD who fulfilled the clinical criteria for dementia according to the ICD-10 system (Alzheimer dementia, vascular dementia, frontotemporal dementia, Korsakoff&#x00027;s syndrome, or Dementia Not Otherwise Specified) ranging from mild to severe, and their (in)formal caregivers. All participants provided written consent and there was no financial compensation. For this study, a setup with minimal supervision and setup requirements was selected with activation sessions taking place in private rooms or in commonly used spaces in the care facilities. The tablet was placed on a stand in front of the participant with dementia so that their face was well-aligned with the field of view of the tablet camera.</p>
<p>At the beginning of each session, the system enquired about the daily well-being (&#x0201C;How are you today?&#x0201D;) of the PwD using a smiley rating scale (positive, neutral, negative). Next, the system&#x00027;s recommender system suggested four different activation items, based on interests, personal information of the PwD, and previous ratings. The system also provided the opportunity to search for specific contents and view an activation history. Next, the PwD chose the activation content, e.g. an image gallery on baking, a video on gardening and so on. After each activation, the system asked the PwD for a rating of how well they liked the activation (&#x0201C;Did you enjoy the content?&#x0201D;), again, on a smiley rating scale (positive, neutral, negative). <xref ref-type="fig" rid="F2">Figure 2</xref> shows the thumbnail images of four activation recommendations (left) and the rating options after the activation (right). Following the smiley rating, the system went directly back to the overview with recommended activation contents. Here, the PwD could decide whether or not to continue with another activation. Usually, activation sessions consisted of multiple individual activations.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>User interface of the I-CARE system. The left figure shows the activation recommendations (top left: memory game, top right: image gallery, bottom left: video, bottom right: phrase). The right figure illustrates the rating options after the activation.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-03-770492-g0002.tif"/>
</fig>
<p>The dataset used in this study consists of 187 activation sessions comprising 804 individual activations and, correspondingly, 804 activation ratings. These sessions cover 25 PwD (gender: 15 f, 10 m; age: 58&#x02013;95 years, <italic>M</italic>: 82.4 years, <italic>SD</italic>: 9.0 years; dementia stage: 8 mild-moderate, 5 severe, 12 unspecified). Individual participants contributed with different number of sessions (<italic>M</italic> = 7.48, <italic>SD</italic> = 2.42, <italic>Min</italic> = 2, <italic>Max</italic> = 12).</p>
</sec>
</sec>
<sec sec-type="methods" id="s4">
<title>4. Methods</title>
<sec>
<title>4.1. Rating Measurement</title>
<p>Self-reported activation ratings of the PwD were collected using an smiley rating scale (positive, neutral, negative) at the end of each activation. <xref ref-type="fig" rid="F3">Figure 3</xref> shows the distribution of activation ratings for the participants individually and in total. The colors correspond to the rating (positive = green, neutral = yellow, negative = red). It is evident that activation contents were more frequently perceived as positive than neutral or negative by most participants. A Kruskal-Wallis test shows that these differences are statistically significant (<italic>H</italic> = 54.571, <italic>p</italic> &#x0003C; 0.001). Accordingly, investigating the class distribution across all participants provides a similar picture (positive = 68.23 %, neutral = 25.46 %, negative = 6.3 %). This demonstrates that the activation contents were mostly perceived positively.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Rating distribution for individual participants <bold>(left)</bold> and in total <bold>(right)</bold> grouped based on their dementia stage. Bar colors correspond to the ratings (positive = green; neutral = yellow; negative = red). Positive ratings significantly outweigh neutral and negative ratings (<italic>H</italic> = 54.571, <italic>p</italic> &#x0003C; 0.001).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-03-770492-g0003.tif"/>
</fig>
</sec>
<sec>
<title>4.2. Engagement Analysis</title>
<p>While effective activation contents are typically perceived as positive, not all positive contents are likely to be highly engaging. Furthermore, activation contents will only be effective in the long run if they succeed in engaging PwD. Thus, predicting engagement from verbal and non-verbal signals can be regarded as a separate challenge. As shown by previous work (Steinert et al., <xref ref-type="bibr" rid="B45">2020</xref>, <xref ref-type="bibr" rid="B46">2021</xref>), engagement can indeed be automatically recognized from verbal and non-verbal signals. Engagement in I-CARE was annotated retrospectively based on audio-visual data using the &#x0201C;Video Coding-Incorporating Observed Emotion&#x0201D; (VC-IOE) protocol (Jones et al., <xref ref-type="bibr" rid="B19">2015</xref>) by two independent raters. We computed Cohen&#x00027;s Kappa (&#x003BA;) between both raters after intensive training on six random test sessions to evaluate inter-rater reliability. The VC-IOE defines different engagement dimensions which were evaluated separately. These are emotional (&#x003BA; = 0.824), verbal (&#x003BA; = 0.783), visual (&#x003BA; = 0.887), behavioral (&#x003BA; = 0.745), and agitation (&#x003BA; = 0.941) <xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>. To obtain the level of engagement for each activation content, we calculated an engagement score by summing up the number of positive engagement outcomes per dimension over all frames of an activation content, divided by the total number of frames covering that activation.</p>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> shows the distribution of engagement scores with regards to the self-reported activation ratings of the participants. A Kruskal-Wallis test demonstrated a statistically significant difference (<italic>H</italic> = 7.199, <italic>p</italic> &#x0003C; 0.05) in the group means between the negative (<italic>M</italic> = 0.75, <italic>SD</italic> = 0.56), the neutral (<italic>M</italic> = 0.78, <italic>SD</italic> = 0.51) and the positive class (<italic>M</italic> = 0.89, <italic>SD</italic> = 0.47), indicating a small effect of slightly more evidence for engagement toward positively evaluated activations compared to more negatively perceived contents. Similarly, a Spearman rank correlation analysis (&#x003C1; = 0.094, <italic>p</italic> &#x0003C; 0.001) showed a significant but small correlation between the engagement score and the rating of individual activation contents.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Distribution of engagement scores with regards to the self-reported activation ratings (negative, neutral, positive). There are statistically significant difference (<italic>H</italic> = 7.199, <italic>p</italic> &#x0003C; 0.05) in the group means between the negative (<italic>M</italic> = 0.75, <italic>SD</italic> = 0.56), the neutral (<italic>M</italic> = 0.78, <italic>SD</italic> = 0.51) and the positive class (<italic>M</italic> = 0.89, <italic>SD</italic> = 0.47). Outliers were not removed from further analyses. The &#x0002A; symbol indicates the arithmetic mean.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-03-770492-g0004.tif"/>
</fig>
</sec>
<sec>
<title>4.3. Multimodal Features</title>
<p>Human affective behavior and signaling is multimodal by nature. Thus, it can only be fully interpreted by jointly considering information from different modalities (Pantic et al., <xref ref-type="bibr" rid="B33">2005</xref>). We argue that this is especially valid for PwD in an unconstrained care setting because PwD might suffer from aphasia or blunted affect (Kumfor and Piguet, <xref ref-type="bibr" rid="B22">2012</xref>; Perugia et al., <xref ref-type="bibr" rid="B35">2020</xref>). As individual channels begin to degrade, compensation by other channels is well-known to become more important. However, PwD may not only face greater challenges when decoding signals from by their interaction partners (receiver role) - but also with respect to clearly encoding their own socio-emotional signals in any individual channel (sender role). The Signal-to-Noise Ratio (SNR) can also be low for some modalities due to (multiple) background speakers, room reverberation or adverse lighting conditions. Accordingly, we use video-based features (OpenFace, OpenPose, and VGG-FACE) and audio-based (ComParE, DeepSpectrum) features, for the prediction of activation liking of PwD.</p>
<sec>
<title>4.3.1. Video</title>
<p>The face is arguably the most important non-verbal source for information about another person&#x00027;s affective states (Kappas et al., <xref ref-type="bibr" rid="B20">2013</xref>) and can provide information about affective states throughout all stages of dementia (see section 2). Here, we use the video signal captured with the tablet&#x00027;s camera to detect, align, and crop faces from the participants with dementia. From these pre-processed video frames, we extract facial features, namely the (binary scaled) presence of 18 and the (continuously scaled) intensity of 17 Action Units (AUs)<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> ranging from 0 to 5, the location and rotation of the head (head pose), and the direction of eye gaze in world coordinates using OpenFace 2.0 (Baltrusaitis et al., <xref ref-type="bibr" rid="B5">2018</xref>). In the same vein, we extract skeleton features using OpenPose (Cao et al., <xref ref-type="bibr" rid="B7">2019</xref>) to calculate relevant features, namely the distance between shoulders, eyes, ears, hands to nose, and the visibility of the hands. Last, we apply transfer learning using the pre-trained VGG-Face network (Parkhi et al., <xref ref-type="bibr" rid="B34">2015</xref>). We retrained the network for five epochs using the FER2013 dataset with stochastic gradient descent, a learning rate of 0.0001, and a momentum of 0.9. Next, all video frames are rescaled to 224x224 pixels to match the input size of the Convolutional Neural Network (CNN), and normalized by subtracting the mean. The feature vectors for each video frame is the extracted from the <italic>fc6</italic> layer of the network. Overall, concatenating the feature vectors from all feature extractors leads to a 4138-dimensional feature vector for each video frame.</p>
</sec>
<sec>
<title>4.3.2. Audio</title>
<p>The recognition of affective states from speech is also a highly active research area (Ak&#x000E7;ay and O&#x001E7;uz, <xref ref-type="bibr" rid="B1">2020</xref>). While previous research has shown that speech undergoes disease-related changes in dementia, e.g. impairments in the production of prosody (Roberts et al., <xref ref-type="bibr" rid="B37">1996</xref>; Horley et al., <xref ref-type="bibr" rid="B18">2010</xref>), recent studies suggest that speech of PwD may still help to improve the automatic recognition of engagement (Steinert et al., <xref ref-type="bibr" rid="B46">2021</xref>). We first apply denoising on all raw audio files recorded with the tablet&#x00027;s microphone to remove stationary and non-stationary background sounds, and to enhance participant&#x00027;s speech (Defossez et al., <xref ref-type="bibr" rid="B12">2020</xref>). From the denoised audios, we extract the 2013 Interspeech Computational Paralinguistics Challenge features set (ComParE) using OpenSMILE (Eyben et al., <xref ref-type="bibr" rid="B15">2010</xref>, <xref ref-type="bibr" rid="B14">2013</xref>). We extract audio frame-wise (60 ms frame size; 10 ms steps) frequency, energy, and spectral related Low-Level Descriptors (LLD) which leads to a 130-dimensional feature vector (65 LLDs &#x0002B; deltas) for each step of 10 ms. Next, we create mel spectrograms using Hanning windows (512 samples size, 256 samples steps). We forward spectrograms (227x227 pixels, viridis colormap) to the pre-trained CNN AlexNet to receive bottleneck features from the <italic>fc7</italic> layer which results in a 4096-dimensional feature vector (Amiriparian et al., <xref ref-type="bibr" rid="B2">2017</xref>).</p>
</sec>
</sec>
<sec>
<title>4.4. Data Pre-processing</title>
<p>To take interpersonal and intrapersonal variations into account, we scale each feature to a range between zero and one. We assume that the verbal and non-verbal signals from the time interval shortly before the rating are likely to be most diagnostic for the subsequent activation rating. Correspondingly, we consider the 30 s of verbal and non-verbal signals before the rating was provided. Next, we slice features into 1 s segments with 25 % overlap and assign each segment to the corresponding rating label. Due to the class imbalance (see <xref ref-type="fig" rid="F3">Figure 3</xref>), we combine the neutral and negative classes to formulate a two-class prediction problem. This seems reasonable as especially the prediction of positively perceived activation contents is relevant for an individual&#x00027;s well-being and motivation (Cohen-Mansfield, <xref ref-type="bibr" rid="B9">2018</xref>). These pre-processed and labeled feature sequences are then forwarded to the classifier.</p>
</sec>
<sec>
<title>4.5. Prediction and Evaluation</title>
<p>The applied prediction approach is based on Long-Short-Term-Memory (LSTM) networks which allow for the preservation of temporal dependencies. This is especially important as verbal and non-verbal signals such as speech or facial expressions are subject to continuous change, especially in interactive activation sessions. Due to the different sampling rates of the feature sets of video and audio features (ComParE and DeepSpectrum), the classifier consists of three different input branches. Each input branch consists of a CNN layer (filter size = 256, 64, 256) followed by a MaxPooling layer (pool size = 3, 5, 3). Next, outputs are forwarded to an LSTM layer (units = 512, 64, 512). The three resulting context vectors are concatenated and passed to a Dense layer (units = 256) followed by the output layer (units = 2) with a Softmax activation function which outputs the class prediction. <xref ref-type="fig" rid="F5">Figure 5</xref> shows the proposed system architecture. For regularization, we use a dropout rate of 0.3 in the LSTM layers and after the concatenation layer. We train the model for 50 epochs with a batch size of 16. We use a cross-entropy loss function and Adam optimizer with a learning rate of 0.001. To retrieve the overall rating prediction from individual segments, we apply majority voting. We apply a session-independent model evaluation through 10-fold cross-validation on session level where individual folds contain multiple sessions (18&#x02013;19) and, thus, multiple activation ratings (67&#x02013;87) ranging from negative to positive. Based on this approach, the proposed system learns behavioral characteristics elicited through subjective activation likings of multiple participants for inference on unseen sessions. The performance of our approach is compare to chance level. We select Unweighted Average Precision, Recall and F1-Score as the evaluation metrics as they are particularly suitable for unevenly distributed classes. To test for statistical significance between our model and the baseline, i.e. chance level, we apply a McNemar Test.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Overview of our proposed prediction model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-03-770492-g0005.tif"/>
</fig>
</sec>
<sec>
<title>4.6. Permutation-Based Feature Importance</title>
<p>Explainable artificial intelligence has become an important research field in recent years (Linardatos et al., <xref ref-type="bibr" rid="B24">2021</xref>). Knowing about the underlying mechanisms behind the predictions of black-box classifiers such as neural networks helps to understand and interpret their output. Accordingly, we compute permutation-based feature importances to investigate the importance of individual features for the prediction results (Molnar, <xref ref-type="bibr" rid="B30">2020</xref>). For this, we break the association between individual features and labels by shuffling each feature sequence and adding random noise. For particularly relevant features, this should increase the model&#x00027;s prediction error, i.e. the cross-entropy loss (Kuhn and Johnson, <xref ref-type="bibr" rid="B21">2013</xref>; Molnar, <xref ref-type="bibr" rid="B30">2020</xref>). This is especially useful because it (1) provides insights into which verbal and non-verbal signals are relevant for the prediction of activation rating/ liking of PwD and allows for comparison with healthy individuals, and (2) it can help reveal irrelevant features, which can then be removed to decrease model complexity and computational costs.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Results and Discussion</title>
<p><xref ref-type="table" rid="T1">Table 1</xref> shows the prediction results as the <italic>M</italic> and <italic>SD</italic>, Precision, Recall and F1-Score for each class individually and as an unweighted average over all folds. It is apparent that the model is especially capable of correctly predicting the positive class. A possible explanation for this may be the imbalance toward this class (see <xref ref-type="fig" rid="F3">Figure 3</xref>). The model might not have seen a sufficient variation of data to accurately predict neutral and negative activation ratings. We also assume that participants showed only rather subtle negative expressions due to the highly supportive social context (Lee et al., <xref ref-type="bibr" rid="B23">2017</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Prediction results based on the session-independent 10-fold cross-validation on session level.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Class</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>F1-Score</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Pos.</td>
<td valign="top" align="center">0.726 (0.096)</td>
<td valign="top" align="center">0.754 (0.209)</td>
<td valign="top" align="center">0.729 (0.127)</td>
</tr>
<tr>
<td valign="top" align="left">Neu./ Neg.</td>
<td valign="top" align="center">0.308 (0.224)</td>
<td valign="top" align="center">0.364 (0.277)</td>
<td valign="top" align="center">0.328 (0.238)</td>
</tr>
<tr>
<td valign="top" align="left">Unweighted avg.</td>
<td valign="top" align="center">0.517 (0.272)</td>
<td valign="top" align="center">0.559 (0.312)</td>
<td valign="top" align="center">0.528 (0.277)</td>
</tr>
<tr>
<td valign="top" align="left">Chance</td>
<td valign="top" align="center">0.342 (0.354)</td>
<td valign="top" align="center">0.500 (0.513)</td>
<td valign="top" align="center">0.405 (0.417)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Results are reported as the M and SD Precision, Recall and F1 Score for each class individually and as the unweighted average over all folds</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>What stands out is that overall the prediction model significantly (&#x003C7;<sup>2</sup> = 4.91, <italic>p</italic> &#x0003C; 0.05) outperforms the baseline. Accordingly, verbal and non-verbal signals of PwD in different stages of the disease contain sufficient information for the prediction of activation ratings - despite the challenging recording conditions. The standard deviation indicates performance fluctuations throughout the folds. There are several possible explanations for this result. Participants in our study contributed substantially different numbers of sessions and, thus, different numbers of training samples (see section 3.2). As individual folds do not necessarily represent the overall data distribution, predictions can be based on a variable number of training samples of the same participant. The unstable recording conditions (background speakers, room reverberation, or lighting) throughout individual sessions might further increase the heterogeneity within folds. At the same time, this seems inevitable as the I-CARE system is designed for mobile usage. Thus, these results are not comparable to clean and unambiguous data obtained in laboratory studies with healthy individuals.</p>
<p><xref ref-type="fig" rid="F6">Figure 6</xref> provides an overview of the permutation-based feature importance averaged over all folds. The y-axis indicates the percentage change when comparing the cross-entropy loss before and after permutation. The bigger the negative change, the more important we consider the feature to be. This x-axis represents all 8364 feature candidates (see section 4.3). It is apparent that video-based and DeepSpectrum features seem to be important for the prediction. Especially video-based have been found as an import predictor in other tasks, namely the investigation of music (Tkal&#x0010D;i&#x0010D; et al., <xref ref-type="bibr" rid="B47">2019</xref>) or image (Masip et al., <xref ref-type="bibr" rid="B27">2014</xref>) preferences. The curve progression further suggests that there are no individual features that stand out. Instead, it is rather the combination of different features on which the model relies. This finding could also be due to colinearity in the features, i.e. if one feature is permuted, the model relies on a highly correlated neighbor.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Permutation-based feature importance averaged over all folds. The y-axis represents the perceptual change when comparing the cross-entropy loss before and after permutation, the x-axis shows the feature candidates. The colors indicates the set the feature belongs to (Video = blue, ComParE = Orange, DeepSpectrum = green).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-03-770492-g0006.tif"/>
</fig>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>The main goal of the current study was to determine if activation ratings of PwD can be predicted in a real-life environment. We investigated a dataset collected with the I-CARE system of 25 PwD throughout all stages of the disease, and showed that contents provided by the system are mainly perceived positively, which can lead to more engagement and positive mood (Cohen-Mansfield, <xref ref-type="bibr" rid="B9">2018</xref>). Moreover, participants&#x00027; verbal and non-verbal signals contain sufficient information to successfully predict their activation ratings. Also, we could show that, in line with studies on healthy individuals (Masip et al., <xref ref-type="bibr" rid="B27">2014</xref>; Tkal&#x0010D;i&#x0010D; et al., <xref ref-type="bibr" rid="B47">2019</xref>), the face remains an important source of information for inferring preferences. Interestingly, in our sample, there seems to be only a weak link between observed engagement and subjective activation liking. In general, this finding is indeed more consistent with prior reviews and meta-analyses focused on healthy adults, which have demonstrated only weak to moderate associations between subjective experience and different types of physiological or behavioral responses to emotion-eliciting stimuli in healthy adults (Mauss and Robinson, <xref ref-type="bibr" rid="B28">2009</xref>; Hollenstein and Lanteigne, <xref ref-type="bibr" rid="B17">2014</xref>). However, it is remarkable that (1) this relationship appears to be even further degraded among PwD and (2) that machine learning approaches based on multimodal data may still succeed in successfully predicting subjective ratings of PwD. At the same time, our approach still faces a number of limitations. A session-independent model evaluation implies the existence of annotated samples of the participants. While user-independent modeling would be preferable for the real-world application, this seems too ambitious with a small and heterogeneous dataset. As the presented results are not easily comparable to other studies, future work could also consider the assessments of the present caregivers. This could provide further information about the validity of our results. Despite these limitations, the present results make an important contribution to a, thus far, sparsely populated part of the field with regards to predicting activation liking of PwD.</p>
</sec>
<sec sec-type="data-availability" id="s7">
<title>Data Availability Statement</title>
<p>The datasets presented in this article are not readily available as the used dataset consists of data of People with Dementia. Requests to access the datasets should be directed to <email>lars.steinert&#x00040;uni-bremen.de</email>.</p>
</sec>
<sec id="s8">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by University Of Bremen. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec id="s9">
<title>Author Contributions</title>
<p>LS conceived and designed the analyses, performed the analyses, and wrote the paper. FP conceived and designed the analyses, collected the data, and wrote the paper. DK conceived and designed the analyses and wrote the paper. TS conceived and designed the analyses, collected the data, and supervision of project. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>This work was partially funded by the Klaus-Tschira-Stiftung. Data collection and development of the I-CARE system was funded by the BMBF under reference BMBF-number V4PIDO62. We also gratefully acknowledge the support of the Leibniz ScienceCampus Bremen Digital Public Health (lsc-diph.de), which is jointly funded by the Leibniz Association (W4/2018), the Federal State of Bremen and the Leibniz Institute for Prevention Research and Epidemiology&#x02014;BIPS.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ak&#x000E7;ay</surname> <given-names>M. B.</given-names></name> <name><surname>O&#x001E7;uz</surname> <given-names>K.</given-names></name></person-group> (<year>2020</year>). <article-title>Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers</article-title>. <source>Speech Commun</source>. <volume>116</volume>, <fpage>56</fpage>&#x02013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1016/j.specom.2019.12.001</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Amiriparian</surname> <given-names>S.</given-names></name> <name><surname>Gerczuk</surname> <given-names>M.</given-names></name> <name><surname>Ottl</surname> <given-names>S.</given-names></name> <name><surname>Cummins</surname> <given-names>N.</given-names></name> <name><surname>Freitag</surname> <given-names>M.</given-names></name> <name><surname>Pugachevskiy</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Snore sound classification using image-based deep spectrum features,</article-title> in <source>Interspeech 2017</source> (<publisher-loc>Stockholm</publisher-loc>), <fpage>3512</fpage>&#x02013;<lpage>3516</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2017-434</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asplund</surname> <given-names>K.</given-names></name> <name><surname>Jansson</surname> <given-names>L.</given-names></name> <name><surname>Norberg</surname> <given-names>A.</given-names></name></person-group> (<year>1995</year>). <article-title>Facial expressions of patients with dementia: A comparison of two methods of interpretation</article-title>. <source>Int. Psychogeriatr</source>. <volume>7</volume>, <fpage>527</fpage>&#x02013;<lpage>534</lpage>. <pub-id pub-id-type="doi">10.1017/S1041610295002262</pub-id><pub-id pub-id-type="pmid">8833276</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Astell</surname> <given-names>A. J.</given-names></name> <name><surname>Ellis</surname> <given-names>M. P.</given-names></name> <name><surname>Bernardi</surname> <given-names>L.</given-names></name> <name><surname>Alm</surname> <given-names>N.</given-names></name> <name><surname>Dye</surname> <given-names>R.</given-names></name> <name><surname>Gowans</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Using a touch screen computer to support relationships between people with dementia and caregivers</article-title>. <source>Interact. Comput</source>. <volume>22</volume>, <fpage>267</fpage>&#x02013;<lpage>275</lpage>. <pub-id pub-id-type="doi">10.1016/j.intcom.2010.03.003</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Baltrusaitis</surname> <given-names>T.</given-names></name> <name><surname>Zadeh</surname> <given-names>A.</given-names></name> <name><surname>Lim</surname> <given-names>Y. C.</given-names></name> <name><surname>Morency</surname> <given-names>L.-P.</given-names></name></person-group> (<year>2018</year>). <article-title>Openface 2.0: facial behavior analysis toolkit,</article-title> in <source>2018 13th IEEE International Conference on Automatic Face &#x00026; Gesture Recognition (FG 2018)</source>(<publisher-loc>Xi&#x00027;an: IEEE</publisher-loc>), <fpage>59</fpage>&#x02013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.1109/FG.2018.00019</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Budson</surname> <given-names>A. E.</given-names></name> <name><surname>Kowall</surname> <given-names>N. W.</given-names></name></person-group> (<year>2011</year>). <source>The Handbook of Alzheimer&#x00027;s Disease and Other Dementias</source>, Vol. <volume>7</volume>. <publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>John Wiley &#x00026; Sons</publisher-name>. <pub-id pub-id-type="doi">10.1002/9781444344110</pub-id><pub-id pub-id-type="pmid">25855820</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cao</surname> <given-names>Z.</given-names></name> <name><surname>Hidalgo Martinez</surname> <given-names>G.</given-names></name> <name><surname>Simon</surname> <given-names>T.</given-names></name> <name><surname>Wei</surname> <given-names>S.</given-names></name> <name><surname>Sheikh</surname> <given-names>Y. A.</given-names></name></person-group> (<year>2019</year>). <article-title>Openpose: realtime multi-person 2d pose estimation using part affinity fields</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>43</volume>, <fpage>172</fpage>&#x02013;<lpage>186</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2019.2929257</pub-id><pub-id pub-id-type="pmid">31331883</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clare</surname> <given-names>L.</given-names></name> <name><surname>Woods</surname> <given-names>R. T.</given-names></name></person-group> (<year>2004</year>). <article-title>Cognitive training and cognitive rehabilitation for people with early-stage Alzheimer&#x00027;s disease: a review</article-title>. <source>Neuropsychol. Rehabil</source>. <volume>14</volume>, <fpage>385</fpage>&#x02013;<lpage>401</lpage>. <pub-id pub-id-type="doi">10.1080/09602010443000074</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen-Mansfield</surname> <given-names>J</given-names></name></person-group> (<year>2018</year>). <article-title>Do reports on personal preferences of persons with dementia predict their responses to group activities?</article-title> <source>Dement. Geriatr. Cogn. Disord</source>. <volume>46</volume>, <fpage>100</fpage>&#x02013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1159/000491746</pub-id><pub-id pub-id-type="pmid">30145591</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen-Mansfield</surname> <given-names>J.</given-names></name> <name><surname>Dakheel-Ali</surname> <given-names>M.</given-names></name> <name><surname>Marx</surname> <given-names>M. S.</given-names></name></person-group> (<year>2009</year>). <article-title>Engagement in persons with dementia: the concept and its measurement</article-title>. <source>Am. J. Geriatr. Psychiatry</source> <volume>17</volume>, <fpage>299</fpage>&#x02013;<lpage>307</lpage>. <pub-id pub-id-type="doi">10.1097/JGP.0b013e31818f3a52</pub-id><pub-id pub-id-type="pmid">28214783</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen-Mansfield</surname> <given-names>J.</given-names></name> <name><surname>Marx</surname> <given-names>M. S.</given-names></name> <name><surname>Thein</surname> <given-names>K.</given-names></name> <name><surname>Dakheel-Ali</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>The impact of stimuli on affect in persons with dementia</article-title>. <source>J. Clin. Psychiatry</source> <volume>72</volume>:<fpage>480</fpage>. <pub-id pub-id-type="doi">10.4088/JCP.09m05694oli</pub-id><pub-id pub-id-type="pmid">21527124</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Defossez</surname> <given-names>A.</given-names></name> <name><surname>Synnaeve</surname> <given-names>G.</given-names></name> <name><surname>Adi</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Real time speech enhancement in the waveform domain,</article-title> in <source>Interspeech</source> (<publisher-loc>Shanghai</publisher-loc>). <pub-id pub-id-type="doi">10.21437/Interspeech.2020-2409</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ekman</surname> <given-names>P.</given-names></name> <name><surname>Friesen</surname> <given-names>W. V.</given-names></name> <name><surname>Hager</surname> <given-names>J. C.</given-names></name></person-group> (<year>2002</year>). <source>Facial Action Coding System (FACS), 2nd Edn</source>. <publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>Research Nexus Division of Network Information Research Corporation</publisher-name>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eyben</surname> <given-names>F.</given-names></name> <name><surname>Weninger</surname> <given-names>F.</given-names></name> <name><surname>Gro&#x000DF;</surname> <given-names>F.</given-names></name> <name><surname>Schuller</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Recent developments in opensmile, the Munich open-source multimedia feature extractor,</article-title> in <source>MM &#x00027;13: Proceedings of the 21st ACM International Conference on Multimedia</source> (<publisher-loc>Barcelona</publisher-loc>). <pub-id pub-id-type="doi">10.1145/2502081.2502224</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eyben</surname> <given-names>F.</given-names></name> <name><surname>W&#x000F6;llmer</surname> <given-names>M.</given-names></name> <name><surname>Schuller</surname> <given-names>B.</given-names></name></person-group> (<year>2010</year>). <article-title>Opensmile: the Munich versatile and fast open-source audio feature extractor,</article-title> in <source>Proceedings of the 18th ACM International Conference on Multimedia, MM &#x00027;10</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>1459</fpage>&#x02013;<lpage>1462</lpage>. <pub-id pub-id-type="doi">10.1145/1873951.1874246</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fridlund</surname> <given-names>A. J</given-names></name></person-group> (<year>1991</year>). <article-title>Sociality of solitary smiling: potentiation by an implicit audience</article-title>. <source>J. Pers. Soc. Psychol</source>. <volume>60</volume>, <fpage>229</fpage>&#x02013;<lpage>240</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.60.2.229</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hollenstein</surname> <given-names>T.</given-names></name> <name><surname>Lanteigne</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>Models and methods of emotional concordance</article-title>. <source>Biol. Psychol</source>. <volume>98</volume>, <fpage>1</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1016/j.biopsycho.2013.12.012</pub-id><pub-id pub-id-type="pmid">24394718</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horley</surname> <given-names>K.</given-names></name> <name><surname>Reid</surname> <given-names>A.</given-names></name> <name><surname>Burnham</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <article-title>Emotional prosody perception and production in dementia of the Alzheimer&#x00027;s type</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>53</volume>, <fpage>1132</fpage>&#x02013;<lpage>1146</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2010/09-0030)</pub-id><pub-id pub-id-type="pmid">20643797</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>C.</given-names></name> <name><surname>Sung</surname> <given-names>B.</given-names></name> <name><surname>Moyle</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>Assessing engagement in people with dementia: a new approach to assessment using video analysis</article-title>. <source>Arch. Psychiatr. Nurs</source>. <volume>29</volume>, <fpage>377</fpage>&#x02013;<lpage>382</lpage>. <pub-id pub-id-type="doi">10.1016/j.apnu.2015.06.019</pub-id><pub-id pub-id-type="pmid">26577550</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kappas</surname> <given-names>A.</given-names></name> <name><surname>Krumhuber</surname> <given-names>E.</given-names></name> <name><surname>K&#x000FC;ster</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>Facial behavior,</article-title> in <source>Nonverbal Communication</source>, eds J. A. Hall and M. L. Knapp (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Mouton de Gruyter</publisher-name>), <fpage>131</fpage>&#x02013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1515/9783110238150.131</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kuhn</surname> <given-names>M.</given-names></name> <name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <source>Applied Predictive Modeling</source>, Vol. <volume>26</volume>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-1-4614-6849-3</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumfor</surname> <given-names>F.</given-names></name> <name><surname>Piguet</surname> <given-names>O.</given-names></name></person-group> (<year>2012</year>). <article-title>Disturbance of emotion processing in frontotemporal dementia: a synthesis of cognitive and neuroimaging findings</article-title>. <source>Neuropsychol. Rev</source>. <volume>22</volume>, <fpage>280</fpage>&#x02013;<lpage>297</lpage>. <pub-id pub-id-type="doi">10.1007/s11065-012-9201-6</pub-id><pub-id pub-id-type="pmid">22577002</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>K. H.</given-names></name> <name><surname>Boltz</surname> <given-names>M.</given-names></name> <name><surname>Lee</surname> <given-names>H.</given-names></name> <name><surname>Algase</surname> <given-names>D. L.</given-names></name></person-group> (<year>2017</year>). <article-title>Does social interaction matter psychological well-being in persons with dementia?</article-title> <source>Am. J. Alzheimers Dis. Other Dement</source>. <volume>32</volume>, <fpage>207</fpage>&#x02013;<lpage>212</lpage>. <pub-id pub-id-type="doi">10.1177/1533317517704301</pub-id><pub-id pub-id-type="pmid">28417644</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Linardatos</surname> <given-names>P.</given-names></name> <name><surname>Papastefanopoulos</surname> <given-names>V.</given-names></name> <name><surname>Kotsiantis</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Explainable AI: a review of machine learning interpretability methods</article-title>. <source>Entropy</source> <volume>23</volume>:<fpage>18</fpage>. <pub-id pub-id-type="doi">10.3390/e23010018</pub-id><pub-id pub-id-type="pmid">33375658</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magai</surname> <given-names>C.</given-names></name> <name><surname>Cohen</surname> <given-names>C.</given-names></name> <name><surname>Gomberg</surname> <given-names>D.</given-names></name> <name><surname>Malatesta</surname> <given-names>C.</given-names></name> <name><surname>Culver</surname> <given-names>C.</given-names></name></person-group> (<year>1996</year>). <article-title>Emotional expression during mid- to late-stage dementia</article-title>. <source>Int. Psychogeriatr</source>. <volume>8</volume>, <fpage>383</fpage>&#x02013;<lpage>395</lpage>. <pub-id pub-id-type="doi">10.1017/S104161029600275X</pub-id><pub-id pub-id-type="pmid">9116175</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manera</surname> <given-names>V.</given-names></name> <name><surname>Petit</surname> <given-names>P.-D.</given-names></name> <name><surname>Derreumaux</surname> <given-names>A.</given-names></name> <name><surname>Orvieto</surname> <given-names>I.</given-names></name> <name><surname>Romagnoli</surname> <given-names>M.</given-names></name> <name><surname>Lyttle</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>&#x0201C;Kitchen and cooking,&#x0201D; a serious game for mild cognitive impairment and Alzheimer&#x00027;s disease: a pilot study</article-title>. <source>Front. Aging Neurosci</source>. <volume>7</volume>:<fpage>24</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2015.00024</pub-id><pub-id pub-id-type="pmid">25852542</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Masip</surname> <given-names>D.</given-names></name> <name><surname>North</surname> <given-names>M. S.</given-names></name> <name><surname>Todorov</surname> <given-names>A.</given-names></name> <name><surname>Osherson</surname> <given-names>D. N.</given-names></name></person-group> (<year>2014</year>). <article-title>Automated prediction of preferences using facial expressions</article-title>. <source>PLoS ONE</source> <volume>9</volume>:<fpage>e87434</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0087434</pub-id><pub-id pub-id-type="pmid">24503553</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mauss</surname> <given-names>I. B.</given-names></name> <name><surname>Robinson</surname> <given-names>M. D.</given-names></name></person-group> (<year>2009</year>). <article-title>Measures of emotion: a review</article-title>. <source>Cogn. Emot</source>. <volume>23</volume>, <fpage>209</fpage>&#x02013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1080/02699930802204677</pub-id><pub-id pub-id-type="pmid">19809584</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mograbi</surname> <given-names>D. C.</given-names></name> <name><surname>Brown</surname> <given-names>R. G.</given-names></name> <name><surname>Morris</surname> <given-names>R. G.</given-names></name></person-group> (<year>2012</year>). <article-title>Emotional reactivity to film material in Alzheimer&#x00027;s disease</article-title>. <source>Dement. Geriatr. Cogn. Disord</source>. <volume>34</volume>, <fpage>351</fpage>&#x02013;<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1159/000343930</pub-id><pub-id pub-id-type="pmid">23222153</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Molnar</surname> <given-names>C</given-names></name></person-group> (<year>2020</year>). <source>Interpretable Machine Learning</source>. <publisher-loc>Morrisville</publisher-loc>: <publisher-name>lulu.com</publisher-name>.</citation>
</ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Nazareth</surname> <given-names>D. S</given-names></name></person-group> (<year>2019</year>). <article-title>Emotion recognition in dementia: advancing technology for multimodal analysis of emotion expression in everyday life,</article-title> in <source>2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)</source> (<publisher-loc>Cambridge</publisher-loc>), <fpage>45</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1109/ACIIW.2019.8925059</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olsen</surname> <given-names>R. V.</given-names></name> <name><surname>Hutchings</surname> <given-names>B. L.</given-names></name> <name><surname>Ehrenkrantz</surname> <given-names>E.</given-names></name></person-group> (<year>2000</year>). <article-title>&#x0201C;Media memory lane&#x0201D; interventions in an Alzheimer&#x00027;s day care center</article-title>. <source>Am. J. Alzheimers Dis</source>. <volume>15</volume>, <fpage>163</fpage>&#x02013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1177/153331750001500307</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pantic</surname> <given-names>M.</given-names></name> <name><surname>Sebe</surname> <given-names>N.</given-names></name> <name><surname>Cohn</surname> <given-names>J. F.</given-names></name> <name><surname>Huang</surname> <given-names>T.</given-names></name></person-group> (<year>2005</year>). <article-title>Affective multimodal human-computer interaction,</article-title> in <source>Proceedings of the 13th Annual ACM International Conference on Multimedia, MULTIMEDIA &#x00027;05</source> (<publisher-loc>Singapore</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>669</fpage>&#x02013;<lpage>676</lpage>. <pub-id pub-id-type="doi">10.1145/1101149.1101299</pub-id><pub-id pub-id-type="pmid">32316626</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Parkhi</surname> <given-names>O. M.</given-names></name> <name><surname>Vedaldi</surname> <given-names>A.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep face recognition,</article-title> in <source>British Machine Vision Conference</source> (<publisher-loc>Swansea</publisher-loc>). <pub-id pub-id-type="doi">10.5244/C.29.41</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perugia</surname> <given-names>G.</given-names></name> <name><surname>Diaz-Boladeras</surname> <given-names>M.</given-names></name> <name><surname>Catala</surname> <given-names>A.</given-names></name> <name><surname>Barakova</surname> <given-names>E. I.</given-names></name> <name><surname>Rauterberg</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>ENGAGE-DEM: a model of engagement of people with dementia</article-title>. <source>IEEE Trans. Affect. Comput</source>. <volume>1</volume>. <pub-id pub-id-type="doi">10.1109/TAFFC.2020.2980275</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riley</surname> <given-names>P.</given-names></name> <name><surname>Alm</surname> <given-names>N.</given-names></name> <name><surname>Newell</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <article-title>An interactive tool to promote musical creativity in people with dementia</article-title>. <source>Comput. Hum. Behav</source>. <volume>25</volume>, <fpage>599</fpage>&#x02013;<lpage>608</lpage>. <pub-id pub-id-type="doi">10.1016/j.chb.2008.08.014</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roberts</surname> <given-names>V. J.</given-names></name> <name><surname>Ingram</surname> <given-names>S. M.</given-names></name> <name><surname>Lamar</surname> <given-names>M.</given-names></name> <name><surname>Green</surname> <given-names>R. C.</given-names></name></person-group> (<year>1996</year>). <article-title>Prosody impairment and associated affective and behavioral disturbances in Alzheimer&#x00027;s disease</article-title>. <source>Neurology</source> <volume>47</volume>, <fpage>1482</fpage>&#x02013;<lpage>1488</lpage>. <pub-id pub-id-type="doi">10.1212/WNL.47.6.1482</pub-id><pub-id pub-id-type="pmid">8960731</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schreiner</surname> <given-names>A. S.</given-names></name> <name><surname>Yamamoto</surname> <given-names>E.</given-names></name> <name><surname>Shiotani</surname> <given-names>H.</given-names></name></person-group> (<year>2005</year>). <article-title>Positive affect among nursing home residents with Alzheimer&#x00027;s dementia: the effect of recreational activity</article-title>. <source>Aging Mental Health</source> <volume>9</volume>, <fpage>129</fpage>&#x02013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1080/13607860412331336841</pub-id><pub-id pub-id-type="pmid">15804629</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schuller</surname> <given-names>B. W</given-names></name></person-group> (<year>2018</year>). <article-title>Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends</article-title>. <source>Commun. ACM</source> <volume>61</volume>, <fpage>90</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1145/3129340</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>T.</given-names></name> <name><surname>Putze</surname> <given-names>F.</given-names></name> <name><surname>Schulze</surname> <given-names>T.</given-names></name> <name><surname>Steinert</surname> <given-names>L.</given-names></name> <name><surname>Mikut</surname> <given-names>R.</given-names></name> <name><surname>Doneit</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2018</year>). <source>I-CARE - Ein Mensch-Technik Interaktionssystem zur Individuellen Aktivierung von Menschen mit Demenz</source> (<publisher-loc>Oldenburg</publisher-loc>).</citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>T.</given-names></name> <name><surname>Putze</surname> <given-names>F.</given-names></name> <name><surname>Steinert</surname> <given-names>L.</given-names></name> <name><surname>Mikut</surname> <given-names>R.</given-names></name> <name><surname>Depner</surname> <given-names>A.</given-names></name> <name><surname>Kruse</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>I-CARE-an interaction system for the individual activation of people with dementia</article-title>. <source>Geriatrics</source> <volume>6</volume>:<fpage>51</fpage>. <pub-id pub-id-type="doi">10.3390/geriatrics6020051</pub-id><pub-id pub-id-type="pmid">34068284</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>K. L.</given-names></name> <name><surname>Crete-Nishihata</surname> <given-names>M.</given-names></name> <name><surname>Damianakis</surname> <given-names>T.</given-names></name> <name><surname>Baecker</surname> <given-names>R. M.</given-names></name> <name><surname>Marziali</surname> <given-names>E.</given-names></name></person-group> (<year>2009</year>). <article-title>Multimedia biographies: a reminiscence and social stimulus tool for persons with cognitive impairment</article-title>. <source>J. Technol. Hum. Serv</source>. <volume>27</volume>, <fpage>287</fpage>&#x02013;<lpage>306</lpage>. <pub-id pub-id-type="doi">10.1080/15228830903329831</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spector</surname> <given-names>A.</given-names></name> <name><surname>Thorgrimsen</surname> <given-names>L.</given-names></name> <name><surname>Woods</surname> <given-names>B.</given-names></name> <name><surname>Royan</surname> <given-names>L.</given-names></name> <name><surname>Davies</surname> <given-names>S.</given-names></name> <name><surname>Butterworth</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Efficacy of an evidence-based cognitive stimulation therapy programme for people with dementia: randomised controlled trial</article-title>. <source>Brit. J. Psychiatry</source> <volume>183</volume>, <fpage>248</fpage>&#x02013;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1192/bjp.183.3.248</pub-id><pub-id pub-id-type="pmid">12948999</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spiro</surname> <given-names>N</given-names></name></person-group> (<year>2010</year>). <article-title>Music and dementia: observing effects and searching for underlying theories</article-title>. <source>Aging Ment. Health</source> <volume>14</volume>, :<fpage>891</fpage>&#x02013;<lpage>899</lpage>. <pub-id pub-id-type="doi">10.1080/13607863.2010.519328</pub-id><pub-id pub-id-type="pmid">21069595</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Steinert</surname> <given-names>L.</given-names></name> <name><surname>Putze</surname> <given-names>F.</given-names></name> <name><surname>K&#x000FC;ster</surname> <given-names>D.</given-names></name> <name><surname>Schultz</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>Towards engagement recognition of people with dementia in care settings,</article-title> in <source>Proceedings of the 2020 International Conference on Multimodal Interaction</source> (<publisher-loc>Virtual Event</publisher-loc>), <fpage>558</fpage>&#x02013;<lpage>565</lpage>. <pub-id pub-id-type="doi">10.1145/3382507.3418856</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Steinert</surname> <given-names>L.</given-names></name> <name><surname>Putze</surname> <given-names>F.</given-names></name> <name><surname>Kuster</surname> <given-names>D.</given-names></name> <name><surname>Schultz</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Audio-visual recognition of emotional engagement of people with dementia,</article-title> in <source>Proc. Interspeech 2021</source> (<publisher-loc>Brno</publisher-loc>), <fpage>1024</fpage>&#x02013;<lpage>1028</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2021-567</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tkal&#x0010D;i&#x0010D;</surname> <given-names>M.</given-names></name> <name><surname>Maleki</surname> <given-names>N.</given-names></name> <name><surname>Pesek</surname> <given-names>M.</given-names></name> <name><surname>Elahi</surname> <given-names>M.</given-names></name> <name><surname>Ricci</surname> <given-names>F.</given-names></name> <name><surname>Marolt</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Prediction of music pairwise preferences from facial expressions,</article-title> in <source>Proceedings of the 24th International Conference on Intelligent User Interfaces, IUI &#x00027;19</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>150</fpage>&#x02013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1145/3301275.3302266</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="web"><person-group person-group-type="author"><collab>WHO</collab></person-group> (<year>2017</year>). <source>Dementia</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.who.int/news-room/fact-sheets/detail/dementia">https://www.who.int/news-room/fact-sheets/detail/dementia</ext-link> (accessed August 5, 2021).</citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Woods</surname> <given-names>B.</given-names></name> <name><surname>Aguirre</surname> <given-names>E.</given-names></name> <name><surname>Spector</surname> <given-names>A. E.</given-names></name> <name><surname>Orrell</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Cognitive stimulation to improve cognitive functioning in people with dementia</article-title>. <source>Cochrane Database Syst. Rev</source>. <volume>2</volume>:<fpage>CD005562</fpage>. <pub-id pub-id-type="doi">10.1002/14651858.CD005562.pub2</pub-id><pub-id pub-id-type="pmid">22336813</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>The VC-IOE further suggests collective engagement as a dimension which is defined as &#x0201C;Encouraging others to interact with STIMULUS. Introducing STIMULUS to others.&#x0201D; (Jones et al., <xref ref-type="bibr" rid="B19">2015</xref>). We interpreted &#x0201C;others&#x0201D; as third persons who did not originally take part in the session. As collective engagement was not apparent in this dataset, we dismissed this dimension.</p></fn>
<fn id="fn0002"><p><sup>2</sup>AU01, AU02, AU04, AU05, AU06, AU07, AU09, AU10, AU12, AU14, AU15, AU17, AU20, AU23, AU25, AU26, AU45. For AU28, OpenFace only provides information about whether the AU is present.</p></fn>
</fn-group>
</back>
</article>