<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Aging Neurosci.</journal-id>
<journal-title>Frontiers in Aging Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Aging Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1663-4365</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnagi.2023.1122799</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Aging Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Learning implicit sentiments in Alzheimer&#x00027;s disease recognition with contextual attention features</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes">
<name><surname>Liu</surname> <given-names>Ning</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1066710/overview"/>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<name><surname>Yuan</surname> <given-names>Zhenming</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x02020;</sup></xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<name><surname>Chen</surname> <given-names>Yan</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1693967/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Chuan</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Lingxing</given-names></name>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/703125/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Science/School of Big Data Science, Zhejiang University of Science and Technology</institution>, <addr-line>Hangzhou</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>School of Information Science and Technology, Hangzhou Normal University, Hangzhou</institution>, <addr-line>Zhejiang</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University</institution>, <addr-line>Hangzhou</addr-line>, <country>China</country></aff>
<aff id="aff4"><sup>4</sup><institution>School of Mathematics and Computer Science, Quanzhou Normal University, Quanzhou</institution>, <addr-line>Fujian</addr-line>, <country>China</country></aff>
<aff id="aff5"><sup>5</sup><institution>Department of Neurology, Second Affiliated Hospital of Fujian Medical University, Quanzhou</institution>, <addr-line>Fujian</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Saturnino Luz, University of Edinburgh, United Kingdom</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Fasih Haider, University of Edinburgh, United Kingdom; Jianping Qiao, Shandong Normal University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Lingxing Wang <email>lxing502&#x00040;fjmu.edu.cn</email></corresp>
<fn fn-type="equal" id="fn001"><p>&#x02020;These authors have contributed equally to this work</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>17</day>
<month>05</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>15</volume>
<elocation-id>1122799</elocation-id>
<history>
<date date-type="received">
<day>13</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>05</day>
<month>04</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Liu, Yuan, Chen, Liu and Wang.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Liu, Yuan, Chen, Liu and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>Alzheimer&#x00027;s disease (AD) is difficult to diagnose on the basis of language because of the implicit emotion of transcripts, which is defined as a supervised fuzzy implicit emotion classification at the document level. Recent neural network-based approaches have not paid attention to the implicit sentiments entailed in AD transcripts.</p>
</sec>
<sec>
<title>Method</title>
<p>A two-level attention mechanism is proposed to detect deep semantic information toward words and sentences, which enables it to attend to more words and fewer sentences differentially when constructing document representation. Specifically, a document vector was built by progressively aggregating important words into sentence vectors and important sentences into document vectors.</p>
</sec>
<sec>
<title>Results</title>
<p>Experimental results showed that our method achieved the best accuracy of 91.6% on annotated public Pitt corpora, which validates its effectiveness in learning implicit sentiment representation for our model.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The proposed model can qualitatively select informative words and sentences using attention layers, and this method also provides good inspiration for AD diagnosis based on implicit sentiment transcripts.</p>
</sec></abstract>
<kwd-group>
<kwd>Alzheimer&#x00027;s disease</kwd>
<kwd>attention</kwd>
<kwd>deep learning</kwd>
<kwd>feature extraction</kwd>
<kwd>machine learning</kwd>
</kwd-group>
<contract-sponsor id="cn001">Natural Science Foundation of Fujian Province<named-content content-type="fundref-id">10.13039/501100003392</named-content></contract-sponsor>
<counts>
<fig-count count="5"/>
<table-count count="5"/>
<equation-count count="18"/>
<ref-count count="66"/>
<page-count count="12"/>
<word-count count="8091"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Alzheimer&#x00027;s Disease and Related Dementias</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction</title>
<p>Alzheimer&#x00027;s disease (AD) is a progressive degeneration of the brain and is irreversible (Mattson, <xref ref-type="bibr" rid="B33">2004</xref>), and early diagnosis and intervention are essential as there is currently no optimal method to cure AD. A previous study (Mueller et al., <xref ref-type="bibr" rid="B37">2018</xref>) showed that the first sign of the disease is the deterioration of language; therefore, early diagnosis based on language has gradually become a research hotspot. With the development of artificial intelligence (AI), natural language processing (NLP), and machine learning technology, diagnosing AD through these new technologies is possible, and AI technology based on language may be used as a preliminary diagnosis tool for people with cognitive impairment, which is indeed a text classification problem in the NLP area.</p>
<p>Emotion recognition (text classification) can be classified into three levels according to previous studies (Medhat et al., <xref ref-type="bibr" rid="B34">2014</xref>; Yadollahi et al., <xref ref-type="bibr" rid="B62">2017</xref>), namely, the aspect, sentence, and document levels (Xu et al., <xref ref-type="bibr" rid="B61">2015</xref>; Yadollahi et al., <xref ref-type="bibr" rid="B62">2017</xref>), as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. Meanwhile, texts at the document level can be classified as explicit or implicit emotions. Explicit sentiment refers to the obvious emotional words used to express sentiment polarity, and the classification model can extract these key emotional words and provide a large weight to perform the classification task accurately. Unlike explicit expressions, implicit sentiment analysis indicates that the sentences have no obvious emotional words but can still convey a clear sentiment polarity in the context (Russo et al., <xref ref-type="bibr" rid="B46">2015</xref>). The model cannot extract these important emotional words for text classification correctly, which may lead to worse classification performance.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Classification of emotional recognition (blue is the character of the transcripts in this study).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnagi-15-1122799-g0001.tif"/>
</fig>
<p>Reviews of explicit and implicit sentiments are presented in <xref ref-type="table" rid="T1">Table 1</xref>. In explicit expression, words such as &#x0201C;lovely&#x0201D;, &#x0201C;beautiful&#x0201D;, &#x0201C;bad&#x0201D;, and &#x0201C;like&#x0201D; have an obvious feeling tendency that can be captured toward a particular aspect by the classification model. Implicit sentiments may express emotions that cannot be easily found, such as irony, anger, and depression. According to a previous study (Xu et al., <xref ref-type="bibr" rid="B61">2015</xref>), approximately 30% of reviews contain implicit aspects of emotional classification. For example, the sentence &#x0201C;We cannot bite the dog anymore when bitten by a mad dog&#x0201D; obviously expresses a sense of irony and negativity. &#x0201C;Sales of your company in a year cannot match us for a month&#x0201D; also expresses a negative meaning that indicates a poor sale. &#x0201C;The waiter poured water over me and walked away&#x0201D; means poor service, and although it contains no opinion words, it can be clearly interpreted as negative. These sentences must extract deep semantic information to be correctly classified. However, the text in this study is clearly different from explicit and implicit expressions as it does not have any emotional words or tendencies. An example of our transcripts is presented below.</p>
<disp-quote><p><italic>The scene is in the in the kitchen. The mother is wiping dishes and the water is running on the floor, a child is trying to get a boy is trying to get cookies outta out a jar and he&#x00027;s about to tip over on a stool. The little girl is reacting to his falling, it seems to be summer out, the window is open</italic>.</p></disp-quote>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Reviews containing explicit and implicit sentiments.</p></caption>
<table frame="box" rules="all">
<tbody>
<tr>
<td valign="top" align="left" rowspan="3"><bold>Explicit expression</bold></td>
<td valign="top" align="left">What a <bold>lovely</bold> girl!</td>
</tr>
<tr>
<td valign="top" align="left">It&#x00027;s a <bold>beautiful</bold> day. I <bold>like</bold> it.</td>
</tr>
<tr>
<td valign="top" align="left">The service of this hotel is <bold>bad</bold>, I must complain.</td>
</tr>
<tr>
<td valign="top" align="left" rowspan="3"><bold>Implicit expression</bold></td>
<td valign="top" align="left">Yeah, we can&#x00027;t bite the dog anymore when bitten by a mad dog.</td>
</tr>
<tr>
<td valign="top" align="left">Sales of your company in a year cannot match us for a month.</td>
</tr>
<tr>
<td valign="top" align="left">The waiter poured water over me and walked away</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>Bold means obvious emotional words.</p>
</table-wrap-foot>
</table-wrap>
<p>The text above is an example of our dataset that has no emotional words and only a description of a picture. The famous Boston Diagnostic Aphasia Examination (Chen et al., <xref ref-type="bibr" rid="B6">2019</xref>) was used for AD diagnosis. However, our text is an implicit expression and cannot convey a clear sentiment polarity in the context. In addition, humans cannot even judge emotional polarity from the text. Thus, texts with these characteristics are called &#x0201C;fuzzy emotions&#x0201D;. Though an implicit expression in the text, humans can judge the emotional polarity of the text, which is called &#x0201C;obvious emotion&#x0201D; in the implicit document. Fuzzy emotional document classification includes unsupervised, supervised, and semi-supervised methods. In this study, transcripts from voice recordings for AD diagnosis were supervised by the fuzzy implicit emotion classification at the document level. Sentiment analysis classification is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<p>For the classification of implicit transcripts with a long document in this study, the text lacks emotional words and context-dependent features. Compared with the explicit classification task, it is more difficult to perform classification tasks for fuzzy implicit text because it lacks obvious emotional words and polarity, and a deep-learning model cannot extract effective features from the transcripts, although extracting the features of fuzzy implicit documents is essential for AD diagnosis. In this study, a classification model combining the attention mechanism of words and sentence levels was designed in view of the dependence of implicit expression in contextual content. Not all words and sentences in the text are equally relevant to the final classification, and previous deep learning models paid little attention to words and sentences with different levels of importance for the classification correctly. Specifically, the bidirectional gated recurrent unit (GRU) was used to obtain vectors from the transcript, and an attention mechanism based on word and sentence levels was used to extract deep semantic features for better representation. Experiments showed that the accuracy on public Pitt datasets with five-cross validation was 91.6%, which is a competitive performance compared with other similar studies.</p>
</sec>
<sec id="s2">
<title>2. Related work</title>
<sec>
<title>2.1. Implicit sentiment classification</title>
<p>Many studies have mentioned the presence of implicit sentiments in text classification. For example, Toprak et al. (<xref ref-type="bibr" rid="B53">2010</xref>) and Russo et al. (<xref ref-type="bibr" rid="B46">2015</xref>) proposed implicit polarity (polar facts) and provided a corpus with an implicit sentiment. Choi and Wiebe (<xref ref-type="bibr" rid="B9">2014</xref>) proposed a &#x0002B;/- EffectWordNet lexicon to recognize implicit sentiment, assuming that sentiment analysis was related to states and events which had a positive or negative effect on the entity. Deng and Wiebe (<xref ref-type="bibr" rid="B12">2014</xref>) detected implicit sentiment via inference over explicit expressions and the so-called goodFor/badFor events. Memory networks (Tang et al., <xref ref-type="bibr" rid="B51">2016</xref>; Chen et al., <xref ref-type="bibr" rid="B7">2017</xref>; Wang et al., <xref ref-type="bibr" rid="B58">2018</xref>), graph neural networks (Sun et al., <xref ref-type="bibr" rid="B50">2019</xref>; Zhang et al., <xref ref-type="bibr" rid="B66">2019</xref>; Wang et al., <xref ref-type="bibr" rid="B57">2020</xref>), and pretrained knowledge (Xu et al., <xref ref-type="bibr" rid="B60">2019</xref>; Rietzler et al., <xref ref-type="bibr" rid="B42">2020</xref>; Dai et al., <xref ref-type="bibr" rid="B11">2021</xref>) were all used to capture aspect-related information from the text. Meanwhile, some studies used the attention mechanism, which was first proposed by Bahdanau et al. (<xref ref-type="bibr" rid="B2">2014</xref>) for machine translation, to extract implicit sentiment. It usually has better performance as it can extract the importance of different parts in texts. For example, a study by He et al. (<xref ref-type="bibr" rid="B17">2018</xref>) used syntax information from a dependency tree to enhance the attention-based model. The studies by Toprak et al. (<xref ref-type="bibr" rid="B53">2010</xref>) and Zehra et al. (<xref ref-type="bibr" rid="B65">2021</xref>) used different attention mechanisms to identify aspect-related contexts. In the study by He et al. (<xref ref-type="bibr" rid="B17">2018</xref>), two methods were proposed to improve attention effectiveness. First, they introduced an attention model that incorporates syntactic contents into the attention mechanism. Second, they proposed a method for target representation that could better capture the semantic meaning of the opinion target. In a study by Tang et al. (<xref ref-type="bibr" rid="B52">2020</xref>), a dependency graph enhanced a dual-transformer network with a dual-transformer structure to support the reinforcement of graph-based representation learning. Ma et al. (<xref ref-type="bibr" rid="B30">2017</xref>) proposed an interactive attention network to learn the relationship between contexts and targets, which is mainly based on the concept that both contexts and targets should be treated specifically. Wang et al. (<xref ref-type="bibr" rid="B59">2016</xref>) proposed an attention-based long short-term memory (LSTM) network for aspect-level text classification and obtained state-of-the-art performance on SemEval 2014 datasets. However, these studies are all implicit classifications with obvious emotions, and to the best of our knowledge, there are no studies of fuzzy implicit emotion classification other than those in the AD diagnosis area.</p>
</sec>
<sec>
<title>2.2. AD diagnosis based on acoustic and its transcripts</title>
<p>There are three main methods to recognize AD and MCI from normal control (NC) in this area. The first method uses traditional machine learning methods in combination with manual feature extraction, which needs professional knowledge to extract effective features. Although the explanation of this method is better, the performance is just maybe passable. The second approach uses deep learning models to recognize AD and MCI, the performance of which is usually better than the first method. However, the interpretability is not better as deep learning is a &#x0201C;black&#x0201D; box and it is difficult to understand the meaning of the features extracted automatically. The third approach is a combination of the first two methods and may further improve the performance of deep learning. It highlights the important linguistic or phonetic features in participant language description tasks, which may have a significant guide for AD clinical diagnosis.</p>
<p>The first method uses manual conventional, phonetic, and linguistic feature extraction as key factors. For example, the study by Luz S. (<xref ref-type="bibr" rid="B26">2017</xref>), to the best of our knowledge, was the first to employ speech datasets exclusively for analysis without transcripts, extract low-level acoustic features, such as speech rate, vocalization events, and the number of utterances, use Bayesian classifiers to train on low speech datasets extracted from the recordings, and achieve 68% accuracy in classifying AD and elderly controls. Fraser et al. (<xref ref-type="bibr" rid="B14">2016</xref>) extracted 42 mel-frequency cepstral coefficient (MFCC) features (Chen et al., <xref ref-type="bibr" rid="B5">2014</xref>) from Pitt datasets and is the first study to carry out an acoustic-prosodic analysis. Another study by Roark et al. (<xref ref-type="bibr" rid="B44">2011</xref>) employed automatic speech recognition (ASR) and natural language processing (NLP) to classify MCI and healthy participants; the extracted features included pause frequency and duration. Finally, the SVM classifier obtained the best AUC of 0.861 by combining linguistic features, automated speech, and cognitive test scores. Jarrold et al. (<xref ref-type="bibr" rid="B20">2014</xref>) extracted 41 features, including the mean and standard deviation of the duration of pauses, speech rate, and consonants and vowels. The datasets included nine AD patients, 13 semantic dementia patients, nine healthy controls, nine frontotemporal dementia patients, and eight progressive nonfluent aphasia patients. Zehra et al. (<xref ref-type="bibr" rid="B65">2021</xref>) extracted speech rate (Luz, <xref ref-type="bibr" rid="B27">2013</xref>) and graph-based features by encoding patterns from Carolina Conversations Collection (Pope and Davis, <xref ref-type="bibr" rid="B40">2011</xref>) and used the logistic regression classifier to obtain an accuracy of 85% when distinguishing AD from non-AD participants. Toth et al. (<xref ref-type="bibr" rid="B54">2018</xref>) found that a pause could not be detected reliably by human annotators, whereas using an ASR system improved the effectiveness. They analyzed the speech of 48 MCI and 38 healthy controls and extracted acoustic features such as the length of utterance, hesitation ratio, filled pauses, and speech tempo. Finally, ASR-extracted features in combination with a Random forest classifier manifested the best results (75% accuracy). For example, Antonsson et al. (<xref ref-type="bibr" rid="B1">2021</xref>) quantitatively measured the semantic ability, used the Support Vector Machine (SVM) classifier to recognize AD, and finally obtained the best area under the curve (AUC) of 0.93. Clarke et al. (<xref ref-type="bibr" rid="B10">2013</xref>) measured 286 linguistic features to train the SVM classifier, and the final accuracy obtained was 50&#x02013;78% for MCI vs. HC, 59&#x02013;90% for AD vs. HC, and 62&#x02013;78% for AD&#x0002B;MCI vs. HC. Meanwhile, the study found that the speech task impacts the accuracy of AD detection more than the length of the sample. R&#x00027;mani and James (<xref ref-type="bibr" rid="B43">2021</xref>) investigated the use of x-vector and i-vector methods (Snyder et al., <xref ref-type="bibr" rid="B49">2018</xref>) that were linguistic features for tackling AD detection and phonetic features devised originally for speaker identification and yielded 85.4% accuracy in AD detection with Random Forests and SVM. Shamila et al. (<xref ref-type="bibr" rid="B47">2021</xref>) used the Carolinas Conversations Collection Classification Model (Pope and Davis, <xref ref-type="bibr" rid="B40">2011</xref>), investigated conversational features such as pauses, dysfluencies, overlaps, and other elements for AD detection, and finally achieved the best accuracy of 90% in Alzheimer&#x00027;s Dementia Recognition through Spontaneous Speech (ADReSS) datasets. Zehra et al. (<xref ref-type="bibr" rid="B65">2021</xref>) developed acoustic and linguistic features by combining a regularized logistic regression classifier, achieving an accuracy of 85.4% on DementiaBank datasets.</p>
<p>Deep learning models for AD recognition by the second method include Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), LSTM, and Transformer and BERT. For instance, in the study by Fritsch et al. (<xref ref-type="bibr" rid="B15">2019</xref>), the n-gram language model was enhanced by creating a neural network language model with LSTM and finally obtained an accuracy of 85.6%. A study by Chen et al. (<xref ref-type="bibr" rid="B6">2019</xref>) proposed a network based on the attention mechanism composed of GRU and CNN modules and finally obtained a state-of-the-art accuracy of 97% in distinguishing individuals with AD from NC. Balagopalan et al. (<xref ref-type="bibr" rid="B3">2021</xref>) used a pretrained BERT model to recognize AD from NC with ADReSS datasets and achieved an accuracy of 83.33%, thus outperforming the performance of acoustic and linguistic features manually. Guo et al. (<xref ref-type="bibr" rid="B16">2021</xref>) trained a BERT model on DementiaBank and ADReSS datasets with different sizes and demonstrated that more datasets can obtain a better performance than minor datasets relatively. Meghanani et al. (<xref ref-type="bibr" rid="B35">2021</xref>) compared two approaches for AD recognization&#x02014;one method employed the fastText model and the other used the CNN model. The performance of the fastText model outperformed the CNN model and achieved the best accuracy of 83.3% in classification.</p>
<p>The third method can combine the advantage of the first two methods&#x02014;deep learning models combined with acoustic features or linguistic features can manually improve the performance of the model further. For example, the champion of the Interspeech challenge in 2020 (Yuan et al., <xref ref-type="bibr" rid="B64">2020</xref>), the world&#x00027;s premier conference on speech research, combined the Baidu ERNIE model and pause information with three different sizes (extracted with Penn Phonetics Lab Forced Aligner) and finally achieved the best accuracy of 89.6%. From this study, we can conclude that pause is an important and distinguishing feature of AD recognition. Pranav and Veeky (<xref ref-type="bibr" rid="B41">2021</xref>) employed a deep learning model in combination with the acoustic and linguistic features on ADReSS (78 AD vs. 78 HC) datasets and DementiaBank datasets, respectively. The performance of the model that combines linguistic features was better than the model that combines the acoustic features, with accuracies of 88% and 73%, respectively. This method, to the best of our knowledge, is the most promising research direction of the future.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Attention network</title>
<sec>
<title>3.1. GRU-based sequence encoder</title>
<p>GRU is a variant structure of LSTM (Hochreiter and Schmidhuber, <xref ref-type="bibr" rid="B19">1997</xref>), which can effectively solve the problem of gradient vanishing or explosion in recurrent neural networks and, thereby, preserve the remote memory ability of LSTM and simplify its structure. GRU can capture the dependence of words in sentences and hence is widely used in text classification, machine translation, and other tasks. GRU mainly includes two types of gates: the update gate and the reset gate. The update gate replaces the forget gate and the input gate in LSTM and the reset gate stores the information that may be forgotten easily.</p>
</sec>
<sec>
<title>3.2. Model structure</title>
<p>The attention mechanism (Vaswani et al., <xref ref-type="bibr" rid="B56">2017</xref>) can select the most valuable information from texts. In the field of automatic language processing, such as machine translation and text classification, it can not only improve the performance of the model but also visualize the internal valuable information of the text. For text classification, the attention mechanism highlights the importance of words and sentences in the final classification. The entire model structure includes four parts: a word encoder, word attention, sentence encoder, and sentence attention. The structure of the model is illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The model architecture of the attention network.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnagi-15-1122799-g0002.tif"/>
</fig>
<sec>
<title>3.2.1. Word encoder</title>
<p>We embedded words into vectors through an embedding matrix <italic>We</italic>, which is used to obtain the annotation by summarizing information from two directions for words; therefore, it can incorporate contextual contents. Bidirectional GRU can obtain information representation of whole sentences from two directions.</p>
<p>Suppose there are <italic>L</italic> sentences in document <italic>s</italic><sub><italic>i</italic></sub>, like [s<sub>1</sub>,s<sub>2</sub>,...,s<sub>L</sub>], the input of the model is the words in the joint set of all the sentences s<sub>i</sub> with i &#x02208; [1, L] in the transcripts. Every sentence includes <italic>T</italic><sub><italic>i</italic></sub>words; <italic>w</italic><sub><italic>it</italic></sub> is the <italic>tth</italic> word in the <italic>ith</italic> sentence. The word was mapped into vector <italic>x</italic><sub><italic>it</italic></sub> through an embedding matrix, <italic>We</italic> [Eq.(1)]. The implicit vector <italic>h</italic><sub><italic>it</italic></sub> was obtained by calculating the bidirectional GRU [Eq.(2)]. Full-text information can be fully obtained through a bidirectional calculation.</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>W</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mover class="overrightarrow"><mml:mrow><mml:mi>G</mml:mi><mml:mi>R</mml:mi><mml:mi>U</mml:mi></mml:mrow><mml:mo>&#x020D7;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mover class="overleftarrow"><mml:mrow><mml:mi>G</mml:mi><mml:mi>R</mml:mi><mml:mi>U</mml:mi></mml:mrow><mml:mo>&#x020D6;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><inline-formula><mml:math id="M11a"><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> is the final word vector that summarizes the information of the entire sentence centered on <italic>w</italic><sub><italic>it</italic></sub>. The input is the words in the joint set of all sentences s<sub>i</sub> with <italic>i</italic> &#x02208; [<italic>1, L</italic>] in the transcript, like [<italic>s</italic><sub>1</sub><italic>, s</italic><sub>2</sub><italic>,&#x02026;, s</italic><sub><italic>L</italic></sub>].</p>
</sec>
<sec>
<title>3.2.2. Word attention</title>
<p>Not all words contribute equally to the representation of a sentence. Thus, we introduce an attention mechanism to extract informative words that are important to the meaning of a sentence and integrate them into the representation of sentence vectors.</p>
<disp-formula id="E5"><label>(4)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo class="qopname">tanh</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(5)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(6)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>t</italic><sub><italic>w</italic></sub> is a high-level representation of the sentence vector and can be learned iteratively; it is initialized randomly and learned jointly during the training process. The hidden layer vector was further represented by a multilayer perceptron, that is, we obtained the representation of <italic>s</italic><sub><italic>it</italic></sub> as a hidden representation of <italic>h</italic><sub><italic>it</italic></sub>. The importance of words was measured by calculating the similarity between <italic>s</italic><sub><italic>it</italic></sub> and the context word vector <italic>t</italic><sub><italic>w</italic></sub> and then standardizing it using the softmax function to obtain a normalized weight matrix <italic>m</italic><sub><italic>it</italic></sub>; that is, we calculated the importance of the word vector <italic>s</italic><sub><italic>it</italic></sub> and obtained the important weight <italic>m</italic><sub><italic>it</italic></sub>through the softmax function. Finally, we calculated the sentence vector representation <italic>p</italic><sub><italic>i</italic></sub>as the weighted sum of words.</p>
</sec>
<sec>
<title>3.2.3. Sentence encoder</title>
<p>Similarly, we used bidirectional GRU to encode the sentence vector <italic>s</italic><sub><italic>i</italic></sub>.</p>
<disp-formula id="E8"><label>(7)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mover class="overrightarrow"><mml:mrow><mml:mi>G</mml:mi><mml:mi>R</mml:mi><mml:mi>U</mml:mi></mml:mrow><mml:mo>&#x020D7;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E9"><label>(8)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mover class="overleftarrow"><mml:mrow><mml:mi>G</mml:mi><mml:mi>R</mml:mi><mml:mi>U</mml:mi></mml:mrow><mml:mo>&#x020D6;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>h</italic><sub><italic>i</italic></sub> focuses on sentence <italic>s</italic><sub><italic>i</italic></sub> and summarizes neighboring sentences around sentence <italic>i</italic>, <inline-formula><mml:math id="M11"><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec>
<title>3.2.4. Sentence attention</title>
<p>To highlight the contribution of important sentences to the representation of a document, the importance of sentences can be measured using the attention mechanism and the sentence-level context vector <italic>s</italic><sub><italic>w</italic></sub>.</p>
<disp-formula id="E10"><label>(9)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo class="qopname">tanh</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E11"><label>(10)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E12"><label>(11)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>p</italic> is a document vector that summarizes the information of the sentences in a document. The process of sentence attention is initialized randomly and learned jointly during the entire training process.</p>
</sec>
<sec>
<title>3.2.5. Document classification</title>
<p>The document vector <italic>p</italic> is a high-level representation of the document and can be used as a feature for text classification.</p>
<disp-formula id="E13"><label>(12)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mi>p</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The loss function in this study is a negative log-likelihood of correct labels.</p>
<disp-formula id="E14"><label>(13)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo class="qopname">log</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where j is the label of document d. Finally, the output of the model is a binary classification result obtained by using the softmax function.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4. Experiments</title>
<sec>
<title>4.1. Pitt corpus</title>
<p>We performed experiments on the public Pitt Corpus of the DementiaBank (<ext-link ext-link-type="uri" xlink:href="https://sla.talkbank.org/TBB/dementia/English/Pitt">https://sla.talkbank.org/TBB/dementia/English/Pitt</ext-link>) (Becker et al., <xref ref-type="bibr" rid="B4">1994</xref>), which was gathered longitudinally on a yearly basis. The datasets consisted of radio recordings and transcripts corresponding to the ratio of spontaneous picture description tasks produced by patients with AD and cognitively normal subjects. They were required to describe the cookie theft picture (shown in <xref ref-type="fig" rid="F3">Figure 3</xref>) from the Boston Aphasia Examination (Chen et al., <xref ref-type="bibr" rid="B6">2019</xref>), and the participants were all speakers of English. The transcripts of the voice recordings were gathered as part of Alzheimer&#x00027;s and related dementia studies by the University of Pittsburgh School of Medicine. Every audio file had an associated transcript, allowing for acoustic and lexical analyses in parallel; the speech sample was recorded and then manually transcribed at the word level using codes for the human analysis of transcripts (CHAT) coding system (MacWhinney, <xref ref-type="bibr" rid="B31">2021</xref>). Every transcript came with morphosyntactic analysis automatically, such as repetition markers, description of tense, and standard part-of-speech tagging. Note that we removed utterances that had accompanying dysfluency annotations, morphological analysis, POS tags, and other associated information, leaving only pure text contents; as the deep learning model does not need to extract features manually, we aimed to create a fully automated system that does not need the participation of human annotators. After data preprocessing, 498 participants were enrolled in this study, including 242 normal controls and 256 people with possible and probable AD, and their corresponding transcripts were obtained. We divided the datasets into training sets, validation sets, and testing sets in a ratio of approximately 8:1:1. Therefore, the final number of the three datasets was 400, 50, and 48, respectively. Demographic information is shown in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Cookie theft picture.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnagi-15-1122799-g0003.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Demographics of Pitt datasets.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th/>
<th valign="top" align="center"><bold>CTRL (242)</bold></th>
<th valign="top" align="center"><bold>Possible/probable AD (256)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Age (years)</td>
<td valign="top" align="center">65.2 (7.8)</td>
<td valign="top" align="center">71.8 (8.5)</td>
</tr> <tr>
<td valign="top" align="left">Education (years)</td>
<td valign="top" align="center">14.1 (2.4)</td>
<td valign="top" align="center">12.5 (2.9)</td>
</tr> <tr>
<td valign="top" align="left">Gender (male/female)</td>
<td valign="top" align="center">86/156</td>
<td valign="top" align="center">90/166</td>
</tr> <tr>
<td valign="top" align="left">Mini-mental state exam</td>
<td valign="top" align="center">29.1 (1.1)</td>
<td valign="top" align="center">18.5 (5.1)</td>
</tr></tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>4.2. Model configuration and structure</title>
<p>Documents were split into sentences, and every sentence was tokenized using StanfoCoreNLP (Manning et al., <xref ref-type="bibr" rid="B32">2014</xref>). For word embedding, three methods were used to obtain the best performance in this study, i.e., word2vec from Google (Mikolov et al., <xref ref-type="bibr" rid="B36">2013</xref>), Glove (<ext-link ext-link-type="uri" xlink:href="https://nlp.stanford.edu/projects/glove/">https://nlp.stanford.edu/projects/glove/</ext-link>) including four word2vec files (50d, 100d, 200d, and 300d) from Stanford University, and FastText (<ext-link ext-link-type="uri" xlink:href="https://fasttext.cc/docs/en/crawl-vectors.html">https://fasttext.cc/docs/en/crawl-vectors.html</ext-link>) from Facebook. Glove and Fasttext needed a shorter training time, while word2vec required a longer time. Finally, the word embeddings were pretrained on Stanford&#x00027;s publicly available 100-dimensional Glove for better performance after comparison. We obtained the word embeddings on the training and validation splits and then used them to initialize <italic>We</italic>. The number of GRU units was set to 100 and the dense layer dimension at the word level was set to 50. The proposed model was trained on a fixed 10 epochs and evaluated on the validation sets at every epoch. Word weight and context weight were initialized randomly according to a normal distribution (mean = 0, std = 0.1). Similarly, sentence weight and context weight were also initialized randomly according to a normal distribution with mean and std being 0 and 0.1, respectively. Word bias and sentence bias were initialized randomly in the training stage. We applied an Adam optimizer with a 0.01 learning rate; the dropout to the output of all the functional layers was used, and the dropout rate was set to 0.35 for all the layers. All the aforementioned parameters were trained on the training sets and the best model was selected based on the accuracy of the validation sets. All the aforementioned parameters can be applied to the other models.</p>
</sec>
<sec>
<title>4.3. Results and analysis</title>
<p>In this study, we evaluated the effectiveness of our model with a five-fold cross-validation. That is, four sets were used as training sets and one as the test set, the results of which were summarized, and the average value was calculated. The relationship between the actual and predicted classes is presented in <xref ref-type="table" rid="T3">Table 3</xref> and the metric formulas of accuracy, precision, recall rate, and F1 score are shown in Eq. (18)&#x02013;(21).</p>
<disp-formula id="E15"><label>(14)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="center"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>u</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E16"><label>(15)</label><mml:math id="M18"><mml:mtable class="eqnarray" columnalign="center"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E17"><label>(16)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="center"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E18"><label>(17)</label><mml:math id="M20"><mml:mtable class="eqnarray" columnalign="center"><mml:mtr><mml:mtd><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><xref ref-type="table" rid="T4">Table 4</xref> shows the performance of the studies with Pitt datasets in this area. Of course, these datasets may include different subsets of the Pitt Cookie Theft corpus, and the results summarized in <xref ref-type="table" rid="T4">Table 4</xref> are not always comparable. In addition, these articles are not exhaustive because of our limited ability. Of all the studies in <xref ref-type="table" rid="T4">Table 4</xref>, the first set of studies (Becker et al., <xref ref-type="bibr" rid="B4">1994</xref>; Clarke et al., <xref ref-type="bibr" rid="B10">2013</xref>; Yancheva and Rudzicz, <xref ref-type="bibr" rid="B63">2016</xref>; Sirts et al., <xref ref-type="bibr" rid="B48">2017</xref>; Hern&#x000E1;ndez-Dom&#x000ED;nguez et al., <xref ref-type="bibr" rid="B18">2018</xref>; Fraser et al., <xref ref-type="bibr" rid="B13">2019</xref>; Li et al., <xref ref-type="bibr" rid="B22">2019</xref>; Antonsson et al., <xref ref-type="bibr" rid="B1">2021</xref>; R&#x00027;mani and James, <xref ref-type="bibr" rid="B43">2021</xref>; Zehra et al., <xref ref-type="bibr" rid="B65">2021</xref>) used a feature extraction &#x0002B; machine learning method, and the best accuracy was 85.4%. The second set of studies (Karlekar et al., <xref ref-type="bibr" rid="B21">2018</xref>; Orimaye et al., <xref ref-type="bibr" rid="B38">2018</xref>; Fritsch et al., <xref ref-type="bibr" rid="B15">2019</xref>; Pan et al., <xref ref-type="bibr" rid="B39">2019</xref>; Balagopalan et al., <xref ref-type="bibr" rid="B3">2021</xref>; Guo et al., <xref ref-type="bibr" rid="B16">2021</xref>; Meghanani et al., <xref ref-type="bibr" rid="B35">2021</xref>) used deep learning methods, of which the best accuracy was 91.1% (Karlekar et al., <xref ref-type="bibr" rid="B21">2018</xref>). The rest of the studies (Yuan et al., <xref ref-type="bibr" rid="B64">2020</xref>; Pranav and Veeky, <xref ref-type="bibr" rid="B41">2021</xref>; Roshanzamir et al., <xref ref-type="bibr" rid="B45">2021</xref>; Tristan and Saturnino Analysis, <xref ref-type="bibr" rid="B55">2021</xref>) used deep learning models in combination with acoustic features or linguistic features. The study by Yuan et al. (<xref ref-type="bibr" rid="B64">2020</xref>) obtained the best accuracy of 89.6%, the highest in Interspeech 2020. Our method obtained the best accuracy of 91.6%, which is 0.5% higher than the best performance of the study by Karlekar et al. (<xref ref-type="bibr" rid="B21">2018</xref>). The image of the confusion matrix of our study is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, and only two AD and two NC in 48 testing sets were not recognized correctly.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Relationship between the predicted and true classes.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th/>
<th valign="top" align="left" colspan="2"><bold>True class</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Predicted class</td>
<td valign="top" align="left">Positive</td>
<td valign="top" align="left">Negative</td>
</tr> <tr>
<td valign="top" align="left">Positive</td>
<td valign="top" align="left">True positive (TP)</td>
<td valign="top" align="left">False positive (FP)</td>
</tr> <tr>
<td valign="top" align="left">Negative</td>
<td valign="top" align="left">False negative (FN)</td>
<td valign="top" align="left">True negative (TN)</td>
</tr></tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>AD vs. CTRL classification scores(%) on Pitt datasets.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="left"><bold>Embedding</bold></th>
<th valign="top" align="left"><bold>Classifier</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>AUC</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Antonsson et al. (<xref ref-type="bibr" rid="B1">2021</xref>)</td>
<td valign="top" align="left">Semantic features</td>
<td valign="top" align="left">SVM</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Clarke et al. (<xref ref-type="bibr" rid="B10">2013</xref>)</td>
<td valign="top" align="left">286 Linguistic features</td>
<td valign="top" align="left">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">50&#x02013;78 for MCI vs. HC, 59&#x02013;90 for AD vs. HC, and 62&#x02013;78 for AD&#x0002B;MCI vs. HC</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">R&#x00027;mani and James (<xref ref-type="bibr" rid="B43">2021</xref>)</td>
<td valign="top" align="left">x-vectors and i-vectors features (Roark et al., <xref ref-type="bibr" rid="B44">2011</xref>)</td>
<td valign="top" align="left">Random Forests and SVM</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">85.4</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Zehra et al. (<xref ref-type="bibr" rid="B65">2021</xref>)</td>
<td valign="top" align="left">Hand-Craft acoustic and linguistic features</td>
<td valign="top" align="left">Logistic Regression</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">85.4</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Becker et al. (<xref ref-type="bibr" rid="B4">1994</xref>)</td>
<td valign="top" align="left">35Hand-Crafted Feature</td>
<td valign="top" align="left">Logistic Regression<break/>(LR)</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">81.92</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Yancheva and Rudzicz (<xref ref-type="bibr" rid="B63">2016</xref>)</td>
<td valign="top" align="left">12Cluster-Based Features&#x0002B;LS&#x00026;A</td>
<td valign="top" align="left">Random Forest</td>
<td valign="top" align="center">80.00</td>
<td valign="top" align="center">80.00</td>
<td valign="top" align="center">80.00</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">80.00</td>
</tr> <tr>
<td valign="top" align="left">Sirts et al. (<xref ref-type="bibr" rid="B48">2017</xref>)</td>
<td valign="top" align="left">Cluster&#x0002B;PID&#x0002B;SID Features</td>
<td valign="top" align="left">LR</td>
<td valign="top" align="center">74.4 &#x000B1; 1.5</td>
<td valign="top" align="center">72.5<break/>&#x000B1; 1.2</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">72.7 &#x000B1; 1.2</td>
</tr> <tr>
<td valign="top" align="left">Hern&#x000E1;ndez-Dom&#x000ED;nguez et al. (<xref ref-type="bibr" rid="B18">2018</xref>)</td>
<td valign="top" align="left">105Hand-Crafted Features</td>
<td valign="top" align="left">SVM</td>
<td valign="top" align="center">81.00</td>
<td valign="top" align="center">81.00</td>
<td valign="top" align="center">79.00</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">81.00</td>
</tr> <tr>
<td valign="top" align="left">Li et al. (<xref ref-type="bibr" rid="B22">2019</xref>)</td>
<td valign="top" align="left">185Hand-Craft Features</td>
<td valign="top" align="left">LR</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">77</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Fraser et al. (<xref ref-type="bibr" rid="B14">2016</xref>)</td>
<td valign="top" align="left">Info and LM Features</td>
<td valign="top" align="left">SVM</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">77</td>
</tr> <tr>
<td valign="top" align="left">Fritsch et al. (<xref ref-type="bibr" rid="B15">2019</xref>)</td>
<td valign="top" align="left">n-gram</td>
<td valign="top" align="left">NNLM&#x0002B;LSTM</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">85.6</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Balagopalan et al. (<xref ref-type="bibr" rid="B3">2021</xref>)</td>
<td valign="top" align="left">-</td>
<td valign="top" align="left">BERT</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">83.33</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Guo et al. (<xref ref-type="bibr" rid="B16">2021</xref>)</td>
<td valign="top" align="left">-</td>
<td valign="top" align="left">BERT</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">82.1</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Meghanani et al. (<xref ref-type="bibr" rid="B35">2021</xref>)</td>
<td valign="top" align="left">-</td>
<td valign="top" align="left">FastText</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">83.3</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Karlekar et al. (<xref ref-type="bibr" rid="B21">2018</xref>)</td>
<td valign="top" align="left">POS-tagged data</td>
<td valign="top" align="left">CNN-RNN</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">91.1</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Orimaye et al. (<xref ref-type="bibr" rid="B38">2018</xref>)</td>
<td valign="top" align="left">n-grams</td>
<td valign="top" align="left">D2NN</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">88.9</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Pan et al. (<xref ref-type="bibr" rid="B39">2019</xref>)</td>
<td valign="top" align="left">GloVe Word Embedding Sequence</td>
<td valign="top" align="left">BiLSTM|GRU<break/>Hierarchical Attention</td>
<td valign="top" align="center">84.02</td>
<td valign="top" align="center">84.97</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">84.43</td>
</tr> <tr>
<td valign="top" align="left">Yuan et al. (<xref ref-type="bibr" rid="B64">2020</xref>)</td>
<td valign="top" align="left">Encoding of pauses&#x0002B;ERNIE Embedding</td>
<td valign="top" align="left">ERNIE</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">89.6</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Tristan and Saturnino Analysis (<xref ref-type="bibr" rid="B55">2021</xref>)</td>
<td valign="top" align="left">Word cooccurrence graphs</td>
<td valign="top" align="left">Machine Learning</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">66.7</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Roshanzamir et al. (<xref ref-type="bibr" rid="B45">2021</xref>)</td>
<td valign="top" align="left">BERT<sub>Base</sub></td>
<td valign="top" align="left">LR</td>
<td valign="top" align="center">90.31 &#x000B1;7.36</td>
<td valign="top" align="center">76.52<break/>&#x000B1;8.06</td>
<td valign="top" align="center">84.46 &#x000B1;6.31</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">82.72 &#x000B1;7.21</td>
</tr> <tr>
<td valign="top" align="left">Roshanzamir et al. (<xref ref-type="bibr" rid="B45">2021</xref>)</td>
<td valign="top" align="left">Bert<sub>Large</sub></td>
<td valign="top" align="left">LR</td>
<td valign="top" align="center">90.57 &#x000B1; 3.18</td>
<td valign="top" align="center">84.34<break/>&#x000B1; 7.58</td>
<td valign="top" align="center">88.08 &#x000B1; 4.48</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">87.23 &#x000B1; 5.20</td>
</tr> <tr>
<td valign="top" align="left">Pranav and Veeky (<xref ref-type="bibr" rid="B41">2021</xref>)</td>
<td valign="top" align="left">Linguistic features</td>
<td valign="top" align="left">Deep learning</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">88</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">Our method</td>
<td valign="top" align="left">GRU</td>
<td valign="top" align="left">Softmax</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
<td valign="top" align="center"><bold>91.6</bold></td>
<td valign="top" align="center">-</td>
<td valign="top" align="center">-</td>
</tr></tbody>
</table>
</table-wrap>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Result of the confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnagi-15-1122799-g0004.tif"/>
</fig>
</sec>
<sec>
<title>4.4. Ablation study on attention network</title>
<p>We validated the effectiveness of every part by ablation study, as illustrated in <xref ref-type="table" rid="T5">Table 5</xref>. First, removing the word level (-Word) leads to a 1.4% performance drop for Pitt datasets. Similarly, removing the sentence level (-Sentence) leads to a 2.3% performance drop, which is more significant than removing the word level. From the ablation experiment, we can demonstrate that the word level and sentence level are essential to our model.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Ablation study on our model.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>Drop</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Our Model</td>
<td valign="top" align="center">91.6</td>
<td valign="top" align="center">-</td>
</tr> <tr>
<td valign="top" align="left">(-Word)</td>
<td valign="top" align="center">90.2</td>
<td valign="top" align="center">1.4</td>
</tr> <tr>
<td valign="top" align="left">(-Sentence)</td>
<td valign="top" align="center">89.3</td>
<td valign="top" align="center">2.3</td>
</tr></tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>4.5. Visualization of attention features</title>
<p>We normalized the word weight by sentence weight to make sure that only important words in important sentences are emphasized because of the hierarchical structure. To validate that our proposed model can select formative words and sentences, we visualized the contextual attention features shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. Each line is a sentence; green denotes the word weight and red denotes the sentence weight. The study by Liu and Yuan (<xref ref-type="bibr" rid="B24">2022</xref>) indicates that a general and integral expression for normal should include the following seed words: boy, girl, woman, cookie, stool, sink, overflow, fall, window, curtain, plate, cloth, jar, water, cupboard, dish, kitchen, garden, take, wash, reach, attention, and see. In the AD group, we found three problems in linguistic expression. For the first one, our model only referred to a few seed words such as &#x0201C;boy&#x0201D;, &#x0201C;girl&#x0201D;, &#x0201C;mother&#x0201D;, &#x0201C;floor&#x0201D;, and &#x0201C;window&#x0201D;, and the description was much shorter compared to that of the NC group. The participant cannot describe the picture completely which affects the adequacy of discourse information to some extent. For the second one, our model localized the key colloquial words such as &#x0201C;uh&#x0201D;, and &#x0201C;um&#x0201D;; the study by Yuan et al. (<xref ref-type="bibr" rid="B64">2020</xref>) indicates that people with AD use more &#x0201C;uh&#x0201D; and &#x0201C;um&#x0201D; than NC. There is usually a pause after &#x0201C;uh&#x0201D; and &#x0201C;um&#x0201D; and the participant may not find appropriate words or sentences to express himself, which finally influences verbal fluency. For the third one, our model accurately localized personal pronouns such as &#x0201C;he&#x0201D; and &#x0201C;she&#x0201D;, as well as the corresponding sentences, which means that people with AD may have a word-finding difficulty and can only use he or she to replace, which finally influences the sentence expression and meaningful output.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>An example of AD and NC from Pitt dataset. <bold>(A)</bold> Prediction AD. <bold>(B)</bold> Prediction NC.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnagi-15-1122799-g0005.tif"/>
</fig>
<p>In the normal group, our model selected more seed words, such as scene, kitchen, mother, dish, water, garden, boy, girl, mother, window, curtain, breeze, water, and their corresponding sentences, indicating a rich vocabulary and integrated semantic expression. In addition, some attributive words that our model selected include &#x0201C;little&#x0201D;, &#x0201C;short&#x0201D;, &#x0201C;gentle&#x0201D;, and &#x0201C;almost&#x0201D;, manifesting a sufficiency of discourse information and the coherence of discourse.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Conclusion</title>
<p>Many studies on AD diagnosis using language focused on the deep learning method (Liu et al., <xref ref-type="bibr" rid="B25">2021</xref>, <xref ref-type="bibr" rid="B23">2022</xref>; Chen and Liu, <xref ref-type="bibr" rid="B8">2022</xref>) as the traditional feature extraction method is blind, lacks integrity, and has a relatively worse performance compared with the deep learning method. Meanwhile, with the development of deep learning, new methods such as contrast learning, unsupervised learning, and multimodal feature fusion can be used to differentiate AD from normal controls.</p>
<p>This study used the deep learning method combined with the attention mechanism to identify important words in a sentence to form sentence representation and important sentences in a document, which formed the representation of the whole document. We combined contextual features with the attention mechanism and studied the classification of implicit effective sentences based on the bi-GRU model and attention mechanism. Of course, the encoder of bi-GRU in our model can be replaced by other models, such as RNN and LSTM. Owing to the difference in expression between implicit and explicit texts, the proposed model can learn fuzzy implicit sentiment with contextual attention features to improve classification performance. Compared with the general classification model, our model can extract more valuable information based on word and sentence levels. Experimental results on public Pitt datasets show the superiority of our model to other classification models in AD diagnosis. Meanwhile, deep learning models are considered &#x0201C;a blind box&#x0201D; (Meghanani et al., <xref ref-type="bibr" rid="B35">2021</xref>), the interpretability of which is not better than that of the machine learning method as we cannot obtain the feature information that humans can understand from these models. However, our work can be visualized further as we may select more informative words and sentences that affect the classification effect, which may provide some references for the detection and rehabilitation of cognitive dysfunction sufferings from the perspective of linguistics.</p>
<p>However, our model may ignore some potential risks. For example, the corpus we used may contain recordings taken over multiple visits from the same patient, which might bias the model because the training sets and testing sets may be from the same patient. To eliminate this bias, the studies (Luz et al., <xref ref-type="bibr" rid="B28">2020</xref>, <xref ref-type="bibr" rid="B29">2021</xref>), for example, employed the one-to-one matching approach and propensity score matching strategy, respectively. The datasets of the ADReSS challenge in 2010 were created precisely for avoiding this and other potential sources of bias (such as gender and age). In our future study, we will take effective measures to eliminate these potential biases.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec sec-type="ethics-statement" id="s7">
<title>Ethics statement</title>
<p>Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>ZY gave some good suggestions and revised the parameters of the model. YC revised the background introduction. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antonsson</surname> <given-names>M.</given-names></name> <name><surname>Fors</surname> <given-names>K. L.</given-names></name> <name><surname>Eckerstrm</surname> <given-names>M.</given-names></name> <name><surname>Kokkinakis</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <article-title>Using a discourse task to explore semantic ability in persons with cognitive impairment</article-title>. <source>Front. Aging Neurosci</source>. <volume>12</volume>, <fpage>607449</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2020.607449</pub-id><pub-id pub-id-type="pmid">33536894</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bahdanau</surname> <given-names>D.</given-names></name> <name><surname>Cho</surname> <given-names>K.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Neural machine translation by jointly learning to align and translate</article-title>. <source>arXiv</source>. <fpage>1</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1409.0473</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balagopalan</surname> <given-names>A.</given-names></name> <name><surname>Benjamin</surname> <given-names>E.</given-names></name> <name><surname>Jessica</surname> <given-names>R.</given-names></name> <name><surname>Frank</surname> <given-names>R.</given-names></name> <name><surname>Jekaterina</surname> <given-names>N.</given-names></name></person-group> (<year>2021</year>). <article-title>Comparing pre-trained and feature-based models for prediction of alzheimer&#x00027;s disease based on speech</article-title>. <source>Front. Aging. Neurosci.</source> <volume>13</volume>, <fpage>635945</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2021.635945</pub-id><pub-id pub-id-type="pmid">33986655</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Becker</surname> <given-names>J.</given-names></name> <name><surname>Boller</surname> <given-names>F.</given-names></name> <name><surname>Lopez</surname> <given-names>O.</given-names></name> <name><surname>Saxton</surname> <given-names>J.</given-names></name> <name><surname>McGonigle</surname> <given-names>K.</given-names></name></person-group> (<year>1994</year>). <article-title>The natural history of Alzheimer&#x00027;s disease: Description of study cohort and accuracy of diagnosis</article-title>. <source>Arch. Neurol</source>. <volume>51</volume>, <fpage>585</fpage>&#x02013;<lpage>594</lpage>. <pub-id pub-id-type="doi">10.1001/archneur.1994.00540180063015</pub-id><pub-id pub-id-type="pmid">8198470</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>A feature study for classificationbased speech separation at low signal-to-noise ratios</article-title>. <source>IEEE/AC Trans Audio Speech Lang Process</source>. <volume>22</volume>, <fpage>1993</fpage>&#x02013;<lpage>2002</lpage>. <pub-id pub-id-type="doi">10.1109/TASLP.2014.2359159</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Ye</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>An attention-based hybrid network for automatic detection of alzheimer&#x00027;s disease from narrative speech</article-title>, in <source>Interspeech.</source> (<publisher-loc>Baltimore, MD</publisher-loc>: <publisher-name>The Association for Computer Linguistics</publisher-name>). <pub-id pub-id-type="doi">10.21437/Interspeech.2019-2872</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>P.</given-names></name> <name><surname>Sun</surname> <given-names>Z.</given-names></name> <name><surname>Bing</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>W.</given-names></name></person-group> (<year>2017</year>). <article-title>Recurrent attention network on memory for aspect sentiment analysis</article-title>, in <source>Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.</source> <publisher-loc>Copenhagen, Denmark</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>452</fpage>&#x02013;<lpage>461</lpage>. <pub-id pub-id-type="doi">10.18653/v1/D17-1047</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>N.</given-names></name></person-group> (<year>2022</year>). <article-title>Using multimodel features to diagnose mild cognitive impairment and Alzheimer&#x00027;s disease</article-title>. <source>AEMCME</source>. <volume>3</volume>. <fpage>322</fpage>&#x02013;<lpage>332</lpage>.</citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>Y.</given-names></name> <name><surname>Wiebe</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0002B;/- EffectWordNet: Sense-level lexicon acquisition for opinion inference</article-title>, in <source>Proceedings of the (2014). Conference on Empirical Methods in Natural Language Processing (EMNLP).</source> <publisher-loc>Doha, Qatar</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>, p. <fpage>1181</fpage>&#x02013;<lpage>1191</lpage>. <pub-id pub-id-type="doi">10.3115/v1/D14-1125</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clarke</surname> <given-names>N.</given-names></name> <name><surname>Barrick</surname> <given-names>T. R.</given-names></name> <name><surname>Garrard</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>Comparison of connected speech tasks for detecting early Alzheimer&#x00027;s disease and mild cognitive impairment using natural language processing and machine learning</article-title>. <source>Front. Comp. Sci</source>. <volume>3</volume>, <fpage>634360</fpage>. <pub-id pub-id-type="doi">10.3389/fcomp.2021.634360</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>J.</given-names></name> <name><surname>Yan</surname> <given-names>H.</given-names></name> <name><surname>Sun</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>P.</given-names></name> <name><surname>Qiu</surname> <given-names>X.</given-names></name></person-group> (<year>2021</year>). <article-title>Does syntax matter? a strong baseline for aspect-based sentiment analysis with RoBERTa</article-title>, in <source>Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>. Online: Association for Computational Linguistics. p <fpage>1816</fpage>&#x02013;<lpage>1829</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2021.naacl-main.146</pub-id><pub-id pub-id-type="pmid">36568019</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Wiebe</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Sentiment propagation via implicature constraints</article-title>, in <source>Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics</source>. <publisher-loc>Gothenburg, Sweden</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>377</fpage>&#x02013;<lpage>385</lpage>. <pub-id pub-id-type="doi">10.3115/v1/E14-1040</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Fraser</surname> <given-names>K. C.</given-names></name> <name><surname>Linz</surname> <given-names>N.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Multilingual prediction of Alzheimer&#x00027;s disease through domain adaptation and concept-based language modelling</article-title>, in <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).</source> (<publisher-loc>Minneapolis, MN</publisher-loc>: <publisher-name>Association for Computational Linguistics: Human Language Technologies</publisher-name>) p. <fpage>3659</fpage>&#x02013;<lpage>3670</lpage>. <ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/N19-1367.pdf">https://aclanthology.org/N19-1367.pdf</ext-link></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fraser</surname> <given-names>K. C.</given-names></name> <name><surname>Meltzer</surname> <given-names>J. A.</given-names></name> <name><surname>Rudzicz</surname> <given-names>F.</given-names></name></person-group> (<year>2016</year>). <article-title>Linguistic features identify Alzheimer&#x00027;s disease in narrative speech</article-title>. <source>J. Alzheimer&#x00027;s Dis</source>. <volume>49</volume>, <fpage>407</fpage>&#x02013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.3233/JAD-150520</pub-id><pub-id pub-id-type="pmid">26484921</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fritsch</surname> <given-names>J.</given-names></name> <name><surname>Wankerl</surname> <given-names>S.</given-names></name> <name><surname>N&#x000F3;th</surname> <given-names>E.</given-names></name></person-group> (<year>2019</year>). <article-title>Automatic diagnosis of alzheimer&#x00027;s disease using neural network language models</article-title>, in <source>ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source> (<publisher-loc>Brighton</publisher-loc>: <publisher-name>ICASSP 2019 IEEE International Conference on Acoustics</publisher-name>) p. <fpage>5841</fpage>&#x02013;<lpage>5845</lpage>. <pub-id pub-id-type="doi">10.1109/ICASSP.2019.8682690</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>C. Y.</given-names></name> <name><surname>Carol</surname> <given-names>R.</given-names></name> <name><surname>Serguei</surname> <given-names>P.</given-names></name> <name><surname>Trevor</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Crossing the &#x0201C;cookie theft&#x0201D; corpus chasm: applying what BERT learns from outside data to the ADReSS challenge dementia detection task</article-title>. <source>Front. Comp. Sci.</source> <volume>3</volume>, <fpage>642517</fpage>. <pub-id pub-id-type="doi">10.3389/fcomp.2021.642517</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>R.</given-names></name> <name><surname>Lee</surname> <given-names>W.</given-names></name> <name><surname>Ng</surname> <given-names>H. T.</given-names></name> <name><surname>Dahlmeier</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Effective attention modeling for aspect-level sentiment classification</article-title>, in <source>Proceedings of the 27th International Conference on Computational Linguistics.</source> <publisher-loc>Santa Fe, New Mexico, USA</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>1121</fpage>&#x02013;<lpage>1131</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P18-2092</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hern&#x000E1;ndez-Dom&#x000ED;nguez</surname> <given-names>L.</given-names></name> <name><surname>Ratt&#x000E9;</surname> <given-names>S.</given-names></name> <name><surname>Sierra-Mart&#x000ED;nez</surname> <given-names>G.</given-names></name> <name><surname>Roche-Berguac</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Computer-based evaluation of Alzheimer&#x00027;s disease and mild cognitive impairment patients during a picture description task</article-title>. <source>Alzheimer&#x00027;s Dementia</source>. <volume>10</volume>, <fpage>260</fpage>&#x02013;<lpage>268</lpage>. <pub-id pub-id-type="doi">10.1016/j.dadm.2018.02.004</pub-id><pub-id pub-id-type="pmid">29780871</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hochreiter</surname> <given-names>S.</given-names></name> <name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>Long short-term memory</article-title>. <source>Neural Computat.</source> <volume>9</volume>, <fpage>1735</fpage>&#x02013;<lpage>1780</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id><pub-id pub-id-type="pmid">9377276</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jarrold</surname> <given-names>W.</given-names></name> <name><surname>Peintner</surname> <given-names>B.</given-names></name> <name><surname>Wilkins</surname> <given-names>D.</given-names></name> <name><surname>Vergryi</surname> <given-names>D.</given-names></name> <name><surname>Richey</surname> <given-names>C.</given-names></name> <name><surname>Gorno-Tempini</surname> <given-names>M. L.</given-names></name> <name><surname>Ogar</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Aided diagnosis of dementia type through computer-based analysis of spontaneous speech</article-title>. <source>CLPsych.</source> <volume>11</volume>, <fpage>27</fpage>&#x02013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.3115/v1/W14-3204</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Karlekar</surname> <given-names>S.</given-names></name> <name><surname>Niu</surname> <given-names>T.</given-names></name> <name><surname>Bansal</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Detecting Linguistic Characteristics of Alzheimer&#x00027;s Dementia by Interpreting Neural Models</article-title>, in <source>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)</source>, (<publisher-loc>New Orleans, LA</publisher-loc>: <publisher-name>Association for Computational Linguistics: Human Language Technologies</publisher-name>) p. <fpage>701</fpage>&#x02013;<lpage>707</lpage>. <pub-id pub-id-type="doi">10.18653/v1/N18-2110</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Hsu</surname> <given-names>Y. T.</given-names></name> <name><surname>Rudzicz</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Detecting dementia in mandarin Chinese using transfer learning from a parallel corpus</article-title>. <source>arXiv</source>. <pub-id pub-id-type="doi">10.18653/v1/N19-1199</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Luo</surname> <given-names>K.</given-names></name> <name><surname>Yuan</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>Y. A.</given-names></name></person-group> (<year>2022</year>). <article-title>A transfer learning method for detecting Alzheimer&#x00027;s disease based on speech and natural language processing</article-title>. <source>Front. Public Health</source>. <volume>2</volume>, <fpage>772592</fpage>. <pub-id pub-id-type="doi">10.3389/fpubh.2022.772592</pub-id><pub-id pub-id-type="pmid">35493375</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Yuan</surname> <given-names>Z.</given-names></name></person-group> (<year>2022</year>). <article-title>Spontaneous language analysis in alzheimer&#x00027;s disease: evaluation of natural language processing technique for analyzing lexical performance</article-title>. <source>J. Shanghai Jiao Tong Univ. (Sci.).</source> <volume>27</volume>, <fpage>160</fpage>&#x02013;<lpage>167</lpage>. <pub-id pub-id-type="doi">10.1007/s12204-021-2384-3</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Yuan</surname> <given-names>Z.</given-names></name> <name><surname>Tang</surname> <given-names>Q.</given-names></name></person-group> (<year>2021</year>). <article-title>Improving Alzheimer&#x00027;s disease detection for speech based on feature purification network</article-title>. <source>Front. Public Health</source>. <volume>12</volume>, <fpage>835960</fpage>. <pub-id pub-id-type="doi">10.3389/fpubh.2021.835960</pub-id><pub-id pub-id-type="pmid">35310782</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Luz</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Longitudinal monitoring and detection of Alzheimer&#x00027;s type dementia from spontaneous speech data,&#x0201D;</article-title> in <source>Procs. of the Intl. Symp on Comp. Based Medical Systems (CBMS)</source>. <publisher-loc>Manhattan, New York</publisher-loc>: <publisher-name>IEEE</publisher-name>. p. <fpage>45</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1109/CBMS.2017.41</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luz</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Automatic identifification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction</article-title>. <source>Procs I. C. M. A. C. M. I.</source> <volume>2013</volume>, <fpage>575</fpage>&#x02013;<lpage>582</lpage>. <pub-id pub-id-type="doi">10.1145/2522848.2533788</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Luz</surname> <given-names>S.</given-names></name> <name><surname>Haider</surname> <given-names>F.</given-names></name> <name><surname>de la Fuente Garcia</surname> <given-names>S.</given-names></name> <name><surname>Fromm</surname> <given-names>D.</given-names></name> <name><surname>Macwhinney</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <source>Alzheimer&#x00027;s Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge</source>. <publisher-loc>Shanghai</publisher-loc>: <publisher-name>Interspeech</publisher-name>.</citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Luz</surname> <given-names>S.</given-names></name> <name><surname>Haider</surname> <given-names>F.</given-names></name> <name><surname>de la Fuente Garcia</surname> <given-names>S.</given-names></name> <name><surname>Fromm</surname> <given-names>D.</given-names></name> <name><surname>Macwhinney</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <source>Detecting cognitive decline using speech only: The ADReSS O Challenge</source>. <publisher-loc>Virtual conference</publisher-loc>: <publisher-name>Interspeech</publisher-name>.</citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Interactive attention networks for aspect-level sentiment classifification</article-title>, in <source>Proceedings of the Twenty-Sixth International Joint Conference on Artifificial Intelligence, IJCAI-17.</source> (<publisher-loc>Melbourne</publisher-loc>: <publisher-name>International Joint Conferences on Artificial Intelligence</publisher-name>) p. <fpage>4068</fpage>&#x02013;<lpage>4074</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2017/568</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>MacWhinney</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <source>Tools for Analyzing Talk Part 1: The CHAT Transcription Format</source>. <publisher-loc>Pittsburgh, PA</publisher-loc>: <publisher-name>Carnegie Mellon University</publisher-name>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manning</surname> <given-names>C. D.</given-names></name> <name><surname>Surdeanu</surname> <given-names>M.</given-names></name> <name><surname>Bauer</surname> <given-names>J.</given-names></name> <name><surname>Finkel</surname> <given-names>J.</given-names></name> <name><surname>Bethard</surname> <given-names>S. J.</given-names></name> <name><surname>McClosky</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>The stanford corenlp natural language processing toolkit</article-title>, in <source>Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>. p. <fpage>55</fpage>&#x02013;<lpage>60</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mattson</surname> <given-names>M. P.</given-names></name></person-group> (<year>2004</year>). <article-title>Pathways towards and away from Alzheimer&#x00027;s disease</article-title>. <source>Nature</source>. <volume>430</volume>, <fpage>631</fpage>&#x02013;<lpage>639</lpage>. <pub-id pub-id-type="doi">10.1038/nature02621</pub-id><pub-id pub-id-type="pmid">15295589</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Medhat</surname> <given-names>W.</given-names></name> <name><surname>Hassan</surname> <given-names>A.</given-names></name> <name><surname>Korashy</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Sentiment analysis algorithms and applications: a survey</article-title>. <source>AIN Shams Engineering J.</source> <volume>5</volume>, <fpage>1093</fpage>&#x02013;<lpage>1113</lpage>. <pub-id pub-id-type="doi">10.1016/j.asej.2014.04.011</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meghanani</surname> <given-names>A.</given-names></name> <name><surname>Anoop</surname> <given-names>C. S.</given-names></name> <name><surname>Ganesan</surname> <given-names>R. A.</given-names></name></person-group> (<year>2021</year>). <article-title>Recognition of Alzheimer&#x00027;s dementia from the transcriptions of spontaneous speech using fasttext and CNN models</article-title>. <source>Front. Comp. Sci.</source> <volume>3</volume>, <fpage>624558</fpage>. <pub-id pub-id-type="doi">10.3389/fcomp.2021.624558</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Corrado</surname> <given-names>G. S.</given-names></name> <name><surname>Dean</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Distributed representations of words and phrases and their compositionality</article-title>, in <source>Advances in neural information processing systems.</source> (<publisher-loc>Lake Tahoe, NV</publisher-loc>: <publisher-name>27th Annual Conference on Neural Information Processing Systems</publisher-name>) P. <fpage>3111</fpage>&#x02013;<lpage>3119</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mueller</surname> <given-names>K. D.</given-names></name> <name><surname>Koscik</surname> <given-names>R. L.</given-names></name> <name><surname>Hermann</surname> <given-names>B.</given-names></name> <name><surname>Johnson</surname> <given-names>S. C.</given-names></name> <name><surname>Turkstra</surname> <given-names>L. S.</given-names></name></person-group> (<year>2018</year>). <article-title>Declines in connected language are associated with very early mild cognitive impairment: Results from the wisconsin registry for alzheimer&#x00027;s prevention</article-title>. <source>Front. Aging Neurosci</source>. <volume>9</volume>, <fpage>00437</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2017.00437</pub-id><pub-id pub-id-type="pmid">29375365</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orimaye</surname> <given-names>S. O.</given-names></name> <name><surname>Wong</surname> <given-names>S. M.</given-names></name> <name><surname>Wong</surname> <given-names>C. P.</given-names></name> <name><surname>Liang</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep language space neural network for classifying mild cognitive impairment and Alzheimer-type dementia</article-title>. <source>PLoS ONE.</source> <volume>13</volume>, <fpage>e0205636</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0205636</pub-id><pub-id pub-id-type="pmid">30870510</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>Y.</given-names></name> <name><surname>Mirheidari</surname> <given-names>B.</given-names></name> <name><surname>Reuber</surname> <given-names>M.</given-names></name> <name><surname>Venneri</surname> <given-names>A.</given-names></name> <name><surname>Blackburn</surname> <given-names>D.</given-names></name> <name><surname>Christensen</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Automatic Hierarchical Attention Neural Network for Detecting AD</article-title>, in <source>Proc. Interspeech.</source> p. <fpage>4105</fpage>&#x02013;<lpage>4109</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2019-1799</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pope</surname> <given-names>C.</given-names></name> <name><surname>Davis</surname> <given-names>B. H.</given-names></name></person-group> (<year>2011</year>). <article-title>Finding a balance: the carolinas conversation collection</article-title>. <source>Corpus Linguist. Lingu. Theory</source>. <volume>7</volume>, <fpage>143</fpage>&#x02013;<lpage>161</lpage>. <pub-id pub-id-type="doi">10.1515/cllt.2011.007</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pranav</surname> <given-names>M.</given-names></name> <name><surname>Veeky</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>Acoustic and language based deep learning approaches for alzheimer&#x00027;s dementia detection from spontaneous speech</article-title>. <source>Front. Aging Neurosci</source>. <volume>13</volume>, <fpage>623607</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2021.623607</pub-id><pub-id pub-id-type="pmid">33613269</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rietzler</surname> <given-names>A.</given-names></name> <name><surname>Stabinger</surname> <given-names>S.</given-names></name> <name><surname>Opitz</surname> <given-names>P.</given-names></name> <name><surname>Engl</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Adapt or get left behind: Domain adaptation through BERT language model finetuning for aspect-target sentiment classifification</article-title>, in <source>Proceedings of the 12th Language Resources and Evaluation Conference.</source> <publisher-loc>Marseille, France</publisher-loc>: <publisher-name>European Language Resources Association</publisher-name>. p. <fpage>4933</fpage>&#x02013;<lpage>4941</lpage>.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>R&#x00027;mani</surname> <given-names>H.</given-names></name> <name><surname>James</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>Classifying Alzheimer&#x00027;s disease using audio and text-based representations of speech</article-title>. <source>Front. Psychol</source>. <volume>11</volume>, <fpage>624137</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2020.624137</pub-id><pub-id pub-id-type="pmid">33519651</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roark</surname> <given-names>B.</given-names></name> <name><surname>Mitchell</surname> <given-names>M.</given-names></name> <name><surname>Hosom</surname> <given-names>J.-P.</given-names></name> <name><surname>Hollingshead</surname> <given-names>K.</given-names></name> <name><surname>Kaye</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Spoken language derived measures for detecting mild cognitive impairment</article-title>. <source>IEEE/AC Trans Audio Speech Lang Process</source>. <volume>19</volume>, <fpage>2081</fpage>&#x02013;<lpage>2090</lpage>. <pub-id pub-id-type="doi">10.1109/TASL.2011.2112351</pub-id><pub-id pub-id-type="pmid">22199464</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roshanzamir</surname> <given-names>A.</given-names></name> <name><surname>Aghajan</surname> <given-names>H.</given-names></name> <name><surname>Baghshah</surname> <given-names>M. S.</given-names></name></person-group> (<year>2021</year>). <article-title>Transformer-based deep neural network language models for Alzheimer&#x00027;s disease risk assessment from targeted speech</article-title>. <source>BMC</source>. <volume>21</volume>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1186/s12911-021-01456-3</pub-id><pub-id pub-id-type="pmid">33750385</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Russo</surname> <given-names>I.</given-names></name> <name><surname>Caselli</surname> <given-names>T.</given-names></name> <name><surname>Strapparava</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>SemEval-2015 task 9: CLIPEval implicit polarity of events</article-title>, in <source>Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)</source>. <publisher-loc>Denver, Colorado</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>443</fpage>&#x02013;<lpage>450</lpage>. <pub-id pub-id-type="doi">10.18653/v1/S15-2077</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shamila</surname> <given-names>N.</given-names></name> <name><surname>Morteza</surname> <given-names>R.</given-names></name> <name><surname>Julian</surname> <given-names>H.</given-names></name> <name><surname>Matthew</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>Alzheimer&#x00027;s dementia recognition from spontaneous speech using disfluency and interactional features</article-title>. <source>Front. Comp. Sci</source>. <volume>3</volume>, <fpage>640669</fpage>. <pub-id pub-id-type="doi">10.3389/fcomp.2021.640669</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sirts</surname> <given-names>K.</given-names></name> <name><surname>Piguet</surname> <given-names>O.</given-names></name> <name><surname>Johnson</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Idea density for predicting Alzheimer&#x00027;s disease from transcribed speech</article-title>. <source>Alzheimer S.</source> <fpage>322</fpage>&#x02013;<lpage>332</lpage>. <pub-id pub-id-type="doi">10.18653/v1/K17-1033</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Snyder</surname> <given-names>D.</given-names></name> <name><surname>Garcia-Romero</surname> <given-names>D.</given-names></name> <name><surname>Sell</surname> <given-names>G.</given-names></name> <name><surname>Povey</surname> <given-names>D.</given-names></name> <name><surname>Khudanpur</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>X-vectors: Robust DNN Embeddings for Speaker Recognition</article-title>, in <source>Procs IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source> (<publisher-loc>New York City</publisher-loc>: <publisher-name>IEEE</publisher-name>) p. <fpage>5329</fpage>&#x02013;<lpage>5333</lpage>. <pub-id pub-id-type="doi">10.1109/icassp.2018.8461375</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name> <name><surname>Mensah</surname> <given-names>S.</given-names></name> <name><surname>Mao</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Aspect-level sentiment analysis via convolution over dependency tree</article-title>, in <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>. <publisher-loc>Hong Kong, China</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>5679</fpage>&#x02013;<lpage>5688</lpage>. <pub-id pub-id-type="doi">10.18653/v1/D19-1569</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>D.</given-names></name> <name><surname>Qin</surname> <given-names>B.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). <article-title>Aspect level sentiment classifification with deep memory network</article-title>, in <source>Proceedings of the (2016). Conference on 256 Empirical Methods in Natural Language Processing</source>. <publisher-loc>Austin, Texas</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>214</fpage>&#x02013;<lpage>224</lpage>. <pub-id pub-id-type="doi">10.18653/v1/D16-1021</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Ji</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Zhou</surname> <given-names>Q.</given-names></name></person-group> (<year>2020</year>). <article-title>Dependency graph enhanced dual transformer structure for aspect-based sentiment classification</article-title>, in <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>. Online: Association for Computational Linguistics. p. <fpage>6578</fpage>&#x02013;<lpage>6588</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2020.acl-main.588</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Toprak</surname> <given-names>C.</given-names></name> <name><surname>Jakob</surname> <given-names>N.</given-names></name> <name><surname>Gurevych</surname> <given-names>I.</given-names></name></person-group> (<year>2010</year>). <article-title>Sentence and expression level annotation of opinions in user-generated discourse</article-title>, in <source>Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>. <publisher-loc>Uppsala, Sweden</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>575</fpage>&#x02013;<lpage>584</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Toth</surname> <given-names>L.</given-names></name> <name><surname>Hoffmann</surname> <given-names>I.</given-names></name> <name><surname>Gosztolya</surname> <given-names>G.</given-names></name> <name><surname>Vincze</surname> <given-names>V.</given-names></name> <name><surname>Szatloczki</surname> <given-names>G.</given-names></name> <name><surname>Banreti</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>A., speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech</article-title>. <source>Curr. Alzheimer Res</source>. <volume>15</volume>, <fpage>130</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.2174/1567205014666171121114930</pub-id><pub-id pub-id-type="pmid">29165085</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tristan</surname> <given-names>M.</given-names></name> <name><surname>Saturnino Analysis</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>Analysis and classification of word co-occurrence networks from Alzheimer&#x00027;s patients and controls</article-title>. <source>Front. Comp. Sci</source>. <volume>3</volume>, <fpage>649508</fpage>. <pub-id pub-id-type="doi">10.3389/fcomp.2021.649508</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A.</given-names></name> <name><surname>Shazeer</surname> <given-names>N.</given-names></name> <name><surname>Parmar</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Attention Is All You Need</article-title> in <source>arXiv 2017 Proc of Advances in Neural Information Processing Systems</source> p. <fpage>5998</fpage>&#x02013;<lpage>6008</lpage>.</citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Shen</surname> <given-names>W.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Quan</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>Relational graph attention network for aspect-based sentiment analysis</article-title>, in <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>. Online: Association for Computational Linguistics. p. <fpage>3229</fpage>&#x02013;<lpage>3238</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2020.acl-main.295</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Mazumder</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>B.</given-names></name> <name><surname>Zhou</surname> <given-names>M.</given-names></name> <name><surname>Chang</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Target-sensitive memory networks for aspect sentiment classifification</article-title>, in <source>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).</source> <publisher-loc>Melbourne, Australia</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>957</fpage>&#x02013;<lpage>967</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P18-1088</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>M.</given-names></name> <name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Zhao</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>Attention-based LSTM for aspect level sentiment classification</article-title>, in <source>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>. <publisher-loc>Austin, Texas</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>606</fpage>&#x02013;<lpage>615</lpage>. <pub-id pub-id-type="doi">10.18653/v1/D16-1058</pub-id><pub-id pub-id-type="pmid">35145460</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>B.</given-names></name> <name><surname>Shu</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>BERT post-training for review reading comprehension and aspect-based sentiment analysis</article-title>,. in <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)</source>. <publisher-loc>Minneapolis, Minnesota</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>2324</fpage>&#x02013;<lpage>2335</lpage>.</citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>Implicit feature identification in Chinese reviews using explicit topic mining model</article-title>. <source>Knowledge-Based Syst</source>. <volume>76</volume>, <fpage>166</fpage>&#x02013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2014.12.012</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yadollahi</surname> <given-names>A.</given-names></name> <name><surname>Shahraki</surname> <given-names>A. G.</given-names></name> <name><surname>Zaiane</surname> <given-names>O. R.</given-names></name></person-group> (<year>2017</year>). <article-title>Current state of text sentiment analysis from opinion to emotion mining</article-title>. <source>ACM Computing Surveys (CSUR).</source> <volume>50</volume>, <fpage>25</fpage>. <pub-id pub-id-type="doi">10.1145/3057270</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yancheva</surname> <given-names>M.</given-names></name> <name><surname>Rudzicz</surname> <given-names>F.</given-names></name></person-group> (<year>2016</year>). <article-title>Vector-space topic models for detecting Alzheimer&#x00027;s disease</article-title>, in <source>Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).</source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Germany; Association for Computational Linguistics</publisher-name>) p. <fpage>2337</fpage>&#x02013;<lpage>2346</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P16-1221</pub-id></citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>J.</given-names></name> <name><surname>Bian</surname> <given-names>Y.</given-names></name> <name><surname>Cai</surname> <given-names>X.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Church Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer&#x00027;s Disease</article-title>, in <source>Interspeech</source>. <pub-id pub-id-type="doi">10.21437/Interspeech.2020-2516</pub-id></citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zehra</surname> <given-names>S.</given-names></name> <name><surname>Jeffrey</surname> <given-names>S.</given-names></name> <name><surname>Mashrura</surname> <given-names>T.</given-names></name> <name><surname>Shi-ang</surname> <given-names>Q.</given-names></name> <name><surname>Eleni</surname> <given-names>S.</given-names></name> <name><surname>Russell</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>Learning language and acoustic models for identifying alzheimer&#x00027;s dementia from speech</article-title>. <source>Front. Comp. Sci.</source> <volume>3</volume>, <fpage>624659</fpage>. <pub-id pub-id-type="doi">10.3389/fcomp.2021.624659</pub-id><pub-id pub-id-type="pmid">26484921</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Song</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Aspect-based sentiment classifification with aspect specific graph convolutional networks</article-title>, in <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).</source> <publisher-loc>Hong Kong, China</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. p. <fpage>4568</fpage>&#x02013;<lpage>4578</lpage>. <pub-id pub-id-type="doi">10.18653/v1/D19-1464</pub-id></citation>
</ref>
</ref-list>
</back>
</article>