<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Lang. Sci.</journal-id>
<journal-title>Frontiers in Language Sciences</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Lang. Sci.</abbrev-journal-title>
<issn pub-type="epub">2813-4605</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/flang.2023.1100774</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Language Sciences</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Modeling speech processing in case of neurogenic speech and language disorders: neural dysfunctions, brain lesions, and speech behavior</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kr&#x000F6;ger</surname> <given-names>Bernd J.</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/87539/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Phoniatrics, Pedaudiology, and Communication Disorders, RWTH Aachen University</institution>, <addr-line>Aachen</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Qing Cai, East China Normal University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Elena Barbieri, Northwestern University, United States; Ya-Ning Chang, National Chung Cheng University, Taiwan</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Bernd J. Kr&#x000F6;ger <email>bernd.kroeger&#x00040;rwth-aachen.de</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>10</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>2</volume>
<elocation-id>1100774</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>09</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Kr&#x000F6;ger.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Kr&#x000F6;ger</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Computer-implemented neural speech processing models can simulate patients suffering from neurogenic speech and language disorders like aphasia, dysarthria, apraxia of speech, and neurogenic stuttering. Speech production and perception tasks simulated by using quantitative neural models uncover a variety of speech symptoms if neural dysfunctions are inserted into these models. Neural model dysfunctions can be differentiated with respect to type (dysfunction of neuron cells or of neural connections), location (dysfunction appearing in a specific buffer of submodule of the model), and severity (percentage of affected neurons or neural connections in that specific submodule of buffer). It can be shown that the consideration of quantitative computer-implemented neural models of speech processing allows to refine the definition of neurogenic speech disorders by unfolding the relation between inserted neural dysfunction and resulting simulated speech behavior while the analysis of neural deficits (e.g., brain lesions) uncovered from imaging experiments with real patients does not necessarily allow to precisely determine the neurofunctional deficit and thus does not necessarily allow to give a precise neurofunctional definition of a neurogenic speech and language disorder. Furthermore, it can be shown that quantitative computer-implemented neural speech processing models are able to simulate complex communication scenarios as they appear in medical screenings, e.g., in tasks like picture naming, word comprehension, or repetition of words or of non-words (syllable sequences) used for diagnostic purposes or used in speech tasks appearing in speech therapy scenarios (treatments). Moreover, neural speech processing models which can simulate neural learning are able to simulate progress in the overall speech processing skills of a model (patient) resulting from specific treatment scenarios if these scenarios can be simulated. Thus, quantitative neural models can be used to sharpen up screening and treatment scenarios and thus increase their effectiveness by varying certain parameters of screening as well as of treatment scenarios.</p></abstract>
<kwd-group>
<kwd>neural model of speech processing</kwd>
<kwd>speech production</kwd>
<kwd>speech perception</kwd>
<kwd>speech disorder</kwd>
<kwd>neural dysfunction</kwd>
<kwd>brain lesion</kwd>
<kwd>communication scenarios</kwd>
<kwd>medical screening</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="4"/>
<equation-count count="0"/>
<ref-count count="69"/>
<page-count count="19"/>
<word-count count="19986"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Language Processing</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction</title>
<p>Neural models of speech processing comprise the modeling of speech production and speech perception/comprehension. Production models start with the specification of a verbal intention at the semantic or concept level, generate lemmata and phonological forms (cognitive-linguistic model part). These models subsequently initiate the motor execution processes including articulatory movement generation, acoustic signal generation, and sensory feedback signal generation (sensorimotor model part). Well-known neural models representing the sensorimotor part of speech production have been developed by Dell (<xref ref-type="bibr" rid="B15">1986</xref>) and Dell et al. (<xref ref-type="bibr" rid="B16">2007</xref>, <xref ref-type="bibr" rid="B17">2013</xref>; spreading activation model), Roelofs (<xref ref-type="bibr" rid="B54">1992</xref>, <xref ref-type="bibr" rid="B55">1997</xref>, <xref ref-type="bibr" rid="B56">2014</xref>; WEAVER model), and Levelt et al. (<xref ref-type="bibr" rid="B44">1999</xref>; word production model). Well-known neural models representing the <italic>sensorimotor part</italic> have been developed by Guenther (<xref ref-type="bibr" rid="B26">2006</xref>, <xref ref-type="bibr" rid="B27">2016</xref>; DIVA model), Guenther et al. (<xref ref-type="bibr" rid="B28">2006</xref>; DIVA model), and Bohland et al. (<xref ref-type="bibr" rid="B7">2010</xref>; GODIVA model). A biologically inspired feedback-aware speech task control approach has been introduced by Parrell et al. (<xref ref-type="bibr" rid="B51">2019</xref>; FACTS) and a spiking neuron model covering the linguistic and sensorimotor part has been developed by Kr&#x000F6;ger et al. (<xref ref-type="bibr" rid="B41">2012</xref>, <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>; ACT model). All these neural models are concrete, quantitatively implemented, and checked by computer-simulating realistic communication scenarios.</p>
<p>A comprehensive neurobiologically motivated but still not computer-implemented model of <italic>speech perception and comprehension</italic> is introduced by Hickok and Poeppel (<xref ref-type="bibr" rid="B29">2007</xref>, <xref ref-type="bibr" rid="B30">2016</xref>). This model comprises modules for spectro-temporal analysis of incoming acoustic speech signals, for phonological processing and then splits in a ventral processing stream including lexical, semantic, and grammatical processing and in a dorsal stream for further auditory, somatosensory, and motor processing.</p>
<p>Combined production-perception models (<italic>speech processing models</italic>) are needed if the simulation of speech learning (i.e., modeling of <italic>speech acquisition</italic>) is of interest (developmental neural models; see Warlaumont and Finnegan, <xref ref-type="bibr" rid="B67">2016</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>). During the babbling phase&#x02014;which appears in the first year of lifetime&#x02014;first sensorimotor relations establish (mainly auditory-to-motor relations) and later during the imitation phase, in which the toddler imitates speech items produced by caretakers, the mental lexicon and the grammatical repertoire of the target language is learned and stored. These speech acquisition phases are needed to be simulated in a realistic neural speech processing model. Moreover, complete neural speech processing models are needed for the simulation of <italic>speech communication scenarios</italic>, i.e., scenarios, comprising capabilities like listening to and speaking with a communication partner. It should be kept in mind that all medical screenings in case of diagnosis of speech disorders include such communication scenarios between test supervisor (communication partner) and patient (model).</p>
<p>It will be shown that the neural models reviewed here are able to unfold the complex associations between <italic>neural dysfunctions</italic> and <italic>symptoms of disordered speech</italic> in case of neurogenic speech and language disorders. Thus, models can help to refine the definition of speech and language disorders because an underlying neural dysfunction, which is the basis for the definition of a speech disorder, can be clearly defined in a neural model while a lesioned brain region of patients (i.e., anatomical information) which probably is located by imaging techniques in a patient (e.g., Crinion et al., <xref ref-type="bibr" rid="B11">2013</xref>) does not necessarily point in a one-to-one relation on a specific neural dysfunction (i.e., functional information) nor on a specific speech and language disorder. While functional information is directly defined as inserted distortion to the model in a specific neural subnetwork, this information needs to be extracted indirectly from behavioral data by asking patients to perform specific speech and language tests in order to collect data of relevant speech errors from these screening scenarios.</p>
<p>In this paper we will concentrate on four types of neurogenic speech and language disorders, i.e., on aphasia, dysarthria, apraxia of speech, and neurogenic stuttering. <italic>Aphasia</italic> can be defined as a disorder resulting from neural dysfunctions arising in the cognitive-linguistic part of the speech processing network. Aphasia can affect the activation of a word at the lexical level even if motor processes are intact or can affect the comprehension of a word even if auditory perception is intact (e.g., Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>). <italic>Dysarthria</italic> and <italic>apraxia of speech</italic> result from neural dysfunctions in the sensorimotor part of the brain including the peripheral motor neuron system. All types of <italic>dysarthria</italic> reflect functional deficits appearing during motor execution even in case of fully functional articulatory organs (Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>). <italic>Apraxia of speech</italic> reflect deficits in motor planning and motor programming (Van der Merwe, <xref ref-type="bibr" rid="B66">2021</xref>). <italic>Neurogenic stuttering</italic> reflects deficits in the initiation of execution of motor programs (Chang and Guenther, <xref ref-type="bibr" rid="B8">2020</xref>).</p>
<p><italic>Symptoms of speech and language disorders</italic> typically appear in communication situations and can be evoked in speech tasks like picture naming, narration tasks, word, non-word (logatome), or syllable repetition tasks, or in word or sentence comprehension tasks. For all types of neurogenic speech and language disorders, diagnosis procedures (also called screenings) comprise a <italic>batterie of tests</italic> and most of these tests are <italic>speech mediated tasks</italic> (i.e., the supervisor instructs the patient verbally, gives test items verbally and the patient answers verbally; a non-speech mediated task would be picture pointing like in the Token Test; here even the target words could be presented non-verbally, for example as written text). Well-known and widely used <italic>screenings</italic> in case of suspected aphasia are e.g., the Token Test (De Renzi and Vignolo, <xref ref-type="bibr" rid="B14">1962</xref>; De Renzi and Faglioni, <xref ref-type="bibr" rid="B13">1978</xref>), the Frenchay Aphasia Screening Test (FAST, Enderby et al., <xref ref-type="bibr" rid="B21">1987</xref>), the Acute Aphasia Screening Protocol (Crary et al., <xref ref-type="bibr" rid="B10">1989</xref>), the Aachen Aphasia Bedside Test (Biniek et al., <xref ref-type="bibr" rid="B6">1992</xref>), and the Bedside Western Aphasia Battery (Kertesz, <xref ref-type="bibr" rid="B33">2006</xref>). These screenings typically (i) assess comprehension, e.g., by pointing on objects on cards portraying a scene and/or geometric shapes, by executing simple movements based on instructions given by the test supervisor, (ii) assess expression, e.g., by describing a picture, by repeating words, or by naming of objects displayed on pictures, (iii) assess reading capabilities by reading words or short texts, and (iv) assess writing capabilities by writing words or a short text which describes a scene displayed on pictures.</p>
<p>Screenings for detecting dysarthria or apraxia of speech are often combined with screenings for differential diagnosis together with suspected aphasia and are sometimes subsumed as screenings for neurological communicative disorders (e.g., Araki et al., <xref ref-type="bibr" rid="B2">2021</xref>) or as screenings for differential diagnosis of different types of neurogenic speech disorders (e.g., Allison et al., <xref ref-type="bibr" rid="B1">2020</xref>). These screenings include verbal-linguistic sections (e.g., word and nonword repetition, object naming, word writing, dictation) and articulatory sections including non-speech tasks like oral movement analysis and tasks like diadochokinesis, i.e., repetition of syllable sequences like [badaga] or [pataka] as often as possible and as fast as possible. Apraxia of speech screenings as well include verbal-linguistic tests and articulatory tests like word and non-word repetition, sentence production, and phonological awareness tests. Here, the analysis of speech items which are uttered by patients in addition comprises phonetic transcriptions to identify prosodic and segmental errors (Ballard et al., <xref ref-type="bibr" rid="B3">2016</xref>; Allison et al., <xref ref-type="bibr" rid="B1">2020</xref>).</p>
<p><italic>Treatments</italic> for all neurogenic speech disorders mainly comprise practice for improving speaking capabilities in case of sentence and word production. During ongoing treatment, the training concentrates on speech items with increasing length and complexity. Lexical learning strategies for the association of phonological word form and meaning aim to widen the vocabulary of patients in case of patients suffering from different forms of aphasia (Tippett et al., <xref ref-type="bibr" rid="B64">2015</xref>). Practice for syllable production to learn the pronunciation of different speech sounds in typical speech-like environments and in combination with other speech sounds within a syllable is focused on in case of treatments for patients suffering from apraxia of speech (Ballard et al., <xref ref-type="bibr" rid="B4">2000</xref>). In case of dysarthria in addition detailed advises are given for increasing or reducing speaking rate and speaking intelligibility or for increasing speech and non-speech motor capabilities for the neuromuscular system of several articulators including respiration and phonation (Palmer and Enderby, <xref ref-type="bibr" rid="B50">2007</xref>).</p>
</sec>
<sec id="s2">
<title>2. Functional location, type, and severity of neural dysfunctions</title>
<p>In this paper a comprehensive sketch of a computer-implementable speech processing model is introduced (<xref ref-type="fig" rid="F1">Figure 1</xref> and Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>). <xref ref-type="fig" rid="F1">Figure 1</xref> illustrates that models of speech processing can be divided in <italic>functional subnetworks</italic> or <italic>modules</italic> which specify <italic>functional locations of parts of the neural network</italic> and that each of these subnetworks or modules can be associated with specific <italic>cortical and subcortical brain regions</italic>. The model sketch presented in <xref ref-type="fig" rid="F1">Figure 1</xref> comprises a <italic>cognitive-linguistic model part</italic> for which the associations of subnetworks or modules to brain regions are outlined by Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>) and a <italic>sensorimotor model part</italic> for which these relations are outlined by Guenther (<xref ref-type="bibr" rid="B26">2006</xref>), Bohland et al. (<xref ref-type="bibr" rid="B7">2010</xref>), and Kearney and Guenther (<xref ref-type="bibr" rid="B32">2019</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>(A)</bold> Functional architecture of a sketch of a neural model of speech processing including <bold>(B)</bold> localization of modules of the cognitive-linguistic subnetwork and <bold>(C)</bold> localization of modules of the sensorimotor subnetwork. Light gray modules and connections within <bold>(B)</bold> represent sensorimotor components. Mental lexicon and mental syllabary are the central knowledge and skill repositories within the cognitive-linguistic and sensorimotor part of the model as indicated in <bold>(A)</bold>. The mental lexicon stores concepts, lemmata and phonological forms of already learned words. The mental syllabary stores motor plans, motor programs, auditory states, and somatosensory states of already learned syllables. Mental lexicon: &#x0201C;in&#x0201D; marks input buffers; &#x0201C;out&#x0201D; marks output buffers of the mental lexicon at the phonological, lemma, and concept (i.e., semantic) level. &#x0201C;in&#x0201D;-buffers are activated in connection with auditory and somatosensory processing; see part <bold>(C)</bold> of this figure and indicates buffers at auditory and somatosensory primary cortex areas. &#x0201C;out&#x0201D; marks output buffers of the mental lexicon at all three levels (concept, lemma, and phonological form) and these buffers are activated in connection with production processes and indicates buffers at temporal as well as at frontal lobe of the neocortex. All in- and output buffers are interconnected (i.e., associated with each other by neural connections) in order to represent the whole semantic, wordform, and phonological form knowledge stored in the mental lexicon for a specific target language. Cortico-cortical loops: (i) orange arrows indicate connections between specific cortical modules (neural buffers) and the subcortical action control module (action control loop including basal ganglia and thalamus). Action control is needed for guaranteeing the correct process flow in case of any production or perception task including motor program execution. Basal ganglia and thalamus (orange) are central while cortical modules (black) are located lateral (neocortex) in <bold>(B, C)</bold>. Orange dashed arrows in <bold>(A)</bold> indicate transfer of feedback information for the action control loop in case of learning (see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>). (ii) Green arrows indicate connections between specific cortical modules (neural buffers) and the subcortical motor feedback loop (motor feedback loop comprising parts of the pons, cerebellum, and thalamus). This second cortico-cortical loop is indicated by green lines and by a green box or green text in <bold>(A, C)</bold>. While the action control loop controls cognitive as well as sensorimotor processes the motor feedback loop only acts on the sensorimotor components of the neural network. The bidirectional dorsal pathway <bold>(B)</bold> connects areas of the posterior superior temporal gyrus pSTG with two main areas in the frontal lobe, i.e., premotor cortex PMC and posterior inferior frontal gyrus pIFG. The bidirectional ventral pathway <bold>(B)</bold> connects areas of pIFG with two areas in the temporal gyrus, i.e., with anterior up to posterior regions of the STG/STS (i.e., a route connecting the three levels of the mental lexicon from phonological form via lemma to concept) as well as with the anterior inferior temporal gyrus, also called ventral anterior temporal lobe (Stefaniak et al., <xref ref-type="bibr" rid="B61">2020</xref>). The information given in this figure is based on Friederici (<xref ref-type="bibr" rid="B22">2011</xref>), Ueno et al. (<xref ref-type="bibr" rid="B65">2011</xref>), Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>), Stefaniak et al. (<xref ref-type="bibr" rid="B61">2020</xref>), and Miller and Guenther (<xref ref-type="bibr" rid="B48">2021</xref>). Semantic processing [<bold>(B)</bold>, see Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>, and cf. combinatorial network in Hickok and Poeppel, <xref ref-type="bibr" rid="B29">2007</xref>] is a part of overall cognitive processing [see <bold>(A)</bold>].</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="flang-02-1100774-g0001.tif"/>
</fig>
<p>This model sketch is mainly based on two computer-implemented model approaches, i.e., on the DIVA/GODIVA approach for sensorimotor control of speech production (Guenther, <xref ref-type="bibr" rid="B26">2006</xref>, <xref ref-type="bibr" rid="B27">2016</xref>; Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>; Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>) and on the WEAVER approach for word-form encoding (Roelofs, <xref ref-type="bibr" rid="B54">1992</xref>, <xref ref-type="bibr" rid="B55">1997</xref>, <xref ref-type="bibr" rid="B56">2014</xref>). The WEAVER model (as published by Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>) reflects the cognitive-linguistic part of the model sketch (<xref ref-type="fig" rid="F1">Figure 1</xref>) and is based on the word production model published by Levelt et al. (<xref ref-type="bibr" rid="B44">1999</xref>). The DIVA/GODIVA model (as published by Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>) reflects the sensorimotor part of the model sketch (for the differences between DIVA and GODIVA see below in this section). WEAVER as well as DIVA/GODIVA are implemented using second generation neural networks (see <xref ref-type="app" rid="A1">Appendix A</xref>). The network model developed by Kr&#x000F6;ger et al. (<xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>) aims for a complete representation of a speech processing network as given in <xref ref-type="fig" rid="F1">Figure 1</xref> including cognitive-linguistic and sensorimotor processing (see below) and uses a third generation neural network approach (see <xref ref-type="app" rid="A1">Appendices A</xref>&#x02013;<xref ref-type="app" rid="A1">C</xref>).</p>
<p>The model sketch (<xref ref-type="fig" rid="F1">Figure 1</xref>) comprises a mental lexicon and a mental syllabary as central knowledge and skill repositories within the cognitive-linguistic and sensorimotor part of the model (<xref ref-type="fig" rid="F1">Figure 1A</xref>). The core of the <italic>mental lexicon</italic>&#x02014;storing and processing cognitive speech states (concepts, lemmata, and phonological forms)&#x02014;is located in the temporal lobe. Its brain locations overlap with the network part representing the auditory input states of syllables within the mental syllabary. Phonological representations as output of the mental lexicon and input for the sensorimotor part of the speech production part of the speech processing model sketch are located in the posterior part of the frontal lobe near the syllable initiation module of the speech production network.</p>
<p>Motor program states and somatosensory states of syllables as part of the <italic>mental syllabary</italic> are stored in the inferior parts of the frontal and parietal lobe (the transformation of phonological states into motor plans is described in detail by Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>, i.e., within the GODIVA model, and this transformation is in accordance with the model concept given by Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>; the processing using the mental syllabary is described in detail by DIVA model, see, e.g., Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>). Speech perception is mainly located in the superior and posterior part of the temporal lobe and comprehension leads to lexical activations in the anterior part of the temporal lobe. The cortico-cortical feedback loop including basal ganglia and thalamus (<italic>action control loop</italic>) can be activated from many cortical regions and feeds neural activations back to different parts of the cortical speech processing neural network, while the cortico-cortical control loop including cerebellum via pons (<italic>motor feedback loop</italic>) is activated mainly by the production part of the sensorimotor network and feeds back its activations to this region for activating the learned auditory and somatosensory states for currently produced syllables as well as for activating motor program execution.</p>
<p>While imaging and lesion studies support the strong correlation between structural (anatomical) brain locations and functional aspects (functional modules) of the neural network (e.g., Batista-Garc&#x000ED;a-Ram&#x000F3; and Fern&#x000E1;ndez-Verdecia, <xref ref-type="bibr" rid="B5">2018</xref>; Litwi&#x00144;czuk et al., <xref ref-type="bibr" rid="B46">2022</xref>), this does not implicate that there exists no close neighborhood or even spatial overlap of functional modules in several regions of the brain. Thus, <italic>functional deficits</italic> appearing in modules or sub-networks of the neural speech processing network cannot always be easily associated with a specific <italic>localization of dysfunctional (e.g., damaged) regions within the brain</italic>. Moreover, in the case of developmental speech and language disorders (i.e., delay of learning and storing data within the speech processing neural network; difficulties in learning), in case of speech and language disorders which result from neurodegenerative diseases, or in case of aging which may lead to (slow and limited) degeneration of the neural network, imaging studies may not indicate any specific anatomic regions or structural abnormalities which directly uncover an underlying neural deficit which probably is responsible for the occurring speech or language disorder. In all these cases specific screenings are needed in order to collect relevant behavioral data for diagnosing a speech and language disorder correctly.</p>
<p>It is possible to insert <italic>neural dysfunctions</italic> of any type and severity to any functional subnetwork or module (i.e., functional location) of a speech processing neural network model. Thus, a modeled neural dysfunction can be specified with respect to <italic>functional location, severity</italic>, and <italic>neural type</italic>. The <italic>functional location</italic> (i.e., a specific module or subnetwork in the model, which is affected) is correlated to a lesioned brain regions which can be identified on basis of functional imaging data, but in many cases, the identified brain areas hosting a specific sub-network, module, or buffer of the speech processing neural network are relatively broad (e.g., Golfinopoulos et al., <xref ref-type="bibr" rid="B25">2010</xref>; Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>). The <italic>severity</italic> of a dysfunction is defined as the percentage of non-functioning neurons or of non-functioning neural connections within a module or sub-network of the neural model (e.g., Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>). The <italic>neural type of a dysfunction</italic> separates dysfunctions of neurons (cells, specifically cell body), dysfunctions of synapses (synaptic connections), and dysfunctions of connecting pathways (axons, dendrites) within the modeled neural network (e.g., Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>). Moreover, some models are capable of varying concentrations of neurotransmitters like dopamine level in specific modules or subnetworks (e.g., in striatum of basal ganglia, Civier et al., <xref ref-type="bibr" rid="B9">2013</xref>; Senft et al., <xref ref-type="bibr" rid="B59">2016</xref>, <xref ref-type="bibr" rid="B60">2018</xref>). In these models, an abnormal concentration (too low or too high) of a transmitter substance can be introduced for instantiating a further type of neural dysfunction.</p>
<p>As already stated above, the linguistic-cognitive part of the model sketch given in <xref ref-type="fig" rid="F1">Figure 1</xref> is mainly based on the WEAVER model and the sensorimotor part is based on the DIVA/GODIVA model.</p>
<p>WEAVER (Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>) is a second generation or <italic>node-and-link neural network</italic> (see <xref ref-type="app" rid="A1">Appendix A</xref>) consisting of seven <italic>node layers</italic> (or simply <italic>layers</italic>), separating concept level, lemma level, phonological form level, and syllable motor program level. Two input-/output-layers are labeled as lexical input and output layers and phonological forms are separated in input- and output-layers as well, while the lemma and concept level do not separate input and output forms. The syllable motor program layer in WEAVER is comparable to a motor plan level in our model sketch (<xref ref-type="fig" rid="F1">Figure 1</xref>). Links are building up neural pathways for connecting different layers of the model, i.e., the layers representing concept, lemma, lexical in-/output, phoneme in-/output and motor plan level. From the functional viewpoint of neural processing these inter-layer neural connections or inter-layer links can be labeled also as <italic>neural mappings</italic> while phonemes, lemmata, and concepts are represented as <italic>neural states</italic>. Each state is represented in WEAVER by a specific node in a neural layer. Thus, this network type uses a local representation of states. The performance of the WEAVER network for simulating different types of aphasias and the temporal specification of increasing/decreasing node activation is discussed in Section 3 and in Section 5.</p>
<p>The DIVA/GODIVA approach differentiates 10 layers, also called <italic>neural maps</italic> in the context of this modeling approach (Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>; Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>; Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>). These neural maps and their hypothetical location in the brain are discussed in Section 3 of this paper. The neural maps (i.e., initiation map, speech sound map, auditory target, state, and error map, somatosensory target, state, and error map, articulator map, and feedback control map) and the cortical mappings connecting these neural maps are displayed in <xref ref-type="fig" rid="F1">Figure 1A</xref>. Here, the speech sound map is labeled motor plan map (or motor plan buffer), the feedback map is part of the mental syllabary buffer, and the articulator map is part of the motor execution buffer. These renaming of map labels in <xref ref-type="fig" rid="F1">Figure 1</xref> results from the separation of motor planning and motor programming and on quantifying motor plans and programs with respect to the concept of speech actions or articulator gestures (see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>). The functioning of the DIVA/GODIVA model and the modeling of speech disorders is discussed in Section 3 and in Section 5.</p>
<p>Moreover, the model sketch presented in this paper is in accordance as well with three further computational neural network models cited in this review paper (see below).</p>
<list list-type="simple">
<list-item><p>(i) The <italic>spreading activation model</italic> introduced by Dell (<xref ref-type="bibr" rid="B15">1986</xref>) and further developed (see Dell et al., <xref ref-type="bibr" rid="B16">2007</xref>, <xref ref-type="bibr" rid="B17">2013</xref>) is a three-layer second generation neural network modeling lexical processing. The three layers (phonemes, words, semantic features) are interconnected by bidirectional mappings between phoneme and word layer and between word and semantic feature layer. For each mapping, all nodes of one layer are connected with all nodes of the other layer in both directions. This allows the typical spreading of activation from one layer to another layer. The approach is mainly used for modeling aphasic speech disorders. In the later versions of the model (Dell et al., <xref ref-type="bibr" rid="B17">2013</xref>) a fourth layer, i.e., an auditory input layer is added for modeling the auditory-phonetic-to-phonological conversion in a more detailed way (see Section 5).</p></list-item>
<list-item><p>(ii) The LICHTHEIM 2 model (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) is based as well on a second generation neural network model and separates seven neural layers. While four of these layers are hidden layers (no specification of the type of states is needed here), whereas all other network models discussed in this paper have a specification of type of state for each layer or buffer in order to specify its layers or buffers in a functional sense as, e.g., concept, lemma, phonological form, sensory, or motor layer. The hidden layers defined in this network model are chosen with respect to neuroanatomical reasons as neural hubs within the ventral and dorsal route or speech processing. The mappings connecting all layers are bidirectional. Three of the seven layers are defined as input/output layers, i.e., an auditory input layer, a motor output layer and a semantic in-/output layer which receives semantic input information, e.g., in case of picture naming tasks and which generates semantic output information, e.g., in the case of a word comprehension task. Thus, this model can be represented by <xref ref-type="fig" rid="F1">Figure 1</xref> at least partially. It comprises an auditory input layer and a neural pathway toward the concept and semantic processing layer via temporal lobe and further to the motor plan/program layer (ventral pathway). Moreover, it comprises a neural pathway from auditory input layer to the motor plan/program layer via parietal lobe (dorsal pathway). Thus, the hidden layers of the LICHTHEIM 2 model cannot be directly associated with intermediate functional layers of our model sketch, but it can be hypothesized that the two hidden layers within the temporal lobe which are part of the ventral pathway are related to lexical processing (concept, lemma and phonological form level in <xref ref-type="fig" rid="F1">Figure 1</xref>). The two further hidden layers which appear in LICHTHEIM 2&#x02014;one of them within the dorsal route and located in the parietal lobe, directly connecting auditory input and the motor domain and the other located in the ventral route connecting layers of the temporal land frontal lobe and located in the opercularis-triangularis&#x02014;are not easily interpretable in our model sketch.</p></list-item>
<list-item><p>(iii) The ACT-model in its current state (speech action model, ACT, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>) is probably most exactly represented in <xref ref-type="fig" rid="F1">Figure 1</xref> (for its neurocomputational realization see <xref ref-type="app" rid="A3">Appendix C</xref>). This model uses the spiking neural network approach developed in the NEF-SPA framework (Neural Engineering Framework, NEF, augmented by and Semantic Pointer Architecture, see <xref ref-type="app" rid="A2">Appendix B</xref>) and it is capable of representing and processing cognitive states, i.e., concept states, lemma states, and phonological form states within the perception pathway and within the production pathway of the mental lexicon (see <xref ref-type="fig" rid="F1">Figure 1A</xref>) as well as sensorimotor states, i.e., motor plan states and motor program states within the further (lower) production pathway and sensorimotor and auditory states within the feedback perception pathway. Cognitive-linguistic states are hosted in <italic>cognitive-linguistic state buffers</italic> or <italic>SPA-buffers</italic> (see <xref ref-type="fig" rid="F1">Figure 1A</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>; <xref ref-type="app" rid="A2">Appendix B</xref>). Higher-level motor states are hosted in the <italic>motor plan and motor program buffers</italic> (also SPA-buffers, see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>). Lower-level motor states (i.e., syllable oscillators and gesture movement trajectory estimators) are hosted in <italic>lower-level state buffers</italic>, called <italic>neuron ensembles</italic> or <italic>NEF-ensembles</italic>.</p></list-item>
</list>
<p>So far, the cognitive-linguistic part as well as the production-side of the sensorimotor part of the model sketch (<xref ref-type="fig" rid="F1">Figure 1</xref>) are computer-implemented now by using a spiking neuron approach (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>; see also <xref ref-type="app" rid="A3">Appendix C</xref>). The feedback loop of the sensorimotor part of the model sketch has been implemented beside DIVA in a spatio-temporal activation averaging model (STAA model or second generation neural network model, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B42">2014</xref>; Kr&#x000F6;ger and Cao, <xref ref-type="bibr" rid="B39">2015</xref>; Kr&#x000F6;ger and Bekolay, <xref ref-type="bibr" rid="B37">2019</xref>, p. 133ff, while spiking neuron or spiking neural networks i.e., SNN&#x00027;s, are also called third generation neural network approaches, see Maass, <xref ref-type="bibr" rid="B47">1997</xref>).</p>
</sec>
<sec id="s3">
<title>3. Anatomical locations of modules or sub-networks</title>
<p>Computer-implemented neural network models are simulating neural functionality. These models clearly define subnetworks which are responsible for specific functional subtasks, e.g., for selecting and activating a concept, lemma, or phonological form, or for activating a stored syllable motor plan etc. Imaging techniques allow to specify exactly those brain regions which are activated if a specific functional task is performed and thus allow to associate neural functionality and brain areas. Indefrey and Levelt (<xref ref-type="bibr" rid="B31">2004</xref>) and Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>) assume that concepts which are stored in the <italic>mental lexicon</italic> are represented in anterior-ventral temporal cortex, lemmas in the mid-section of the left middle temporal gyrus, input and output lexical forms of lemmas as well as input phonemes in left posterior superior and middle temporal gyrus (Wernicke&#x00027;s area), while output phonemes are stored in left posterior inferior frontal gyrus (Broca&#x00027;s area). Syllable motor representations which are stored in the <italic>mental syllabary</italic> are represented in ventral precentral gyrus. Inter-lobe neural associations appear especially between phonological input and output forms located in part in the temporal lobe and in part in the frontal lobe (see <xref ref-type="fig" rid="F1">Figure 1B</xref>). These associations are structurally realized by left arcuate fasciculus and uncinate fasciculus.</p>
<p>Guenther (<xref ref-type="bibr" rid="B26">2006</xref>), Guenther et al. (<xref ref-type="bibr" rid="B28">2006</xref>), Golfinopoulos et al. (<xref ref-type="bibr" rid="B25">2010</xref>), and Kearney and Guenther (<xref ref-type="bibr" rid="B32">2019</xref>) assume that the <italic>initiation map</italic>, which activates motor plans and motor programs and thus starts syllable execution as postulated in the DIVA and GODIVA models (Guenther, <xref ref-type="bibr" rid="B26">2006</xref>; Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>), is located in the supplementary motor area, on the medial wall of the frontal cortex. The speech sound map (a term used in DIVA and GODIVA models and represented by mental syllabary in our model sketch, <xref ref-type="fig" rid="F1">Figure 1A</xref>) is assumed to be located in left ventral premotor cortex, i.e., in the ventral precentral gyrus and in the surrounding portions of posterior inferior frontal gyrus and of the anterior insula.</p>
<p>The <italic>articulation map</italic> (execution map in model sketch, <xref ref-type="fig" rid="F1">Figure 1A</xref>) which directly activates motor neurons controlling the movements of the speech articulators is located within the ventral motor cortex (primary motor cortex). Neural buffers hosting the <italic>auditory target, state, and error maps</italic> are located within the ventral auditory cortex (temporal lobe), and those hosting the <italic>somatosensory target, state, and error maps</italic> are located in the ventral somatosensory cortex (parietal lobe). The detailed organization and functioning of the feedforward and feedback motor control system within the sensorimotor part of the model sketch using these maps is described by Guenther (<xref ref-type="bibr" rid="B26">2006</xref>) and Kearney and Guenther (<xref ref-type="bibr" rid="B32">2019</xref>) and in the context of our model sketch it is described by Kr&#x000F6;ger et al. (<xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>).</p>
<p>Two cortico-cortical feedback loops (action control loop and motor feedback loop) including basal ganglia, cerebellum, and thalamus are introduced in the model sketch (<xref ref-type="fig" rid="F1">Figure 1A</xref>). The <italic>action control loop</italic> (see orange arrows, orange box, and orange text in <xref ref-type="fig" rid="F1">Figures 1A</xref>, <xref ref-type="fig" rid="F1">C</xref>) is responsible for all cognitive control processes needed for temporal sequencing of cognitive and sensorimotor processes in each situation, e.g., paying attention to specific incoming sensory information, deciding how to react in a specific situation, and activating motor processes for reacting. In case of a speech task like picture naming this can be the sequence of visual perception (activation of visual state), word recognition (activation of a concept from mental lexicon), and word production (phonological form activation, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>). Moreover, action control as well comprises motor planning in form of motor plan and motor program activation and motor execution (see planning and motor loop in Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref> and in Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>). This control loop starts and ends in different areas of the neocortex and includes the basal ganglia and thalamus in its center (see <xref ref-type="fig" rid="F1">Figures 1B</xref>, <xref ref-type="fig" rid="F1">C</xref>, solid orange lines). A second feedback loop, here called <italic>motor feedback loop</italic> (called cerebellar loop in Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>; green arrows and boxes in <xref ref-type="fig" rid="F1">Figure 1A</xref> and green lines and green text in <xref ref-type="fig" rid="F1">Figure 1C</xref>) is responsible for activating feedback control processes comparing stored and current auditory and somatosensory states and thus in part acts on syllable execution as well. This loop is activated together with current target states at mental syllabary, comprises cerebellum and thalamus in its center, and allows correction of sensory feedback states as well as of motor program states (see <xref ref-type="fig" rid="F1">Figures 1A</xref>, <xref ref-type="fig" rid="F1">C</xref>, green lines).</p>
<p>While most approaches discussed above can be represented by <italic>functionally defined</italic> box-and-arrow models, i.e., can be represented by box-and-arrow plots in which the boxes or modules define partial <italic>functions</italic> of speech processing like semantic-to-lemma or lemma-to-phonological form transformation (see, e.g., Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>), the LICHTHEIM 2 approach (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) can be represented by a <italic>neuroanatomically defined</italic> box-and-arrow model. Here, the architecture of the network and thus the modules of the network are defined with respect to the <italic>neuroanatomy</italic> of the brain, but it should be kept in mind that these brain regions are mainly defined based on knowledge gathered from functional imaging experiments, so that these regions as well can be separated on the basis of <italic>neurofunctionality</italic>.</p>
<p>The LICHTHEIM 2 model is based on a second generation neural network model and separates seven neural layers representing different cortical areas. The model (i) connects the auditory input layer (mid-superior temporal gyrus/sulcus, mSTG, mSTS) with the semantic layer (anterior STG/STS) by two intermediate or hidden layers within the temporal lobe via the ventral pathway, (ii) connects the semantic layer with the motor output layer (insular motor cortex) via two intermediate or hidden layers within the temporal and frontal lobe as a further part of the ventral pathway (main part of the ventral pathway), and (iii) connects the auditory input layer and motor output layer via one further intermediate or hidden layer within the parietal lobe (i.e., a hidden layer within the inferior supramarginal gyrus, inf SMG) by the dorsal pathway.</p>
<p>Within the functional part of our model sketch (<xref ref-type="fig" rid="F1">Figure 1A</xref>) a part of the ventral route located in the temporal lobe and the further dorsal route (both routes as defined by Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) can be interpreted as the production pathway (our model sketch, <xref ref-type="fig" rid="F1">Figure 1A</xref>) leading from semantic processing via lexical processing (from the concept buffer via lemma, to phonological form buffer), toward motor processing (pathway from motor via and program buffers toward motor execution). In the other (i.e., in the perceptual) direction (the perception pathway in our model sketch), the dorsal pathway (from frontal via parietal toward temporal lobe as defined by Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) can be interpreted as sensory feedback processing pathway including somatosensory and auditory state, target, and error buffers (<xref ref-type="fig" rid="F1">Figure 1A</xref>), while the part of the ventral pathway located in the temporal lobe (as defined by Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) as well realizes perceptual lexical processing (i.e., comprehension).</p>
<p>In contrast to all other neurocomputational models and as stated above, the LICHTHEIM 2 approach does not specify the intermediate or hidden layers with respect to sensorimotor or linguistic functions (e.g., phonological, higher level auditory, or higher-level somatosensory representations) and thus can be interpreted as a basically neuroanatomical approach. Only the input- and output layers are defined concerning functions, i.e., the semantic layer for activation of word meanings, the acoustic input layer for activation of phonetic-acoustic features and the motor output layer for the activation of motor programs. Thus, the model can be interpreted as well as an early version of <italic>deep learning networks</italic> (for an overview on deep learning neural networks in speech processing see, e.g., Nassif et al., <xref ref-type="bibr" rid="B49">2019</xref>; Roger et al., <xref ref-type="bibr" rid="B57">2022</xref>). A further hidden layer neural network model for speech processing including the ventral visual pathway but omitting a part of the speech processing ventral pathway (connecting the anterior part of the temporal lobe directly with the inferior part of the frontal lobe) has been developed by Weems and Reggia (<xref ref-type="bibr" rid="B68">2006</xref>).</p>
</sec>
<sec id="s4">
<title>4. Disorders: symptoms, types of dysfunctions, lesioned brain regions</title>
<sec>
<title>4.1. Aphasia</title>
<p><italic>Aphasias</italic> are characterized by a loss of language knowledge and language skills. Production and/or comprehension of words or entire sentences can be disrupted. The cause is damage of parts of the central nervous system, e.g., after stroke or traumatic brain injury (acute-onset type aphasias), or the cause is a progressive neurodegenerative disease. The acute-onset type aphasias can be subdivided in <italic>Broca aphasia</italic> as a disturbance in speech production, <italic>Wernicke aphasia</italic> as a disturbance in speech comprehension, and <italic>global aphasia</italic>, as a mixture of both.</p>
<p>Other types of aphasia are conduction aphasia and transcortical aphasia. In case of <italic>conduction aphasia</italic>, both, speech production and speech comprehension are widely unaffected, but patients have difficulties to repeat unfamiliar words as well as non-words (logatomes). <italic>Transcortical aphasias</italic> can be subdivided in three types. In the case of <italic>transcortical motor aphasia</italic>, the initiation of learned words (stored in mental lexicon) is affected (e.g., picture naming is difficult), but the patient still can repeat words or syllables (direct repetition of auditory stimuli). In the case of <italic>transcortical sensory aphasia</italic>, repeating of (auditory presented) syllables, words, or phrases is also possible but without understanding the words or the meaning of the entire utterance. In the case of <italic>transcortical mixed aphasia</italic>, words can only be understood to a limited extent and can only be produced to a limited extent in a picture naming or storytelling scenario, but they can be imitated perfectly.</p>
<p>All these subtypes of aphasia can more easily be understood and differentiated based on a definition of the <italic>functional deficits</italic> in the neuronal network model sketch (<xref ref-type="fig" rid="F1">Figure 1A</xref>), i.e., can be understood and can be differentiated based on the definition of <italic>model dysfunctions</italic> as described in <xref ref-type="table" rid="T1">Table 1</xref>. Thus, in case of Broca aphasia mainly the phonological input buffer, in case of Wernicke aphasia, mainly the phonological output buffer and in case of global aphasia, mainly the phonological in <italic>and</italic> output buffer is dysfunctional (see Section 5).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Subtypes of aphasias, core symptoms, affected brain regions, and model dysfunctions following Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>).</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Type of aphasia</bold></th>
<th valign="top" align="left"><bold>Core symptoms deficits in:</bold></th>
<th valign="top" align="left"><bold>Damaged brain regions (in language dominant hemisphere)</bold></th>
<th valign="top" align="left"><bold>Neural (model) dysfunction disruptions in:</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Broca aphasia</td>
<td valign="top" align="left">Word production</td>
<td valign="top" align="left">Broca area (posterior inferior frontal gyrus)</td>
<td valign="top" align="left">Phonological output buffer</td>
</tr> <tr>
<td valign="top" align="left">Wernicke aphasia</td>
<td valign="top" align="left">Word comprehension</td>
<td valign="top" align="left">Wernicke area (posterior superior and middle temporal gyrus)</td>
<td valign="top" align="left">Phonological input buffer</td>
</tr> <tr>
<td valign="top" align="left">Global aphasia</td>
<td valign="top" align="left">Word production and comprehension</td>
<td valign="top" align="left">Broca and Wernicke area (parts of frontal and temporal lobe)</td>
<td valign="top" align="left">Phonological output as well as phonological input buffer</td>
</tr> <tr>
<td valign="top" align="left">Transcortical motor aphasia</td>
<td valign="top" align="left">Word production without word repetition</td>
<td valign="top" align="left">Anterior superior frontal lobe</td>
<td valign="top" align="left">Network between lexical output buffer (lemma level) to phonological output buffer</td>
</tr> <tr>
<td valign="top" align="left">Transcortical sensory aphasia</td>
<td valign="top" align="left">Word comprehension without word repetition</td>
<td valign="top" align="left">Inferior temporal lobe</td>
<td valign="top" align="left">Network between phonological input and lexical input buffer (lemma level)</td>
</tr> <tr>
<td valign="top" align="left">Transcortical mixed aphasia</td>
<td valign="top" align="left">Word production and comprehension without word repetition</td>
<td valign="top" align="left">Anterior superior frontal lobe and inferior temporal lobe</td>
<td valign="top" align="left">Network between lexical in- or output and phonological level on production and perception side</td>
</tr>
<tr>
<td valign="top" align="left">Conduction aphasia</td>
<td valign="top" align="left">Logatome repetition (and repetition of low-frequent complex words)</td>
<td valign="top" align="left">Left dorsal stream (Sylvian parietal temporal boundary and arcuate fasciculus)</td>
<td valign="top" align="left">Direct neural connections between input and output phoneme level</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In case of the transcortical aphasias the neural connections from/toward phonological buffers are dysfunctional (i.e., dysfunctions in connecting lemma and concept level with the phonological input level of the mental lexicon in case of transcortical sensory aphasia and dysfunctions in connecting these lexical levels with phonological output level in case of transcortical motor aphasia) and in case of conduction aphasia the shortcut connection between phonological in- and output buffer is dysfunctional. Here input buffers describe the buffers on the perception pathway while output buffers describe the production pathway in <xref ref-type="fig" rid="F1">Figure 1A</xref>. Modeling of these subtypes of aphasia has been demonstrated successfully by Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>) and Kr&#x000F6;ger et al. (<xref ref-type="bibr" rid="B43">2020</xref>) as is described in Section 5 (modeling of symptoms). These explanations in terms of our functional model sketch (<xref ref-type="fig" rid="F1">Figure 1</xref>) are also in accordance with the simulation results given by LICHTHEIM 2 model (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>).</p>
<p>These neural dysfunctions named above in case of several subtypes of aphasia refer to our functional model sketch (<xref ref-type="fig" rid="F1">Figure 1A</xref>) but are also in accordance with the simulation results from the LICHTHEIM 2 model (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>). Here, three subtypes of conduction aphasia can be differentiated with respect to neural dysfunctions within two different cortical locations, i.e., inferior supramarginal gyrus iSMG and insular motor cortex, and with respect to the dorsal pathway connecting these two cortical areas. These locations and pathways are part of the dorsal route of speech processing and thus are independent from lexical processing which is activated via the ventral processing route (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>). Broca-Aphasia here results from neural dysfunctions within the frontal operculum and of the anterior inferior frontal lobe aIFL (Stefaniak et al., <xref ref-type="bibr" rid="B61">2020</xref>) also hosting the phonological output-buffer in terms of our model sketch (<xref ref-type="fig" rid="F1">Figure 1B</xref>). Wernicke-type aphasia is here analyzed as neural dysfunction appearing within the auditory input layer, but it should be kept in mind that the corresponding layers for representing and processing auditory input and its phonological interpretation are located nearby in the posterior part of the superior temporal gyrus pSTG. One further very interesting feature of the LICHTHEIM 2 model is that this model can simulate post-stroke recovery phenomena in case of different types of aphasia (Stefaniak et al., <xref ref-type="bibr" rid="B61">2020</xref> and see Section 5).</p>
</sec>
<sec>
<title>4.2. Apraxia of speech</title>
<p><italic>Apraxia of speech</italic> (AOS) can be defined as a dysfunction of speech motor planning and/or speech motor programming. It is a neurogenic speech disorder which can be acquired (result of stroke, traumatic brain injury), which can have its origin in a neurodegenerative disorder (primary progressive apraxia of speech), or which can have its origin in developmental problems (childhood apraxia of speech). On the one hand, apraxia of speech does not involve the cognitive-linguistic part of the speech processing system. Thus, the patient is aware of his self-produced speech errors. On the other hand, the (peripheral) neuro-muscular system and the articulation apparatus including all speech articulators is intact as well. The affected modules are the planning and programming components in connection with parts of the mental syllabary, i.e., the central model parts of the sensorimotor part of the speech processing model in terms of our model sketch (<xref ref-type="fig" rid="F1">Figure 1A</xref>, see also Van der Merwe, <xref ref-type="bibr" rid="B66">2021</xref>). The core symptoms, damaged brain regions, and the model dysfunctions arising in the case of apraxia of speech are listed in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Core symptoms, affected brain regions, and neural model dysfunctions for apraxia of speech (see Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>; Van der Merwe, <xref ref-type="bibr" rid="B66">2021</xref>).</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Cause</bold></th>
<th valign="top" align="left"><bold>Core symptoms</bold></th>
<th valign="top" align="left"><bold>Damaged brain regions (in language dominant hemisphere)</bold></th>
<th valign="top" align="left"><bold>Neural (model) dysfunction disruptions in:</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">AOS: damage of stored motor progra</td>
<td valign="top" align="left">Reduced speaking rate, sound prolongations and pauses between sounds, sound and syllable segregation, groping, speech initiation difficulties, increased segment duration and intersegment duration while peak velocities of articulatory movements remain unchanged, increase in speech errors with increase in syllable or word complexity or with increase in speaking rate, reduced coarticulation, islands of error-free speech chunks, good awareness of self-produced speech errors</td>
<td valign="top" align="left">Lateral prefrontal and premotor areas, ventral premotor cortex, ventral precentral gyrus and surrounding portions of posterior inferior frontal gyrus and anterior insula</td>
<td valign="top" align="left">Damaged or destroyed motor programs within mental syllabary</td>
</tr>
<tr>
<td valign="top" align="left">AOS: damage of plan-to-program transformation</td>
<td/>
<td valign="top" align="left">Ventral premotor cortex</td>
<td valign="top" align="left">Motor plan-to-program transformation network: no activation of motor programs even if motor plan is available or generation of faulty motor programs</td>
</tr>
 
<tr>
<td valign="top" align="left">AOS: dysfunction of phono-to-motor planning</td>
<td/>
<td valign="top" align="left">Pre-SMA, supplementary motor area (SMA), left posterior inferior frontal sulcus (pIFS)</td>
<td valign="top" align="left">Phono output buffer, initiation map, cortical connections between mental syllabary and motor plan map</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Three main neurofunctional causes are discussed in case of AOS following Miller and Guenther (<xref ref-type="bibr" rid="B48">2021</xref>): (a) damage of pre-learned motor programs, stored in mentally syllabary; (b) damage in the motor plan-to-motor program transformation network (activation of a motor program if a motor plan is already activated or assembling a motor program if a motor plan is not available or only partially available); (c) dysfunction of phonological sequence-to-motor plan selection (selection of a motor plan from the continuous flow of phonological sound sequences during production process). This separation of causes is based on assumptions by considering the box-and-arrow-version of the GODIVA model (see Figures 1, 2 in Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>, pp. 430f) but it should be noticed that these three causes do not lead to a separation of symptoms (see <xref ref-type="table" rid="T2">Table 2</xref>). Simulations of word and phrase production based on GODIVA model versions including these neural dysfunctions still need to be realized in order to simulate the symptoms listed in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<p>An illustration for incorrect motor plan to program transformation (i.e., motor plan realization) is the occurrence of incorrectly timed speech gestures within a syllable, i.e., the incorrect temporal coordination of gestures appearing within the motor plans of syllables (for motor plan realizations as gesture scores, see Kr&#x000F6;ger and Bekolay, <xref ref-type="bibr" rid="B37">2019</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>). For example, a mistiming between a labial closing gesture and a glottal opening gesture producing the speech segment [p] may lead to the impression that the speaker produces a [p] OR a [b], i.e., the listener sometimes perceives a voiced and sometimes a voiceless segment even though the glottal opening gesture is present in both cases. The incorrect plan-to-program transformation here leads to a faulty shift of the glottal gesture toward earlier points in time which leads to the perceptual impression of a voiced version of this segment because the glottal gesture is more and more hidden behind the labial closure [see <xref ref-type="fig" rid="F2">Figure 2</xref>; the transition of glottal gesture to the left side from <bold>(A)</bold> to <bold>(C)</bold>]. If the glottal gesture is produced even more early in comparison to the labial closing gesture, we even can get the effect of pre-aspiration [<sup>h</sup>b] which as well can be observed as a symptom of speakers, suffering from apraxia of speech (<xref ref-type="fig" rid="F2">Figure 2D</xref>, and see Kr&#x000F6;ger, <xref ref-type="bibr" rid="B34">2021</xref>). Thus, these speakers may be able to produce four variants of the segment, i.e., [<sup>h</sup>b] -&#x0003E; [b] -&#x0003E; [p] -&#x0003E; [p<sup>h</sup>] just by shifting the glottal opening gesture from &#x0201C;early&#x0201D; to &#x0201C;late&#x0201D; with respect to the labial closing gesture.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Four motor plans (gesture score notation, see Kr&#x000F6;ger and Bekolay, <xref ref-type="bibr" rid="B37">2019</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>) of CV-syllables with variation of temporal location of glottal opening gesture relative to labial closing gesture toward earlier points in time from <bold>(A&#x02013;D)</bold>. Phonetic transcription of the auditory impression of the syllables is notated above each motor plan. Blue rectangles indicate the temporal duration of a gesture (vocalic tongue lowering gesture producing an/a/, labial closing gesture, velopharyngeal tight closing gestures (needed for realization of obstruents) and closing gestures (needed for realization of all other non-nasal speech sounds), glottal opening gestures (producing voiceless sound, if timed correctly) and closing gesture (producing phonation). Light blue portions indicate movement parts of a gesture; dark blue portions indicate time intervals, in which the gesture target is reached (see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="flang-02-1100774-g0002.tif"/>
</fig>
</sec>
<sec>
<title>4.3. Dysarthria</title>
<p>Dysarthria is caused by neural dysfunctions appearing in the neuromuscular system as well as in both cortico-cortical feedback loops including basal ganglia or cerebellum. Several types of dysarthria can be differentiated. To understand the neurofunctional background of different types of dysarthria a detailed understanding of the organization of the whole sensorimotor part of the speech processing system including its frontend, i.e., including the neuromuscular system, which is activated during motor execution, is needed. Kearney and Guenther (<xref ref-type="bibr" rid="B32">2019</xref>) give an overview concerning the influence of the cortico-cortical feedback loop via the basal ganglia (BG) and thalamus (action control loop) and the feedback loop via the cerebellum and thalamus (motor feedback loop) on these neurogenic speech disorders.</p>
<p>The neurogenic speech disorder associated with damage in the cerebellum is <italic>ataxic dysarthria</italic>. Here, the damage of parts of the cerebellum which can be caused, e.g., by traumata or by vascular diseases, leads to disturbances in the interaction of feedforward motor signals and feedback sensory signals caused by the motor feedback loop (Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>; green arrows in <xref ref-type="fig" rid="F1">Figure 1A</xref>). Thus, dysfunction within the motor feedback loop lead to deficits in generating precisely timed control commands and thus to deficits in the direct online control of articulation (ibid., and see <xref ref-type="table" rid="T3">Table 3</xref>).</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Core symptoms, affected brain regions, and neural model dysfunctions for different subtypes of dysarthria and for neurogenic stuttering (see Golfinopoulos et al., <xref ref-type="bibr" rid="B25">2010</xref>; Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>; Chang and Guenther, <xref ref-type="bibr" rid="B8">2020</xref>; Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>).</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Type</bold></th>
<th valign="top" align="left"><bold>Core symptoms</bold></th>
<th valign="top" align="left"><bold>Damaged or dysfunctional brain regions</bold></th>
<th valign="top" align="left"><bold>Neural (model) dysfunction</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Ataxic dysarthria</td>
<td valign="top" align="left">Less coordinated and less controlled articulatory movements for vowels and consonants; less controlled loudness and pitch; less controlled stress and intonation patterns</td>
<td valign="top" align="left">Superior medial cerebellum, superior lateral cerebellum, ventral lateral nucleus of the thalamus</td>
<td valign="top" align="left">Cortico-cortical loop including cerebellum and thalamus (motor feedback loop); premotor-to-cerebellum connections; thalamus-to-premotor and primary motor cortex connections -&#x0003E; weakening of feedback control</td>
</tr> <tr>
<td valign="top" align="left">Hypokinetic dysarthria</td>
<td valign="top" align="left">Reduced range for pitch and loudness; undershoot in vocalic and consonantal articulation (reduction); compensation by lengthening of gesture duration; longer syllable duration</td>
<td valign="top" align="left">Basal ganglia: putamen, globus pallidus, substantia nigra pars reticula; thalamus: ventral anterior and lateral nucleus</td>
<td valign="top" align="left">Cortico-cortical loop including basal ganglia and thalamus (action control loop); premotor to basal ganglia connections; thalamus-to-premotor connections -&#x0003E; under-activation of initiation</td>
</tr> <tr>
<td valign="top" align="left">Hyperkinetic dysarthria</td>
<td valign="top" align="left">Harsh voice quality; overshooting articulatory gestures. Articulatory timing deficits; imprecise articulation of consonants and vowels; articulatory breakdowns</td>
<td valign="top" align="left">Basal ganglia: putamen, globus pallidus, substantia nigra pars reticula; thalamus: ventral anterior and lateral nucleus</td>
<td valign="top" align="left">Cortico-cortical loop including basal ganglia and thalamus (action control loop); premotor to basal ganglia connections; thalamus-to-premotor connections -&#x0003E; overactivation of initiation</td>
</tr> <tr>
<td valign="top" align="left">Spastic dysarthria</td>
<td valign="top" align="left">Strained voice; slow articulation</td>
<td valign="top" align="left">Primary motor cortex, upper motor neuron</td>
<td valign="top" align="left">Articulation map (execution); neuromuscular system (too high muscle tone)</td>
</tr> <tr>
<td valign="top" align="left">Flaccid dysarthria</td>
<td valign="top" align="left">Breathy voice, short phrases; increased nasal resonance</td>
<td valign="top" align="left">Brainstem and midbrain, lower motor neuron (cranial nerves)</td>
<td valign="top" align="left">Neuromuscular system (too low muscle tone)</td>
</tr>
<tr>
<td valign="top" align="left">Neurogenic stuttering</td>
<td valign="top" align="left">Involuntary, frequent disruptions of speech; part-word repetitions; sound prolongations; silent blocks</td>
<td valign="top" align="left">Basal ganglia: putamen, internal parts of globus pallidus, substantia nigra pars reticula</td>
<td valign="top" align="left">Cortico-cortical loop including basal ganglia and thalamus (action control loop); impairment of connections between cortex and basal ganglia; impairment in functions of basal ganglia -&#x0003E; malfunction in initiation of motor programs</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The dysfunctions occurring in the action control loop (basal ganglia and thalamus) lead to hypokinetic or hyperkinetic dysarthria. These dysfunctions can be caused by neurodegenerative diseases such as Parkinson&#x00027;s disease (reduction in the functionality of the striatum as part of the basal ganglia due to the reduction of dopamine) or Huntington&#x00027;s disease (damage of brain cells because of gene mutation, neural cell damage in striatum and later in cortical neural cells). This results in a malfunction of the whole action control loop which in case of speech production leads to under- or overactivation of states within the initiation map, motor program map and motor execution map (see orange arrows within the sensorimotor part of the model sketch displayed in <xref ref-type="fig" rid="F1">Figure 1A</xref>; the motor execution map is called articulation map in DIVA/GODIVA model) and thus to under- or overactivation of (mainly syllabic) motor plans and motor programs which subsequently influences the correctness of the appearance of all speech gestures within each syllable and which as well leads to an incorrect timing of whole syllables.</p>
<p>Underactivation of the neural states mentioned above occurs in <italic>hypokinetic dysarthria</italic> and leads to symptoms like reduction in articulatory movements and decrease in pitch and loudness range (see <xref ref-type="table" rid="T3">Table 3</xref>). Moreover, underactivation of neural states within initiation map leads to weakening of motor plan activation and thus leads to longer syllable durations and to slowing down articulatory movements. Overactivation of states within the initiation map leads to neural overactivation at the motor plan and motor program level and may be the source for neural malfunction occurring in <italic>hyperkinetic dysarthria</italic>. Overactivation of motor plans and programs and thus of articulatory gestures leads to harsh vowel quality and overshooting articulatory gestures. Overshoot destroys the timing of gestures as defined in the motor plan and may lead to imprecise articulation of consonants and distorted vowels, and&#x02014;if gesture activation does not stop&#x02014;to irregular articulatory breakdowns.</p>
<p><italic>Spastic dysarthria</italic> is caused by an impairment of the upper motor neurons located in the primary motor cortex while <italic>flaccid dysarthria</italic> is caused by an impairment of the lower motor neurons located in the midbrain and in the brain stem. Thus, both subtypes of dysarthria can be labeled as impairments of the neuromuscular system. Patient suffering from flaccid dysarthria show symptoms like breathy voice resulting from insufficient glottal closing gestures, short phrases resulting from too short activation of all gestures within an utterance and increased nasal resonance resulting from imperfect closure of the velopharyngeal port and imprecise articulation (Kr&#x000F6;ger, <xref ref-type="bibr" rid="B35">2022</xref>). These symptoms can easily be attributed to a reduced muscle tension (too low muscle tonus) and reduced duration of all muscle activations. Spastic dysarthria leads to symptoms like strained voice and slow articulation (ibid.). This behavior typically results from a too high muscle tonus and too long activation of muscle actions. In this case it is difficult to complete speech gestures in normal time intervals as pre-specified by the motor programs of syllables. Detailed simulations of these symptoms need the implementation of more detailed neuromuscular systems as part of articulatory models within the entire neural speech processing model (see Section 7).</p>
</sec>
<sec>
<title>4.4. Neurogenic stuttering</title>
<p>Stuttering can appear after stroke, or as a comorbidity of neurological diseases. But in most cases stuttering is a developmental disorder typically emerging at 2&#x02013;5 years of age in about 3%&#x02212;8% of preschool-aged children (Chang and Guenther, <xref ref-type="bibr" rid="B8">2020</xref>). Developmental stuttering resoles without treatment within 2 years in 75% of cases (ibid.). Core symptoms of stuttering are involuntary, frequent disruptions during ongoing speech such as part-word repetitions, sound prolongations, and silent blocks, which interrupt fluent speech (Chang and Guenther, <xref ref-type="bibr" rid="B8">2020</xref>). Because many functional causes are discussed in case of stuttering, we will concentrate here on dysfunctions of the neural system and thus we label the disorder discussed here <italic>neurogenic stuttering</italic>.</p>
<p>Civier et al. (<xref ref-type="bibr" rid="B9">2013</xref>) and Chang and Guenther (<xref ref-type="bibr" rid="B8">2020</xref>) claim that the functional deficit underlying neurogenic stuttering is due to a malfunction within the cortico-cortical feedback loop including basal ganglia and thalamus (action control loop). One of the responsibilities of this cortico-cortical loop is <italic>to initiate the execution of speech motor programs</italic>. If this initiation process is not working in a proper way this may cause interruptions by blocking the production and execution of a syllable or by blocking the production and execution of a whole utterance directly at its beginning. Civier et al. (<xref ref-type="bibr" rid="B9">2013</xref>) simulated the production of syllable sequences using the GODIVA model by impairing (i) the projections between cortical regions and input regions of the basal ganglia, i.e., the cortical input projections toward the basal ganglia, labeled as white matter abnormalities (ibid.) and by impairing (ii) the striatum as part of the basal ganglia by reducing the dopamine level within that part of the model. The performed simulations show typical dysfluency symptoms like part-word or syllable repetitions, sound prolongations, and silent blocks, which all interrupt fluent speech. Similar results were reported by reduction of the dopamine level in the striatum of basal ganglia using a spiking neuron model (Senft et al., <xref ref-type="bibr" rid="B59">2016</xref>, <xref ref-type="bibr" rid="B60">2018</xref>).</p>
</sec>
</sec>
<sec id="s5">
<title>5. Simulation of symptoms</title>
<p>The main questions which will be answered in this chapter for each existing computer-implemented neural model is: Which speech and language disorder is intended to be simulated with this dysfunctional model? Which dysfunctions (type and location) are inserted in that model to simulate these speech and language disorders? Which simulation scenarios (screening tasks) were simulated using that dysfunctional model in order to simulate all typical symptoms of the targeted speech and language disorder?</p>
<p>Five main groups of computer-implemented neural models will be discussed here, i.e., WEAVER, Dell&#x00027;s spreading activation model, LICHTHEIM 2, DIVA/GODIVA and ACT. The WEAVER model of Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>) as well as the <italic>spreading activation model</italic> of Dell et al. (<xref ref-type="bibr" rid="B17">2013</xref>) are second generation network models (see <xref ref-type="app" rid="A1">Appendix A</xref>) comprising semantic, lemma, and phoneme <italic>layers</italic> (comparable to <italic>neural maps</italic> in DIVA/GODIVA and to <italic>buffers</italic> in ACT) of <italic>nodes</italic> (comparable cells in DIVA/GODIVA and to neuron ensembles in ACT) connected by bidirectional inter-layer <italic>links</italic> or inter-layer <italic>neural pathways</italic> (representing <italic>neural connections</italic>). While the terms layers, nodes, links are defining second <italic>generation neural network approaches</italic> (using spatio-temporal activation averaging, see Kr&#x000F6;ger and Bekolay, <xref ref-type="bibr" rid="B37">2019</xref>, p. 133ff and see <xref ref-type="app" rid="A1">Appendix A</xref>) the terms <italic>buffers, cells</italic> and <italic>synaptic connections</italic> are used in <italic>adaptive neural network approaches</italic> (DIVA: Guenther, <xref ref-type="bibr" rid="B26">2006</xref>, <xref ref-type="bibr" rid="B27">2016</xref>; Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>, GODIVA: Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>) and in third generation neural network approaches also called <italic>spiking neuron models</italic> (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>; Senft et al., <xref ref-type="bibr" rid="B59">2016</xref>, <xref ref-type="bibr" rid="B60">2018</xref>; Stille et al., <xref ref-type="bibr" rid="B63">2020</xref>; also labeled as ACT model, see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B41">2012</xref>; and see <xref ref-type="app" rid="A1">Appendices A</xref>, <xref ref-type="app" rid="A1">B</xref>).</p>
<p>The <italic>WEAVER model</italic> (Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>) can simulate cognitive-linguistic production and perception/comprehension processes and is able to generate typical speech symptoms appearing in different forms of aphasia, i.e., Broca&#x00027;s, Wernicke&#x00027;s, conduction, transcortical motor, transcortical sensory, and mixed transcortical aphasia. These symptoms comprise production or comprehension of a wrong word, a complete failure of word production, or in case of nonwords (meaningless syllables or syllable sequences) the production of no or of a wrong or reduced sequence of speech sounds. These symptoms typically appear in question-answering scenarios and are tried to be evoked in medical screenings (questions by test supervisor; answers by patient) comprising <italic>picture naming, word repetition, word comprehension, or logatome (i.e., nonword) repetition tasks</italic> (see introduction for the definition of these tasks and see <xref ref-type="table" rid="T4">Table 4</xref>). The LICHTHEIM 2 approach (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) can simulate different types of conduction aphasia, Broca- and Wernicke aphasia by simulating the same types of tests, i.e., naming, word comprehension and word and logatome repetition. The brain lesions inserted in this model are not primarily functional but defined from cortical locations but can be interpreted in a functional way as is explained already in Section 3.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Speech disorders and listing of tasks for the generation of associated speech errors already simulated by using different computer-implemented quantitative neural models of speech processing.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Neural model</bold></th>
<th valign="top" align="left"><bold>Modeled speech disorders</bold></th>
<th valign="top" align="left"><bold>Modeled tasks</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Spreading activation model: Dell, <xref ref-type="bibr" rid="B15">1986</xref>; Schwartz et al., <xref ref-type="bibr" rid="B58">2006</xref>; Dell et al., <xref ref-type="bibr" rid="B16">2007</xref>, <xref ref-type="bibr" rid="B17">2013</xref></td>
<td valign="top" align="left">Different subtypes of aphasia, speech errors in normal (healthy) speakers</td>
<td valign="top" align="left">Naming, word and nonword repetition</td>
</tr> <tr>
<td valign="top" align="left">WEAVER: Levelt et al., <xref ref-type="bibr" rid="B44">1999</xref>; Roelofs, <xref ref-type="bibr" rid="B56">2014</xref></td>
<td valign="top" align="left">Different subtypes of aphasia (Broca&#x00027;s, Wernicke&#x00027;s, conduction, transcortical motor, transcortical sensory, and mixed transcortical)</td>
<td valign="top" align="left">Naming, word and nonword repetition, word comprehension</td>
</tr> <tr>
<td valign="top" align="left">LICHTHEIM 2: Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>; Stefaniak et al., <xref ref-type="bibr" rid="B61">2020</xref> (see also the hidden layer neural network model developed by Weems and Reggia, <xref ref-type="bibr" rid="B68">2006</xref>)</td>
<td valign="top" align="left">Different subtypes of aphasia including post-stroke recovery</td>
<td valign="top" align="left">Picture naming, word and nonword repetition, word comprehension</td>
</tr> <tr>
<td valign="top" align="left">DIVA/GODIVA: Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>; Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>; Civier et al., <xref ref-type="bibr" rid="B9">2013</xref>; Guenther, <xref ref-type="bibr" rid="B27">2016</xref>; Senft et al., <xref ref-type="bibr" rid="B59">2016</xref>, <xref ref-type="bibr" rid="B60">2018</xref></td>
<td valign="top" align="left">Apraxia of speech, different subtypes of dysarthria, neurogenic stuttering</td>
<td valign="top" align="left">Syllable or word production, syllable repetition (diadochokinesis)</td>
</tr> <tr>
<td valign="top" align="left">ACT: Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>; Stille et al., <xref ref-type="bibr" rid="B63">2020</xref></td>
<td valign="top" align="left">Different subtypes of aphasia, developmental speech disorders concerning lexical access problems, speech errors in normal (healthy) speakers</td>
<td valign="top" align="left">Picture naming with semantic and phonological cues, word and nonword repetition, word comprehension</td>
</tr>
<tr>
<td valign="top" align="left">ACT: Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref></td>
<td valign="top" align="left">No disorder: &#x0201C;healthy subject (model)&#x0201D;</td>
<td valign="top" align="left">Picture naming disturbed by distractor words via auditory channel</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The task is called &#x0201C;production&#x0201D; in case of DIVA/GODIVA in order to refer to the fact that the input level of this model is the phonological level or motor plan level. Modeling of a &#x0201C;naming&#x0201D; task means direct word activation at the semantic level (model does not include a visual input pathway); modeling of a &#x0201C;picture naming&#x0201D; task means that activation starts at the visual input level.</p>
</table-wrap-foot>
</table-wrap>
<p>To evoke these symptoms by simulation, two neural types of dysfunctions can be chosen in WEAVER (ibid.). (i) Reduction in strength of neural activation appearing in the <italic>nodes</italic> because of a specific percentage of inactive or dead neurons or (ii) reduction in strength of neural activation forwarded in <italic>synaptic connections</italic> between neurons which results from a specific percentage of inactive or dead synaptic connections or links. These types of model dysfunctions can be inserted in neuron buffers at concept, lemma, or phonological form levels on the perception or production pathway or can be inserted in the neural connections between these buffers within the production or perception pathway (see <xref ref-type="fig" rid="F1">Figure 1</xref>). The severity of a dysfunction is modeled in WEAVER by (i) the parameter <italic>decay rate</italic> affecting the nodes (i.e., the model layers), and by (ii) the parameter <italic>connection weight</italic> affecting the links (i.e., the connections between layers). Thus, the stronger the decay rate or the lower the connection weight the higher the number of damaged nodes or links and the higher the rate of symptoms produced in simulated speech tasks.</p>
<p>A comprehensive description of typical simulations for generating symptoms in different forms of aphasia using the WEAVER model is given by Roelofs (<xref ref-type="bibr" rid="B56">2014</xref>). The neural dysfunctions corresponding to different forms of aphasia are inserted at the phonological, the lemma, and the concept in- and output layers in form of increase in decay rate which models an increasing number of dysfunctional neurons within these layers. Furthermore, neural dysfunctions are inserted in form of decrease in connection weight for modeling an increasing number of dysfunctional neural connections between layers (see chapter 2 of this paper and see ibid., p. 37f). Three types of tests were simulated. (i) Word production is simulated by introducing neural activation at the phonological input layer representing a phonological word form of a specific target word and by evaluating whether the correct activation appears at the syllable nodes below the phonological output layer (i.e., motor plan nodes in <xref ref-type="fig" rid="F1">Figure 1</xref>). (ii) Word comprehension is simulated in the same way concerning target word activation but here, the activation at the concept level layer is evaluated. (iii) Logatome repetition is simulated in the same way as word production but here instead of target words target syllables are activated for which no corresponding lemma and concept exists (i.e., phonologically well-formed syllables without word meaning in the target language). If logatomes are not part of the model vocabulary, it is possible to simulate logatome repletion by using words or syllables of the target language if the neural connection between phonological form level and lemma level is interrupted in the model at the perception side in order to avoid a coactivation of word forms and concepts.</p>
<p>Simulation results of these types of tests are interpreted as an error if no neural activation arises at the concept level (comprehension test) or at the motor plan level (word production and logatome repetition test) or if the occurring neural activation pattern represents a wrong word. The results from this interpretation of simulated errors indicate for all types of aphasia that the error rate increases with increasing neural damage of layers or neural connections at least for one of the three tests. Thus, a strong error rate appears for the word production and repetition test in case of neural damage within the phonological output layer (Broca aphasia), a strong increase in error rate appears for word comprehension and repetition in case of neural damage within the phonological input layer (Wernicke aphasia), a strong increase in error rate appears for logatome repetition in case of neural damage within the neural pathway between input and output phonological layer (conduction aphasia), a strong increase in error rate appears for word production only in case of neural damage within the neural pathway between lemma and output phonological layer (transcortical motor aphasia), a strong increase in error rate appears for word comprehension only in case of neural damage within the neural pathway between input phonological and lemma layer (transcortical sensory aphasia), and a strong increase in error rate appears for word production and comprehension in case of neural damage within the neural pathway between lemma buffers and concept layers within input or perception and within production or output pathway (mixed aphasia; see ibid, <xref ref-type="fig" rid="F2">Figure 2</xref> on p. 38).</p>
<p>Dell (<xref ref-type="bibr" rid="B15">1986</xref>) introduces a <italic>weight-decay approach</italic> implemented as dysfunction-inserting approach in his <italic>spreading activation model</italic> which represents a network-wide reduction in connection weight (weight parameter affecting <italic>links</italic> and thus the connections between layers) combined with a decrease or increase in activation-decay rate (decay parameter affecting activation of <italic>nodes</italic> within a layer). In later versions of the spreading activation model the weight parameter is split into separate <italic>semantic (s-weight) and phonological (p-weight) weights</italic> representing different functional locations of dysfunctions (between semantic and word layer and between word and phoneme layer), while the activation decay in nodes cannot be changed. The continuity thesis (Schwartz et al., <xref ref-type="bibr" rid="B58">2006</xref>, p. 232) implies that an increase/decrease, i.e., the strength of each of these parameters changes the model performance from normal to random (i.e., toward incorrect or abnormal behavior) while the quotient of strength of these parameters, i.e., the degree of dominance of one of these parameters, characterizes the type of disorder. An auditory input layer and a further weight parameter (nl-weight) is introduced by Dell et al. (<xref ref-type="bibr" rid="B17">2013</xref>) which describes dysfunctions between the auditory input layer (added in this new model variant) and phoneme layer and which in addition allows the modeling of auditory input disorders. The tasks simulated by spreading activation models were <italic>(picture) naming</italic> by inserting primary or input neural activation at the semantic layer, and <italic>word and nonword repetition</italic> by inserting neural activation at the auditory layer of the model (see <xref ref-type="table" rid="T4">Table 4</xref>). Furthermore, Dell et al. (<xref ref-type="bibr" rid="B17">2013</xref>) introduces a concept which allows an association of cortical locations of brain lesions and model dysfunctions by associating model and patient behavioral data. The simulation results are comparable with those generated by the WEAVER model (Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>).</p>
<p>In the LICHTHEIM 2 model (Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref>) brain lesions are defined from a neuroanatomical viewpoint as <italic>white matter damage</italic> and <italic>gray matter damage</italic>. Gray matter damage is modeled here as an insertion of white noise to the activation pattern of nodes within a specific network layer (damage within a specific <italic>layer</italic>) while white matter damage is modeled as a partial removal of neural links of a neural pathway interconnecting two neighboring neural layers of the network model (damage of a specific <italic>neural pathway</italic>). Both types of damage are applied in combination with increasing severity leading to an increasing number of speech errors appearing in the simulated naming, word comprehension, and word and logatome repetition tasks.</p>
<p>Because the LICHTHEIM 2 model is capable of modeling lexical relearning by applying a set of training stimuli in form of auditory and related motor activation patterns to the auditory and motor layer and by adjusting synaptic connections of the hidden layers along the lexical route within the temporal lobe, this model is capable of simulating post-stroke neural recovery via the ventral route if the dorsal route is damaged (see Stefaniak et al., <xref ref-type="bibr" rid="B61">2020</xref>, p. 47ff). This relearning (readjustment of synaptic link weights) applied to the model can be interpreted as a post-stroke learning or recovery.</p>
<p>The <italic>adaptive neural production models DIVA</italic> and <italic>GODIVA</italic> developed by Guenther (<xref ref-type="bibr" rid="B26">2006</xref>, <xref ref-type="bibr" rid="B27">2016</xref>), Guenther et al. (<xref ref-type="bibr" rid="B28">2006</xref>), and Bohland et al. (<xref ref-type="bibr" rid="B7">2010</xref>) simulate <italic>neural processes of speech learning</italic> (early phases of speech acquisition, i.e., babbling and imitation) and <italic>neural processes of feedback-controlled speech production</italic> (sensorimotor part of the speech production model; feedforward and feedback control). These models comprise <italic>motor planning</italic> (i.e., selection of executable chunks at the phonological level), <italic>motor program selection</italic>, and <italic>motor program execution</italic>. The DIVA/GODIVA model components (modules or subnetworks) are associated with specific cortical as well as subcortical brain regions (Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>, and see above, chapter &#x0201C;anatomical locations&#x0201D;) and several modules or subnetworks can be identified in the DIVA/GODIVA models which cause symptoms of dysarthria or apraxia of speech (Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref>, pp. 11ff; Miller and Guenther, <xref ref-type="bibr" rid="B48">2021</xref>, pp. 432ff, and see above, chapter &#x0201C;disorders&#x0201D;). A concrete simulation study using the GODIVA model has been performed for simulation neurogenic stuttering (Civier et al., <xref ref-type="bibr" rid="B9">2013</xref>, <xref ref-type="table" rid="T3">Table 3</xref>). Model parameters characterizing neural dysfunctions were (i) <italic>number of defective cells</italic> within distinct cortical or subcortical modules, (ii) <italic>number of defective neural connection weights</italic> between cells of distinct modules of the model, and (iii) <italic>change in dopamine level</italic> within striatal component of the modeled basal ganglia module of the model. Typical symptoms which can be simulated using this model by performing a <italic>word production task</italic> are (i) slower initiation of execution of a motor program (ibid., p. 272), leading to prolongation of preceding syllables as well as to silent blocks.</p>
<p>The <italic>spiking neuron model</italic> developed by Kr&#x000F6;ger et al. (<xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>), also called <italic>ACT model</italic> (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B41">2012</xref>) is capable of simulating speech errors as produced by normal speaking subjects (<italic>picture naming task</italic> without and with auditory distractor signals, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, see also <xref ref-type="table" rid="T4">Table 4</xref>), is capable of simulating speech errors produced by speakers suffering from different forms of aphasia (<italic>word comprehension tasks</italic> and <italic>word and nonword production tasks</italic>, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, p. 13f) and by subjects suffering from developmental (neurogenic) speech deficits concerning lexical access (<italic>picture naming tasks</italic> with auditorily presented phonological or semantic cues, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, p. 14f), and is capable of simulating symptoms of neurogenic stuttering which result from changes in dopamine level within the basal ganglia (<italic>syllable repetition tasks</italic>, Senft et al., <xref ref-type="bibr" rid="B59">2016</xref>).</p>
<p>In case of simulating different forms of aphasia the same types of simulations are used here for ACT as described above for the WEAVER model, i.e., simulations of word production, word comprehension and of logatome repetition. The same buffers and neural pathways are disturbed in the ACT model as already described above for the WEAVER model. But in the ACT model we are able to directly deactivate a specific number of neurons in the phonological form, lemma, and concept buffers. And in case of the neural pathways we are apbel to directly deactivate of a specific number of neurons within the associative memories because in ACT we have a direct modeling of neurons within buffers and within associative memories, while in WEAVER the neuron modeling is more indirect by using nodes and links (for the definition of nodes, links, neurons, neuron buffers and associative memories see Kr&#x000F6;ger and Bekolay, <xref ref-type="bibr" rid="B37">2019</xref>, p.133ff).</p>
<p><italic>Changes in dopamine level</italic> were also induced in the GODIVA model (Civier et al., <xref ref-type="bibr" rid="B9">2013</xref>) which also resulted in typical symptoms of neurogenic stuttering, i.e., silent blocks and prolongation of syllables. In the ACT model used by Senft et al. (<xref ref-type="bibr" rid="B59">2016</xref>, <xref ref-type="bibr" rid="B60">2018</xref>) changes of dopamine level (here reduction of dopamine level) led to symptoms of stuttering like omission of syllables, like errors in syllable ordering, as well as to repetitions of the same syllable. The task used here was <italic>repetition of a pre-learned sequence of syllables</italic> like [ba-da-ga] (this kind of task is also called <italic>diadochokinesis</italic>).</p>
<p>Simulations of a <italic>word repetition task</italic> performed with the normal model (healthy subject in the ACT model, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>) leads to speech errors (wrong word is produced or no word is produced) if the production process is distracted by placing perceptual events like <italic>distractor words</italic> during the production process of a target word within a <italic>picture naming task</italic>. Distractor words are most effective if they are semantically and/or phonologically similar with the target word, presented by the picture. An interesting result of this simulation study is that even the normal picture naming task executed in the ACT model without inserting any neural dysfunction produces a low rate of speech errors as it is the case for normal speakers in normal conversation or reading scenarios. This error rate increases dramatically in case of word repetition tasks including distractor words.</p>
<p>Simulation of <italic>picture naming</italic> in case of inserting model dysfunctions concerning neurons (<italic>rate of ablated neurons within a model buffer</italic>) and concerning neural connections (<italic>rate of ablated neurons in an association memory</italic>, see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>) for simulating subtypes of aphasia leads to reduction of correct word productions in <italic>picture naming</italic> tasks (<xref ref-type="table" rid="T3">Table 3</xref>). Errors appearing here are the production of wrong or of no words. Same holds for the <italic>non-word repetition</italic> task. The repetition of the correct syllable sequence goes down, and errors appear like production of a wrong syllable sequence or no syllable production at all. Comparable results appear for the <italic>word comprehension</italic> task. The comprehension rate becomes low and beside correct word meanings more and more wrong meanings become activated at the semantic level of the neural model or no item is activated at that level if the severity of the neural dysfunction (rate of ablated neurons) is increased in the model (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>).</p>
<p>Stille et al. (<xref ref-type="bibr" rid="B63">2020</xref>) was able to show that phonological or semantic cues can help to increase the rate of correct word productions in a <italic>picture naming</italic> task if a target word is not produced in a first trail (the model is not able to activate the correct word at the semantic or phonological level) and if <italic>cues</italic> are given by the environment (by a communication partner) by using the ACT model. Here, neural dysfunctions were inserted in the neural model at the semantic level of the production pathway in order to model lexical access problems.</p>
<p>In general, the simulations performed using the ACT approach (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>; Stille et al., <xref ref-type="bibr" rid="B63">2020</xref>) suggest that the &#x0201C;punctual&#x0201D; model dysfunctions implemented in case of modeling different types of speech disorders lead to a variety of different speech symptoms like (i) phonological distortions if a wrong but phonologically similar syllable or word is activated, (ii) to a drop out of a word production or word comprehension if the activation of an item at the phonological, lemma, or concept level is too low, and (iii) wrong word production or wrong word comprehension if further (non-similar) items are co-activated at the phonological, lemma, or concept level.</p>
</sec>
<sec id="s6">
<title>6. Neural models for medical research: modeling of screening scenarios</title>
<p>Beside a detailed neurobiologically inspired architecture, a neural model of speech processing needs to be able to simulate different <italic>communication scenarios</italic> as they typically appear in <italic>medical screenings</italic> (the modeled speech tasks are already mentioned in chapter &#x0201C;Simulation of symptoms&#x0201D; in this paper). Thus, the neural model needs to be able to react on an auditory or visual input (stimuli) and the model should include initiation processes for activating the production process at the cognitive-linguistic level (semantic, lemma, and phonological level) by these stimuli and to further activate motor plans, motor programs, and motor execution as well as to activate the perception and comprehension process. In addition, neural models should include halt or correction procedures during an ongoing production process (see generation of an error signal by comparing intended and produced sensorimotor signals via sensorimotor feedback loop; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>). These model features mentioned above are available in the DIVA/GODIVA model (Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>; Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>), in the ACT model (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B41">2012</xref>, <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>), and in part in the WEAVER model of Roelofs (<xref ref-type="bibr" rid="B54">1992</xref>, <xref ref-type="bibr" rid="B55">1997</xref>, <xref ref-type="bibr" rid="B56">2014</xref>) as well as in the spreading activation model of Dell (<xref ref-type="bibr" rid="B15">1986</xref>) and Dell et al. (<xref ref-type="bibr" rid="B16">2007</xref>, <xref ref-type="bibr" rid="B17">2013</xref>).</p>
<p>The simulation of communication scenarios like those appearing in medical screenings between test supervisor and patient requires (i) an always active perceptive input channel even during ongoing production processes, (ii) the modeling of temporal aspects for forwarding neural activation patterns along the perception and production pathways, e.g., the modeling of concept to lemma to phonological form conversion, (iii) the modeling of cortical and sensorimotor action control (modeling of cortico-cortical feedback loop including basal ganglia and thalamus) for initiating specific cognitive and/or motor action is dominant at specific points in time, and (iv) the modeling of motor feedback (modeling of cortico-cortical feedback loop including cerebellum and thalamus) in order to simulate online control for all motor actions.</p>
<p>Especially the action control component is important in order to model different communication scenarios (i.e., different tasks as part of a medical screening) because different scenarios need the activation of different processing paths within the neural model. In the case of <italic>picture naming</italic> an external visual stimulus (e.g., a picture displaying a specific object) is initially processed by the visual object recognition module leading to a neural activation at the entry of the cognitive-linguistic module for representing that visually activated item at the semantic level of the perception pathway (<xref ref-type="fig" rid="F1">Figure 1A</xref>). Because the patient (the model) is instructed to name the object in this task, this semantic activation is directly forwarded toward the production pathway (by skipping further cognitive processing) which results in a cascade of neural transformations from semantic state via lemma state to phonological form state and further from motor plan via motor program activation toward execution of the motor program by activating the neuromuscular system. Subsequently, online feedback procedures activate the auditory and somatosensory state of the produced speech item (sensory feedback state) which can be compared with the auditory and somatosensory expectations (learned and stored target states), activated earlier within the production process, which may lead to an online repair of a not well-articulated syllable or to a halt of articulation in case of a severe articulation error. In case of normal speech, in most cases the sensory expectations (target states stored in mental syllabary) match with the feedback signal produced during articulation (currently produced sensory feedback states) and no error signal is generated and thus the neural processing for generating an articulatory correction or a halt signal, which is realized by the action control loop (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>), needs not to be activated.</p>
<p><italic>Word repetition</italic> tasks start at the auditory input level, i.e., activation of an auditory (input) state by a signal (stimulus) produced by the communication partner (external speaker, <xref ref-type="fig" rid="F1">Figure 1A</xref>; test supervisor in a medical screening scenario). The resulting auditory state activated at the auditory input buffer activates an auditory state of mental syllabary (syllable level) or can be analyzed in smaller chunks leading to an activation of speech sound candidates (e.g., Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>). In both cases this leads to an activation of phonological (input) states and subsequently leads to an activation of lemma candidates and to the activation of a semantic state within the perception/comprehension pathway (<xref ref-type="fig" rid="F1">Figure 1A</xref>). The activated semantic state (concept buffer, see <xref ref-type="fig" rid="F1">Figure 1A</xref>) directly activates the production pathway in the case of the word repetition task and subsequently a word candidate is selected within the production pathway and subsequently processed and its motor program is executed in the same way as described above for the picture naming task (e.g., Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>).</p>
<p>In the case of <italic>non-word repetition (logatome repetition)</italic>, i.e., repetition of a syllable sequence with no meaning in the target language (mother tongue or learned language), the neural activation at the phonological level does not lead to an activation of a lemma and of a concept stored in the mental lexicon. In this case the shortcut between phonological input and output phonological level is activated (dashed black line in <xref ref-type="fig" rid="F1">Figure 1A</xref>; and see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>) leads to a direct activation of the phonological form within the production pathway of the model (phonological output buffer, <xref ref-type="fig" rid="F1">Figure 1A</xref>) leading to further activations of motor plans, motor programs and subsequently to motor execution for the activated logatome or syllable sequence.</p>
<p><italic>Word comprehension tasks</italic> activate the same neural pathway as already mentioned in case of the word repetition task. But here the processing already ends at the level of concept activation (activation of a meaning, e.g., Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>). In case of a medical screening task for word comprehension, the target word is presented acoustically by the test supervisor and the patient is asked to point on one specific picture as part of a list of pictures to allow the test supervisor to see the word candidate, selected by the patient. This pointing procedure which is part of the scenario is not included in most models because neural activations can be directly accessed at all levels within the model (here at the concept level) which allows a direct monitoring of concept selection within the task by the patient (model) even without an explicit motor reaction.</p>
<p>In case of all these tasks (all these communication scenarios) described above the patient is already prepared or <italic>primed</italic> for giving a specific motor reaction, i.e., speaking by using the speech articulation apparatus or gesturing by using the arm-hand apparatus, if an input stimulus is seen or heard. Thus, in case of medical screenings the action control loop is already prepared for activating a specific sequence of cognitive and motor actions which is the consequence of a priming procedure, i.e., a consequence of preparing and instructing the patient or model for executing a specific task. Thus, even if a neural model does not include an action control loop (not including a model of basal ganglia and thalamus), the neural model can be shaped in a way for executing a specific task, i.e., by activating a specific input buffer (e.g., auditory input buffer or visual input buffer) which always leads to a chain of co-activations of further buffers in order to perform the neural processing required for executing a specific task. Thus, tasks can be performed also in case of the spreading activation model of Dell (<xref ref-type="bibr" rid="B15">1986</xref>) and Dell et al. (<xref ref-type="bibr" rid="B16">2007</xref>, <xref ref-type="bibr" rid="B17">2013</xref>) and in case of the WEAVER model of Roelofs (<xref ref-type="bibr" rid="B54">1992</xref>, <xref ref-type="bibr" rid="B55">1997</xref>, <xref ref-type="bibr" rid="B56">2014</xref>) without an explicit modeling of action control.</p>
<p>A relatively complex medical screening task based on a complex communication scenario has been simulated by Stille et al. (<xref ref-type="bibr" rid="B63">2020</xref>) using the ACT model. Here, as part of a <italic>picture naming task</italic>&#x02014;designed for quantifying lexical access mechanisms&#x02014;<italic>semantic and phonological cues</italic> were provided auditorily (by the test supervisor) only in those cases, where a word is not directly produced by the patient (by the model) in a time interval of a few seconds. In these cases, a second and a third word production trial is started by providing acoustic phonological or semantic cues in parallel to the still available visual information. Thus, in case of this task, the action control loop allows word production directly by visual input (a picture representing the target word, for example a pic of a ball) an later the action control loop allows word production based on visual input but added by auditory input (test instructor gives a phonological cue like &#x0201C;the word starts with a [b]&#x0201D; or a semantic cue &#x0201C;the object can be thrown or kicked&#x0201D;). It has been shown by Stille et al. (<xref ref-type="bibr" rid="B63">2020</xref>) that in case of modeling a mental lexicon as it is acquired by children suffering from lexical access problems, lexical dysfunctions can be divided in &#x0201C;within level&#x0201D; and &#x0201C;between level dysfunctions,&#x0201D; i.e., in neural dysfunctions appearing within the neural buffers storing and ordering concept, lemma, or phonological form information with respect to semantic, grammatical, or phonological information and in neural dysfunctions appearing in the neural pathways between neural buffers for forwarding information from concept to lemma, or from lemma to phonological form buffers, i.e., in buffers representing the association of concepts to their lemmata as well as of lemmata to their phonological forms. Based on the simulations, Stille et al. (<xref ref-type="bibr" rid="B63">2020</xref>) found performance differences for lexical selection and activation processes if neural dysfunctions are located at the semantic and/or if neural dysfunction are located at the phonological level of the mental lexicon.</p>
</sec>
<sec id="s7">
<title>7. Discussion</title>
<p>Neural modeling of speech processing allows to unfold the relations between <italic>location, type, and severity of a neural dysfunction</italic> inserted into a model and <italic>type and frequency of arising speech symptoms</italic> if simulations of specific communication scenarios (speech tasks) are performed using this model. In such a research endeavor the location of a dysfunction within a neural model is defined in a functional and not directly in an anatomically way. While a <italic>functional subnetwork or module</italic> of a neural model can be associated with an <italic>anatomic location</italic> in a direct way (see Roelofs, <xref ref-type="bibr" rid="B56">2014</xref> in case of the WAVER model; see Guenther, <xref ref-type="bibr" rid="B26">2006</xref>; Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>; Kearney and Guenther, <xref ref-type="bibr" rid="B32">2019</xref> in case of the DIVA/GODIVA models and Ueno et al., <xref ref-type="bibr" rid="B65">2011</xref> in case of LICHTHEIM 2 model) it in not always simple to identify a specific neural dysfunction on the basis of disruptions or damage appearing in a specific anatomical location within the central nervous system of a patient. Thus, it is not easy to associate <italic>brain lesions, disruptions, or abnormalities</italic> appearing in a specific anatomic location of the central nervous system (probably identified by neuroimaging methods applied to patients) and <italic>functional deficits</italic> in speech and language processing. For example, a brain lesion arising in the mid part of the temporal lobe may result in dysfunctions of different lexical submodules or buffers, for example lemma and concept buffer at the production as well as on the perception pathway. Or in case of a neurodegenerative abnormality of the basal ganglia we need to know in detail how this defect affects the cortico-striatal association network to differentiate dysfunctions (functional deficits) with respect to connections of the action control loop with the sensorimotor part or with the linguistic-cognitive part of the speech processing network. Thus, insertions of defined neural dysfunctions in computer-implemented quantitative neural models and its behavioral results generated (i.e., simulated) in speech tasks, allow to refine the <italic>definition of a speech or language disorder</italic> with respect to its etiology. It should be kept in mind that the LICHTHEIM 2 model here plays an intermediate role, because this model is not defined primarily in the neurofunctional domain (only the input and output layers are defined in a functional way) but refers to the neuroanatomical domain because the model separates different intermediate (or hidden) neural layers by specifying their neuroanatomical location but without specifying these hidden layers directly in a functional sense (e.g., for representing phonological, lemma, or concept forms).</p>
<p>Moreover, it should be kept in mind that a model could generate plausible simulation results even if the underlying neural mechanisms implemented in that model are not (strictly) similar with those appearing in humans. But in order to hold a high similarity of natural and simulated neural processes all neural models mentioned in this paper include basic as well as advanced neurofunctional knowledge gained from natural data as it is available from contemporary literature.</p>
<p>Furthermore, it needs to be mentioned that models are helpful to define functionality and to associate specific functionality appearing in specific brain regions with specific submodules of neural models, but it should be kept in mind that the hypothetical association of brain regions with submodules of neural models does not automatically strengthen the neurobiological reality of a model.</p>
<p>Nevertheless, the potential application of neural models in medical research could be manifold. (i) Neural models of speech processing allow to simulate medial screenings (tasks which are used for the diagnosis of speech and language disorders) by simulating the corresponding communication scenarios between patient and test supervisor. If location, type, and severity of an inserted dysfunction is varied in the model the simulation results could uncover the <italic>sensitivity of a screening task</italic> with respect to a specific neural dysfunction. Thus, simulations of screening tasks allow to estimate the effectiveness of that screening task to uncover a specific speech and language disorder. (ii) Because speech screening tasks usually are shaped or configured manually based on the experience of leading experts in the field of diagnosis and therapy of speech and language disorders, this information concerning effectiveness or sensitivity of a specific screening task could help to <italic>optimize screenings</italic> by varying all available <italic>task scenario parameters</italic> like type, number, or complexity of test items, number or repetition of trials, etc. (iii) Because of potential learning and familiarity effects and because of ethical reasons, a screening task can be undertaken only one time with a patient, while model simulations can be repeated as often as necessary by using the same computer-implemented model. Thus, a high number of patients is needed, all suffering from the same type and same severity of a speech or language disorder, to generate meaningful results for the optimization of a speech screening. Such a research endeavor is difficult to realize because of the need of a high number of well diagnosed patients for conducting such a research study, while neural models are capable to fulfill these demands more easily. A model is capable to repeat a screening task as often as needed and the location, type, and severity of the (inserted) neural dysfunction is clearly defined. This could result in a high reliability for the results concerning the association of location, type, and severity of a neural dysfunction and type and number of relevant speech symptoms. Moreover, no <italic>ethical conflicts</italic> appear in case of using computer models. (iv) All aspects discussed above are also applicable for the development of <italic>therapy scenarios</italic>. Because learning effects can be simulated in neural models as well, it is possible to quantify these learning effect and thus to quantify the efficiency of a therapy scenario for different types and severity levels of a specific speech and language disorder. Moreover, it is possible to vary all parameters of a therapy scenario (e.g., length of intervention, number and type of speech items trained in the therapy scenario, different designs concerning the increase in complexity of test items during a therapy, etc.) in order to find an optimally shaped treatment procedure (therapy scenario). But it needs to be stated here that all these ideas for simulation experiments have not been realized thus far. This disadvantage is mainly due to the current state of the art in neural modeling. Most models are currently used mainly for simulating typical behavioral effects like the production of striking symptoms, in order to exemplify the quality of modeling already reached in these days, but further model development and further simulations need to be done in order to simulate more complex screening or therapy scenarios in order to be able to deliver statistically significant results which allow to modify these scenarios toward more efficiency.</p>
<p>The <italic>detailedness of neural models</italic> of <italic>speech productio</italic>n is already on a relative high level in case of the cognitive-linguistic model part (Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>) as well as in case of the sensorimotor model part (Guenther, <xref ref-type="bibr" rid="B26">2006</xref>; Guenther et al., <xref ref-type="bibr" rid="B28">2006</xref>; Bohland et al., <xref ref-type="bibr" rid="B7">2010</xref>; Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>). This allows a <italic>detailed modeling of speech and language disorders</italic> like aphasia with respect to all lexical aspects (production and comprehension deficits) as well as for apraxia of speech (production deficits with respect to planning and programming of syllables and syllable sequences). While the modeling of vocal tract geometries and vocal fold dynamics including all aerodynamic and acoustic relations is already on a high level as well (for a review see Kr&#x000F6;ger, <xref ref-type="bibr" rid="B35">2022</xref>), there are still <italic>deficits in modeling the neuromuscular system</italic> and especially there is no overall consent concerning a control concept for neural activation at the level of the neuromuscular system directly controlling speech articulators (ibid.). This limits our current modeling endeavors for example concerning the simulation of articulatory consequences of dysarthric speech disorders especially in case of flaccid dysarthria and in case of spastic dysarthria (abnormal muscle tone). But due to the existence of already very detailed models of the basal ganglia and of the cortico-cortical control loop including basal ganglia and thalamus (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B38">2022</xref>), the modeling of other types of dysarthria is already possible but has not exemplified yet. In case of neurogenic stuttering the detailed implementation of this cortico-cortical action control loop already gave plausible simulation results for symptoms appearing in stuttering (see Civier et al., <xref ref-type="bibr" rid="B9">2013</xref>).</p>
<p>Unfortunately, the neurobiologically based quantitative modeling of <italic>speech perception</italic> and <italic>speech comprehension</italic> is not as developed as it is the case for speech production. Specifically, there exist no comprehensive models which include the important concept of brain waves (gamma and theta waves, see Hickok and Poeppel, <xref ref-type="bibr" rid="B29">2007</xref>; Ghitza, <xref ref-type="bibr" rid="B23">2011</xref>; Ghitza et al., <xref ref-type="bibr" rid="B24">2013</xref>) which is needed in order to model speech perception and speech comprehension realistically in a neurobiologically grounded way.</p>
<p>Because the <italic>microscopic functional level of natural neural networks</italic>&#x02014;i.e., the cellular level of neurons and their functioning within subnetworks as well as within the whole network of speech processing&#x02014;is not or not easily accessible by imaging as well as by functional electro-analytical methods up to now (e.g., Batista-Garc&#x000ED;a-Ram&#x000F3; and Fern&#x000E1;ndez-Verdecia, <xref ref-type="bibr" rid="B5">2018</xref>), <italic>neurobiologically inspired and computer-implemented quantitative neural models</italic> are currently an important and advantageous research tool in order to get a detailed and quantitative impression of the <italic>neurobiological functioning in speech processing</italic> (production and perception).</p>
<p>Finally, it needs to be stated that even if it seems to be attractive to develop computational models which probably are able to mimic the functionality of the neural system of speech production and speech perception including comprehension or of other human capabilities it should be kept in mind that all these models up to now are only of limited benefits form the viewpoint of answering fundamental research questions. And it needs to be stated that the idea of understanding speech and language disorders from a functional point of view is not new (e.g., Lichtheim, <xref ref-type="bibr" rid="B45">1885</xref>). Much of the information given in chapter 4 of this paper concerning the functional specification of different types of speech and language disorders may also be deducible from box-and-arrow models (e.g., Datteri and Laudisa, <xref ref-type="bibr" rid="B12">2014</xref>). But the chance to derive quantitative results like those concerning severity levels of a disorder or concerning the sensitivity level of a screening in order to detect a specific disorder or how effective a therapy method may be in strengthening a specific speaking behavior of a (model-) patient can be seized by simulating specific tasks and opens a path in the direction of increasing the efficiency of diagnosis and therapy tools. This increase in efficiency of diagnosis and therapy tools can be reached hardly by other methods as is already mentioned above in this paper (because of the need of a huge number of patients to participate in a research endeavor and because these patients then have to pass an number of screenings of therapy modules without a clear therapeutic benefit for them. Moreover, all patients recruited for this kind of studies need to be excellently diagnosed and should suffer from specific and isolated type of disorder and furthermore the exact degrees of severity of the disorder needs to be known as well, in order to get meaningful results).</p>
</sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>The author confirms being the sole contributor of this work and has approved it for publication.</p>
</sec>
</body>
<back>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Allison</surname> <given-names>K. M.</given-names></name> <name><surname>Cordella</surname> <given-names>C.</given-names></name> <name><surname>Iuzzini-Seigel</surname> <given-names>J.</given-names></name> <name><surname>Green</surname> <given-names>J. R.</given-names></name></person-group> (<year>2020</year>). <article-title>Differential diagnosis of apraxia of speech in children and adults: a scoping review</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>63</volume>, <fpage>2952</fpage>&#x02013;<lpage>2994</lpage>. <pub-id pub-id-type="doi">10.1044/2020_JSLHR-20-00061</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Araki</surname> <given-names>K.</given-names></name> <name><surname>Hirano</surname> <given-names>Y.</given-names></name> <name><surname>Kozono</surname> <given-names>M.</given-names></name> <name><surname>Fujitani</surname> <given-names>J.</given-names></name> <name><surname>Shimizu</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <article-title>The screening test for aphasia and dysarthria (STAD) for patients with neurological communicative disorders: a large-scale, multicenter validation study in Japan</article-title>. <source>Folia Phoniatr. Logop.</source> <volume>74</volume>, <fpage>195</fpage>&#x02013;<lpage>208</lpage>. <pub-id pub-id-type="doi">10.1159/000519381</pub-id><pub-id pub-id-type="pmid">34510047</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ballard</surname> <given-names>K. J.</given-names></name> <name><surname>Azizi</surname> <given-names>L.</given-names></name> <name><surname>Duffy</surname> <given-names>J. R.</given-names></name> <name><surname>McNeil</surname> <given-names>M. R.</given-names></name> <name><surname>Halaki</surname> <given-names>M.</given-names></name> <name><surname>O&#x00027;Dwyer</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>A predictive model for diagnosing stroke-related apraxia of speech</article-title>. <source>Neuropsychologia</source> <volume>81</volume>, <fpage>129</fpage>&#x02013;<lpage>139</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2015.12.010</pub-id><pub-id pub-id-type="pmid">26707715</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ballard</surname> <given-names>K. J.</given-names></name> <name><surname>Granier</surname> <given-names>J. P.</given-names></name> <name><surname>Robin</surname> <given-names>D. A.</given-names></name></person-group> (<year>2000</year>). <article-title>Understanding the nature of apraxia of speech: theory, analysis, and treatment</article-title>. <source>Aphasiology</source> <volume>14</volume>, <fpage>969</fpage>&#x02013;<lpage>995</lpage>. <pub-id pub-id-type="doi">10.1080/02687030050156575</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Batista-Garc&#x000ED;a-Ram&#x000F3;</surname> <given-names>K.</given-names></name> <name><surname>Fern&#x000E1;ndez-Verdecia</surname> <given-names>C. I.</given-names></name></person-group> (<year>2018</year>). <article-title>What we know about the brain structure-function relationship</article-title>. <source>Behav. Sci</source>. 8, 39. <pub-id pub-id-type="doi">10.3390/bs8040039</pub-id><pub-id pub-id-type="pmid">29670045</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Biniek</surname> <given-names>R.</given-names></name> <name><surname>Huber</surname> <given-names>W.</given-names></name> <name><surname>Glindemann</surname> <given-names>R.</given-names></name> <name><surname>Willmes</surname> <given-names>K.</given-names></name> <name><surname>Klumm</surname> <given-names>H.</given-names></name></person-group> (<year>1992</year>). <article-title>[The aachen aphasia bedside test&#x02013;criteria for validity of psychologic tests] Der Aachener Aphasie-Bedside-Test&#x02013;Testpsychologische Gutekriterien</article-title>. <source>Nervenarzt</source> <volume>63</volume>, <fpage>473</fpage>&#x02013;<lpage>479</lpage>.<pub-id pub-id-type="pmid">1381813</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bohland</surname> <given-names>J. W.</given-names></name> <name><surname>Bullock</surname> <given-names>D.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2010</year>). <article-title>Neural representations and mechanisms for the performance of simple speech sequences</article-title>. <source>J. Cogn. Neurosci</source>. <volume>22</volume>, <fpage>1504</fpage>&#x02013;<lpage>1529</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2009.21306</pub-id><pub-id pub-id-type="pmid">19583476</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>S. E.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2020</year>). <article-title>Involvement of the cortico-basal ganglia-thalamocortical loop in developmental stuttering</article-title>. <source>Front. Psychol</source>. 10, 3088. <pub-id pub-id-type="doi">10.3389/fpsyg.2019.03088</pub-id><pub-id pub-id-type="pmid">32047456</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Civier</surname> <given-names>O.</given-names></name> <name><surname>Bullock</surname> <given-names>D.</given-names></name> <name><surname>Max</surname> <given-names>L.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2013</year>). <article-title>Computational modeling of stuttering caused by impairments in a basal ganglia thalamo-cortical circuit involved in syllable selection and initiation</article-title>. <source>Brain Lang.</source> <volume>126</volume>, <fpage>263</fpage>&#x02013;<lpage>278</lpage>. <pub-id pub-id-type="doi">10.1016/j.bandl.2013.05.016</pub-id><pub-id pub-id-type="pmid">23872286</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crary</surname> <given-names>M. A.</given-names></name> <name><surname>Haak</surname> <given-names>N. J.</given-names></name> <name><surname>Malinsky</surname> <given-names>A. E.</given-names></name></person-group> (<year>1989</year>). <article-title>Preliminary psychometric evaluation of an acute aphasia screening protocol</article-title>. <source>Aphasiology</source> <volume>3</volume>, <fpage>611</fpage>&#x02013;<lpage>618</lpage>. <pub-id pub-id-type="doi">10.1080/02687038908249027</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crinion</surname> <given-names>J.</given-names></name> <name><surname>Holland</surname> <given-names>A. L.</given-names></name> <name><surname>Copland</surname> <given-names>D. A.</given-names></name> <name><surname>Thompson</surname> <given-names>C. K.</given-names></name> <name><surname>Hillis</surname> <given-names>A. E.</given-names></name></person-group> (<year>2013</year>). <article-title>Neuroimaging in aphasia treatment research: quantifying brain lesions after stroke</article-title>. <source>Neuroimage</source> <volume>73</volume>, <fpage>208</fpage>&#x02013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2012.07.044</pub-id><pub-id pub-id-type="pmid">22846659</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Datteri</surname> <given-names>E.</given-names></name> <name><surname>Laudisa</surname> <given-names>F.</given-names></name></person-group> (<year>2014</year>). <article-title>Box-and-arrow explanations need not be more abstract than neuroscientific mechanism descriptions</article-title>. <source>Front. Psychol.</source> <volume>5</volume>, <fpage>464</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2014.00464</pub-id><pub-id pub-id-type="pmid">24904480</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Renzi</surname> <given-names>E.</given-names></name> <name><surname>Faglioni</surname> <given-names>P.</given-names></name></person-group> (<year>1978</year>). <article-title>Normative data and screening power of a shortened version of the token test</article-title>. <source>Cortex</source> <volume>14</volume>, <fpage>41</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1016/S0010-9452(78)80006-9</pub-id><pub-id pub-id-type="pmid">16295108</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Renzi</surname> <given-names>E.</given-names></name> <name><surname>Vignolo</surname> <given-names>L. A.</given-names></name></person-group> (<year>1962</year>). <article-title>The token test: a sensitive test to detect receptive disturbances in aphasics</article-title>. <source>Brain</source> <volume>85</volume>, <fpage>665</fpage>&#x02013;<lpage>678</lpage>. <pub-id pub-id-type="doi">10.1093/brain/85.4.665</pub-id><pub-id pub-id-type="pmid">14026018</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dell</surname> <given-names>G. S.</given-names></name></person-group> (<year>1986</year>). <article-title>A spreading activation theory of retrieval in language production</article-title>. <source>Psychol. Rev.</source> <volume>93</volume>, <fpage>283</fpage>&#x02013;<lpage>321</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.93.3.283</pub-id><pub-id pub-id-type="pmid">3749399</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dell</surname> <given-names>G. S.</given-names></name> <name><surname>Martin</surname> <given-names>N.</given-names></name> <name><surname>Schwartz</surname> <given-names>M. F.</given-names></name></person-group> (<year>2007</year>). <article-title>A case-series test of the interactive two-step model of lexical access: predicting word repetition from picture naming</article-title>. <source>J. Mem. Lang.</source> <volume>56</volume>, <fpage>490</fpage>&#x02013;<lpage>520</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2006.05.007</pub-id><pub-id pub-id-type="pmid">21085621</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dell</surname> <given-names>G. S.</given-names></name> <name><surname>Schwartz</surname> <given-names>M. F.</given-names></name> <name><surname>Nozari</surname> <given-names>N.</given-names></name> <name><surname>Faseyitan</surname> <given-names>O.</given-names></name> <name><surname>Coslett</surname> <given-names>H. B.</given-names></name></person-group> (<year>2013</year>). Voxel-based lesion-parameter mapping: identifying the neural correlates of a computational model of word production, <source>Cognition</source> <volume>128</volume>, <fpage>380</fpage>&#x02013;<lpage>396</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2013.05.007</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eliasmith</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <source>How to Build a Brain: A Neural Architecture for Biological Cognition</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>. <pub-id pub-id-type="doi">10.1093/acprof:oso/9780199794546.001.0001</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eliasmith</surname> <given-names>C.</given-names></name> <name><surname>Anderson</surname> <given-names>C. H.</given-names></name></person-group> (<year>2003</year>). <source>Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eliasmith</surname> <given-names>C.</given-names></name> <name><surname>Stewart</surname> <given-names>T. C.</given-names></name> <name><surname>Choo</surname> <given-names>X.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>DeWolf</surname> <given-names>T.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>A large-scale model of the functioning brain</article-title>. <source>Science</source> <volume>338</volume>, <fpage>1202</fpage>&#x02013;<lpage>1205</lpage>. <pub-id pub-id-type="doi">10.1126/science.1225266</pub-id><pub-id pub-id-type="pmid">23197532</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Enderby</surname> <given-names>P.</given-names></name> <name><surname>Wood</surname> <given-names>V.</given-names></name> <name><surname>Wade</surname> <given-names>D.</given-names></name></person-group> (<year>1987</year>). <source>Frenchay Aphasia Screening Test: (FAST)</source>. <publisher-loc>Cornwall, UK</publisher-loc>: <publisher-name>Test Manual Whurr Publishers</publisher-name>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friederici</surname> <given-names>A. D.</given-names></name></person-group> (<year>2011</year>). <article-title>The brain basis of language processing: from structure to function</article-title>. <source>Physiol. Rev.</source> <volume>91</volume>, <fpage>1357</fpage>&#x02013;<lpage>1392</lpage>. <pub-id pub-id-type="doi">10.1152/physrev.00006.2011</pub-id><pub-id pub-id-type="pmid">22013214</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghitza</surname> <given-names>O.</given-names></name></person-group> (<year>2011</year>). <article-title>Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm</article-title>. <source>Front. Psychol.</source> <volume>2</volume>, <fpage>130</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00130</pub-id><pub-id pub-id-type="pmid">21743809</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghitza</surname> <given-names>O.</given-names></name> <name><surname>Giraud</surname> <given-names>A. L.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>Neuronal oscillations and speech perception: critical-band temporal envelopes are the essence</article-title>. <source>Front. Hum. Neurosci.</source> <volume>6</volume>, <fpage>340</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2012.00340</pub-id><pub-id pub-id-type="pmid">23316150</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Golfinopoulos</surname> <given-names>E.</given-names></name> <name><surname>Tourville</surname> <given-names>J. A.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2010</year>). <article-title>The integration of large-scale neural network modeling and functional brain imaging in speech motor control</article-title>. <source>Neuroimage</source> <volume>52</volume>, <fpage>862</fpage>&#x02013;<lpage>874</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.10.023</pub-id><pub-id pub-id-type="pmid">19837177</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2006</year>). <article-title>Cortical interactions underlying the production of speech sounds</article-title>. <source>J. Commun. Disord</source>. <volume>39</volume>, <fpage>350</fpage>&#x02013;<lpage>365</lpage>. <pub-id pub-id-type="doi">10.1016/j.jcomdis.2006.06.013</pub-id><pub-id pub-id-type="pmid">16887139</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2016</year>). <source>Neural Control of Speech</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>. <pub-id pub-id-type="doi">10.7551/mitpress/10471.001.0001</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guenther</surname> <given-names>F. H.</given-names></name> <name><surname>Ghosh</surname> <given-names>S. S.</given-names></name> <name><surname>Tourville</surname> <given-names>J. A.</given-names></name></person-group> (<year>2006</year>). <article-title>Neural modeling and imaging of the cortical interactions underlying syllable production</article-title>. <source>Brain Lang.</source> <volume>96</volume>, <fpage>280</fpage>&#x02013;<lpage>301</lpage>. <pub-id pub-id-type="doi">10.1016/j.bandl.2005.06.001</pub-id><pub-id pub-id-type="pmid">16040108</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hickok</surname> <given-names>G.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>The cortical organization of speech processing</article-title>. <source>Nat. Rev. Neurosci</source>. 8, 393402. <pub-id pub-id-type="doi">10.1038/nrn2113</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hickok</surname> <given-names>G.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Neural basis of speech perception,&#x0201D;</article-title> in <source>Neurobiology of Language</source>, eds G. Hickok and S. L. Small (Cambridge, MA: Academic Press), <fpage>299</fpage>&#x02013;<lpage>310</lpage>. <pub-id pub-id-type="doi">10.1016/B978-0-12-407794-2.00025-0</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Indefrey</surname> <given-names>P.</given-names></name> <name><surname>Levelt</surname> <given-names>W. J. M.</given-names></name></person-group> (<year>2004</year>). <article-title>The spatial and temporal signatures of word production components</article-title>. <source>Cognition</source> <volume>92</volume>, <fpage>101</fpage>&#x02013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2002.06.001</pub-id><pub-id pub-id-type="pmid">15037128</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kearney</surname> <given-names>E.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2019</year>). <article-title>Articulating: the neural mechanisms of speech production</article-title>. <source>Lang. Cogn. Neurosci.</source> <volume>34</volume>, <fpage>1214</fpage>&#x02013;<lpage>1229</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2019.1589541</pub-id><pub-id pub-id-type="pmid">31777753</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kertesz</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <source>Western Aphasia Battery Revised</source>. <publisher-loc>San Antonio, TX</publisher-loc>: <publisher-name>Harcourt Assessment</publisher-name>. <pub-id pub-id-type="doi">10.1037/t15168-000</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Modeling dysfunctions in the coordination of voice and supraglottal articulation in neurogenic speech disorders,&#x0201D;</article-title> in <source>Models and Analysis of Vocal Emissions for Biomedical Applications</source>, ed. C. Manfredi (<publisher-loc>Firenze</publisher-loc>: <publisher-name>Firenze University Press</publisher-name>), <fpage>79</fpage>&#x02013;<lpage>82</lpage>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name></person-group> (<year>2022</year>). <article-title>Computer-implemented articulatory models for speech production: a review</article-title>. <source>Front. Robot. AI</source> <volume>9</volume>, <fpage>796739</fpage>. <pub-id pub-id-type="doi">10.3389/frobt.2022.796739</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Bafna</surname> <given-names>T.</given-names></name> <name><surname>Cao</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Emergence of an action repository as part of a biologically inspired model of speech processing: the role of somatosensory information in learning phonetic-phonological sound features</article-title>. <source>Front. Psychol</source>. 10, 1462. <pub-id pub-id-type="doi">10.3389/fpsyg.2019.01462</pub-id><pub-id pub-id-type="pmid">31354560</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <source>Neural Modeling of Speech Processing and Speech Learning. An Introduction</source>. Berlin: Springer International Publishing. ISBN 978-3-030-15852-1. <pub-id pub-id-type="doi">10.1007/978-3-030-15853-8</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>Cao</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>On the emergence of phonological knowledge and on motor planning and motor programming in a developmental model of speech production</article-title>. <source>Front. Hum. Neurosci</source>. 16, 844529. <pub-id pub-id-type="doi">10.3389/fnhum.2022.844529</pub-id><pub-id pub-id-type="pmid">35634209</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Cao</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>The emergence of phonetic-phonological features in a biologically inspired model of speech processing</article-title>. <source>J. Phon.</source> <volume>53</volume>, <fpage>88</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2015.09.006</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Crawford</surname> <given-names>E.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>Eliasmith</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>Modeling interactions between speech production and perception: speech error detection at semantic and phonological levels and the inner speech loop</article-title>. <source>Front. Comput. Neurosci</source>. 10, 51. <pub-id pub-id-type="doi">10.3389/fncom.2016.00051</pub-id><pub-id pub-id-type="pmid">27303287</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Kannampuzha</surname> <given-names>J.</given-names></name> <name><surname>Eckers</surname> <given-names>C.</given-names></name> <name><surname>Heim</surname> <given-names>S.</given-names></name> <name><surname>Kaufmann</surname> <given-names>E.</given-names></name> <name><surname>Neuschaefer-Rube</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>&#x0201C;The neurophonetic model of speech processing ACT: structure, knowledge acquisition, and function modes,&#x0201D;</article-title> in <source>Cognitive Behavioural Systems, LNCS 7403</source>, eds A. Esposito, A. M. Esposito, A. Vinciarelli, R. Hoffmann, and V. C. M&#x000FC;ller (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>398</fpage>&#x02013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-34584-5_35</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Kannampuzha</surname> <given-names>J.</given-names></name> <name><surname>Kaufmann</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception</article-title>. <source>EPJ Nonlinear Biomed. Phys.</source> <volume>2</volume>, <fpage>2</fpage>. <pub-id pub-id-type="doi">10.1140/epjnbp15</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name> <name><surname>Stille</surname> <given-names>C.</given-names></name> <name><surname>Blouw</surname> <given-names>P.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>Stewart</surname> <given-names>T. C.</given-names></name></person-group> (<year>2020</year>). <article-title>Hierarchical sequencing and feedforward and feedback control mechanisms in speech production: a preliminary approach for modeling normal and disordered speech</article-title>. <source>Front. Comput. Neurosci</source>. 14, 99. <pub-id pub-id-type="doi">10.3389/fncom.2020.573554</pub-id><pub-id pub-id-type="pmid">33262697</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levelt</surname> <given-names>W. J. M.</given-names></name> <name><surname>Roelofs</surname> <given-names>A.</given-names></name> <name><surname>Meyer</surname> <given-names>A. S.</given-names></name></person-group> (<year>1999</year>). <article-title>A theory of lexical access in speech production</article-title>. <source>Behav. Brain Sci</source>. <volume>22</volume>, <fpage>1</fpage>&#x02013;<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1017/S0140525X99001776</pub-id><pub-id pub-id-type="pmid">11301520</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lichtheim</surname> <given-names>L.</given-names></name></person-group> (<year>1885</year>). <article-title>On aphasia</article-title>. <source>Brain</source> <volume>7</volume>, <fpage>433</fpage>&#x02013;<lpage>484</lpage>. <pub-id pub-id-type="doi">10.1093/brain/7.4.433</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Litwi&#x00144;czuk</surname> <given-names>M. C.</given-names></name> <name><surname>Muhlert</surname> <given-names>N.</given-names></name> <name><surname>Cloutman</surname> <given-names>L.</given-names></name> <name><surname>Trujillo-Barreto</surname> <given-names>N.</given-names></name> <name><surname>Woollams</surname> <given-names>A.</given-names></name></person-group> (<year>2022</year>). <article-title>Combination of structural and functional connectivity explains unique variation in specific domains of cognitive function</article-title>. <source>Neuroimage</source> <volume>262</volume>, <fpage>119531</fpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2022.119531</pub-id><pub-id pub-id-type="pmid">35931312</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maass</surname> <given-names>W.</given-names></name></person-group> (<year>1997</year>). <article-title>Networks of spiking neurons: the third generation of neural network models</article-title>. <source>Neural Netw.</source> <volume>10</volume>, <fpage>1659</fpage>&#x02013;<lpage>1671</lpage>. <pub-id pub-id-type="doi">10.1016/S0893-6080(97)00011-7</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>H. E.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2021</year>). <article-title>Modelling speech motor programming and apraxia of speech in the DIVA/GODIVA neurocomputational framework</article-title>. <source>Aphasiology</source> <volume>35</volume>, <fpage>424</fpage>&#x02013;<lpage>441</lpage>. <pub-id pub-id-type="doi">10.1080/02687038.2020.1765307</pub-id><pub-id pub-id-type="pmid">34108793</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nassif</surname> <given-names>A. B.</given-names></name> <name><surname>Shahin</surname> <given-names>I.</given-names></name> <name><surname>Attili</surname> <given-names>I. M.</given-names></name> <name><surname>Shaalan</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Speech recognition using deep neural networks: a systematic review</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>19143</fpage>&#x02013;<lpage>19165</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2896880</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Palmer</surname> <given-names>R.</given-names></name> <name><surname>Enderby</surname> <given-names>P.</given-names></name></person-group> (<year>2007</year>). <article-title>Methods of speech therapy treatment for stable dysarthria: a review</article-title>. <source>Adv. Speech Lang. Pathol.</source> <volume>9</volume>, <fpage>140</fpage>&#x02013;<lpage>153</lpage>. <pub-id pub-id-type="doi">10.1080/14417040600970606</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parrell</surname> <given-names>B.</given-names></name> <name><surname>Ramanarayanan</surname> <given-names>V.</given-names></name> <name><surname>Nagarajan</surname> <given-names>S.</given-names></name> <name><surname>Houde</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>The FACTS model of speech motor control: fusing state estimation and task-based control</article-title>. <source>PLoS Comput. Biol.</source> <volume>15</volume>, <fpage>e1007321</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1007321</pub-id><pub-id pub-id-type="pmid">31479444</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ponulak</surname> <given-names>F.</given-names></name> <name><surname>Kasinski</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>Introduction to spiking neural networks: information processing, learning and applications</article-title>. <source>Acta Neurobiol. Exp.</source> <volume>71</volume>, <fpage>409</fpage>&#x02013;<lpage>433</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/nlmcatalog?term=%22Acta&#x0002B;Neurobiol&#x0002B;Exp&#x0002B;%28Wars%29%22%5BTitle&#x0002B;Abbreviation%5D">https://www.ncbi.nlm.nih.gov/nlmcatalog?term=%22Acta&#x0002B;Neurobiol&#x0002B;Exp&#x0002B;%28Wars%29%22%5BTitle&#x0002B;Abbreviation%5D</ext-link><pub-id pub-id-type="pmid">22237491</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rockland</surname> <given-names>K. S.</given-names></name> <name><surname>Ichinohe</surname> <given-names>N.</given-names></name></person-group> (<year>2004</year>). <article-title>Some thoughts on cortical minicolumns</article-title>. <source>Exp. Brain Res.</source> <volume>158</volume>, <fpage>265</fpage>&#x02013;<lpage>277</lpage>. <pub-id pub-id-type="doi">10.1007/s00221-004-2024-9</pub-id><pub-id pub-id-type="pmid">15365664</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roelofs</surname> <given-names>A.</given-names></name></person-group> (<year>1992</year>). <article-title>A spreading-activation theory of lemma retrieval in speaking</article-title>. <source>Cognition</source> <volume>42</volume>, <fpage>107</fpage>&#x02013;<lpage>142</lpage>. <pub-id pub-id-type="doi">10.1016/0010-0277(92)90041-F</pub-id><pub-id pub-id-type="pmid">1582154</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roelofs</surname> <given-names>A.</given-names></name></person-group> (<year>1997</year>). <article-title>The WEAVER model of word-form encoding in speech production</article-title>. <source>Cognition</source> <volume>64</volume>, <fpage>249</fpage>&#x02013;<lpage>284</lpage>. <pub-id pub-id-type="doi">10.1016/S0010-0277(97)00027-9</pub-id><pub-id pub-id-type="pmid">9426503</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roelofs</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>A dorsal-pathway account of aphasic language production: the WEAVER&#x0002B;&#x0002B;/ARC model</article-title>. <source>Cortex</source> <volume>59</volume>, <fpage>33</fpage>&#x02013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1016/j.cortex.2014.07.001</pub-id><pub-id pub-id-type="pmid">25128898</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roger</surname> <given-names>V.</given-names></name> <name><surname>Farinas</surname> <given-names>J.</given-names></name> <name><surname>Pinquier</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <article-title>Deep neural networks for automatic speech processing: a survey from large corpora to limited data</article-title>. <source>J. Audio Speech Music Proc</source>. 2022, 19. <pub-id pub-id-type="doi">10.1186/s13636-022-00251-w</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>M. F.</given-names></name> <name><surname>Dell</surname> <given-names>G. S.</given-names></name> <name><surname>Martin</surname> <given-names>N.</given-names></name> <name><surname>Gahl</surname> <given-names>S.</given-names></name> <name><surname>Sobel</surname> <given-names>P.</given-names></name></person-group> (<year>2006</year>). <article-title>A case-series test of the interactive two-step model of lexical access: evidence from picture naming</article-title>. <source>J. Mem. Lang.</source> <volume>54</volume>, <fpage>228</fpage>&#x02013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2005.10.001</pub-id><pub-id pub-id-type="pmid">21085621</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Senft</surname> <given-names>V.</given-names></name> <name><surname>Stewart</surname> <given-names>T. C.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>Eliasmith</surname> <given-names>C.</given-names></name> <name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name></person-group> (<year>2016</year>). <article-title>Reduction of dopamine in basal ganglia and its effects on syllable sequencing in speech: a computer simulation study</article-title>. <source>Basal Ganglia</source> <volume>6</volume>, <fpage>7</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1016/j.baga.2015.10.003</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Senft</surname> <given-names>V.</given-names></name> <name><surname>Stewart</surname> <given-names>T. C.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>Eliasmith</surname> <given-names>C.</given-names></name> <name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name></person-group> (<year>2018</year>). <article-title>Inhibiting basal ganglia regions reduces syllable sequencing errors in parkinson&#x00027;s disease: a computer simulation study</article-title>. <source>Front. Comput. Neurosci.</source> <volume>12</volume>, <fpage>41</fpage>. <pub-id pub-id-type="doi">10.3389/fncom.2018.00041</pub-id><pub-id pub-id-type="pmid">29928197</pub-id></citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stefaniak</surname> <given-names>J. D.</given-names></name> <name><surname>Halai</surname> <given-names>A. D.</given-names></name> <name><surname>Lambon Ralph</surname> <given-names>M. A.</given-names></name></person-group> (<year>2020</year>). <article-title>The neural and neurocomputational bases of recovery from post-stroke aphasia</article-title>. <source>Nat. Rev. Neurol.</source> <volume>16</volume>, <fpage>43</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1038/s41582-019-0282-1</pub-id><pub-id pub-id-type="pmid">31772339</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stewart</surname> <given-names>T. C.</given-names></name> <name><surname>Eliasmith</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Large-scale synthesis of functional spiking neural circuits</article-title>. <source>Proc. IEEE</source> <volume>102</volume>, <fpage>881</fpage>&#x02013;<lpage>898</lpage>. <pub-id pub-id-type="doi">10.1109/JPROC.2014.2306061</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stille</surname> <given-names>C.</given-names></name> <name><surname>Bekolay</surname> <given-names>T.</given-names></name> <name><surname>Blouw</surname> <given-names>P.</given-names></name> <name><surname>Kr&#x000F6;ger</surname> <given-names>B. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Modeling the mental lexicon as part of long-term and working memory and simulating lexical access in a naming task including semantic and phonological cues</article-title>. <source>Front. Psychol.</source> <volume>11</volume>, <fpage>1594</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2020.01594</pub-id><pub-id pub-id-type="pmid">32774315</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tippett</surname> <given-names>D. C.</given-names></name> <name><surname>Hillis</surname> <given-names>A. E.</given-names></name> <name><surname>Tsapkini</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <article-title>Treatment of primary progressive aphasia</article-title>. <source>Curr. Treat. Options Neurol</source>. 17, 362. <pub-id pub-id-type="doi">10.1007/s11940-015-0362-5</pub-id></citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ueno</surname> <given-names>T.</given-names></name> <name><surname>Saito</surname> <given-names>S.</given-names></name> <name><surname>Rogers</surname> <given-names>T. T.</given-names></name> <name><surname>Lambon Ralph</surname> <given-names>M. A.</given-names></name></person-group> (<year>2011</year>). <article-title>Lichtheim 2: synthesizing aphasia and the neural basis of language in a neurocomputational model of the dual dorsal-ventral language pathways</article-title>. <source>Neuron</source> <volume>72</volume>, <fpage>385</fpage>&#x02013;<lpage>396</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2011.09.013</pub-id><pub-id pub-id-type="pmid">22017995</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van der Merwe</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>New perspectives on speech motor planning and programming in the context of the four- level model and its implications for understanding the pathophysio-logy underlying apraxia of speech and other motor speech disorders</article-title>. <source>Aphasiology</source> <volume>35</volume>, <fpage>397</fpage>&#x02013;<lpage>423</lpage>. <pub-id pub-id-type="doi">10.1080/02687038.2020.1765306</pub-id></citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Warlaumont</surname> <given-names>A. S.</given-names></name> <name><surname>Finnegan</surname> <given-names>M. K.</given-names></name></person-group> (<year>2016</year>). <article-title>Learning to produce syllabic speech sounds via reward-modulated neural plasticity</article-title>. <source>PLoS ONE</source> <volume>11</volume>, <fpage>e0145096</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0145096</pub-id><pub-id pub-id-type="pmid">26808148</pub-id></citation></ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weems</surname> <given-names>S. A.</given-names></name> <name><surname>Reggia</surname> <given-names>J. A.</given-names></name></person-group> (<year>2006</year>). <article-title>Simulating single word processing in the classic aphasia syndromes based on the Wernicke&#x02013;Lichtheim&#x02013;Geschwind theory</article-title>. <source>Brain Lang.</source> <volume>98</volume>, <fpage>291</fpage>&#x02013;<lpage>309</lpage>. <pub-id pub-id-type="doi">10.1016/j.bandl.2006.06.001</pub-id><pub-id pub-id-type="pmid">16828860</pub-id></citation></ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yamazaki</surname> <given-names>K.</given-names></name> <name><surname>Vo-Ho</surname> <given-names>V. K.</given-names></name> <name><surname>Bulsara</surname> <given-names>D.</given-names></name> <name><surname>Le</surname> <given-names>N.</given-names></name></person-group> (<year>2022</year>). <article-title>Spiking neural networks and their applications: a review</article-title>. <source>Brain Sci</source>. 12, 863. <pub-id pub-id-type="doi">10.3390/brainsci12070863</pub-id></citation>
</ref>
</ref-list>
<app-group>
<title>Appendix</title>
<app id="A1">
<title>Appendix A: second and third generation artificial neural networks</title>
<p>Second generation neural networks (node-and-link networks) are composed of nodes (ensembles of neurons) and links or edges (connections between nodes). Nodes can be interpreted from the neurobiological perspective as bundles of neighboring neurons, i.e., as a set of neurons within a small brain region. Nodes are functionally characterized by its activation level. The activation level is calculated from the input activation stemming from all preceding nodes which are connected to this node via neural links. Thus, a node does not directly represent a neuron in a narrow neurobiological sense but summarizes the neural activity of a set of neurons (i.e., spatial averaging) over a time interval (temporal averaging; not less than 10 ms in most of these networks; see the spatial temporal averaging approach, STAA approach, Kr&#x000F6;ger and Bekolay, <xref ref-type="bibr" rid="B37">2019</xref>, p. 133 ff).</p>
<p>An important characteristic of node-and-link networks is that nodes are organized in layers representing states like, e.g., phonological forms, lemmata, concepts, motor plans, auditory, or somatosensory states etc. If these layers and nodes of a second generation network need to be interpreted in a neuroanatomical context, a layer would represent a small portion of the cortical brain surface and a node would represent probably a cortical column. In this context it must be emphasized that these artificial neural network layers should not be confused with the neuroanatomically defined five cortical layers I, II, III, IV, and V ordered in parallel to the neocortical surface (see, e.g., Rockland and Ichinohe, <xref ref-type="bibr" rid="B53">2004</xref>). The layers of an artificial neural networks if there is a need to interpret them in neurobiologically usually represent different small areas within different cortical regions, e.g., areas for hosting and processing auditory forms, phonological forms, lemmata, and concepts in the temporal lobe, or areas for hosting motor plan and motor program forms in the frontal lobe. Moreover, a node within that specific artificial neural network layer can probably be assumed to represent all neurons appearing in a cortical column within the specific cortical area specified by that layer. This holds in the same way for the buffers (replacing layers) and spiking neurons (replacing nodes) as defined in the third generation spiking neural networks (see below). In these second or third generation network models the nodes of a layer (or spiking neurons in a buffer) are connected with the nodes (spiking neurons) of another layer (buffer). Thus, links (neural connections) between different layers or buffers can be interpreted as neural connections between cortical columns representing different cortical areas.</p>
<p>A further characteristic of second generation neural networks or node-and-link networks is their lack of explicit temporal processing. If the network has already be trained and if the network is running in performance mode (e.g., speech production or speech perception mode), the input layer of the network model normally is activated by a stimulus and a resulting neural activation pattern appears at the output layer in one simulation step (called &#x0201C;performance trial&#x0201D;) without any further temporal specification. In case of speech processing the input and output neural activation patterns (applied to the input layer and appearing at the output layer) thus need to encode the entire syllable-, word-, or phrase-sized auditory or motor pattern, so that the temporal organization of motor or auditory patterns are coded intrinsically in these neural activation patterns (see e.g., Kr&#x000F6;ger and Cao, <xref ref-type="bibr" rid="B39">2015</xref>).</p>
<p>In the case of learning or training of these second generation neural networks (training mode, e.g., for supervised learning) a set of input/output training stimuli is applied to the network several times, i.e., in several training epochs. Here, in a comparable way to the performance mode, one training step alone does not need any temporal specification (like one performance trial). But because during a training procedure the link weights of the neural network are altered in each training step and because the network during training slowly but increasingly performs better and better with an increasing number of training steps and training epochs, this leads to a notion of an increase in learning (increase in knowledge) over time.</p>
<p>Third generation neural networks (spiking neuron networks, SNNs, e.g., Ponulak and Kasinski, <xref ref-type="bibr" rid="B52">2011</xref>; Yamazaki et al., <xref ref-type="bibr" rid="B69">2022</xref>) aim for a neurobiologically plausible modeling of neurons and neural connections. Here, spiking neuron models are used for modeling the neurobiology of a neuron including synaptic connections coming from preceding neurons [i.e., modeling the increase of voltage of the cell membrane potential resulting from incoming presynaptic spikes, modeling the generation of a postsynaptic pulse (or spike) if the firing threshold of the membrane potential is reached, modeling the post-spike refractory period for the membrane potential, modeling temporal delay stemming from synaptic input connections, etc.]. Thus, the temporal features of signal processing are modeled in a more neurobiologically inspired way in third generation neural networks and all temporal features need not to be set externally for this type of networks in comparison to second generation neural networks (e.g., the setting of activation decay rates for nodes, see Dell et al., <xref ref-type="bibr" rid="B17">2013</xref>; Roelofs, <xref ref-type="bibr" rid="B56">2014</xref>). In third generation network models temporal parameters are directly controlled by the synapse model (temporal delay for transforming an incoming presynaptic spike in a specific postsynaptic current for increasing/decreasing the membrane potential depending on the type of synaptic connection: exhibitory or inhibitory) and by the kernel or cell model of the neuron (setting the duration of the post-spike latency period and of the time constant for rate of increase/decrease of membrane potential in case of a presynaptic spike entering an exhibitory/inhibitory synaptic connection).</p>
<p>In third generation or spiking neural networks time is modeled directly within its basic unit, i.e., within the spiking neuron model including its synapses. Simulations can be performed here as a function of time. Even incoming static signals are applied to an input buffer of the model during a defined time interval leading to defined spike trains for further processing with the neural network. Thus, in the NEF-SPA framework (see <xref ref-type="app" rid="A2">Appendix B</xref>) the forwarding of a neural state from buffer to buffer with or without further processing by intermediately connected associative memories leads to a specific delay between input and output signal as a result of a neurobiologically inspired synaptic processing (here for leaky integrate-and-fire neurons, LIF neurons, see Eliasmith, <xref ref-type="bibr" rid="B18">2013</xref>). This delay in processing is about 50 ms from phonological form to concept activation (perception pathway) or vice versa from concept to phonological form activation (production pathway) in our ACT model (see e.g., Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>). Moreover, this implicit modeling of time allows a straight-forward modeling of action selection as is needed, e.g., for task execution even in case of simple speech tasks (see e.g., Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>).</p>
<p>In order to include temporal aspects in second generation network models a temporal model can be added to second generation neural models as done for WEAVER and for DIVA/GODIVA. Here, an activation time interval is defined (e.g., 10 ms) and neural activation is recalculated for a sequence of these time intervals. Here, neural activation of each node within each layer decreases to a certain degree per time interval and the activation of each node per time interval results from the activation level of the preceding time step which is altered only slightly by each new incoming inhibitory or excitatory activation from presynaptic nodes in a current time step. This as well allows a modeling of action selection processes as introduced by Bohland et al. (<xref ref-type="bibr" rid="B7">2010</xref>), e.g., for modeling the chunking of a phonological input chain with respect to motor program selection.</p>
</app>
<app id="A2">
<title>Appendix B: The NEF-SPA framework of third generation spiking neural networks</title>
<p>The NEF-SPA framework (Neural Engineering Framework, NEF, augmented by and Semantic Pointer Architecture, see Eliasmith and Anderson, <xref ref-type="bibr" rid="B19">2003</xref>; Eliasmith, <xref ref-type="bibr" rid="B18">2013</xref>; Stewart and Eliasmith, <xref ref-type="bibr" rid="B62">2014</xref>, <xref ref-type="app" rid="A2">Appendix B</xref>) allows the development of large-scale brain models including peripheral modules (i.e., for sensory input and motor output processing, see Eliasmith et al., <xref ref-type="bibr" rid="B20">2012</xref>) by using a third generation neural network approach. This framework delivers basic elements for hosting neural states (neuron ensembles and neuron buffers) and for forwarding and/or processing neural information by defining the synaptic weights of all neural connections associating two neuron ensembles or neuron buffers. The default neuron model is the leaky integrate-and-fire neuron model, i.e., a spiking neuron model capable of modeling synaptic processing (excitatory as well as inhibitory synaptic connections) and capable of modeling all temporal features concerning the increase or decrease of the membrane potential resulting from incoming presynaptic spikes as well as capable of triggering postsynaptic spikes. Cognitive and higher level sensory and motor states are hosted in this network type by higher-level state buffers, also called SPA-buffers (e.g., lexical states, see Section 5 of this paper). Lower-level motor states (i.e., syllable oscillators and gesture movement trajectory estimators; see Section 5 of this paper) as well as lower-level auditory states are hosted in this network type by lower-level state buffers, called neuron ensembles or NEF-ensembles.</p>
<p>The Semantic Pointer Architecture SPA which is based on the NEF represents cognitive and higher-level sensory and motor states in form of vectors in a D-dimensional vector space. Different vector spaces need to be defined for different types of items, e.g., for words, lemmata, phonological forms, motor plans, motor programs as well as for higher level auditory, somatosensory, and phonetic forms representing syllables. Semantic, grammatical, or phonological similarities as well as motor plan or motor program similarities or similarities of higher-level auditory, somatosensory, and phonetic forms can be modeled and stored as sets of S-pointers in Semantic Pointer Networks (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>). Typical examples for similarities are (i) at the semantic level: e.g., &#x0201C;boy&#x0201D; and &#x0201C;girl&#x0201D; are similar items with respect to the subordinate item &#x0201C;humans&#x0201D;, &#x0201C;dog,&#x0201D; and &#x0201C;cat&#x0201D; are similar items with respect to the subordinate item &#x0201C;animals&#x0201D;; (ii) at the lemma level: e.g., &#x0201C;dog&#x0201D; and &#x0201C;cat&#x0201D; are nouns, &#x0201C;to bark&#x0201D; and &#x0201C;to meow&#x0201D; are verbs; (ii) at other levels: e.g., the syllables/dog/and/dodge/are phonological as well as phonetically, auditorily and motorically similar, because both words start with same consonant followed by the same vowel, etc.</p>
<p>A semantic pointer or S-pointer is a mathematical construct pointing on a specific item and on its neural state (e.g., &#x0201C;dog&#x0201D;, &#x0201C;&#x02018;cat&#x0201D;). Different sets of S-pointers appear in different D-dimensional vector spaces and thus define different item categories (e.g., concept, lemma, phonological form, etc.). Each S-pointer defines its own neural activation pattern which represents one item as neural state in a state buffer. State buffers are implemented in the NEF-SPA framework as a set of D neuron ensembles (NEF-ensembles) where each neuron ensemble is a set of N neurons (typically: N = 20 &#x000B7;&#x000B7;&#x000B7; 100) representing a &#x0201C;value&#x0201D; while the SPA-buffer can represent a whole D-dimensional S-pointer (typically: D = 500 in case of representing a full vocabulary of a language, Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>). This concept of the SPA based on the NEF allows a straight-forward implementation and a direct combination of cognitive-linguistic modules (mainly using SPA-buffers) with (lower-level) sensorimotor modules (mainly using NEF-ensembles) for building up large-scale spiking neural networks, e.g., for speech processing (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>).</p>
<p>Concerning the use of associative memories within the NEF-SPA framework (see below, <xref ref-type="app" rid="A3">Appendix C</xref>) it should be mentioned that a direct connection of two buffers just leads to a (simple) forwarding of neural information without further information processing. Synaptic weights here model a (new) coding of each S-pointer in each buffer, but the underlying meaning of an activated neural state (of an item) in simply connected buffers remains the same, e.g., a phonological item stays as the same phonological item in the next buffer, even if the neural activation pattern (i.e., the coding and decoding of neural activity in each buffer) is different from buffer to buffer. In order to transform a state from one vocabulary to another vocabulary (e.g., from phonological forms to lemmata and so on) an associative memory need to be interposed between both buffers. This makes the modeling of neural pathways a little more complex but allows the modeling of neural dysfunctions in buffers as well as in neural pathway by using the same process, i.e., by ablating neurons. Thus, ablating neurons in buffers is used for modeling dysfunctions within buffers while ablating neurons within associative memories is used for modeling dysfunctions within the neural pathways between buffers (see, e.g., Stille et al., <xref ref-type="bibr" rid="B63">2020</xref>).</p>
<p>The neurobiologically-inspired motivation for the definition of all basic elements used within the NEF-SPA framework (e.g., buffers, memories, simple neural connection pathways, S-pointer networks, binding and unbinding buffers, etc.) is motivated by the theoretical background delivered for the NEF-SPA framework (see Eliasmith, <xref ref-type="bibr" rid="B18">2013</xref>; Stewart and Eliasmith, <xref ref-type="bibr" rid="B62">2014</xref>). It has been proved that only few basic NEF and SPA elements allow the development of large-scale brain models capable of modeling a wide range of human behavior (cognitive as well as sensorimotor aspects of behavior, see Eliasmith et al., <xref ref-type="bibr" rid="B20">2012</xref>; Eliasmith, <xref ref-type="bibr" rid="B18">2013</xref>; Stewart and Eliasmith, <xref ref-type="bibr" rid="B62">2014</xref>).</p>
</app>
<app id="A3">
<title>Appendix C: The ACT model</title>
<p>A third generation SNN (see <xref ref-type="app" rid="A1">Appendix A</xref>), developed in the NEF-SPA context (see <xref ref-type="app" rid="A2">Appendix B</xref>), is used for modeling the cognitive-linguistic model part as well as for modeling the production side of the sensorimotor model, i.e., the implementation of motor plan, motor program, and motor execution buffer in our ACT model (see <xref ref-type="fig" rid="F1">Figure 1A</xref> and see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B40">2016</xref>, <xref ref-type="bibr" rid="B43">2020</xref>, <xref ref-type="bibr" rid="B38">2022</xref>; the name ACT is based on an early modeling of the sensorimotor part of the model: speech action model, see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B41">2012</xref>). The neural pathways and mappings connecting the lexical and sensorimotor state buffers are realized using intermediate associative memories. Associative memories as further elements which can be incorporated in neural pathways connecting SPA-buffers are needed in order to map states from one item category onto states of another item category (e.g., concepts onto lemmata or lemmata onto phonological forms and vice versa; see below). The semantic pointer networks for concepts, lemmata, and phonological forms are represented and stored as part of the mental lexicon (<xref ref-type="fig" rid="F1">Figure 1A</xref>) and the semantic pointer networks for motor plans, motor programs, auditory and somatosensory states of syllables are represented and stored as part of the mental syllabary (Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B41">2012</xref>). Not implemented thus far in terms of this NEF-SPA third generation neural network is the feedback loop within the sensorimotor part of our model sketch (i.e., somatosensory and auditory state, target, and error buffers for auditory and somatosensory processing, see also <xref ref-type="fig" rid="F1">Figures 1A</xref>, <xref ref-type="fig" rid="F1">C</xref>). But the sensorimotor part exemplified in our model sketch is already implemented as second generation neural network (node-and-link network) and is used for simulating early states of speech acquisition like the babbling phase and the imitation phase of newborns and toddlers (see Kr&#x000F6;ger et al., <xref ref-type="bibr" rid="B42">2014</xref>, <xref ref-type="bibr" rid="B36">2019</xref>; Kr&#x000F6;ger and Cao, <xref ref-type="bibr" rid="B39">2015</xref>). A shortcoming of the second generation network approach is that lower-level auditory, somatosensory, as well as motor states of syllables, words or phrases cannot be represented in a temporal flexible way. Here, we need to define a fixed time window (e.g., of about 500 ms) for all types of states representing syllables, words, or phrases.</p>
</app>
</app-group>
</back>
</article>