<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2021.730570</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Language Models Explain Word Reading Times Better Than Empirical Predictability</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Hofmann</surname> <given-names>Markus J.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/25011/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Remus</surname> <given-names>Steffen</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Biemann</surname> <given-names>Chris</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Radach</surname> <given-names>Ralph</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/373750/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Kuchinke</surname> <given-names>Lars</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/25987/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Psychology, University of Wuppertal</institution>, <addr-line>Wuppertal</addr-line>, <country>Germany</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Informatics, Universit&#x000E4;t Hamburg</institution>, <addr-line>Hamburg</addr-line>, <country>Germany</country></aff>
<aff id="aff3"><sup>3</sup><institution>International Psychoanalytic University</institution>, <addr-line>Berlin</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Massimo Stella, University of Exeter, United Kingdom</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Aaron Veldre, The University of Sydney, Australia; Marco Silvio Giuseppe Senaldi, McGill University, Canada; David Balota, Washington University in St. Louis, United States, in collaboration with reviewer AK; Abhilasha Kumar, Indiana University, United States, in collaboration with reviewer DB</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Markus J. Hofmann <email>mhofmann&#x00040;uni-wuppertal.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Language and Computation, a section of the journal Frontiers in Artificial Intelligence</p></fn></author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>02</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>730570</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>06</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>12</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Hofmann, Remus, Biemann, Radach and Kuchinke.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Hofmann, Remus, Biemann, Radach and Kuchinke</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Though there is a strong consensus that word length and frequency are the most important single-word features determining visual-orthographic access to the mental lexicon, there is less agreement as how to best capture syntactic and semantic factors. The traditional approach in cognitive reading research assumes that word predictability from sentence context is best captured by cloze completion probability (CCP) derived from human performance data. We review recent research suggesting that probabilistic language models provide deeper explanations for syntactic and semantic effects than CCP. Then we compare CCP with three probabilistic language models for predicting word viewing times in an English and a German eye tracking sample: (1) Symbolic n-gram models consolidate syntactic and semantic short-range relations by computing the probability of a word to occur, given two preceding words. (2) Topic models rely on subsymbolic representations to capture long-range semantic similarity by word co-occurrence counts in documents. (3) In recurrent neural networks (RNNs), the subsymbolic units are trained to predict the next word, given all preceding words in the sentences. To examine lexical retrieval, these models were used to predict single fixation durations and gaze durations to capture rapidly successful and standard lexical access, and total viewing time to capture late semantic integration. The linear item-level analyses showed greater correlations of all language models with all eye-movement measures than CCP. Then we examined non-linear relations between the different types of predictability and the reading times using generalized additive models. N-gram and RNN probabilities of the present word more consistently predicted reading performance compared with topic models or CCP. For the effects of last-word probability on current-word viewing times, we obtained the best results with n-gram models. Such count-based models seem to best capture short-range access that is still underway when the eyes move on to the subsequent word. The prediction-trained RNN models, in contrast, better predicted early preprocessing of the next word. In sum, our results demonstrate that the different language models account for differential cognitive processes during reading. We discuss these algorithmically concrete blueprints of lexical consolidation as theoretically deep explanations for human reading.</p></abstract>
<kwd-group>
<kwd>language models</kwd>
<kwd>n-gram model</kwd>
<kwd>topic model</kwd>
<kwd>recurrent neural network model</kwd>
<kwd>predictability</kwd>
<kwd>generalized additive models</kwd>
<kwd>eye movements</kwd>
</kwd-group>
<contract-sponsor id="cn001">Deutsche Forschungsgemeinschaft<named-content content-type="fundref-id">10.13039/501100001659</named-content></contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="8"/>
<equation-count count="4"/>
<ref-count count="84"/>
<page-count count="20"/>
<word-count count="15234"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Concerning the influence of single-word properties, there is a strong consensus in the word recognition literature that word length and frequency are the most reliable predictors of lexical access (e.g., Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>; New et al., <xref ref-type="bibr" rid="B53">2006</xref>; Adelman and Brown, <xref ref-type="bibr" rid="B1">2008</xref>; Brysbaert et al., <xref ref-type="bibr" rid="B11">2011</xref>). Though for instance, Baayen (<xref ref-type="bibr" rid="B3">2010</xref>) suggests that a large part of the variance explained by word frequency is better explained by contextual word features, we here use these single-word properties as a baseline to set the challenge for contextual word properties to explain more variance than the single-word properties.</p>
<p>In contrast to single-word frequency, the question of how to best capture contextual word properties is controversial. The traditional psychological predictor variables are based on human performance. When aiming to quantify how syntactic and semantic contextual word features influence the reading of the present word, Taylor&#x00027;s (<xref ref-type="bibr" rid="B77">1953</xref>) cloze completion probability (CCP) still represents the performance-based state of the art for predicting sentence reading in psychological research (Kutas and Federmeier, <xref ref-type="bibr" rid="B40">2011</xref>; Staub, <xref ref-type="bibr" rid="B75">2015</xref>). Participants of a pre-experimental study are given a sentence with a missing word, and the relative number of participants completing the respective word are then taken to define CCP. This human performance is then used to account for another human performance such as reading. Westbury (<xref ref-type="bibr" rid="B81">2016</xref>), however, suggests that a to-be-explained variable, the explanandum, should be selected from a different domain than the explaining variable, the explanans (Hempel and Oppenheim, <xref ref-type="bibr" rid="B28">1948</xref>). When two directly observable variables, such as CCP and reading times, are connected, for instance Feigl (<xref ref-type="bibr" rid="B20">1945</xref>, p. 285) suggests that this corresponds to a &#x02018;&#x0201C;low-grade&#x00027; explanation.&#x0201D; Models of eye movement control, however, were &#x0201C;not intended to be a deep explanation of language processing, [&#x02026; because they do] not account for the many effects of higher-level linguistic processing on eye movements&#x0201D; (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>, p. 450).</p>
<p>Language models offer a deeper level of explanation, because they computationally specify how the prediction is generated. Therefore, they incorporate what can be called the three mnestic stages of the mental lexicon (cf. Paller and Wagner, <xref ref-type="bibr" rid="B58">2002</xref>; Hofmann et al., <xref ref-type="bibr" rid="B30">2018</xref>). All memory starts with experience, which is reflected by a text corpus (cf. Hofmann et al., <xref ref-type="bibr" rid="B32">2020</xref>). The language models provide an algorithmic description of how long-term lexical knowledge is consolidated from this experience (Landauer and Dumais, <xref ref-type="bibr" rid="B41">1997</xref>; Hofmann et al., <xref ref-type="bibr" rid="B30">2018</xref>). Based on the consolidated syntactic and semantic lexical knowledge, language models are then exposed to the same materials that participants read and thus predict lexical retrieval. In the present study, we evaluate their predictions for viewing times during sentence reading (e.g., Staub, <xref ref-type="bibr" rid="B75">2015</xref>).</p>
<p>We will compare CCP as a human-performance based explanation of reading against three types of language models. The probability that a word occurs, given two preceding words, is reflected in n-gram models, which capture syntactic and short-range semantic knowledge (cf. e.g., Kneser and Ney, <xref ref-type="bibr" rid="B39">1995</xref>; McDonald and Shillcock, <xref ref-type="bibr" rid="B48">2003a</xref>). This is a fully symbolic model, because the smallest unit of meaning representation consist of words. Second, we test topic models that are trained from word co-occurrence in documents, thus reflecting long-range semantics (Landauer and Dumais, <xref ref-type="bibr" rid="B41">1997</xref>; Blei et al., <xref ref-type="bibr" rid="B8">2003</xref>; Griffiths et al., <xref ref-type="bibr" rid="B26">2007</xref>; Pynte et al., <xref ref-type="bibr" rid="B61">2008a</xref>). Finally, recurrent neural networks (RNNs) most closely reflect the cloze completion procedure, because their hidden units are trained to predict a target word by all preceding words in a sentence (Elman, <xref ref-type="bibr" rid="B16">1990</xref>; Frank, <xref ref-type="bibr" rid="B21">2009</xref>; Mikolov, <xref ref-type="bibr" rid="B50">2012</xref>). In contrast to the n-gram model, topic and RNN models distribute the meaning of a word across several subsymbolic units that do not represent human-understandable meaning by themselves.</p>
<sec>
<title>Eye Movements and Cloze Completion Probabilities</title>
<p>While the eyes sweep over a sequence of words during reading, they remain relatively still for some time, which is generally called fixation duration (e.g., Inhoff and Radach, <xref ref-type="bibr" rid="B33">1998</xref>; Rayner, <xref ref-type="bibr" rid="B65">1998</xref>). A very fast and efficient word recognition is obtained when a word can be recognized at a single glance. In this type of fixation event, the single-fixation duration (SFD) informs about this rapid and successful lexical access. When further fixations are required to recognize the word before the eyes move on to the next word, the duration of these further fixations is added to the gaze duration (GD). This is an eye movement measure that reflects &#x0201C;standard&#x0201D; lexical access in all words, while it may also represent syntactic and semantic integration (cf. e.g., Inhoff and Radach, <xref ref-type="bibr" rid="B33">1998</xref>; Rayner, <xref ref-type="bibr" rid="B65">1998</xref>; Radach and Kennedy, <xref ref-type="bibr" rid="B64">2013</xref>). Finally, the eyes may come back to the respective word, and when the fixation times of these further fixations are added, this is reflected in the total viewing time (TVT)&#x02014;an eye movement measure that reflects the full semantic integration of a word into the current language context (e.g., Radach and Kennedy, <xref ref-type="bibr" rid="B64">2013</xref>).</p>
<p>Though CCP is known to affect all three types of fixation times, the result patterns considerably vary between studies (see e.g., Frisson et al., <xref ref-type="bibr" rid="B24">2005</xref>; Staub, <xref ref-type="bibr" rid="B75">2015</xref>; Brothers and Kuperberg, <xref ref-type="bibr" rid="B10">2021</xref>, for reviews). A potential explanation is that CCP represents an all-in variable (Staub, <xref ref-type="bibr" rid="B75">2015</xref>). The cloze can be completed because the word is expected from syntactic, semantic and/or event-based information&#x02014;a term that refers to idiomatic expressions in very frequently co-occurring words (cf. Staub et al., <xref ref-type="bibr" rid="B76">2015</xref>).</p>
<p>By shedding light on the consolidation mechanisms, language models are expected to complement future models of eye-movement control, which do not provide a deep explanation to linguistic processes (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>, p. 450). Models of eye-movement control, however, provide valuable insights how lexical access and eye-movements interact. These models assume that lexical access is primarily driven by word length, frequency and CCP-based predictability of the presently fixated word (e.g., Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>; Engbert et al., <xref ref-type="bibr" rid="B18">2005</xref>; Snell et al., <xref ref-type="bibr" rid="B73">2018</xref>). This reflects the simplifying eye-mind assumption, which &#x0201C;posits that the interpretation of a word occurs while that word is being fixated, and that the eye remains fixated on that word until the processing has been completed&#x0201D; (Just and Carpenter, <xref ref-type="bibr" rid="B34">1984</xref>, p. 169). Current models of eye-movement control, however, reject the idea that lexical processing exclusively occurs during the fixation of a word (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>; Engbert et al., <xref ref-type="bibr" rid="B18">2005</xref>; see also Anderson et al., <xref ref-type="bibr" rid="B2">2004</xref>; Kliegl et al., <xref ref-type="bibr" rid="B38">2006</xref>). Lexical processing can still be underway when the eyes move on to the subsequent word, which can occur, for instance, if the first word is particularly difficult to process (Reilly and Radach, <xref ref-type="bibr" rid="B67">2006</xref>). Therefore, lexical processing of the last word can still have a considerable impact on the viewing times of the currently fixated word. Moreover, when a word is currently fixated at a point in time, lexical access of the next word can already start (Reilly and Radach, <xref ref-type="bibr" rid="B67">2006</xref>).</p>
<p>When trying to characterize the time course of single-word and contextual word properties, for instance the EZ-reader model suggests that there are two stages of lexical processing that are both influenced by word frequency and predictability. The first stage represents a &#x0201C;familiarity&#x0201D; check and the identification of an orthographic word form&#x02014;this stage is primarily driven by word frequency. The second stage additionally involves the (phonological and) semantic word form&#x02014;therefore, CCP has a stronger impact on this stage of processing. Please also note that attention can already shift to the next word, while the present word is fixated. When the next word is short, highly frequent and/or highly predictable, it can be skipped, and the saccade is programmed toward the word after the next word (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>).</p>
</sec>
<sec>
<title>Language Models in Eye Movement Research</title>
<sec>
<title>Symbolic Representations in N-Gram Models</title>
<p>Symbolic n-gram models are so-called count-based models (Baroni et al., <xref ref-type="bibr" rid="B5">2014</xref>; Mandera et al., <xref ref-type="bibr" rid="B45">2017</xref>). Cases in which all n words co-occur are counted and related to the count of the preceding n-1 words in a text corpus. McDonald and Shillcock (<xref ref-type="bibr" rid="B48">2003a</xref>) were the first who tested whether a simple 2-gram model can predict eye movement data. They calculated the transitional probability that a word occurs at position n given the preceding word at position <italic>n</italic>-1. Then they paired preceding verbs with likely and less likely target nouns and showed significant effects on early SFD, but no effects on later GD (but see Frisson et al., <xref ref-type="bibr" rid="B24">2005</xref>). Effects on GD were subsequently revealed using multiple regression analyses of eye movements, suggesting that 2-gram models also account for lexical access in all words (McDonald and Shillcock, <xref ref-type="bibr" rid="B49">2003b</xref>; see also Demberg and Keller, <xref ref-type="bibr" rid="B14">2008</xref>). McDonald and Shillcock (<xref ref-type="bibr" rid="B49">2003b</xref>) discussed that the 2-gram transitional probability reflects a relatively low-level process, while it does probably not capture high-level conceptual knowledge, corroborating the assumption that n-gram models reflect syntactic and short-range semantic information. Boston et al. (<xref ref-type="bibr" rid="B9">2008</xref>) analyzed the viewing times in the Potsdam Sentence Corpus (PSC, Kliegl et al., <xref ref-type="bibr" rid="B37">2004</xref>) and found effects of transitional probability for all three fixation measures (SFD, GD, and TVT). Moreover, they found that these effects were descriptively larger than the CCP effects (see also Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>).</p>
<p>Smith and Levy (<xref ref-type="bibr" rid="B72">2013</xref>, p. 303) examined larger sequences of words by using a 3-gram model to show last- and present-word probability effects on GD during discourse reading (Kneser and Ney, <xref ref-type="bibr" rid="B39">1995</xref>). Moreover, they showed that these n-gram probability effects are logarithmic (but cf. Brothers and Kuperberg, <xref ref-type="bibr" rid="B10">2021</xref>). For their statistical analyses, Smith and Levy (<xref ref-type="bibr" rid="B72">2013</xref>) selected generalized additive models (GAMs) that can well capture the phenomenon that a predictor may perform better or worse in certain range of the predictor variable. They showed that the 3-gram probability of the last word still has a considerable impact on the GDs of the current word. Therefore, this type of language model can well predict that contextual integration of the last word is still underway at the fixation of the current word. Of some interest is that Smith and Levy (<xref ref-type="bibr" rid="B72">2013</xref>) suggest that CCP may predict reading performance well, when comparing extremely predictable with extremely unpredictable words. Hofmann et al. (<xref ref-type="bibr" rid="B29">2017</xref>, e.g., <bold>Figure 3</bold>), however, provide data showing that a 3-gram model may provide more accurate predictions at the lower end of the predictability distribution.</p>
</sec>
<sec>
<title>Latent Semantic Dimensions</title>
<p>The best-known computational approach to semantics in psychology is probably latent semantic analysis (LSA, Landauer and Dumais, <xref ref-type="bibr" rid="B41">1997</xref>). A factor-analytic-inspired approach is used to compute latent semantic dimensions that determine which words do frequently occur together in documents. This allows to address the long-range similarity of words and sentences by calculating the cosine distance (Deerwester et al., <xref ref-type="bibr" rid="B13">1990</xref>). Wang et al. (<xref ref-type="bibr" rid="B80">2010</xref>) addressed the influence of transitional probability and LSA similarity of the target to the preceding content word. They found that transitional probability predicts lexical access, while the long-range semantics reflected by LSA particularly predicts late semantic integration [but see Pynte et al. (<xref ref-type="bibr" rid="B61">2008a</xref>,<xref ref-type="bibr" rid="B62">b</xref>) for LSA effects on SFD and GD, and Luke and Christianson (<xref ref-type="bibr" rid="B44">2016</xref>) for LSA effects on TVT, GD, and even earlier viewing time measures].</p>
<p>In a recent study, Bianchi et al. (<xref ref-type="bibr" rid="B6">2020</xref>) contrasted the GD predictions of an n-gram model with the predictions of an LSA-based match of the current word with the preceding nine words during discourse reading. They found that LSA did not provide effects over and above the n-gram model. The LSA-based predictions improved, however, when further adding the LSA-based contextual match of the next word. This indicates that such a document-level, long-range type of semantics might be particularly effective when taking the predictabilities of the non-fixated words into account.</p>
<p>LSA has been challenged by another dimension-reducing approach in accounting for eye movement data. Blei et al. (<xref ref-type="bibr" rid="B8">2003</xref>) introduced the topic model as a Bayesian, mere probabilistic language modeling alternative. Much as LSA, topic models are trained to reflect long-range relations based on the co-occurrence of words in documents. Griffiths et al. (<xref ref-type="bibr" rid="B26">2007</xref>) showed that topic models provide better model performance than LSA in many psychological tasks, such as synonym judgment or semantic priming. They calculated the probability of a word to occur, given topical matches with the preceding words in the sentence. This topic model-based predictor, but not LSA cosine accounted for Sereno&#x00027;s et al. (<xref ref-type="bibr" rid="B70">1992</xref>) finding that GDs and TVTs of a subordinate meaning are larger than in a frequency-matched non-ambiguous word (Sereno et al., <xref ref-type="bibr" rid="B70">1992</xref>).</p>
<p>Though Hofmann et al. (<xref ref-type="bibr" rid="B29">2017</xref>) also found topic model effects on SFD data, their results suggested that long-range semantics provides comparably poor predictions. The short-range semantics and syntax provided by n-gram models, in contrast, provided a much better performance, particularly when the language models are trained by a corpus consisting of movie and film subtitles. In sum, the literature on document-level semantics presently provides no consistent picture. Long-range semantic effects might be comparably small (e.g., Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>), but they may be more likely to deliver consistent results when the analysis is not constrained to the long-range contextual match of the present, but also of other words (Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>). A more consistent picture might emerge, when also short-range predictability is considered, as reflected e.g., in n-gram models (Wang et al., <xref ref-type="bibr" rid="B80">2010</xref>; Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>).</p>
</sec>
<sec>
<title>(Recurrent) Neural Networks</title>
<p>Neural network models are deeply rooted in the tradition of connectionist modeling (e.g., Seidenberg and McClelland, <xref ref-type="bibr" rid="B69">1989</xref>; McClelland and Rogers, <xref ref-type="bibr" rid="B47">2003</xref>). In the last decade, these models were advanced in the machine learning community to successfully recognize pictures or machine translation (e.g., LeCun et al., <xref ref-type="bibr" rid="B42">2015</xref>). In the processing of word stimuli, one of the most well-known of these models is the word2vec model, in which a set of hidden units is for instance trained to predict the surrounding words by the present word (Mikolov et al., <xref ref-type="bibr" rid="B51">2013</xref>). This model is able to predict association ratings (Hofmann et al., <xref ref-type="bibr" rid="B30">2018</xref>) or semantic priming (e.g., Mandera et al., <xref ref-type="bibr" rid="B45">2017</xref>). The neural network that most closely approximates the cloze task, however, is the recurrent neural network model (RNN), because it is trained to predict the next word by the preceding sentence context. In RNN models, words are presented at an input layer, and a set of hidden units is trained to predict the probability of the next word at the output layer (Elman, <xref ref-type="bibr" rid="B16">1990</xref>). The hidden layer is copied to a (recurrent) context layer after the presentation of each word. Thus, the network gains a computationally concrete form of short-term memory (Mikolov et al., <xref ref-type="bibr" rid="B51">2013</xref>). Such a network provides large hidden-unit cosine distances between syntactic classes such as verbs and nouns, lower between non-living and living objects, and even lower between mammals and fishes, suggesting that RNNs reflect syntactic and short-range semantic information at the level of the sentence (Elman, <xref ref-type="bibr" rid="B16">1990</xref>). Frank and Bod (<xref ref-type="bibr" rid="B22">2011</xref>) show that RNNs can account for syntactic effects in viewing times, because they absorb variance previously explainable by a hierarchical phrase-structure approach.</p>
<p>Frank (<xref ref-type="bibr" rid="B21">2009</xref>) used a simple RNN to successfully predict GDs during discourse reading. When adding transitional probability to their multiple regression analyses, both predictors revealed significant effects. Such a result demonstrates that prediction-based models such as RNNs and count-based n-gram models probably reflect different types of &#x0201C;predictability.&#x0201D; Hofmann et al. (<xref ref-type="bibr" rid="B29">2017</xref>) showed that an n-gram model, a topic model, and an RNN model together can significantly outperform CCP for the prediction of SFD. It is, however, unclear whether this finding can be replicated in a different data set and generalized to other viewing time measures.</p>
<p>Some recent studies compared other types of neural network models to CCP (Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>; Wilcox et al., <xref ref-type="bibr" rid="B82">2020</xref>; Lopukhina et al., <xref ref-type="bibr" rid="B43">2021</xref>). For example, Bianchi et al. (<xref ref-type="bibr" rid="B6">2020</xref>) explored the usefulness of word2vec. Because they did not find stable word2vec predictions for eye movement data, they decided against a closer examination of this approach. Rather they relied on fasttext&#x02014;another non-recurrent neural model, in which the hidden units are trained to predict the present word by the surrounding language context (Mikolov et al., <xref ref-type="bibr" rid="B52">2018</xref>). Moreover, Bianchi et al. (<xref ref-type="bibr" rid="B6">2020</xref>) evaluated the performance of an n-gram model and LSA. When comparing the performance of these language models, they obtained the most reliable GD predictions for their n-gram model, followed by CCP, while LSA and fasttext provided relatively poor predictions. In sum, studies comparing CCP to language models support the view that CCP-based and language-model-based predictors account for different though partially overlapping variances in eye-movement data (Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>; Lopukhina et al., <xref ref-type="bibr" rid="B43">2021</xref>) that seem related to syntactic, as well as early and late semantic processing during reading.</p>
</sec>
<sec>
<title>The Present Study</title>
<p>The present study was designed to overcome the limitations of the pilot study of Hofmann et al. (<xref ref-type="bibr" rid="B29">2017</xref>), which compared an n-gram, a topics and an RNN model with respect to the prediction of CCP, electrophysiological and SFD data in only the PSC data set. They found that RNN models and n-gram models provide a similar performance in predicting these data, while the topics model made remarkably worse predictions. In the present study, we focused on eye movements and aimed to replicate the SFD effects with a second sample, which was published by Schilling et al. (<xref ref-type="bibr" rid="B68">1998</xref>). Moreover, we aimed to examine the dynamics of lexical processing. By modeling a set of three viewing time parameters (SFD, GD and TVT), we will be able to compare the predictions of CCP and different language models for early rapid (SFD) and standard (GD) lexical access, and their predictions for full semantic integration (TVT). In their linear multiple regression analysis on item-level data, Hofmann et al. (<xref ref-type="bibr" rid="B29">2017</xref>) found that the three language models together account for around 30% of reproducible variance in SFD data&#x02014;as opposed to 18% for the CCP model. Though the three language models together significantly outperformed the CCP-based approach, they used Fisher-Yates significance <italic>z</italic>-to-<italic>t</italic>-tests as a conservative approach, because aggregating over items results in a strong loss of variance. Therefore, n-gram and RNN models alone outperformed CCP always at a descriptive level, but the differences were not significant. Here we applied a model comparison approach to evaluate the model fit in comparison to a baseline model, using standard log likelihood tests (e.g., Baayen et al., <xref ref-type="bibr" rid="B4">2008</xref>). This approach will also test the assumptions of different short- and long-range syntactic and semantic processes that we expect to be reflected by the parameters of the three different language models selected.</p>
<p>Such a statistical approach, however, is based on unaggregated data. As Lopukhina et al. (<xref ref-type="bibr" rid="B43">2021</xref>) pointed out, the predictability effects in such analyses are relatively small. For instance Kliegl et al. (<xref ref-type="bibr" rid="B38">2006</xref>; cf. <bold>Table 5</bold>) found that CCP can account for up to 0.38% of the variance in viewing times &#x02013; thus it is important to evaluate the usefulness of language models in highly powered samples. On the other hand, a smaller sample reflects a more typical experimental situation. The present study was designed to replicate and extend previous analyses of viewing time parameters using two independent eye-movement data sets, a very large sample of CCP and eye movement data, the PSC, and a sample that is more typical for eye movement experiments, the SRC.</p>
<p>In addition to a simple item-level analysis as a standard benchmark for visual word recognition models (Spieler and Balota, <xref ref-type="bibr" rid="B74">1997</xref>), that were applied more thoroughly in a previous set of analyses (Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>), we here applied Smith and Levy&#x00027;s (<xref ref-type="bibr" rid="B72">2013</xref>) generalized additive modeling approach with a logarithmic link function (but cf. Brothers and Kuperberg, <xref ref-type="bibr" rid="B10">2021</xref>). The computed GAMs rely on fixation-event-level viewing time parameters as the dependent variables. We used a standard set of baseline predictors for reading and lexical access, and then extended this baseline model by CCP- and/or language-model-based predictors for the present, last and next words. To test for reproducibility, our analyses will be based on the two eye-movement data sets that are most frequently used for testing models of eye-movement control: the EZ-reader model (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>) was tested with the SRC data set; and the SWIFT and the OB-1 reader models were used to predict viewing times in the PSC (Kliegl et al., <xref ref-type="bibr" rid="B37">2004</xref>; cf. Engbert et al., <xref ref-type="bibr" rid="B18">2005</xref>; Snell et al., <xref ref-type="bibr" rid="B73">2018</xref>). GAMs are non-linear extensions of the generalized linear models that allow predictors to be modeled as a sum of smooth functions and therefore allow better adaptations to curvilinear and wiggly predictor-criterion relationships (Wood, <xref ref-type="bibr" rid="B83">2017</xref>).</p>
<p>Different language models are expected to explain differential and independent proportions of variance in the viewing time parameters. While an n-gram model reflects short-range semantics, we expect it to be predictor of all viewing time measures (e.g., Boston et al., <xref ref-type="bibr" rid="B9">2008</xref>). A subsymbolic topic model that reflects long-range semantics should be preferred over the other language models in predicting GD and TVT and semantic integration into memory (Sereno et al., <xref ref-type="bibr" rid="B70">1992</xref>; Griffiths et al., <xref ref-type="bibr" rid="B26">2007</xref>), particularly when other forms of predictability are additionally taken into account (Wang et al., <xref ref-type="bibr" rid="B80">2010</xref>; Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>). Previous studies examining RNN models found effects on SFD and GD (e.g., Frank, <xref ref-type="bibr" rid="B21">2009</xref>; Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>). Thus, it is an open empirical question whether predict-based models do not only affect lexical access, but also late semantic integration. As these models are trained to predict the next word, they may be particularly useful to examine early lexical preprocessing of the next word.</p>
</sec>
</sec>
</sec>
<sec id="s2">
<title>Method</title>
<sec>
<title>Language Model Simulations</title>
<p>All language models were trained by <bold>corpora</bold> derived from movie and film subtitles.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> The English Subtitles training corpus consisted of 110 thousand films and movies that were used for document-level training of the topic models. We used the 128 million utterances as the sentence-level, in order to train the n-gram and RNN models in the English corpus, which in all consisted of 716 million tokens. The German corpus consisted of 7 thousand movies, 7 million utterances/sentences comprising of 54 million tokens.</p>
<p>Statistical <bold>n-gram</bold> models for words are defined by a sequence of <italic>n</italic> words, in which the probability of the <italic>n</italic><sup>th</sup> word depends on a Markov chain of the previous <italic>n</italic>-1 words (see, e.g., Chen and Goodman, <xref ref-type="bibr" rid="B12">1999</xref>; Manning and Sch&#x000FC;tze, <xref ref-type="bibr" rid="B46">1999</xref>). Here we set <italic>n</italic> = 3 and thus computed the conditional probability of a word <italic>w</italic><sub><italic>n</italic></sub>, given the two previous words (<italic>w</italic><sub><italic>n</italic>&#x02212;1</sub> &#x02026;<italic> w</italic><sub>1</sub>; Smith and Levy, <xref ref-type="bibr" rid="B72">2013</xref>).</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02026;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02026;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02026;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We used Kneser-Ney-smoothed 3-gram models, relying on the BerkleyLM implementation (Pauls and Klein, <xref ref-type="bibr" rid="B59">2011</xref>).<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> These models were trained by the subtitles corpora to capture lexical memory consolidation (cf. Hofmann et al., <xref ref-type="bibr" rid="B30">2018</xref>). For modeling lexical retrieval, we computed the conditional probabilities for the sentences presented in the SRC and the PSC data set (cf. below). Since n-gram models only rely on the most recent history for predicting the next word, they fail to account for longer-range phenomena and semantic coherence (see Biemann et al., <xref ref-type="bibr" rid="B7">2012</xref>).</p>
<p>For training the <bold>topic</bold> models, we used the procedure by Griffiths and Steyvers (2004), who infer per-topic word distributions and per-document topic distributions through a Gibbs sampling process. The empirically observable probability of a word <italic>w</italic> to occur in a document <italic>d</italic> is thus approximated by the sum of the products of the probabilities a word, given the respective topic <italic>z</italic>, and the topic, given the respective word:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02026;</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>*</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Therefore, words frequently co-occurring in the same documents receive a high probability in the same topic. We use Phan and Nguyen&#x00027;s (<xref ref-type="bibr" rid="B60">2007</xref>) Gibbs-LDA implementation<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> for training a latent dirichlet allocation (LDA) model with <italic>N</italic> = 200 topics (default values for &#x003B1; = 0.25 and &#x003B2; = 0.001; Blei et al., <xref ref-type="bibr" rid="B8">2003</xref>). The per-document topic distributions are trained in form of a topic-document matrix [<italic>p(z</italic><sub><italic>i</italic></sub><italic>|d)</italic>], allowing to classify documents by topical similarities, and used for inference of new (unseen) &#x0201C;documents&#x0201D; at retrieval.</p>
<p>For modeling lexical retrieval of the SRC and PSC text samples, we successively iterate over the words of the particular sentence and create a new LDA document representation <italic>d</italic> for each word at time <italic>i</italic> and its entire history of words in the same sentence:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02026;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In this case, <italic>d</italic> refers to the history of the current sentence including the current word <italic>w</italic><sub><italic>i</italic></sub>, where we are only interested in the probability of <italic>w</italic><sub><italic>i</italic></sub>. We here computed the probabilities of the current word <italic>w</italic><sub><italic>i</italic></sub> given its history as a mixture of its topical components (cf. Griffiths et al., <xref ref-type="bibr" rid="B26">2007</xref>, p. 231f), and thus address the topical matches of the present word with the preceding words in the sentence context.</p>
<p>For the <bold>RNN</bold> model simulations, we relied on the faster RNN-LM implementation<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref>, which can be trained on huge data sets and very large vocabularies (cf. Mikolov, <xref ref-type="bibr" rid="B50">2012</xref>). The input and target output units consist of so-called one-hot vectors with one entry for each word in the lexicon of this model. If the respective word is present, the entry corresponds to 1, while the entries remain 0 for all other words. At the input level, the entire sentence history is given word-by-word and the models objective is to predict the probability of the next word at the output level. Therefore, the connection weights of the input and output layer to the hidden layer are optimized. At model initialization, all weights are assigned randomly. As soon as the first word is presented to the input layer, the output probability of the respective word unit is compared to the actual word, and the larger the difference, the larger will be the connection weight change (i.e., backpropagation by a simple delta rule). When the second word of a sentence then serves as input, the state of the hidden layer after the first word is copied to a context layer (cf. <bold>Figure 2</bold> in Elman, <xref ref-type="bibr" rid="B16">1990</xref>). This (recurrent) context layer is used to inform the current prediction. Therefore, the RNN obtains a form of short-term memory (Mikolov, <xref ref-type="bibr" rid="B50">2012</xref>; cf. Mikolov et al., <xref ref-type="bibr" rid="B51">2013</xref>). We trained a model with 400 hidden units and used the hierarchical softmax provided by faster-RNN with a temperature of 0.6, using a sigmoid activation function for all layers. For computing lexical retrieval, we used the entire history of a sentence up to the current word and computed the probability for that particular word.</p>
</sec>
<sec>
<title>Cloze Completion and Eye Movement Data</title>
<p>The CCP and eye movement data of the SRC and the PSC were retrieved from Engelmann et al. (<xref ref-type="bibr" rid="B19">2013</xref>).<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref> The SRC data set contains incremental cloze task and eye movement data for 48 sentences and 536 words that were initially published by Schilling et al. (<xref ref-type="bibr" rid="B68">1998</xref>). The PSC data set provides the same data for 144 sentences and 1,138 words (Kliegl et al., <xref ref-type="bibr" rid="B37">2004</xref>, <xref ref-type="bibr" rid="B38">2006</xref>).</p>
<p>The sentence length of the PSC ranges from 5 to 11 words (<italic>M</italic> = 7.9; <italic>SD</italic> = 1.39) and from 8 to 14 words in the SRC (<italic>M</italic> = 11.17; <italic>SD</italic> = 1.36). As last-word probability cannot be computed for the first word in a sentence, and next-word probability cannot be computed for the last word of a sentence, we excluded fixation durations on the first and the last words of each sentence from analyses. Four words of the PSC (e.g., &#x0201C;Andend&#x000F6;rfern,&#x0201D; villages of the Andes) did not occur in the training corpus and were excluded from analyses. This resulted in the 440 target words for the SRC and the 846 target words for the PSC analyses. The respective participant sample sizes and number of sentences are summarized in <xref ref-type="table" rid="T1">Table 1</xref> (see Schilling et al., <xref ref-type="bibr" rid="B68">1998</xref>; Kliegl et al., <xref ref-type="bibr" rid="B37">2004</xref>, <xref ref-type="bibr" rid="B38">2006</xref>, for further details). <xref ref-type="table" rid="T2">Table 2</xref> shows example sentences, in which one type of predictability is higher than the other predictability scores. In general, CCP distributes the probability space across a much smaller number of potential completion candidates. Therefore, the mean probabilities are comparably high (SRC: <italic>p</italic> = 0.3; PSC: <italic>p</italic> = 0.2). The mean of the computed predictability scores, in contrast, provide 2&#x02013;3 leading zeros. Moreover, the computed predictability scores by far provide greater probability ranges.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Overview about cloze completion and eye movement (EM) data used for the present study.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Data set</bold></th>
<th valign="top" align="center"><bold>Sentences</bold></th>
<th valign="top" align="center"><bold>Targets</bold></th>
<th valign="top" align="center"><bold>Language</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Participants</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Rows of data in analysis</bold></th>
</tr>
<tr>
<th/>
<th/>
<th/>
<th/>
<th valign="top" align="center"><bold>CCP</bold></th>
<th valign="top" align="center"><bold>EM</bold></th>
<th valign="top" align="center"><bold>SFD</bold></th>
<th valign="top" align="center"><bold>GD</bold></th>
<th valign="top" align="center"><bold>TVT</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SRC</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">440</td>
<td valign="top" align="center">English</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">30</td>
<td valign="top" align="center">6,451</td>
<td valign="top" align="center">8,671</td>
<td valign="top" align="center">8,736</td>
</tr>
<tr>
<td valign="top" align="left">PSC</td>
<td valign="top" align="center">144</td>
<td valign="top" align="center">846</td>
<td valign="top" align="center">German</td>
<td valign="top" align="center">272</td>
<td valign="top" align="center">222</td>
<td valign="top" align="center">100,975</td>
<td valign="top" align="center">134,835</td>
<td valign="top" align="center">135,021</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Example sentences and the probabilities of the four types of predictability.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="center" colspan="5" style="border-bottom: thin solid #000000;"><bold>SRC</bold></th>
<th valign="top" align="center" colspan="5" style="border-bottom: thin solid #000000;"><bold>PSC</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Word</bold></th>
<th valign="top" align="center"><bold>CCP</bold></th>
<th valign="top" align="center"><bold>N-gram</bold></th>
<th valign="top" align="center"><bold>Topic</bold></th>
<th valign="top" align="center"><bold>RNN</bold></th>
<th valign="top" align="left"><bold>Word</bold></th>
<th valign="top" align="center"><bold>CCP</bold></th>
<th valign="top" align="center"><bold>N-gram</bold></th>
<th valign="top" align="center"><bold>Topic</bold></th>
<th valign="top" align="center"><bold>RNN</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Bill</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">1e-4</td>
<td valign="top" align="center">1e-3</td>
<td valign="top" align="center">2e-5</td>
<td valign="top" align="left">In</td>
<td valign="top" align="center">1e-2</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">3e-2</td>
<td valign="top" align="center">4e-3</td>
</tr>
<tr>
<td valign="top" align="left">complained</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">3e-6</td>
<td valign="top" align="center">1e-4</td>
<td valign="top" align="center">1e-6</td>
<td valign="top" align="left">der</td>
<td valign="top" align="center">7e-1</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="center">1e-2</td>
<td valign="top" align="center">1e-1</td>
</tr>
<tr>
<td valign="top" align="left">that</td>
<td valign="top" align="center">3e-1</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">6e-2</td>
<td valign="top" align="left">Klosterschule</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">2e-6</td>
<td valign="top" align="center">4e-5</td>
<td valign="top" align="center">4e-5</td>
</tr>
<tr>
<td valign="top" align="left">the</td>
<td valign="top" align="center">2e-1</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">1e-2</td>
<td valign="top" align="left">herrschen</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">1e-6</td>
<td valign="top" align="center">8e-4</td>
<td valign="top" align="center">3e-4</td>
</tr>
<tr>
<td valign="top" align="left">magazine</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">1e-4</td>
<td valign="top" align="center">3e-4</td>
<td valign="top" align="center">3e-5</td>
<td valign="top" align="left"><bold>Schwester</bold></td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">5e-5</td>
<td valign="top" align="center"><bold>4e-2</bold></td>
<td valign="top" align="center">1e-9</td>
</tr>
<tr>
<td valign="top" align="left">included</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">3e-7</td>
<td valign="top" align="center">2e-4</td>
<td valign="top" align="center">1e-5</td>
<td valign="top" align="left">Agathe</td>
<td valign="top" align="center">1e-2</td>
<td valign="top" align="center">7e-7</td>
<td valign="top" align="center">1e-4</td>
<td valign="top" align="center">1e-8</td>
</tr>
<tr>
<td valign="top" align="left">more</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">4e-4</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">6e-4</td>
<td valign="top" align="left">und</td>
<td valign="top" align="center">9e-1</td>
<td valign="top" align="center">5e-3</td>
<td valign="top" align="center">4e-3</td>
<td valign="top" align="center">4e-2</td>
</tr>
<tr>
<td valign="top" align="left">adds</td>
<td valign="top" align="center"><bold>4e-1</bold></td>
<td valign="top" align="center">6e-7</td>
<td valign="top" align="center">2e-5</td>
<td valign="top" align="center">1e-8</td>
<td valign="top" align="left">Schwester</td>
<td valign="top" align="center">5e-1</td>
<td valign="top" align="center">1e-4</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">8e-5</td>
</tr>
<tr>
<td valign="top" align="left">than</td>
<td valign="top" align="center">9e-1</td>
<td valign="top" align="center">2e-4</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">1e-3</td>
<td valign="top" align="left">Maria</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="center">5e-4</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">2e-3</td>
</tr>
<tr>
<td valign="top" align="left">articles</td>
<td valign="top" align="center">8e-1</td>
<td valign="top" align="center">4e-6</td>
<td valign="top" align="center">5e-5</td>
<td valign="top" align="center">4e-6</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">The</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">6e-4</td>
<td valign="top" align="center">3e-3</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="left">Er</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">1e-2</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">2e-2</td>
</tr>
<tr>
<td valign="top" align="left">drunk</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">6e-5</td>
<td valign="top" align="center">9e-4</td>
<td valign="top" align="center">2e-5</td>
<td valign="top" align="left">h&#x000E4;tte</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">5e-3</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">2e-3</td>
</tr>
<tr>
<td valign="top" align="left"><bold>driver</bold></td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center"><bold>2e-2</bold></td>
<td valign="top" align="center">2e-4</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="left">nicht</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">4e-2</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">3e-2</td>
</tr>
<tr>
<td valign="top" align="left">lost</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">6e-6</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">1e-5</td>
<td valign="top" align="left">auch</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">3e-4</td>
<td valign="top" align="center">9e-4</td>
<td valign="top" align="center">4e-3</td>
</tr>
<tr>
<td valign="top" align="left">control</td>
<td valign="top" align="center">4e-1</td>
<td valign="top" align="center">4e-1</td>
<td valign="top" align="center">3e-3</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="left">noch</td>
<td valign="top" align="center">7e-1</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="center">1e-3</td>
<td valign="top" align="center">3e-2</td>
</tr>
<tr>
<td valign="top" align="left">crashed</td>
<td valign="top" align="center">5e-2</td>
<td valign="top" align="center">9e-8</td>
<td valign="top" align="center">1e-4</td>
<td valign="top" align="center">4e-7</td>
<td valign="top" align="left">am</td>
<td valign="top" align="center">1e-2</td>
<td valign="top" align="center">3e-3</td>
<td valign="top" align="center">3e-3</td>
<td valign="top" align="center">5e-3</td>
</tr>
<tr>
<td valign="top" align="left">into</td>
<td valign="top" align="center">4e-1</td>
<td valign="top" align="center">9e-2</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="left"><bold>Telefon</bold></td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">4e-3</td>
<td valign="top" align="center">4e-3</td>
<td valign="top" align="center"><bold>4e-2</bold></td>
</tr>
<tr>
<td valign="top" align="left">a</td>
<td valign="top" align="center">6e-1</td>
<td valign="top" align="center">2e-1</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="left">n&#x000F6;rgeln</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">8e-8</td>
<td valign="top" align="center">5e-5</td>
<td valign="top" align="center">6e-8</td>
</tr>
<tr>
<td valign="top" align="left">street</td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">9e-4</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">4e-3</td>
<td valign="top" align="left">sollen</td>
<td valign="top" align="center">7e-1</td>
<td valign="top" align="center">2e-4</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">6e-4</td>
</tr>
<tr>
<td valign="top" align="left">sign</td>
<td valign="top" align="center">6e-1</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">3e-3</td>
<td valign="top" align="center">6e-4</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">and</td>
<td valign="top" align="center">8e-1</td>
<td valign="top" align="center">1e-3</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">3e-3</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">died</td>
<td valign="top" align="center">7e-1</td>
<td valign="top" align="center">3e-5</td>
<td valign="top" align="center">2e-3</td>
<td valign="top" align="center">4e-4</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left"><italic>M</italic></td>
<td valign="top" align="center">3e-1</td>
<td valign="top" align="center">5e-2</td>
<td valign="top" align="center">1e-3</td>
<td valign="top" align="center">3e-2</td>
<td valign="top" align="left"><italic>M</italic></td>
<td valign="top" align="center">2e-1</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">8e-3</td>
<td valign="top" align="center">1e-2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>SD</italic></td>
<td valign="top" align="center">4e-1</td>
<td valign="top" align="center">1e-1</td>
<td valign="top" align="center">1e-3</td>
<td valign="top" align="center">7e-2</td>
<td valign="top" align="left"><italic>SD</italic></td>
<td valign="top" align="center">3e-1</td>
<td valign="top" align="center">7e-2</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">3e-2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Min</italic></td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">1e-9</td>
<td valign="top" align="center">3e-6</td>
<td valign="top" align="center">2e-10</td>
<td valign="top" align="left"><italic>Min</italic></td>
<td valign="top" align="center">6e-3</td>
<td valign="top" align="center">1e-10</td>
<td valign="top" align="center">2e-6</td>
<td valign="top" align="center">4e-13</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Max</italic></td>
<td valign="top" align="center">1e&#x0002B;0</td>
<td valign="top" align="center">1e&#x0002B;0</td>
<td valign="top" align="center">2e-2</td>
<td valign="top" align="center">5e-1</td>
<td valign="top" align="left"><italic>Max</italic></td>
<td valign="top" align="center">1e&#x0002B;0</td>
<td valign="top" align="center">9e-1</td>
<td valign="top" align="center">2e-1</td>
<td valign="top" align="center">5e-1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Examples sentences were selected to illustrate one case, in which one type of predictability is particularly high (bold). Translations (PSC): In the convent school, nun Agathe, and nun Maria rule (upper). He should not have moaned at the telephone, as well (lower sentence)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>To compute SRC-based <italic>CCP</italic> scores comparable to the PSC (Kliegl et al., <xref ref-type="bibr" rid="B38">2006</xref>), we used the empirical cloze completion probabilities (<italic>ccp</italic>) and logit-transformed them (<italic>CCP</italic> in formula 4). Because Kliegl&#x00027;s et al. (<xref ref-type="bibr" rid="B37">2004</xref>) sample was based on 83 complete predictability protocols, cloze completion probabilities of 0 and 1 were replaced by 1/(2<sup>&#x0002A;</sup>83) and 1&#x02212;[1/(2<sup>&#x0002A;</sup>83)] for the SRC, to obtain the same extreme values.</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>C</mml:mi><mml:mi>C</mml:mi><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:msup><mml:mrow><mml:mn>5</mml:mn></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mtext>&#x000A0;</mml:mtext><mml:mo class="qopname">log</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Since lexical processing efficiency varies with landing position of the eye within a word (e.g., O&#x00027;Regan and Jacobs, <xref ref-type="bibr" rid="B56">1992</xref>; Vitu et al., <xref ref-type="bibr" rid="B78">2001</xref>), we computed relative landing positions by dividing the landing letter by the word length. The optimal viewing position is usually slightly left to the middle of the word, granting optimal visual processing of the word (e.g., Nuthmann et al., <xref ref-type="bibr" rid="B54">2005</xref>). Therefore, we will use the landing position as a covariate to partial out variance explainable by suboptimal landing positions (cf. e.g., Vitu et al., <xref ref-type="bibr" rid="B78">2001</xref>; Kliegl et al., <xref ref-type="bibr" rid="B38">2006</xref>; Pynte et al., <xref ref-type="bibr" rid="B62">2008b</xref>). For all eye movement measures, we excluded fixation durations below 70 ms (e.g., Radach et al., <xref ref-type="bibr" rid="B63">2013</xref>). The upper cutoff was defined by examining the data distributions and excluding the range in which only a few trials remained for analyses. We excluded durations 800 ms or greater for SFD (21 fixation durations for SRC and 13 for PSC), 1,200 ms for GD (12 for SRC and 0 for PSC), and 1,600 ms for TVT analyses (7 for SRC and 0 for PSC). This resulted in the row numbers used for the respective analyses given in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
</sec>
<sec>
<title>Data Analysis</title>
<p>First, we calculated simple linear item-level correlations between the predictor variables and the mean SFD, GD and TVT data (see <xref ref-type="table" rid="T3">Table 3</xref>). In addition to the logit-transformed CCPs and the log<sub>10</sub>-transformed language model probabilities (Kliegl et al., <xref ref-type="bibr" rid="B38">2006</xref>; Smith and Levy, <xref ref-type="bibr" rid="B72">2013</xref>), we also explored the correlations of the non-transformed probability values with SFD, GD and TVT data, respectively: In the SRC data set, CCP provided correlations of &#x02212;0.28, &#x02212;0.33, and &#x02212;0.39; n-gram models of &#x02212;0.11, &#x02212;0.16 and &#x02212;0.21; topic models of &#x02212;0.35, &#x02212;0.47 and &#x02212;0.52; and RNN models provided correlations of &#x02212;0.16, &#x02212;0.23, and &#x02212;0.25, respectively. In the PSC data set, the SFD, GD and TVT correlations with CCP were &#x02212;0.20, &#x02212;0.26, &#x02212;0.31; those of n-gram models were &#x02212;0.16, &#x02212;0.18 and &#x02212;0.19; topics models provided correlations of &#x02212;0.19, &#x02212;0.18 and &#x02212;0.17; and RNN models of &#x02212;0.19, &#x02212;0.21, &#x02212;0.22. In sum, the transformed probabilities always provided higher correlations with all fixation durations than the untransformed probabilities (cf. <xref ref-type="table" rid="T3">Table 3</xref>). Therefore, the present analyses focus on transformed values.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Correlations between word properties for the SRC (below diagonal) and PSC (above diagonal), and the item-level means of the SFD, GD, and TVT data of the present word.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>1</bold></th>
<th valign="top" align="center"><bold>2</bold>.</th>
<th valign="top" align="center"><bold>3</bold></th>
<th valign="top" align="center"><bold>4</bold></th>
<th valign="top" align="center"><bold>5</bold></th>
<th valign="top" align="center"><bold>6</bold></th>
<th valign="top" align="center"><bold>7</bold></th>
<th valign="top" align="center"><bold>8</bold></th>
<th valign="top" align="center"><bold>9</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1. Length</td>
<td/>
<td valign="top" align="center">&#x02212;0.62</td>
<td valign="top" align="center">&#x02212;0.40</td>
<td valign="top" align="center">&#x02212;0.47</td>
<td valign="top" align="center">&#x02212;0.46</td>
<td valign="top" align="center">&#x02212;0.51</td>
<td valign="top" align="center">0.28</td>
<td valign="top" align="center">0.62</td>
<td valign="top" align="center">0.57</td>
</tr>
<tr>
<td valign="top" align="left">2. Frequency</td>
<td valign="top" align="center">&#x02212;0.76</td>
<td/>
<td valign="top" align="center">0.52</td>
<td valign="top" align="center">0.71</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.75</td>
<td valign="top" align="center">&#x02212;0.35</td>
<td valign="top" align="center">&#x02212;0.49</td>
<td valign="top" align="center">&#x02212;0.50</td>
</tr>
<tr>
<td valign="top" align="left">3. CCP</td>
<td valign="top" align="center">&#x02212;0.48</td>
<td valign="top" align="center">0.58</td>
<td/>
<td valign="top" align="center">0.56</td>
<td valign="top" align="center">0.36</td>
<td valign="top" align="center">0.56</td>
<td valign="top" align="center"><bold>&#x02212;0.27</bold></td>
<td valign="top" align="center"><bold>&#x02212;0.34</bold></td>
<td valign="top" align="center"><bold>&#x02212;0.40</bold></td>
</tr>
<tr>
<td valign="top" align="left">4. N&#x02013;gram</td>
<td valign="top" align="center">&#x02212;0.58</td>
<td valign="top" align="center">0.74</td>
<td valign="top" align="center">0.63</td>
<td/>
<td valign="top" align="center">0.61</td>
<td valign="top" align="center">0.79</td>
<td valign="top" align="center"><italic><bold>&#x02212;0.41</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.49</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.51</bold></italic></td>
</tr>
<tr>
<td valign="top" align="left">5. Topic</td>
<td valign="top" align="center">&#x02212;0.67</td>
<td valign="top" align="center">0.80</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.69</td>
<td/>
<td valign="top" align="center">0.61</td>
<td valign="top" align="center"><italic><bold>&#x02212;0.35</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.43</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.41</bold></italic></td>
</tr>
<tr>
<td valign="top" align="left">6. RNN</td>
<td valign="top" align="center">&#x02212;0.65</td>
<td valign="top" align="center">0.81</td>
<td valign="top" align="center">0.58</td>
<td valign="top" align="center">0.84</td>
<td valign="top" align="center">0.75</td>
<td/>
<td valign="top" align="center"><italic><bold>&#x02212;0.47</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.51</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.53</bold></italic></td>
</tr>
<tr>
<td valign="top" align="left">7. SFD</td>
<td valign="top" align="center">0.35</td>
<td valign="top" align="center">&#x02212;0.50</td>
<td valign="top" align="center"><bold>&#x02212;0.33</bold></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.39</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.40</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.44</bold></italic></td>
<td/>
<td valign="top" align="center">0.81</td>
<td valign="top" align="center">0.79</td>
</tr>
<tr>
<td valign="top" align="left">8. GD</td>
<td valign="top" align="center">0.54</td>
<td valign="top" align="center">&#x02212;0.62</td>
<td valign="top" align="center"><bold>&#x02212;0.38</bold></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.51</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.55</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.55</bold></italic></td>
<td valign="top" align="center">0.86</td>
<td/>
<td valign="top" align="center">0.95</td>
</tr>
<tr>
<td valign="top" align="left">9. TVT</td>
<td valign="top" align="center">0.61</td>
<td valign="top" align="center">&#x02212;0.66</td>
<td valign="top" align="center"><bold>&#x02212;0.44</bold></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.55</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.58</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;0.60</bold></italic></td>
<td valign="top" align="center">0.78</td>
<td valign="top" align="center">0.90</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Highlighting was used to illustrate that the language models (italics and bold) provide always larger correlations with the three viewing time measures than CCP (bold) (see below for discussions)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>For non-linear fixation-event based analyses of the non-aggregated eye-movement data, we relied on GAMs using thin plate regression splines from the mgcv-package (version 1.8) in R (Hastie and Tibshirani, <xref ref-type="bibr" rid="B27">1990</xref>; Wood, <xref ref-type="bibr" rid="B83">2017</xref>). As several models of eye-movement control rely on the gamma distribution (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>; Engbert et al., <xref ref-type="bibr" rid="B18">2005</xref>), we here also used gamma functions with a logarithmic link function (cf. Smith and Levy, <xref ref-type="bibr" rid="B72">2013</xref>). GAMs have the advantage to model non-linear smooth functions, i.e., the GAM aims to find the best value for the smoothing parameter in an iterative process. Because smooth functions are modeled by additional parameters, the amount of smoothness is penalized in GAMs, i.e., the model aims to reduce the number of parameters of the smooth function and thus to avoid overfitting. The effective degrees of freedom (edf) parameter describes the resulting amount of smoothness (see <xref ref-type="table" rid="T8">Table 8</xref> below). Of note is, that an edf of 1 is present if the model penalized the smooth term to a linear relationship. Edf&#x00027;s close to 0 indicate that the predictor has zero wiggliness and can be interpreted to be penalized out of the model (Wood, <xref ref-type="bibr" rid="B83">2017</xref>). Though Baayen (<xref ref-type="bibr" rid="B3">2010</xref>) suggested that word frequency can be seen as a collector variable that actually also contains variance from contextual word features (cf. Ong and Kliegl, <xref ref-type="bibr" rid="B55">2008</xref>), our baseline GAMs contained single-word properties. We computed a baseline GAM consisting of the length and frequency of the present, last and next word as predictors (cf. Kliegl et al., <xref ref-type="bibr" rid="B38">2006</xref>). To reduce the correlations between the language models trained by the subtitles corpora and the frequency measures, word frequency estimates were taken from the Leipzig corpora collection<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref> The English corpus consisted of 105 million unique sentences and 1.8 billion words, and the German corpus consisted of 70 million unique sentences and 1.1 billion words (Goldhahn et al., <xref ref-type="bibr" rid="B25">2012</xref>). We used Leipzig word frequency classes that relate the frequency of each word to the frequency of the most frequent word using the definition that the most common word is 2<sup>class</sup> more frequent than the word of which the frequency is given (&#x0201C;der&#x0201D; in German and &#x0201C;the&#x0201D; in English; e.g., Hofmann et al., <xref ref-type="bibr" rid="B31">2011</xref>, <xref ref-type="bibr" rid="B30">2018</xref>). Moreover, we inserted landing site into the baseline GAM (e.g., Pynte et al., <xref ref-type="bibr" rid="B62">2008b</xref>), to absorb variance resulting from mislocated fixations.</p>
<p>We added the different types of predictability of the present word to the baseline model and tested whether the resulting GAM performs better than the baseline GAM (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>). Then we successively added the predictability scores of the last and next words and tested whether the novel GAM performs better than the preceding GAM. Finally, we also tested whether a GAM model including all language-model-based predictors provides better predictions than the GAM including CCP scores (Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>).</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Generalized additive models (GCV, <italic>R</italic><sup>2</sup>) for single-fixation duration (SFD) and &#x003C7;<sup>2</sup> tests (df) against the previous model for significant increments in explained variance (<sup>&#x0002A;</sup><italic>p</italic> &#x0003C; 0.05).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>SRC</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>PSC</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="left"><bold>GCV</bold></th>
<th valign="top" align="left"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="left"><bold>Deviance (df)</bold></th>
<th valign="top" align="left"><bold>GCV</bold></th>
<th valign="top" align="left"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="left"><bold>Deviance (df)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="left">0.1273</td>
<td valign="top" align="left">4.17</td>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="left">0.0969</td>
<td valign="top" align="left">3.99</td>
<td valign="top" align="left">Baseline</td>
</tr>
<tr>
<td valign="top" align="left"><bold>CCP</bold></td>
<td valign="top" align="left"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="left">0.1273</td>
<td valign="top" align="left">0.02</td>
<td valign="top" align="left">0.4 (1.9)</td>
<td valign="top" align="left">0.0968</td>
<td valign="top" align="left">0.1</td>
<td valign="top" align="left">10.7 (8)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="left"><bold>0.1272</bold></td>
<td valign="top" align="left"><bold>0.06</bold></td>
<td valign="top" align="left"><bold>1.1 (2.7)&#x0002A;</bold></td>
<td valign="top" align="left"><bold>0.0967</bold></td>
<td valign="top" align="left"><bold>0.05</bold></td>
<td valign="top" align="left"><bold>7.5 (9.1)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="left">0.1271</td>
<td valign="top" align="left">0.02</td>
<td valign="top" align="left">0.5 (1.1)</td>
<td valign="top" align="left">0.0964</td>
<td valign="top" align="left">0.31</td>
<td valign="top" align="left">36.4 (9.5)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold></td>
<td valign="top" align="left"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="left"><bold>0.127</bold></td>
<td valign="top" align="left"><bold>0.29</bold></td>
<td valign="top" align="left"><bold>3.1 (5.6)&#x0002A;</bold></td>
<td valign="top" align="left"><bold>0.0966</bold></td>
<td valign="top" align="left"><bold>0.27</bold></td>
<td valign="top" align="left"><bold>33 (9.1)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="left"><bold>0.1265</bold></td>
<td valign="top" align="left"><bold>0.44</bold></td>
<td valign="top" align="left"><bold>5.5 (9.4)&#x0002A;</bold></td>
<td valign="top" align="left"><bold>0.0964</bold></td>
<td valign="top" align="left"><bold>0.14</bold></td>
<td valign="top" align="left"><bold>15.5 (8.7)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="left">0.1265</td>
<td valign="top" align="left">&#x02212;0.02</td>
<td valign="top" align="left">0 (0.8)</td>
<td valign="top" align="left">0.0963</td>
<td valign="top" align="left">0.07</td>
<td valign="top" align="left">9.6 (8.5)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Topic</bold></td>
<td valign="top" align="left"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="left">0.1272</td>
<td valign="top" align="left">0.12</td>
<td valign="top" align="left">1.4 (4.6)</td>
<td valign="top" align="left">0.0967</td>
<td valign="top" align="left">0.21</td>
<td valign="top" align="left">23.1 (8.4)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="left">0.1272</td>
<td valign="top" align="left">0.04</td>
<td valign="top" align="left">1.1 (5.8)</td>
<td valign="top" align="left">0.0963</td>
<td valign="top" align="left">0.28</td>
<td valign="top" align="left">33.8 (9.2)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="left">0.1272</td>
<td valign="top" align="left">0.09</td>
<td valign="top" align="left">1.3 (4.4)</td>
<td valign="top" align="left">0.0963</td>
<td valign="top" align="left">0.07</td>
<td valign="top" align="left">9.6 (9.2)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>RNN</bold></td>
<td valign="top" align="left"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="left"><bold>0.1271</bold></td>
<td valign="top" align="left"><bold>0.16</bold></td>
<td valign="top" align="left"><bold>1.6 (1.2)&#x0002A;</bold></td>
<td valign="top" align="left"><bold>0.0966</bold></td>
<td valign="top" align="left"><bold>0.27</bold></td>
<td valign="top" align="left"><bold>31.3 (9.6)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="left">0.1271</td>
<td valign="top" align="left">&#x02212;0.02</td>
<td valign="top" align="left">0 (0.9)</td>
<td valign="top" align="left">0.0966</td>
<td valign="top" align="left">0.03</td>
<td valign="top" align="left">3.7 (8.7)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="left"><bold>0.1269</bold></td>
<td valign="top" align="left"><bold>0.27</bold></td>
<td valign="top" align="left"><bold>3.6 (11)&#x0002A;</bold></td>
<td valign="top" align="left"><bold>0.0964</bold></td>
<td valign="top" align="left"><bold>0.16</bold></td>
<td valign="top" align="left"><bold>18 (8.7)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold> <bold>&#x0002B;</bold> <bold>Topic</bold> <bold>&#x0002B;</bold> <bold>RNN</bold></td>
<td valign="top" align="left"><bold>Full CCP model</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">(Present &#x0002B; last &#x0002B; next)</td>
<td valign="top" align="left"><bold>0.1262</bold></td>
<td valign="top" align="left"><bold>1.08</bold></td>
<td valign="top" align="left"><bold>13 (33.6)&#x0002A;</bold></td>
<td valign="top" align="left"><bold>0.0957</bold></td>
<td valign="top" align="left"><bold>0.69</bold></td>
<td valign="top" align="left"><bold>81.5 (49.1)&#x0002A;</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Consistent GAM model improvements in both data sets are marked bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Generalized additive models (GCV, <italic>R</italic><sup>2</sup>) for gaze duration (GD) and &#x003C7;<sup>2</sup> tests (df) against the previous model for significant increments in explained variance (&#x0002A;<italic>p</italic> &#x0003C; 0.05).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>SRC</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>PSC</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>GCV</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance (df)</bold></th>
<th valign="top" align="center"><bold>GCV</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance (df)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="center">0.1656</td>
<td valign="top" align="center">8.19</td>
<td valign="top" align="center">Baseline</td>
<td valign="top" align="center">0.145</td>
<td valign="top" align="center">10.52</td>
<td valign="top" align="center">Baseline</td>
</tr>
<tr>
<td valign="top" align="left"><bold>CCP</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center">0.1656</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.5 (1.6)</td>
<td valign="top" align="center">0.1448</td>
<td valign="top" align="center">0.09</td>
<td valign="top" align="center">31.2 (8.3)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center"><bold>0.1654</bold></td>
<td valign="top" align="center"><bold>0.14</bold></td>
<td valign="top" align="center"><bold>5 (11.1)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1447</bold></td>
<td valign="top" align="center"><bold>0.05</bold></td>
<td valign="top" align="center"><bold>14.2 (9)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center"><bold>0.1652</bold></td>
<td valign="top" align="center"><bold>0.08</bold></td>
<td valign="top" align="center"><bold>2 (0.6)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1444</bold></td>
<td valign="top" align="center"><bold>0.17</bold></td>
<td valign="top" align="center"><bold>45.8 (9.4)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center"><bold>0.1651</bold></td>
<td valign="top" align="center"><bold>0.32</bold></td>
<td valign="top" align="center"><bold>4.3 (0.9)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1446</bold></td>
<td valign="top" align="center"><bold>0.24</bold></td>
<td valign="top" align="center"><bold>66.8 (9)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center"><bold>0.1648</bold></td>
<td valign="top" align="center"><bold>0.14</bold></td>
<td valign="top" align="center"><bold>3.8 (5.2)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1443</bold></td>
<td valign="top" align="center"><bold>0.13</bold></td>
<td valign="top" align="center"><bold>34.1 (8.8)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center">0.1648</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">1.1 (5.7)</td>
<td valign="top" align="center">0.1443</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">13.5 (8.7)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Topic</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center">0.1657</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">&#x02212;0.6 (0.6)</td>
<td valign="top" align="center">0.1448</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">38.1 (7.7)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center">0.1656</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">2.4 (6.9)</td>
<td valign="top" align="center">0.1444</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">47 (9.1)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center">0.1656</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0.1 (1.2)</td>
<td valign="top" align="center">0.1444</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">10.2 (9)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>RNN</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center"><bold>0.1653</bold></td>
<td valign="top" align="center"><bold>0.21</bold></td>
<td valign="top" align="center"><bold>4 (5.3)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1446</bold></td>
<td valign="top" align="center"><bold>0.22</bold></td>
<td valign="top" align="center"><bold>64.1 (8.9)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center">0.1652</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">2.2 (5.7)</td>
<td valign="top" align="center">0.1446</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">6.6 (7.9)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center"><bold>0.1651</bold></td>
<td valign="top" align="center"><bold>0.09</bold></td>
<td valign="top" align="center"><bold>2.4 (4.8)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1444</bold></td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center"><bold>26 (7.8)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold> <bold>&#x0002B;</bold> <bold>Topic</bold> <bold>&#x0002B;</bold> <bold>RNN</bold></td>
<td valign="top" align="center"><bold>Full CCP model</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">(Present &#x0002B; last &#x0002B; next)</td>
<td valign="top" align="center"><bold>0.1644</bold></td>
<td valign="top" align="center"><bold>0.67</bold></td>
<td valign="top" align="center"><bold>14.1 (28.2)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1434</bold></td>
<td valign="top" align="center"><bold>0.55</bold></td>
<td valign="top" align="center"><bold>145.8 (52)&#x0002A;</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Consistent GAM model improvements in both data sets are marked bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Generalized additive models (GCV, <italic>R</italic><sup>2</sup>) for total viewing time (TVT) and &#x003C7;<sup>2</sup> tests (df) against the previous model for significant increments in explained variance (&#x0002A;<italic>p</italic> &#x0003C; 0.05).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>SRC</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>PSC</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>GCV</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance (df)</bold></th>
<th valign="top" align="center"><bold>GCV</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance (df)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Baseline</td>
<td valign="top" align="center">0.1933</td>
<td valign="top" align="center">9.73</td>
<td valign="top" align="center">Baseline</td>
<td valign="top" align="center">0.1952</td>
<td valign="top" align="center">9.94</td>
<td valign="top" align="center">Baseline</td>
</tr>
<tr>
<td valign="top" align="left"><bold>CCP</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center"><bold>0.1931</bold></td>
<td valign="top" align="center"><bold>0.13</bold></td>
<td valign="top" align="center"><bold>2.8 (2.2)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1943</bold></td>
<td valign="top" align="center"><bold>0.32</bold></td>
<td valign="top" align="center"><bold>134.4 (8.3)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center">0.1931</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.6 (1.9)</td>
<td valign="top" align="center">0.194</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">42.8 (9.3)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center"><bold>0.1929</bold></td>
<td valign="top" align="center"><bold>0.14</bold></td>
<td valign="top" align="center"><bold>4.2 (7.8)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1938</bold></td>
<td valign="top" align="center"><bold>0.09</bold></td>
<td valign="top" align="center"><bold>32.6 (9.2)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center"><bold>0.1926</bold></td>
<td valign="top" align="center"><bold>0.39</bold></td>
<td valign="top" align="center"><bold>8.8 (7.2)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1942</bold></td>
<td valign="top" align="center"><bold>0.33</bold></td>
<td valign="top" align="center"><bold>139.6 (8.9)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center"><bold>0.1925</bold></td>
<td valign="top" align="center"><bold>0.09</bold></td>
<td valign="top" align="center"><bold>2.7 (5.5)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1937</bold></td>
<td valign="top" align="center"><bold>0.23</bold></td>
<td valign="top" align="center"><bold>81.8 (8.9)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center">0.1925</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">1.7 (5.6)</td>
<td valign="top" align="center">0.1935</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">26.4 (8.8)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Topic</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center"><bold>0.1932</bold></td>
<td valign="top" align="center"><bold>0.15</bold></td>
<td valign="top" align="center"><bold>4.1 (7.9)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1949</bold></td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center"><bold>48.2 (8.8)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center">0.1932</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">2.1 (8.1)</td>
<td valign="top" align="center">0.1945</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">56.8 (8.8)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center">0.1933</td>
<td valign="top" align="center">&#x02212;0.01</td>
<td valign="top" align="center">0 (0.9)</td>
<td valign="top" align="center">0.1944</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">17.1 (9.3)&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>RNN</bold></td>
<td valign="top" align="center"><bold>Baseline</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Baseline &#x0002B; present</td>
<td valign="top" align="center"><bold>0.1926</bold></td>
<td valign="top" align="center"><bold>0.33</bold></td>
<td valign="top" align="center"><bold>6.1 (0.2)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1942</bold></td>
<td valign="top" align="center"><bold>0.34</bold></td>
<td valign="top" align="center"><bold>138.2 (8.2)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Last</td>
<td valign="top" align="center"><bold>0.1925</bold></td>
<td valign="top" align="center"><bold>0.12</bold></td>
<td valign="top" align="center"><bold>3.4 (6.9)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.194</bold></td>
<td valign="top" align="center"><bold>0.1</bold></td>
<td valign="top" align="center"><bold>32.5 (8.7)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x0002B; Next</td>
<td valign="top" align="center"><bold>0.1923</bold></td>
<td valign="top" align="center"><bold>0.16</bold></td>
<td valign="top" align="center"><bold>5.4 (11.5)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1939</bold></td>
<td valign="top" align="center"><bold>0.04</bold></td>
<td valign="top" align="center"><bold>18.8 (8.8)&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold> <bold>&#x0002B;</bold> <bold>Topic</bold> <bold>&#x0002B;</bold> <bold>RNN</bold></td>
<td valign="top" align="center"><bold>Full CCP model</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">(Present &#x0002B; last &#x0002B; next)</td>
<td valign="top" align="center"><bold>0.1917</bold></td>
<td valign="top" align="center"><bold>0.64</bold></td>
<td valign="top" align="center"><bold>16.6 (19.9)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>0.1924</bold></td>
<td valign="top" align="center"><bold>0.52</bold></td>
<td valign="top" align="center"><bold>206 (53.5)&#x0002A;</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Consistent GAM model improvements in both data sets are marked bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>As model benchmarks, we report the generalized cross-validation score (GCV). This is an estimate of the mean prediction error based on a leave-one-out cross validation process. Better models provide a lower GCV (Wood, <xref ref-type="bibr" rid="B83">2017</xref>). We also report the difference in the percentage of explained variance relative to the preceding or baseline model (%&#x00394;<italic>R</italic><sup>2</sup>, derived from adjusted <italic>R</italic><sup>2</sup>-values). We also tested whether a subsequent GAM provides significantly greater log likelihood than the previous model using &#x003C7;<sup>2</sup>-tests (anova function in R; <italic>p</italic> = 0.05, cf. <xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>). To provide a measure that can be interpreted in a similar fashion as the residual sum of squares for linear models, we further report the difference of the deviance of the last and the present model (e.g., Wood, <xref ref-type="bibr" rid="B83">2017</xref>). If this term is negative, this indicates that the latter model provides a better account for the data. We also report the difference of the degrees of freedom (df) of the models to be compared. Negative values indicate that the previous GAM is more complex.</p>
<p>In the second set of GAM comparisons, we compare the performance of each single language model to the performance of the CCP. For this purpose, we use the predictability scores for all positions (present, last, and next word), and compared each language model to CCP (see <xref ref-type="table" rid="T7">Table 7</xref> below). To examine the predictors themselves and to be able to directly compare the contribution of human-generated and model-based predictors in explaining variance in viewing times, we also generated a final GAM model for each viewing time parameter comprising all types of predictability. For these models we finally report the <italic>F</italic>-values, effective degrees of freedom and the levels of significance (cf. <xref ref-type="table" rid="T8">Table 8</xref>). We evaluate the functional forms of the effects that are most reproducible across all analyses in the final model, while setting all non-inspected variables to their mean value (cf. <xref ref-type="fig" rid="F1">Figures 1</xref>&#x02013;<xref ref-type="fig" rid="F3">3</xref> below).</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>&#x003C7;<sup>2</sup> tests whether the respective language model performed better than CCP.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="6" style="border-bottom: thin solid #000000;"><bold>SRC</bold></th>
<th valign="top" align="center" colspan="6" style="border-bottom: thin solid #000000;"><bold>PSC</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>SFD</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>GD</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>TVT</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>SFD</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>GD</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>TVT</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance</bold></th>
<th valign="top" align="center"><bold>%&#x00394;<italic>R<sup><bold>2</bold></sup></italic></bold></th>
<th valign="top" align="center"><bold>Deviance</bold></th>
</tr>
<tr>
<th/>
<th/>
<th valign="top" align="center"><bold>(df)</bold></th>
<th/>
<th valign="top" align="center"><bold>(df)</bold></th>
<th/>
<th valign="top" align="center"><bold>(df)</bold></th>
<th/>
<th valign="top" align="center"><bold>(df)</bold></th>
<th/>
<th valign="top" align="center"><bold>(df)</bold></th>
<th/>
<th valign="top" align="center"><bold>(df)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">N-gram</td>
<td valign="top" align="center"><bold>0.6</bold></td>
<td valign="top" align="center"><bold>6.5 (10.1)&#x0002A;</bold></td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">1.8 (&#x02212;1.4)</td>
<td valign="top" align="center"><bold>0.21</bold></td>
<td valign="top" align="center"><bold>5.6 (6.5)&#x0002A;</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">3.6 (&#x02212;0.2)</td>
<td valign="top" align="center">0.09</td>
<td valign="top" align="center">23.2 (&#x02212;0.2)</td>
<td valign="top" align="center">0.09</td>
<td valign="top" align="center">38 (&#x02212;0.2)</td>
</tr>
<tr>
<td valign="top" align="left">Topic</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">1.8 (9)</td>
<td valign="top" align="center"><italic><bold>&#x02212;0.17</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;5.7 (&#x02212;4.6)&#x0002A;</bold></italic></td>
<td valign="top" align="center">&#x02212;0.15</td>
<td valign="top" align="center">&#x02212;1.4 (5)</td>
<td valign="top" align="center"><bold>0.1</bold></td>
<td valign="top" align="center"><bold>12 (0.2)&#x0002A;</bold></td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">4 (&#x02212;1)</td>
<td valign="top" align="center">&#x02212;0.24</td>
<td valign="top" align="center">&#x02212;87.7 (0)</td>
</tr>
<tr>
<td valign="top" align="left">RNN</td>
<td valign="top" align="center"><bold>0.31</bold></td>
<td valign="top" align="center"><bold>3.2 (7.4)&#x0002A;</bold></td>
<td valign="top" align="center">0.17</td>
<td valign="top" align="center">1 (2.5)</td>
<td valign="top" align="center"><bold>0.33</bold></td>
<td valign="top" align="center"><bold>7.3 (6.7)&#x0002A;</bold></td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">&#x02212;1.6 (0.3)</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">5.4 (&#x02212;2.1)</td>
<td valign="top" align="center"><italic><bold>&#x02212;0.06</bold></italic></td>
<td valign="top" align="center"><italic><bold>&#x02212;20.3 (&#x02212;1.1)&#x0002A;</bold></italic></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Positive deviance (df) suggests better performance of the language model, and negative deviance indicates that CCP fits better (&#x0002A;p &#x0003C; 0.05)</italic>.</p>
<p><italic>For a better overview, language models performing better were marked bold, and CCP performing better was marked in italics and bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p>The <italic>F</italic>-values of the predictors (effective degrees of freedom), their significance level and the total amount of variance explained in models including all predictors at the same time.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>SFD</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>GD</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>TVT</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Baseline</bold></th>
<th valign="top" align="center"><bold>SRC</bold></th>
<th valign="top" align="center"><bold>PSC</bold></th>
<th valign="top" align="center"><bold>SRC</bold></th>
<th valign="top" align="center"><bold>PSC</bold></th>
<th valign="top" align="center"><bold>SRC</bold></th>
<th valign="top" align="center"><bold>PSC</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Landing site</td>
<td valign="top" align="center">2.9 (1.9)&#x0002A;</td>
<td valign="top" align="center">12.8 (8.6)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">8.4 (4.1)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">12.8 (8.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">10.5 (4.5)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">10.7 (8.5)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Length</td>
<td valign="top" align="center">0.6 (2.0)</td>
<td valign="top" align="center">19.6 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">10.2 (3.2)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">372.5 (9.0)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">17.2 (2.2)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">248.9 (9.0)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Length_last</td>
<td valign="top" align="center">6.9 (7.0)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">29.1 (8.9)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">6.5 (6.3)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">26.8 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">10.2 (8.5)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">24.0 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Length_next</td>
<td valign="top" align="center">2.1 (6.5)&#x0002A;</td>
<td valign="top" align="center">12.3 (8.2)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">3.7 (8.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">10.5 (8.5)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">3.4 (8.6)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">8.9 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Frequency</td>
<td valign="top" align="center">5.0 (4.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">29.4 (8.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">6.9 (4.6)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">65.2 (9.0)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">4.8 (6.1)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">81.3 (8.9)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Frequency_last</td>
<td valign="top" align="center">1.8 (1.1)</td>
<td valign="top" align="center">6.2 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">1.6 (1.0)</td>
<td valign="top" align="center">6.6 (8.6)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">6.6 (3.3)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">24.0 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Frequency_next</td>
<td valign="top" align="center">1.5 (5.5)</td>
<td valign="top" align="center">17.5 (8.5)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">2.0 (7.5)&#x0002A;</td>
<td valign="top" align="center">27.2 (8.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">2.7 (7.5)&#x0002A;&#x0002A;</td>
<td valign="top" align="center">22.6 (8.9)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>CCP</bold></td>
<td valign="top" align="center">0.7 (1.0)</td>
<td valign="top" align="center">12.1 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">0.8 (3.0)</td>
<td valign="top" align="center">17.4 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">0.7 (1.0)</td>
<td valign="top" align="center">37.1 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">CCP_last</td>
<td valign="top" align="center">1.9 (6.0).</td>
<td valign="top" align="center">2.1 (2.3)</td>
<td valign="top" align="center"><bold>2.5 (6.0)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>5.5 (8.8)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center">0.9 (2.0)</td>
<td valign="top" align="center">14.0 (8.9)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">CCP_next</td>
<td valign="top" align="center">2.0 (1.0)</td>
<td valign="top" align="center">19.0 (8.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center"><bold>7.0 (1.0)&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>14.1 (8.8)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center">1.9 (1.0)</td>
<td valign="top" align="center">8.5 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>N-gram</bold></td>
<td valign="top" align="center"><bold>2.7 (4.2)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>14.0 (6.4)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>3.8 (6.4)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>12.8 (8.9)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>4.2 (3.5)&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>10.5 (8.8)&#x0002A;&#x0002A;&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">N-gram_last</td>
<td valign="top" align="center"><bold>2.9 (7.4)&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>12.3 (8.0)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>7.4 (1.0)&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>14.6 (8.9)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center">2.0 (5.6).</td>
<td valign="top" align="center">18.6 (8.9)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">N-gram_next</td>
<td valign="top" align="center">3.7 (1.0).</td>
<td valign="top" align="center">6.0 (8.4)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">2.7 (1.0)</td>
<td valign="top" align="center">8.5 (7.7)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">5.3 (1.0)&#x0002A;</td>
<td valign="top" align="center">10.7 (7.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Topic</bold></td>
<td valign="top" align="center">2.1 (2.4).</td>
<td valign="top" align="center">14.6 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">1.6 (2.9)</td>
<td valign="top" align="center">14.2 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">1.8 (2.1)</td>
<td valign="top" align="center">13.4 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Topic_last</td>
<td valign="top" align="center">1.7 (5.4).</td>
<td valign="top" align="center">19.6 (8.3)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">3.1 (6.5)&#x0002A;&#x0002A;</td>
<td valign="top" align="center">19.0 (8.4)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">1.9 (6.6).</td>
<td valign="top" align="center">19.7 (8.5)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left">Topic_next</td>
<td valign="top" align="center">1.8 (2.4)</td>
<td valign="top" align="center">9.2 (8.6)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">0.3 (1.1)</td>
<td valign="top" align="center">19.0 (8.4)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">0.2 (1.0)</td>
<td valign="top" align="center">11.2 (8.9)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>RNN</bold></td>
<td valign="top" align="center"><bold>4.4 (1.0)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>5.3 (8.5)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center">1.5 (3.6)</td>
<td valign="top" align="center">5.8 (8.1)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center"><bold>8.6 (1.0)&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>8.5 (8.4)&#x0002A;&#x0002A;&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">RNN_last</td>
<td valign="top" align="center">1.5 (1.0)</td>
<td valign="top" align="center">9.2 (8.8)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">3.3 (5.5)&#x0002A;&#x0002A;</td>
<td valign="top" align="center">5.7 (8.5)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center"><bold>3.6 (3.9)&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center"><bold>7.4 (8.8)&#x0002A;&#x0002A;&#x0002A;</bold></td>
</tr>
<tr>
<td valign="top" align="left">RNN_next</td>
<td valign="top" align="center"><bold>2.3 (5.6)&#x0002A;</bold></td>
<td valign="top" align="center"><bold>6.7 (7.8)&#x0002A;&#x0002A;&#x0002A;</bold></td>
<td valign="top" align="center">1.6 (8.2)</td>
<td valign="top" align="center">11.0 (7.4)&#x0002A;&#x0002A;&#x0002A;</td>
<td valign="top" align="center">1.8 (3.1)</td>
<td valign="top" align="center">8.3 (8.7)&#x0002A;&#x0002A;&#x0002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Total</bold> <italic><bold>R</bold><sup><bold>2</bold></sup></italic> <bold>(%)</bold></td>
<td valign="top" align="center">5.45</td>
<td valign="top" align="center">5.36</td>
<td valign="top" align="center">9.16</td>
<td valign="top" align="center">11.5</td>
<td valign="top" align="center">10.7</td>
<td valign="top" align="center">11.3</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>p &#x0003C; 0.1; &#x0002A; p &#x0003C; 0.05; &#x0002A;&#x0002A; p &#x0003C; 0.01; &#x0002A;&#x0002A;&#x0002A; p &#x0003C; 0.001. Consistent findings over both data sets in <bold>Tables 4&#x02013;6</bold> and this Table and are marked bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Cloze completion probability effects on gaze duration. The x-axes show logit-transformed cloze completion probability (CCP), while the y-axes demonstrate its influence on gaze durations (ms) of the full GAM models (cf. <xref ref-type="table" rid="T8">Table 8</xref>). <bold>(A,B)</bold> Illustrate effects of last-word CCP on gaze durations of the currently fixated word. <bold>(C,D)</bold> Demonstrate that the CCP of the next word has an influence on the gaze durations of the currently fixated word. <bold>(A,C)</bold> Display effects in the SRC data set and <bold>(B,D)</bold> illustrate the results of the PSC data set. Particularly in a logit-range around 0.5&#x02013;1.5, the CCP of the surrounding words seems to delay fixations. Shaded areas indicate standard errors.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-04-730570-g0001.tif"/>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>N-gram probability effects on gaze durations. The x-axes display log-transformed n-gram probability, while the y-axes demonstrate its influence on gaze durations (ms) of the full GAM models (cf. <xref ref-type="table" rid="T8">Table 8</xref>). <bold>(A,B)</bold> Demonstrate the effects of the present-word n-gram probabilities, while <bold>(C,D)</bold> Illustrate the influence of the n-gram probability of the last word on the gaze durations of the currently fixated word. Results of the SRC data set are given in <bold>(A,C)</bold>, while PSC data are illustrated in <bold>(B,D)</bold>. Large log-transformed n-gram probabilities lead to an approximately linear decrease of gaze durations, particularly in a log-range of &#x02212;8 to &#x02212;1. Shaded areas indicate standard errors.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-04-730570-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>RNN probability effects on single-fixation durations. X-axes show log-transformed RNN probabilities, while the y-axes demonstrate the most reliable present and next-word RNN effects obtained in single-fixation durations. The <bold>(A,B)</bold> demonstrate the effect of present-word RNN probability. Relatively linear decreases of single-fixation durations are obtained, particularly at a log-RNN probability of &#x02212;10 and higher. <bold>(C,D)</bold> Illustrate the effects of log RNN probability of the next word. Log RNN probabilities of the next word lower than &#x02212;7 seem to delay the fixation duration of the currently fixated word. <bold>(A,C)</bold> show SRC data and <bold>(B,D)</bold> illustrate PSC data. Shaded areas indicate standard errors.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-04-730570-g0003.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<p>Our <bold>simple item-level correlations</bold> revealed that all language models provided larger correlations with SFD, GD, and TVT data than CCP (<xref ref-type="table" rid="T3">Table 3</xref>), demonstrating that language models provide a better account for viewing times than CCP. Moreover, there are substantial correlations between all predictor variables, making analyses with multiple predictors prone to fit error variance. Therefore, we will focus our conclusions on those findings that can be reproduced in different types of analyses (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T8">8</xref>; Wagenmakers et al., <xref ref-type="bibr" rid="B79">2011</xref>). When turning to these <bold>non-linear GAM analyses</bold> at the level of each fixation event, we found that nearly any predictor accounts for variance in the PSC data set. This suggests that all types of predictability account for viewing time variance, once there is sufficient statistical power. When we examined typically sized samples in the SRC, only the most robust effects make a contribution. Therefore, we will also focus our conclusions on those effects that can be reproduced in both samples (see <xref ref-type="table" rid="T8">Table 8</xref> for a summary of all results).</p>
<p>Concerning the <bold>CCP</bold> analyses, the only findings that can be reproduced in both data samples and across all non-linear analyses was the influence of last- and next-word CCP on GD data (<xref ref-type="table" rid="T5">Tables 5</xref>, <xref ref-type="table" rid="T8">8</xref>). These effects seem quite robust and can be examined in <xref ref-type="fig" rid="F1">Figure 1</xref>. When all types of predictability are included in the GAM (<xref ref-type="table" rid="T8">Table 8</xref>), CCP of the last and next word seems to prolong GDs particularly in the range of logit-transformed CCPs of around 0.5&#x02013;1.5 (<xref ref-type="fig" rid="F1">Figure 1</xref>). We see this as a preliminary evidence that this type of predictability might be particularly prone to predict that high-CCP last or next words are processed during present-word fixations.</p>
<p>In general, CCP effects seem to be most reliable in GD data. CCP outperformed the topic model in the SRC data set for GD predictions (<xref ref-type="table" rid="T7">Table 7</xref>). Please also note that the only type of predictability that failed to improve the GAM predictions in the highly powered PSC sample was the CCP-effect of the last word in SFD data (<xref ref-type="table" rid="T8">Table 8</xref>).</p>
<p>The language models not only showed greater correlations with all viewing time measures than CCP (<xref ref-type="table" rid="T3">Table 3</xref>), but they also delivered a larger number of consistent findings in our extensive set of non-linear analyses (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T8">8</xref>). There are some CCP effects worth to be highlighted that are exclusively apparent in the analyses using only a single type of predictability. There are last-word CCP effects in SFD data (<xref ref-type="table" rid="T4">Table 4</xref>), and a present-word effect in the TVT data (<xref ref-type="table" rid="T6">Table 6</xref>). In addition to the fully consistent last-word GD effects (<xref ref-type="table" rid="T5">Tables 5</xref>, <xref ref-type="table" rid="T8">8</xref>, <xref ref-type="fig" rid="F1">Figure 1</xref>), this lends further support to the hypothesis that CCP is particularly prone to reflect late processes. Moreover, there are also consistent next-word effects in the GD and TVT analyses of both samples (<xref ref-type="table" rid="T5">Tables 5</xref>, <xref ref-type="table" rid="T6">6</xref>) that are, however, often better explained by other types of predictability in the analysis containing all predictors (<xref ref-type="table" rid="T8">Table 8</xref>).</p>
<p>Our non-linear analyses revealed that the <bold>n-gram</bold>-based probabilities of the present word can account for variance in all three viewing time measures. Moreover, the n-gram probability of the last word reproducibly accounted for variance in SFD and GD data. We illustrate these effects in <xref ref-type="fig" rid="F2">Figure 2</xref>, suggesting that high log-transformed n-gram probabilities exhibit approximately linear decreases in GD, particularly in the range of log-transformed probabilities from &#x02212;8 to &#x02212;2 (cf. Smith and Levy, <xref ref-type="bibr" rid="B72">2013</xref>).</p>
<p>The present and last-word n-gram effects can be consistently obtained in the analyses of only a single type of predictability (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>), as well as in the analysis containing all predictors (<xref ref-type="table" rid="T8">Table 8</xref>). Moreover, the n-gram-based GAM, including the present, the last and the next word predictor, provided significantly better predictions than the CCP-based GAM in the SRC data set for SFD and TVT data (<xref ref-type="table" rid="T7">Table 7</xref>). This result pattern suggests that an n-gram model is more appropriate than CCP for predicting eye-movements in relatively small samples.</p>
<p>Concerning the non-linear analyses of the <bold>topic</bold> models, we found no effects that can be reproduced across all analyses (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>, <xref ref-type="table" rid="T8">8</xref>). Thus, we found an even less consistent picture for the topic model than for CCP. When tested against each other, CCP provided better predictions than the topics model in the GD data of the SRC, while the reverse result pattern was obtained for SFD data in the PSC (<xref ref-type="table" rid="T7">Table 7</xref>). In the analyses relying on a single type of predictability, we obtained a TVT effect for the present word that can be reproduced across both samples (<xref ref-type="table" rid="T6">Table 6</xref>). In the analyses containing all types of predictability, only the last-word&#x00027;s topical fit with the preceding sentence revealed a GD effect in the SRC data set that can be reproduced across both samples (<xref ref-type="table" rid="T8">Table 8</xref>). These result patterns may tentatively point at a late semantic integration effect of long-range semantics by topic model predictions.</p>
<p>The examination of <bold>RNN</bold> models revealed consistent findings across all non-linear analyses for the present word in SFD and TVT data (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>, <xref ref-type="table" rid="T8">8</xref>). Though consistent next-word predictability effects were obtained for all viewing time measures in the analyses containing only a single type of predictability (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>), only the next-word RNN effect in SFD data was reproducible in the analyses containing all predictors (<xref ref-type="table" rid="T8">Table 8</xref>). This result pattern indicates that an RNN may be particularly useful for investigating (parafoveal) preprocessing in rapidly successful lexical access.</p>
<p>Therefore, we relied on SFD data to illustrate the functional form of the RNN effects in <xref ref-type="fig" rid="F3">Figure 3</xref>. RNN probabilities of the present word reduce SFDs, particularly at a log-transformed probability of &#x02212;10 and higher. Concerning the influence of the next word, log-probabilities lower than &#x02212;7 seem to delay the SFDs of the current word. This provides some preliminary evidence that an extremely low RNN probability of the word in the right parafovea might leads to parafoveal preprocessing of the next word.</p>
<p>Next-word probability effects, however, are not the only domain, in which RNN models can account for other variance than the other types of predictability. We also obtained consistent last-word RNN-based GAM model fit improvements and significant effects of last-word probabilities in the analysis including all predictors for TVT data (<xref ref-type="table" rid="T6">Tables 6</xref>, <xref ref-type="table" rid="T8">8</xref>)&#x02014;a result that probably points at the generalization capabilities of this subsymbolic model. For the SRC data set, the RNN provided significantly better predictions in SFD and TVT data, while CCP outperformed the RNN model in the TVT data of the PSC data set (<xref ref-type="table" rid="T7">Table 7</xref>).</p>
<p>When summing up the results of language-model-based vs. the CCP-based GAM models, language models outperformed CCP in 5 comparisons, while CCP provided significantly better fitting GAM models in 2 comparisons (<xref ref-type="table" rid="T7">Table 7</xref>). Moreover, the three language models together always accounted for more viewing time variance than CCP (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>). When the predictors of all three language models are together incorporated into a final GAM, this accounted for more viewing time variance than CCP (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>). Thus, a combined set of language model predictors is appropriate to explain eye-movement patterns.</p>
</sec>
<sec id="s4">
<title>General Discussion</title>
<sec>
<title>Language Models Account for More Reproducible Variance</title>
<p>In the present study, we compared CCP to word probabilities calculated from n-gram-, topic- and RNN-models for predicting fixation durations in the SRC and PSC data set. Already simple item-level analyses showed that all language models provided greater linear correlations with all viewing time measures than CCP (<xref ref-type="table" rid="T3">Table 3</xref>). We also explored the possibility that the raw word probabilities provide a linear relationship with reading times (Brothers and Kuperberg, <xref ref-type="bibr" rid="B10">2021</xref>). The transformed probabilities, however, always provided greater correlations with the three reading time measures than the raw probabilities. Therefore, our analyses support Smith and Levy&#x00027;s (<xref ref-type="bibr" rid="B72">2013</xref>) conclusion that the best prediction is obtained with log-transformed language model probabilities (cf. Wilcox et al., <xref ref-type="bibr" rid="B82">2020</xref>).</p>
<p>When contrasting each language model against CCP in our non-linear analyses, there was no single language model that provided consistently better performance than CCP for the same viewing time measure in both data sets (<xref ref-type="table" rid="T7">Table 7</xref>). Rather, such comparisons seem to depend on a number of factors such as the chosen data set, language, participants and materials, demonstrating the need for further studies. A particularly important factor should be the number of participants in the CCP sample: In the small SRC data set, the language models outperformed CCP in 4 cases, while CCP was significantly better only 1 time. In the large CCP data set of the PSC, in contrast, both CCP and the language models outperformed each other for 1 time. When examining the amount of explained variance, however, the language models usually provided greater gains in explained variances than CCP: The n-gram and RNN models provided increments in explained variance ranging between 0.33 and 0.6% over CCP in the SFD and TVT data of the SRC data set, in which CCP however provided better predictions than the topic model (0.17%, <xref ref-type="table" rid="T7">Table 7</xref>). For the PSC data, there was a topic model gain of 0.1% over CCP in SFD data, but a CCP gain over the RNN model of 0.06% of variance. In sum, the language models provided better predictions than CCP in 5 cases&#x02014;CCP provided better predictions in 2 cases (<xref ref-type="table" rid="T7">Table 7</xref>). Finally, the three language models together always outperformed CCP (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>), supporting Hofmann&#x00027;s et al. (<xref ref-type="bibr" rid="B29">2017</xref>) conclusion derived from linear item-level based multiple regression analysis. Therefore, language models not only provide a deeper explanation for the consolidation mechanisms of the mental lexicon, but they also often perform better than CCP in accounting for viewing times.</p>
</sec>
<sec>
<title>CCP Effects Set a Challenge for Unexplained Predictive Processes</title>
<p>Nevertheless, CCP can still make a large contribution toward understanding the complex interplay of differential predictive processes. Though the results are less consistent than in the n-gram and RNN analyses, at least the last- and next-word CCP effects on GD data are reproducible in two eye movement samples and over several analyses (<xref ref-type="table" rid="T5">Tables 5</xref>, <xref ref-type="table" rid="T8">8</xref>; cf. <xref ref-type="table" rid="T7">Table 7</xref>). This suggests that CCP accounts for variance that is not reflected by the language models we investigated (cf. e.g., Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>). When exploring the functional form of the GD effects of the surrounding words, <xref ref-type="fig" rid="F1">Figure 1</xref> indicated that a high CCP of the last and next word leads to a relatively linear increase in GD, particularly in a logit-transformed predictability range of around 0.5&#x02013;1.5. This might indicate that when between around 73 to 95% of the participants of the cloze completion sample agree that the cloze can be completed with this particular word, CCP might represent a reliable predictor of GD. As opposed to functional forms of the language model effects, CCP was the only variable that predicted an increase of viewing times with increasing levels of predictability. So CCP might represent an ongoing processing of the last word, or a type of parafoveal preprocessing that delays the present fixation (cf. <xref ref-type="fig" rid="F3">Figures 3C,D</xref>, for other parafoveal preprocessing effects).</p>
<p>When having a large CCP sample available, the usefulness of CCP increases, as can be examined in the PSC effects of the present study (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T8">8</xref>). Though CCP can be considered an &#x0201C;all-in&#x0201D; variable, containing to some degree co-occurrence-based, semantic and syntactic responses (Staub et al., <xref ref-type="bibr" rid="B76">2015</xref>), CCP samples might probably vary with respect to the amount of these different types of linguistic structure that is contained in these samples. This might in part explain some inconsistency between our SRC and PSC samples. We suggest that future studies should more closely evaluate which type of information is contained in which particular CCP sample, in order to obtain a scientifically deeper explanation for the portions of the variance that can presently still be better accounted for by CCP (Shaoul et al., <xref ref-type="bibr" rid="B71">2014</xref>; Luke and Christianson, <xref ref-type="bibr" rid="B44">2016</xref>; Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>; Lopukhina et al., <xref ref-type="bibr" rid="B43">2021</xref>). <xref ref-type="table" rid="T3">Table 3</xref> shows that the correlations of n-gram and RNN models with the CCP data are larger than the correlations with topics models in both data samples. This suggests that the present cloze completion procedures were more sensitive to short-range syntax and semantics rather to long-range semantics. These CCP scores were based on sentence context&#x02014;a stronger contribution of long-range semantics could be probably expected when the cloze (and reading) tasks are based on paragraph data (e.g., Kennedy et al., <xref ref-type="bibr" rid="B35">2013</xref>).</p>
<p>Though last-word SFD effects (<xref ref-type="table" rid="T4">Table 4</xref>) and present-word TVT effects (<xref ref-type="table" rid="T6">Table 6</xref>) of CCP seem to be better explained by the language models (<xref ref-type="table" rid="T8">Table 8</xref>), this result pattern confirms the prediction of the EZ-reader model that CCP particularly reflects late processes during reading (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>).</p>
</sec>
<sec>
<title>Symbolic Short-Range Semantics and Syntax in N-Gram Models</title>
<p>While it has often been claimed that CCP reflects semantic processes (e.g., Staub et al., <xref ref-type="bibr" rid="B76">2015</xref>), it is difficult to define what &#x0201C;semantics&#x0201D; exactly means. Here we offer scientifically deep explanations relying on the computationally concrete definitions of probabilistic language models (Reichle et al., <xref ref-type="bibr" rid="B66">2003</xref>; Westbury, <xref ref-type="bibr" rid="B81">2016</xref>), which allow for a deeper understanding of the consolidation mechanisms. An n-gram model is a simple count-based model that relies exclusively on symbolic representations of words. We call it a short-range &#x0201C;semantics and syntax&#x0201D; model, because it is trained from the immediately preceding words. The n-gram model reflects sequential-syntagmatic &#x0201C;low-level&#x0201D; information (e.g., McDonald and Shillcock, <xref ref-type="bibr" rid="B49">2003b</xref>).</p>
<p>The present word&#x00027;s n-gram probability accounted for early successful lexical access as reflected in SFD, standard lexical access as reflected in the GD, as well as late integration as reflected in the TVT. Moreover, the last word&#x00027;s n-gram probability accounted for lagged effects on SFD and GD data, which replicates and extends Smith and Levy&#x00027;s (<xref ref-type="bibr" rid="B72">2013</xref>) findings. The examination of the functional form also confirms their conclusion that last- and present-word log n-gram probability provides a (near-)linear relationship with GD (see also Wilcox et al., <xref ref-type="bibr" rid="B82">2020</xref>)&#x02014;at least in the log-transformed probability range of &#x02212;8 to &#x02212;2 (<xref ref-type="fig" rid="F2">Figure 2</xref>). Such a near-linear relationship was also obtained for the log RNN probability of the present word with SFD data (<xref ref-type="fig" rid="F3">Figures 3A,B</xref>).</p>
<p>The present- and last-word effects of the n-gram model were remarkably consistent across the two different eye-movement samples, as well as over different analyses (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>, <xref ref-type="table" rid="T8">8</xref>; cf. Wagenmakers et al., <xref ref-type="bibr" rid="B79">2011</xref>). Our data support the view that n-gram models not only explain early effects of lexical access (e.g., McDonald and Shillcock, <xref ref-type="bibr" rid="B48">2003a</xref>; Frisson et al., <xref ref-type="bibr" rid="B24">2005</xref>), but can also be used for the study of late semantic integration (e.g., Boston et al., <xref ref-type="bibr" rid="B9">2008</xref>). Moreover, they seem to be a highly useful tool, when aiming to demonstrate the limitations of the eye-mind assumption (Just and Carpenter, <xref ref-type="bibr" rid="B34">1984</xref>). N-gram models consistently predict lagged effects that reflect the sustained semantic processing of the last word during the current fixation of a word. In sum, count-based syntactic and short-range semantic processes can reliably explain last-word and present-word probability effects (e.g., Smith and Levy, <xref ref-type="bibr" rid="B72">2013</xref>; Baroni et al., <xref ref-type="bibr" rid="B5">2014</xref>).</p>
</sec>
<sec>
<title>Less Consistent Findings in Long-Range Topic Models</title>
<p>Topic models provided the least consistent picture over the two different samples and the different types of analyses (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>, <xref ref-type="table" rid="T8">8</xref>). Topic models are count-based subsymbolic approaches to long-range semantics, which is consolidated from the co-occurrences in whole documents. As the same is true for LSA, they can well be compared to previous LSA-findings (Kintsch and Mangalath, <xref ref-type="bibr" rid="B36">2011</xref>). For instance, Bianchi et al. (<xref ref-type="bibr" rid="B6">2020</xref>) obtained remarkably weaker LSA-based effects than n-gram-based GD effects&#x02014;thus our results are in line with the results of this study (cf. Hofmann et al., <xref ref-type="bibr" rid="B29">2017</xref>). When they combined LSA with an n-gram model and additionally included next-word semantic matches in their regression model, some of the LSA-based effects became significant. This greater robustness of the LSA effects in Bianchi et al. (<xref ref-type="bibr" rid="B6">2020</xref>) may be well-explained by Kintsch and Mangalath&#x00027;s (<xref ref-type="bibr" rid="B36">2011</xref>) proposal, that syntactic factors should be additionally considered when aiming to address long-range meaning and comprehension. As opposed to Bianchi&#x00027;s et al. (<xref ref-type="bibr" rid="B6">2020</xref>) next-word GD effects, however, the present study revealed last-word GD effects of long-range semantics in the analysis considering all predictors (<xref ref-type="table" rid="T8">Table 8</xref>). And when examining only the present word&#x00027;s topical fit with the preceding sentence, Wang&#x00027;s et al. (<xref ref-type="bibr" rid="B80">2010</xref>) conclusion was corroborated that (long-range) lexical knowledge of whole documents is best reflected in TVT data (see <xref ref-type="table" rid="T6">Table 6</xref>).</p>
<p>In sum, long-range semantic models provide a less consistent picture than the other language models (cf. Lopukhina et al., <xref ref-type="bibr" rid="B43">2021</xref>). The results become somewhat more consistent when short-range relations are additionally taken into account. Given that the effects occur in the last, present or next word when short-range semantics is included, this may point at a positional variability of long-range semantics that is integrated at some point in time during retrieval. We think that this results from the position-insensitive training procedure. Therefore, rather than completely rejecting the eye-mind assumption by proposing that &#x0201C;the process of retrieval is independent of eye movements&#x0201D; (Anderson et al., <xref ref-type="bibr" rid="B2">2004</xref>, p. 225), we suggest that long-range semantics is position-insensitive at consolidation. Therefore, it is also position-insensitive at lexical retrieval and the effects can hardly be constrained to the last, present or next word, even if short-range relations are considered (Wang et al., <xref ref-type="bibr" rid="B80">2010</xref>; Bianchi et al., <xref ref-type="bibr" rid="B6">2020</xref>).</p>
<p>Finally, it should be taken into account that we here examined single sentence reading. This may explain the superiority of language models that are trained at the sentence level. Document-level training might be superior when examining words embedded in paragraphs or documents. This hypothesis is in part confirmed by Luke and Christianson (<xref ref-type="bibr" rid="B44">2016</xref>). They computed the similarity of each word to the preceding paragraph and found relatively robust LSA effects (Luke and Christianson, <xref ref-type="bibr" rid="B44">2016</xref>, e.g., Tables 41&#x02013;45).</p>
</sec>
<sec>
<title>RNN Models: An Alternative View on the Mental Lexicon</title>
<p>Short-range semantics and syntax can be much more reliably constrained to present-, last-, or next-word processing, as already demonstrated by the consistent present and last-word n-gram effects. For the RNN probabilities, the simple linear item-level correlations with SFD, GD, and TVT data were largest, replicating, and extending the results of Hofmann et al. (<xref ref-type="bibr" rid="B29">2017</xref>). For the non-linear analyses, the present word&#x00027;s RNN probabilities provided the most consistent results for SFD and TVT (<xref ref-type="table" rid="T5">Tables 5</xref>, <xref ref-type="table" rid="T6">6</xref>, <xref ref-type="table" rid="T8">8</xref>). Though GD was significant when only examining a single language model (<xref ref-type="table" rid="T5">Table 5</xref>), this result could not be confirmed in the analysis in which all predictors competed for viewing time variance (<xref ref-type="table" rid="T8">Table 8</xref>). We propose that this difference can be explained by considering how these short-range models are trained for consolidated information. The n-gram model is symbolic; thus, the prediction is only made for a particular word form. And when testing for standard lexical access (that is reflected in the GD), a perfect match of the predicted and the observed word form may explain the superiority of the n-gram model in this endeavor (<xref ref-type="table" rid="T8">Table 8</xref>).</p>
<p>On the other hand, both language models accounted for SFD and TVT variance in the analysis containing all types of predictability (<xref ref-type="table" rid="T8">Table 8</xref>). The co-existence of these effects may be explained by the proposal that both models represent different views on the mental lexicon (Elman, <xref ref-type="bibr" rid="B17">2004</xref>; Frank, <xref ref-type="bibr" rid="B21">2009</xref>). The n-gram model represents a &#x0201C;static&#x0201D; view, in which large lists of word sequences and their frequencies are stored, to see which word is more likely to occur in a context, given this large &#x0201C;dictionary&#x0201D; of word sequences. The RNN model, in contrast, has only a few hundred hidden units that reflect the &#x0201C;mental state&#x0201D; of the model (Elman, <xref ref-type="bibr" rid="B17">2004</xref>). As a result of this small &#x0201C;mental space,&#x0201D; neural models have to compress the word information, which may, for instance, explain their generalization capabilities: When such a model is trained to learn statements such as &#x0201C;robin is a bird&#x0201D; and &#x0201C;robin can fly,&#x0201D; and it later learns only a few facts about a novel bird, e.g., &#x0201C;sparrow can fly,&#x0201D; &#x0201C;sparrow&#x0201D; obtains a similar hidden unit representation as &#x0201C;robin&#x0201D; (McClelland and Rogers, <xref ref-type="bibr" rid="B47">2003</xref>). Therefore, a neural model can complete the cloze &#x0201C;sparrow is a &#x02026;&#x0201D; with &#x0201C;bird,&#x0201D; even if it never was presented this particular combination of words. For instance, in our PSC example stimulus sentence &#x0201C;Die [The] Richter [judges] der [of the] Landwirtschaftsschau [agricultural show] pr&#x000E4;mieren [award a price to] Rhabarber [rhubarb] und [and] Mangold [mangold],&#x0201D; the word &#x0201C;pr&#x000E4;mieren&#x0201D; (award a price) has a relatively low n-gram probability of 1.549e-10, but a higher RNN probability of 1.378e-5, because the n-gram model has never seen the word n-gram &#x0201C;der Landwirtschaftsschau pr&#x000E4;mieren [agricultural show award a price to],&#x0201D; but the RNN model&#x00027;s hidden units are capable of inferring such information from similar contexts (e.g., Wu et al., <xref ref-type="bibr" rid="B84">2021</xref>). In sum, both views on the mental lexicon account for different portions of variance in word viewing times (<xref ref-type="table" rid="T8">Table 8</xref>). The n-gram model may explain variance resulting from an exact match of a word form in that context, while for instance generalized information may be better explained by the RNN model.</p>
<p>There are, however, also differences in the result patterns that are best captured by the two language models, respectively. A notable difference between count-based, symbolic knowledge in the n-gram vs. predict-based, subsymbolic knowledge in the RNN lies in their capability to account for last-word vs. next-word effects. While the n-gram model obtained remarkably consistent findings for the last word, next-word SFD effects are better captured by an RNN (<xref ref-type="table" rid="T4">Tables 4</xref>, <xref ref-type="table" rid="T8">8</xref>). This corroborates our conclusion that the two views on the mental lexicon account for different effects. The search for a concrete word form, given the preceding word forms in the n-gram model, may take some time. Therefore, the probabilities of the last word still affect the processing of the present word. As opposed to this static view on a huge mental lexicon, the hidden units in the RNN model are trained to predict the next word (Baroni et al., <xref ref-type="bibr" rid="B5">2014</xref>). When such a predicted word to the right of the present fixation position may cross a log RNN probability of &#x02212;7, its presence can be verified (<xref ref-type="fig" rid="F3">Figures 3C,D</xref>). Therefore, the RNN-based probability of the next word may elicit fast and successful lexical access of the present word, as reflected in the SFD.</p>
</sec>
<sec>
<title>Limitations and Outlook</title>
<p>Model comparison is known to be related to the numbers of parameters included in the models, thus a comparison of the GAMs comparing all three language models with the CCP predictor might overemphasize the effects of the language models (<xref ref-type="table" rid="T4">Tables 4</xref>&#x02013;<xref ref-type="table" rid="T6">6</xref>). However, we think that language models allow for a deeper understanding of natural language processing of humans than CCP does, because language models provide a computational definition how &#x0201C;predictability&#x0201D; is consolidated from experience. Moreover, the conclusion that language models account for more variance than CCP is also corroborated by all other analyses (<xref ref-type="table" rid="T3">Tables 3</xref>&#x02013;<xref ref-type="table" rid="T8">8</xref>). Sometimes it is stated that GAMs in general are prone to overfitting. CCP and even more so the model-generated predictability scores are highly correlated with word frequency, potentially leading to overfitting on the one hand. On the other hand, it is more difficult for the language models to account for additional variance, because their higher correlation (i.e., their higher shared variances) with word frequency (<xref ref-type="table" rid="T3">Table 3</xref>). Wood (<xref ref-type="bibr" rid="B83">2017</xref>) discusses that the penalization procedure that leads to the smoothing parameters and also the GCV procedure inherent in the gam() function in R specifically tackles overfitting. We further addressed this question by focusing on consistent results, visible across the computation for two independent eye-movement samples and different types of analyses. This approach reduced the number of robust and consistent findings. If one would have ignored the non-linear nature of the relationships between predictors and dependent variables and only examined the simple linear effects reported in <xref ref-type="table" rid="T3">Table 3</xref>, however, the advantages of the language models over CPP would have been much clearer.</p>
<p>This also leads to the typical concern for analyses on unaggregated data that they account for only a small &#x0201C;portion&#x0201D; of variance. For instance, Duncan (<xref ref-type="bibr" rid="B15">1975</xref>, p. 65) suggests to &#x0201C;eschew altogether the task of dividing up <italic>R</italic><sup>2</sup> into unique causal components&#x0201D; (quoted from Kliegl et al., <xref ref-type="bibr" rid="B38">2006</xref>, p. 22). It is clear that unaggregated data contain a lot of noise, for instance resulting from a random walk in the activation of lexical units, random errors in saccade length, and also &#x0201C;[s]accade programs are generated autonomously, so that fixation durations are basically realizations of a random variable&#x0201D; (Engbert et al., <xref ref-type="bibr" rid="B18">2005</xref>, p. 781). Therefore, if we like to estimate the &#x0201C;unique casual component&#x0201D; of semantic and syntactic processes, we need to rely on aggregated data. <xref ref-type="table" rid="T3">Table 3</xref> suggests that these top-down factors represented by language account for a reasonable 15&#x02013;36% of the viewing time variance, while CCP accounted for 7&#x02013;19%.</p>
<p>It also becomes clear that the two data sets (PSC and SRC), differ in important aspects. First, the CCP samples differ in size, thus they probably provide a different signal-to-noise ratio. A second difference is that also the PSC eye-movement sample is larger and thus has more statistical power to identify significant eye-movement effects of single predictors. Third, the CCP measure is derived from different participant samples. We would also point to the fact, that the eye-movement and CCP samples were collected at different times and come from different countries, which may also explain some differences between the obtained effects. Fourth, also the English and the German subtitles training corpora may contain slightly different information. Having these limitations in mind, we feel that the most consistent findings discussed in the previous sections represent robust effects to evaluate the functioning and the predictions of language models. Our rather conservative approach might miss some effects that are actually apparent, though they are explainable by these four major differences between the samples. Therefore, future studies might more closely characterize the CCP participant samples. They may examine the same participants&#x00027; eye movements to obtain predictability estimates that are representative for the participants&#x02014;which might increase the amount of explainable variance (cf. Hofmann et al., <xref ref-type="bibr" rid="B32">2020</xref>). Finally, google n-gram training corpora may help to obtain training corpora stemming from the same time as the eye-movement data collection.</p>
<p>Though we were able to discriminate between long-range semantics and short-range relations that can be differentiated into count-based symbolic and predict-based subsymbolic representations, we like to point at the fact that the short-range relations could also be separated into semantic and syntactic effects. For example, RNN models have previously also been related to syntax processing (Elman, <xref ref-type="bibr" rid="B16">1990</xref>). Therefore, syntactic information may alternatively explain the very early and very late effects (Friederici, <xref ref-type="bibr" rid="B23">2002</xref>). To examine whether semantic or syntactic effects are at play, a promising start for further evaluations may examine content vs. function words, which may even lead to more consistent findings for long-range semantic models. Further analysis may focus on language models that take into account syntactic information (e.g., Pad&#x000F3; and Lapata, <xref ref-type="bibr" rid="B57">2007</xref>; cf. Frank, <xref ref-type="bibr" rid="B21">2009</xref>).</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusion</title>
<p>Understanding the complex interplay of different types of predictability for reading is a challenging endeavor, but we think that our review and our data point at differential contributions of count-based and predict-based models in the domain of short-range knowledge. Count-based models better capture last-word effects, predict-based models better capture early next-word effects, while present-word probabilities both make an independent contribution to viewing times. In contrast, CCP is a rather all-in predictor, that probably covers both types of semantics: short-range and long-range. But we have shown that language models with their differential foci are better suited for a deeper explanation for eye-movement behavior, and thus applicable in theory development for models of eye-movement control. Finally, we hope that we made clear that these relatively simple language models are highly useful for understanding differential lexical access and semantic integration parameters that are reflected in differential viewing time parameters.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. These data can be found at: <ext-link ext-link-type="uri" xlink:href="https://clarinoai.informatik.uni-leipzig.de/fedora/objects/mrr:11022000000001F2FB/datastreams/EngelmannVasishthEngbertKliegl2013_1.0/content">https://clarinoai.informatik.uni-leipzig.de/fedora/objects/mrr:11022000000001F2FB/datastreams/EngelmannVasishthEngbertKliegl2013_1.0/content</ext-link>. The data and analysis examples of the present study can be found under <ext-link ext-link-type="uri" xlink:href="https://osf.io/z7d3y/?view_only=be48ab71ccd14da5b0413269c150d2f9">https://osf.io/z7d3y/?view_only=be48ab71ccd14da5b0413269c150d2f9</ext-link>.</p>
</sec>
<sec id="s7">
<title>Ethics Statement</title>
<p>Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>SR provided the language models. MH analyzed the data. LK checked and refined the analyses. All authors wrote the paper (major writing: MH).</p>
</sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This paper was funded by a grant of the Deutsche Forschungsgemeinschaft to MH (HO 5139/2-1 and 2-2).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec> </body>
<back>
<ack><p>We like to thank Albrecht Inhoff, Arthur Jacobs, Reinhold Kliegl, and the reviewers of previous submissions for their helpful comments.</p>
</ack>

<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adelman</surname> <given-names>J. S.</given-names></name> <name><surname>Brown</surname> <given-names>G. D.</given-names></name></person-group> (<year>2008</year>). <article-title>Modeling lexical decision: The form of frequency and diversity effects</article-title>. <source>Psychol. Rev.</source> <volume>115</volume>, <fpage>214</fpage>&#x02013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.115.1.214</pub-id><pub-id pub-id-type="pmid">18211194</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>J. R.</given-names></name> <name><surname>Bothell</surname> <given-names>D.</given-names></name> <name><surname>Douglass</surname> <given-names>S.</given-names></name></person-group> (<year>2004</year>). <article-title>Eye movements do not reflect retrieval processes: limits of the eye-mind hypothesis</article-title>. <source>Psychol. Sci.</source> <volume>15</volume>, <fpage>225</fpage>&#x02013;<lpage>231</lpage>. <pub-id pub-id-type="doi">10.1111/j.0956-7976.2004.00656.x</pub-id><pub-id pub-id-type="pmid">15043638</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baayen</surname> <given-names>H. B..</given-names></name></person-group> (<year>2010</year>). <article-title>Demythologizing the word frequency effect: a discriminative learning perspective</article-title>. <source>Ment. Lex.</source> <volume>5</volume>, <fpage>436</fpage>&#x02013;<lpage>461</lpage>. <pub-id pub-id-type="doi">10.1075/ml.5.3.10baa</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baayen</surname> <given-names>R. H.</given-names></name> <name><surname>Davidson</surname> <given-names>D. J.</given-names></name> <name><surname>Bates</surname> <given-names>D. M.</given-names></name></person-group> (<year>2008</year>). <article-title>Mixed-effects modeling with crossed random effects for subjects and items</article-title>. <source>J. Mem. Lang.</source> <volume>59</volume>, <fpage>390</fpage>&#x02013;<lpage>412</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2007.12.005</pub-id><pub-id pub-id-type="pmid">33635157</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Baroni</surname> <given-names>M.</given-names></name> <name><surname>Dinu</surname> <given-names>G.</given-names></name> <name><surname>Kruszewski</surname> <given-names>G.</given-names></name></person-group> (<year>2014</year>). <article-title>Don&#x00027;t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors,</article-title> in <source>Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1</source> (<publisher-loc>Baltimore, MD</publisher-loc>), <fpage>238</fpage>&#x02013;<lpage>247</lpage>. <pub-id pub-id-type="doi">10.3115/v1/P14-1023</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bianchi</surname> <given-names>B.</given-names></name> <name><surname>Bengolea Monz&#x000F3;n</surname> <given-names>G.</given-names></name> <name><surname>Ferrer</surname> <given-names>L.</given-names></name> <name><surname>Fern&#x000E1;ndez Slezak</surname> <given-names>D.</given-names></name> <name><surname>Shalom</surname> <given-names>D. E.</given-names></name> <name><surname>Kamienkowski</surname> <given-names>J. E.</given-names></name></person-group> (<year>2020</year>). <article-title>Human and computer estimations of Predictability of words in written language</article-title>. <source>Sci. Rep.</source> <volume>10</volume>:<fpage>4396</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-020-61353-z</pub-id><pub-id pub-id-type="pmid">32157161</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Biemann</surname> <given-names>C.</given-names></name> <name><surname>Roos</surname> <given-names>S.</given-names></name> <name><surname>Weihe</surname> <given-names>K.</given-names></name></person-group> (<year>2012</year>). <article-title>Quantifying semantics using complex network analysis,</article-title> in <source>24th International Conference on Computational Linguistics&#x02013;Proceedings of COLING 2012: Technical Papers</source> (<publisher-loc>Mumbai</publisher-loc>), <fpage>263</fpage>&#x02013;<lpage>278</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Blei</surname> <given-names>D. M.</given-names></name> <name><surname>Ng</surname> <given-names>A. Y.</given-names></name> <name><surname>Jordan</surname> <given-names>M. I.</given-names></name></person-group> (<year>2003</year>). <article-title>Latent dirichlet allocation</article-title>. <source>J. Mach. Learn. Res.</source> <volume>3</volume>, <fpage>993</fpage>&#x02013;<lpage>1022</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf?TB_iframe=true&#x00026;width=370.8&#x00026;height=658.8">https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf?TB_iframe=true&#x00026;width=370.8&#x00026;height=658.8</ext-link></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boston</surname> <given-names>M. F.</given-names></name> <name><surname>Hale</surname> <given-names>J.</given-names></name> <name><surname>Kliegl</surname> <given-names>R.</given-names></name> <name><surname>Patil</surname> <given-names>U.</given-names></name> <name><surname>Vasishth</surname> <given-names>S.</given-names></name></person-group> (<year>2008</year>). <article-title>Parsing costs as predictors of reading difficulty: an evaluation using the potsdam sentence corpus</article-title>. <source>J. Eye Move. Res.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.16910/jemr.2.1.1</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brothers</surname> <given-names>T.</given-names></name> <name><surname>Kuperberg</surname> <given-names>G. R.</given-names></name></person-group> (<year>2021</year>). <article-title>Word predictability effects are linear, not logarithmic: implications for probabilistic models of sentence comprehension</article-title>. <source>J. Mem. Lang.</source> <volume>116</volume>:<fpage>104174</fpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2020.104174</pub-id><pub-id pub-id-type="pmid">33100508</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brysbaert</surname> <given-names>M.</given-names></name> <name><surname>Buchmeier</surname> <given-names>M.</given-names></name> <name><surname>Conrad</surname> <given-names>M.</given-names></name> <name><surname>Jacobs</surname> <given-names>A. M.</given-names></name> <name><surname>B&#x000F6;lte</surname> <given-names>J.</given-names></name> <name><surname>B&#x000F6;hl</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German</article-title>. <source>Exp. Psychol.</source> <volume>58</volume>, <fpage>412</fpage>&#x02013;<lpage>424</lpage>. <pub-id pub-id-type="doi">10.1027/1618-3169/a000123</pub-id><pub-id pub-id-type="pmid">21768069</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>S. F.</given-names></name> <name><surname>Goodman</surname> <given-names>J.</given-names></name></person-group> (<year>1999</year>). <article-title>Empirical study of smoothing techniques for language modeling</article-title>. <source>Comp. Speech Lang.</source> <volume>13</volume>, <fpage>359</fpage>&#x02013;<lpage>394</lpage>. <pub-id pub-id-type="doi">10.1006/csla.1999.0128</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deerwester</surname> <given-names>S.</given-names></name> <name><surname>Dumais</surname> <given-names>S. T.</given-names></name> <name><surname>Furnas</surname> <given-names>G. W.</given-names></name> <name><surname>Landauer</surname> <given-names>T. K.</given-names></name> <name><surname>Harshman</surname> <given-names>R.</given-names></name></person-group> (<year>1990</year>). <article-title>Indexing by latent semantic analysis</article-title>. <source>J. Am. Soc. Inform. Sci.</source> <volume>41</volume>, <fpage>391</fpage>&#x02013;<lpage>407</lpage>. <pub-id pub-id-type="doi">10.1002/(SICI)1097-4571(199009)41:6&#x0003C;391::AID-ASI1&#x0003E;3.0.CO;2-9</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Demberg</surname> <given-names>V.</given-names></name> <name><surname>Keller</surname> <given-names>F.</given-names></name></person-group> (<year>2008</year>). <article-title>Data from eye-tracking corpora as evidence for theories of syntactic processing complexity</article-title>. <source>Cognition</source> <volume>109</volume>, <fpage>193</fpage>&#x02013;<lpage>210</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2008.07.008</pub-id><pub-id pub-id-type="pmid">18930455</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Duncan</surname> <given-names>O. D..</given-names></name></person-group> (<year>1975</year>). <source>Introduction to Structural Equation Models, 1st Edn.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Academic Press</publisher-name>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elman</surname> <given-names>J. L..</given-names></name></person-group> (<year>1990</year>). <article-title>Finding structure in time</article-title>. <source>Cogn. Sci.</source> <volume>14</volume>, <fpage>179</fpage>&#x02013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1207/s15516709cog1402_1</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elman</surname> <given-names>J. L..</given-names></name></person-group> (<year>2004</year>). <article-title>An alternative view of the mental lexicon</article-title>. <source>Trends Cogn. Sci.</source> <volume>8</volume>, <fpage>301</fpage>&#x02013;<lpage>306</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2004.05.003</pub-id><pub-id pub-id-type="pmid">15242689</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Engbert</surname> <given-names>R.</given-names></name> <name><surname>Nuthmann</surname> <given-names>A.</given-names></name> <name><surname>Richter</surname> <given-names>E. M.</given-names></name> <name><surname>Kliegl</surname> <given-names>R.</given-names></name></person-group> (<year>2005</year>). <article-title>SWIFT: a dynamical model of saccade generation during reading</article-title>. <source>Psychol. Rev.</source> <volume>112</volume>, <fpage>777</fpage>&#x02013;<lpage>813</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.112.4.777</pub-id><pub-id pub-id-type="pmid">16262468</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Engelmann</surname> <given-names>F.</given-names></name> <name><surname>Vasishth</surname> <given-names>S.</given-names></name> <name><surname>Engbert</surname> <given-names>R.</given-names></name> <name><surname>Kliegl</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>A framework for modeling the interaction of syntactic processing and eye movement control</article-title>. <source>Top. Cogn. Sci.</source> <volume>5</volume>, <fpage>452</fpage>&#x02013;<lpage>474</lpage>. <pub-id pub-id-type="doi">10.1111/tops.12026</pub-id><pub-id pub-id-type="pmid">23681560</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feigl</surname> <given-names>H..</given-names></name></person-group> (<year>1945</year>). <article-title>Rejoinders and second thoughts (Symposium on operationism)</article-title>. <source>Psychol. Rev.</source> <volume>52</volume>, <fpage>284</fpage>&#x02013;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1037/h0063275</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S..</given-names></name></person-group> (<year>2009</year>). <article-title>Surprisal-based comparison between a symbolic and a connectionist model of sentence processing,</article-title> in <source>Proceedings of the 31st Annual Conference of the Cognitive Science Society</source> (<publisher-loc>Amsterdam</publisher-loc>), <fpage>1139</fpage>&#x02013;<lpage>1144</lpage>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S. L.</given-names></name> <name><surname>Bod</surname> <given-names>R.</given-names></name></person-group> (<year>2011</year>). <article-title>Insensitivity of the human sentence-processing system to hierarchical structure</article-title>. <source>Psychol. Sci.</source> <volume>22</volume>, <fpage>829</fpage>&#x02013;<lpage>834</lpage>. <pub-id pub-id-type="doi">10.1177/0956797611409589</pub-id><pub-id pub-id-type="pmid">21586764</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friederici</surname> <given-names>A. D..</given-names></name></person-group> (<year>2002</year>). <article-title>Towards a neural basis of auditory sentence processing</article-title>. <source>Trends Cogn. Sci.</source> <volume>6</volume>, <fpage>78</fpage>&#x02013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1016/S1364-6613(00)01839-8</pub-id><pub-id pub-id-type="pmid">15866191</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frisson</surname> <given-names>S.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name> <name><surname>Pickering</surname> <given-names>M. J.</given-names></name></person-group> (<year>2005</year>). <article-title>Effects of contextual predictability and transitional probability on eye movements during reading</article-title>. <source>J. Exp. Psychol. Learn. Mem. Cogn.</source> <volume>31</volume>, <fpage>862</fpage>&#x02013;<lpage>877</lpage>. <pub-id pub-id-type="doi">10.1037/0278-7393.31.5.862</pub-id><pub-id pub-id-type="pmid">16248738</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goldhahn</surname> <given-names>D.</given-names></name> <name><surname>Eckart</surname> <given-names>T.</given-names></name> <name><surname>Quasthoff</surname> <given-names>U.</given-names></name></person-group> (<year>2012</year>). <article-title>Building large monolingual dictionaries at the leipzig corpora collection: from 100 to 200 languages,</article-title> in <source>Proceedings of the 8th International Conference on Language Resources and Evaluation</source> (<publisher-loc>Istanbul</publisher-loc>), <fpage>759</fpage>&#x02013;<lpage>765</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffiths</surname> <given-names>T. L.</given-names></name> <name><surname>Steyvers</surname> <given-names>M.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name></person-group> (<year>2007</year>). <article-title>Topics in semantic representation</article-title>. <source>Psychol. Rev.</source> <volume>114</volume>, <fpage>211</fpage>&#x02013;<lpage>244</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.114.2.211</pub-id><pub-id pub-id-type="pmid">17500626</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hastie</surname> <given-names>T.</given-names></name> <name><surname>Tibshirani</surname> <given-names>R.</given-names></name></person-group> (<year>1990</year>). <article-title>Exploring the nature of covariate effects in the proportional hazards model</article-title>. <source>Int. Biometr. Soc.</source> <volume>46</volume>, <fpage>1005</fpage>&#x02013;<lpage>1016</lpage>. <pub-id pub-id-type="doi">10.2307/2532444</pub-id><pub-id pub-id-type="pmid">1964808</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hempel</surname> <given-names>C. G.</given-names></name> <name><surname>Oppenheim</surname> <given-names>P.</given-names></name></person-group> (<year>1948</year>). <article-title>Studies on the logic of explanation</article-title>. <source>Philos. Sci.</source> <volume>15</volume>, <fpage>135</fpage>&#x02013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1086/286983</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hofmann</surname> <given-names>M. J.</given-names></name> <name><surname>Biemann</surname> <given-names>C.</given-names></name> <name><surname>Remus</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Benchmarking n-grams, topic models and recurrent neural networks by cloze completions, EEGs and eye movements,</article-title> in <source>Cognitive Approach to Natural Language Processing</source>, eds <person-group person-group-type="editor"><name><surname>Sharp</surname> <given-names>B.</given-names></name> <name><surname>Sedes</surname> <given-names>F.</given-names></name> <name><surname>Lubaszewsk</surname> <given-names>W.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>ISTE Press Ltd, Elsevier</publisher-name>), <fpage>197</fpage>&#x02013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1016/B978-1-78548-253-3.50010-X</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hofmann</surname> <given-names>M. J.</given-names></name> <name><surname>Biemann</surname> <given-names>C.</given-names></name> <name><surname>Westbury</surname> <given-names>C. F.</given-names></name> <name><surname>Murusidze</surname> <given-names>M.</given-names></name> <name><surname>Conrad</surname> <given-names>M.</given-names></name> <name><surname>Jacobs</surname> <given-names>A. M.</given-names></name></person-group> (<year>2018</year>). <article-title>Simple co-occurrence statistics reproducibly predict association ratings</article-title>. <source>Cogn. Sci.</source> <volume>42</volume>, <fpage>2287</fpage>&#x02013;<lpage>2312</lpage>. <pub-id pub-id-type="doi">10.1111/cogs.12662</pub-id><pub-id pub-id-type="pmid">30098213</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hofmann</surname> <given-names>M. J.</given-names></name> <name><surname>Kuchinke</surname> <given-names>L.</given-names></name> <name><surname>Biemann</surname> <given-names>C.</given-names></name> <name><surname>Tamm</surname> <given-names>S.</given-names></name> <name><surname>Jacobs</surname> <given-names>A. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Remembering words in context as predicted by an associative read-out model</article-title>. <source>Front. Psychol.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00252</pub-id><pub-id pub-id-type="pmid">22007183</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hofmann</surname> <given-names>M. J.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>L.</given-names></name> <name><surname>R&#x000F6;lke</surname> <given-names>A.</given-names></name> <name><surname>Radach</surname> <given-names>R.</given-names></name> <name><surname>Biemann</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Individual corpora predict fast memory retrieval during reading,</article-title> in <source>Proceedings of the 6th Workshop on Cognitive Aspects of the Lexicon (CogALex-VI)</source> (<publisher-loc>Barcelona</publisher-loc>).</citation>
</ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Inhoff</surname> <given-names>A. W.</given-names></name> <name><surname>Radach</surname> <given-names>R.</given-names></name></person-group> (<year>1998</year>). <article-title>Definition and computation of oculomotor measures in the study of cognitive processes,</article-title> in <source>Eye Guidance in Reading and Scene Perception</source> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Elsevier Science Ltd</publisher-name>), <fpage>29</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1016/B978-008043361-5/50003-1</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Just</surname> <given-names>M. A.</given-names></name> <name><surname>Carpenter</surname> <given-names>P. A.</given-names></name></person-group> (<year>1984</year>). <article-title>Using eye fixations to study reading comprehension,</article-title> in <source>New Methods in Reading Comprehension Research</source>, eds <person-group person-group-type="editor"><name><surname>Kieras</surname> <given-names>D. E.</given-names></name> <name><surname>Just</surname> <given-names>M. A.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Erlbaum</publisher-name>), <fpage>151</fpage>&#x02013;<lpage>169</lpage>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kennedy</surname> <given-names>A.</given-names></name> <name><surname>Pynte</surname> <given-names>J.</given-names></name> <name><surname>Murray</surname> <given-names>W. S.</given-names></name> <name><surname>Paul</surname> <given-names>S. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Frequency and predictability effects in the dundee corpus: an eye movement analysis</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>66</volume>, <fpage>601</fpage>&#x02013;<lpage>618</lpage>. <pub-id pub-id-type="doi">10.1080/17470218.2012.676054</pub-id><pub-id pub-id-type="pmid">22643118</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kintsch</surname> <given-names>W.</given-names></name> <name><surname>Mangalath</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>The construction of meaning</article-title>. <source>Top. Cogn. Sci.</source> <volume>3</volume>, <fpage>346</fpage>&#x02013;<lpage>370</lpage>. <pub-id pub-id-type="doi">10.1111/j.1756-8765.2010.01107.x</pub-id><pub-id pub-id-type="pmid">25164299</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kliegl</surname> <given-names>R.</given-names></name> <name><surname>Grabner</surname> <given-names>E.</given-names></name> <name><surname>Rolfs</surname> <given-names>M.</given-names></name> <name><surname>Engbert</surname> <given-names>R.</given-names></name></person-group> (<year>2004</year>). <article-title>Length, frequency, and predictability effects of words on eye movements in reading</article-title>. <source>Euro. J. Cogn. Psychol.</source> <volume>16</volume>, <fpage>262</fpage>&#x02013;<lpage>284</lpage>. <pub-id pub-id-type="doi">10.1080/09541440340000213</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kliegl</surname> <given-names>R.</given-names></name> <name><surname>Nuthmann</surname> <given-names>A.</given-names></name> <name><surname>Engbert</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>Tracking the mind during reading: the influence of past, present, and future words on fixation durations</article-title>. <source>J. Exp. Psychol. Gen.</source> <volume>135</volume>, <fpage>12</fpage>&#x02013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1037/0096-3445.135.1.12</pub-id><pub-id pub-id-type="pmid">16478314</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kneser</surname> <given-names>R.</given-names></name> <name><surname>Ney</surname> <given-names>H.</given-names></name></person-group> (<year>1995</year>). <article-title>Improved backing-off for m-gram language modeling,</article-title> in <source>Proceeding IEEE International Conference on Acoustics, Speech and Signal Processing</source> (<publisher-loc>Detroit, MI</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>181</fpage>&#x02013;<lpage>184</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kutas</surname> <given-names>M.</given-names></name> <name><surname>Federmeier</surname> <given-names>K. D.</given-names></name></person-group> (<year>2011</year>). <article-title>Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP)</article-title>. <source>Annu. Rev. Psychol.</source> <volume>62</volume>, <fpage>621</fpage>&#x02013;<lpage>647</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.psych.093008.131123</pub-id><pub-id pub-id-type="pmid">20809790</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Landauer</surname> <given-names>T. K.</given-names></name> <name><surname>Dumais</surname> <given-names>S. T.</given-names></name></person-group> (<year>1997</year>). <article-title>A solution to platos problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge</article-title>. <source>Psychol. Rev.</source> <volume>104</volume>, <fpage>211</fpage>&#x02013;<lpage>240</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.104.2.211</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>436</fpage>&#x02013;<lpage>444</lpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id><pub-id pub-id-type="pmid">26017442</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopukhina</surname> <given-names>A.</given-names></name> <name><surname>Lopukhin</surname> <given-names>K.</given-names></name> <name><surname>Laurinavichyute</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments</article-title>. <source>PLoS ONE</source> <volume>16</volume>:<fpage>e246133</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0246133</pub-id><pub-id pub-id-type="pmid">33508029</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luke</surname> <given-names>S. G.</given-names></name> <name><surname>Christianson</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>Limits on lexical prediction during reading</article-title>. <source>Cogn. Psychol.</source> <volume>88</volume>, <fpage>22</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogpsych.2016.06.002</pub-id><pub-id pub-id-type="pmid">27376659</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mandera</surname> <given-names>P.</given-names></name> <name><surname>Keuleers</surname> <given-names>E.</given-names></name> <name><surname>Brysbaert</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation</article-title>. <source>J. Mem. Lang.</source> <volume>92</volume>, <fpage>57</fpage>&#x02013;<lpage>78</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2016.04.001</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Manning</surname> <given-names>C. D.</given-names></name> <name><surname>Sch&#x000FC;tze</surname> <given-names>H.</given-names></name></person-group> (<year>1999</year>). <source>Foundations of Statistical Natural Language Processing</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>The MIT Press</publisher-name>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McClelland</surname> <given-names>J. L.</given-names></name> <name><surname>Rogers</surname> <given-names>T. T.</given-names></name></person-group> (<year>2003</year>). <article-title>The parallel distributed processing approach to semantic cognition</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>4</volume>, <fpage>310</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1038/nrn1076</pub-id><pub-id pub-id-type="pmid">12671647</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McDonald</surname> <given-names>S. A.</given-names></name> <name><surname>Shillcock</surname> <given-names>R. C.</given-names></name></person-group> (<year>2003a</year>). <article-title>Eye movements reveal the on-line computation of lexical probabilities during reading</article-title>. <source>Psychol. Sci.</source> <volume>14</volume>, <fpage>648</fpage>&#x02013;<lpage>652</lpage>. <pub-id pub-id-type="doi">10.1046/j.0956-7976.2003.psci_1480.x</pub-id><pub-id pub-id-type="pmid">14629701</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McDonald</surname> <given-names>S. A.</given-names></name> <name><surname>Shillcock</surname> <given-names>R. C.</given-names></name></person-group> (<year>2003b</year>). <article-title>Low-level predictive inference in reading: the influence of transitional probabilities on eye movements</article-title>. <source>Vision Res.</source> <volume>43</volume>, <fpage>1735</fpage>&#x02013;<lpage>1751</lpage>. <pub-id pub-id-type="doi">10.1016/S0042-6989(03)00237-2</pub-id><pub-id pub-id-type="pmid">12818344</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T..</given-names></name></person-group> (<year>2012</year>). <source>Statistical Language Models Based on Neural Networks.</source> (PhD thesis), Miyazaki: Brno University of Technology, Brno (Czechia).</citation>
</ref>
<ref id="B51">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Corrado</surname> <given-names>G.</given-names></name> <name><surname>Dean</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <source>Efficient Estimation of Word Representations in Vector Space</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1301.3781">https://arxiv.org/abs/1301.3781</ext-link>.</citation>
</ref>
<ref id="B52">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T.</given-names></name> <name><surname>Grave</surname> <given-names>&#x000C9;.</given-names></name> <name><surname>Bojanowski</surname> <given-names>P.</given-names></name> <name><surname>Puhrsch</surname> <given-names>C.</given-names></name> <name><surname>Joulin</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Advances in pre-training distributed word representations,</article-title> in <source>Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)</source>.</citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>New</surname> <given-names>B.</given-names></name> <name><surname>Ferrand</surname> <given-names>L.</given-names></name> <name><surname>Pallier</surname> <given-names>C.</given-names></name> <name><surname>Brysbaert</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>Reexamining the word length effect in visual word recognition: new evidence from the english lexicon project</article-title>. <source>Psychon. Bull. Rev.</source> <volume>13</volume>, <fpage>45</fpage>&#x02013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.3758/BF03193811</pub-id><pub-id pub-id-type="pmid">16724767</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nuthmann</surname> <given-names>A.</given-names></name> <name><surname>Engbert</surname> <given-names>R.</given-names></name> <name><surname>Kliegl</surname> <given-names>R.</given-names></name></person-group> (<year>2005</year>). <article-title>Mislocated fixations during reading and the inverted optimal viewing position effect</article-title>. <source>Vision Res.</source> <volume>45</volume>, <fpage>2201</fpage>&#x02013;<lpage>2217</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2005.02.014</pub-id><pub-id pub-id-type="pmid">15924936</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>J. K. Y.</given-names></name> <name><surname>Kliegl</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Conditional co-occurrence probability acts like frequency in predicting fixation durations</article-title>. <source>J. Eye Mov. Res.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.16910/jemr.2.1.3</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Regan</surname> <given-names>J. K.</given-names></name> <name><surname>Jacobs</surname> <given-names>A. M.</given-names></name></person-group> (<year>1992</year>). <article-title>Optimal viewing position effect in word recognition: a challenge to current theory</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform.</source> <volume>18</volume>, <fpage>185</fpage>&#x02013;<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1037/0096-1523.18.1.185</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pad&#x000F3;</surname> <given-names>S.</given-names></name> <name><surname>Lapata</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Dependency-based construction of semantic space models</article-title>. <source>Comput. Lingu.</source> <volume>33</volume>, <fpage>161</fpage>&#x02013;<lpage>199</lpage>. <pub-id pub-id-type="doi">10.1162/coli.2007.33.2.161</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paller</surname> <given-names>K. A.</given-names></name> <name><surname>Wagner</surname> <given-names>A. D.</given-names></name></person-group> (<year>2002</year>). <article-title>Observing the transformation of experience into memory</article-title>. <source>Trends Cogn. Sci.</source> <volume>6</volume>, <fpage>93</fpage>&#x02013;<lpage>102</lpage>. <pub-id pub-id-type="doi">10.1016/S1364-6613(00)01845-3</pub-id><pub-id pub-id-type="pmid">15866193</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pauls</surname> <given-names>A.</given-names></name> <name><surname>Klein</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <article-title>Faster and smaller n-gram language models,</article-title> in <source>ACL-HLT 2011&#x02013;Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1</source>, <publisher-loc>Portland, OR</publisher-loc>, <fpage>258</fpage>&#x02013;<lpage>267</lpage>.</citation>
</ref>
<ref id="B60">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Phan</surname> <given-names>X.-H.</given-names></name> <name><surname>Nguyen</surname> <given-names>C.-T.</given-names></name></person-group> (<year>2007</year>). <source>GibbsLDA&#x0002B;&#x0002B;: A C/C&#x0002B;&#x0002B; Implementation of Latent Dirichlet Allocation (LDA)</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://gibbslda.sourceforge.net">http://gibbslda.sourceforge.net</ext-link></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pynte</surname> <given-names>J.</given-names></name> <name><surname>New</surname> <given-names>B.</given-names></name> <name><surname>Kennedy</surname> <given-names>A.</given-names></name></person-group> (<year>2008a</year>). <article-title>A multiple regression analysis of syntactic and semantic influences in reading normal text</article-title>. <source>J. Eye Mov. Res.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.16910/jemr.2.1.4</pub-id><pub-id pub-id-type="pmid">18701125</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pynte</surname> <given-names>J.</given-names></name> <name><surname>New</surname> <given-names>B.</given-names></name> <name><surname>Kennedy</surname> <given-names>A.</given-names></name></person-group> (<year>2008b</year>). <article-title>On-line contextual influences during reading normal text: a multiple-regression analysis</article-title>. <source>Vision Res.</source> <volume>48</volume>, <fpage>2172</fpage>&#x02013;<lpage>2183</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2008.02.004</pub-id><pub-id pub-id-type="pmid">18701125</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Radach</surname> <given-names>R.</given-names></name> <name><surname>Inhoff</surname> <given-names>A. W.</given-names></name> <name><surname>Glover</surname> <given-names>L.</given-names></name> <name><surname>Vorstius</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Contextual constraint and N &#x0002B; 2 preview effects in reading</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>66</volume>, <fpage>619</fpage>&#x02013;<lpage>633</lpage>. <pub-id pub-id-type="doi">10.1080/17470218.2012.761256</pub-id><pub-id pub-id-type="pmid">23394582</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Radach</surname> <given-names>R.</given-names></name> <name><surname>Kennedy</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Eye movements in reading: some theoretical context</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>66</volume>, <fpage>429</fpage>&#x02013;<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1080/17470218.2012.750676</pub-id><pub-id pub-id-type="pmid">23289943</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rayner</surname> <given-names>K..</given-names></name></person-group> (<year>1998</year>). <article-title>Eye movements in reading and information processing: 20 years of research</article-title>. <source>Psychol. Bull.</source> <volume>124</volume>, <fpage>372</fpage>&#x02013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.1037/0033-2909.124.3.372</pub-id><pub-id pub-id-type="pmid">9849112</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reichle</surname> <given-names>E. D.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name> <name><surname>Pollatsek</surname> <given-names>A.</given-names></name></person-group> (<year>2003</year>). <article-title>The E-Z reader model of eye-movement control in reading: comparisons to other models</article-title>. <source>Behav. Brain Sci.</source> <volume>26</volume>, <fpage>445</fpage>&#x02013;<lpage>476</lpage>. <pub-id pub-id-type="doi">10.1017/S0140525X03000104</pub-id><pub-id pub-id-type="pmid">15067951</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reilly</surname> <given-names>R. G.</given-names></name> <name><surname>Radach</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>Some empirical tests of an interactive activation model of eye movement control in reading</article-title>. <source>Cogn. Syst. Res.</source> <volume>7</volume>, <fpage>34</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogsys.2005.07.006</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schilling</surname> <given-names>H. H.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name> <name><surname>Chumbley</surname> <given-names>J. I.</given-names></name></person-group> (<year>1998</year>). <article-title>Comparing naming, lexical decision, and eye fixation times: word frequency effects and individual differences</article-title>. <source>Mem. Cogn.</source> <volume>26</volume>, <fpage>1270</fpage>&#x02013;<lpage>1281</lpage>. <pub-id pub-id-type="doi">10.3758/BF03201199</pub-id><pub-id pub-id-type="pmid">9847550</pub-id></citation></ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seidenberg</surname> <given-names>M. S.</given-names></name> <name><surname>McClelland</surname> <given-names>J. L.</given-names></name></person-group> (<year>1989</year>). <article-title>A distributed, developmental model of word recognition and naming</article-title>. <source>Psychol. Rev.</source> <volume>96</volume>, <fpage>523</fpage>&#x02013;<lpage>568</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.96.4.523</pub-id><pub-id pub-id-type="pmid">2798649</pub-id></citation></ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sereno</surname> <given-names>S. C.</given-names></name> <name><surname>Pacht</surname> <given-names>J. M.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name></person-group> (<year>1992</year>). <article-title>The effect of meaning frequency on processing lexically ambiguous words: evidence from eye fixations</article-title>. <source>Psychol. Sci.</source> <volume>3</volume>, <fpage>296</fpage>&#x02013;<lpage>300</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9280.1992.tb00676.x</pub-id></citation>
</ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shaoul</surname> <given-names>C.</given-names></name> <name><surname>Baayen</surname> <given-names>R. H.</given-names></name> <name><surname>Westbury</surname> <given-names>C. F.</given-names></name></person-group> (<year>2014</year>). <article-title>N -gram probability effects in a cloze task</article-title>. <source>Ment. Lex.</source> <volume>9</volume>, <fpage>437</fpage>&#x02013;<lpage>472</lpage>. <pub-id pub-id-type="doi">10.1075/ml.9.3.04sha</pub-id></citation>
</ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>N. J.</given-names></name> <name><surname>Levy</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>The effect of word predictability on reading time is logarithmic</article-title>. <source>Cognition</source> <volume>128</volume>, <fpage>302</fpage>&#x02013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2013.02.013</pub-id><pub-id pub-id-type="pmid">23747651</pub-id></citation></ref>
<ref id="B73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Snell</surname> <given-names>J.</given-names></name> <name><surname>van Leipsig</surname> <given-names>S.</given-names></name> <name><surname>Grainger</surname> <given-names>J.</given-names></name> <name><surname>Meeter</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>OB1-reader: a model of word recognition and eye movements in text reading</article-title>. <source>Psychol. Rev.</source> <volume>125</volume>, <fpage>969</fpage>&#x02013;<lpage>984</lpage>. <pub-id pub-id-type="doi">10.1037/rev0000119</pub-id><pub-id pub-id-type="pmid">30080066</pub-id></citation></ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spieler</surname> <given-names>D. H.</given-names></name> <name><surname>Balota</surname> <given-names>D.</given-names></name></person-group> (<year>1997</year>). <article-title>Bringing computational models of word naming down to the item level</article-title>. <source>Psychol. Sci.</source> <volume>8</volume>, <fpage>411</fpage>&#x02013;<lpage>416</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9280.1997.tb00453.x</pub-id></citation>
</ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Staub</surname> <given-names>A..</given-names></name></person-group> (<year>2015</year>). <article-title>The effect of lexical predictability on eye movements in reading: critical review and theoretical interpretation</article-title>. <source>Lang. Linguist. Compass</source> <volume>9</volume>, <fpage>311</fpage>&#x02013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1111/lnc3.12151</pub-id></citation>
</ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Staub</surname> <given-names>A.</given-names></name> <name><surname>Grant</surname> <given-names>M.</given-names></name> <name><surname>Astheimer</surname> <given-names>L.</given-names></name> <name><surname>Cohen</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>The influence of cloze probability and item constraint on cloze task response time</article-title>. <source>J. Mem. Lang.</source> <volume>82</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2015.02.004</pub-id></citation>
</ref>
<ref id="B77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>W. L..</given-names></name></person-group> (<year>1953</year>). <article-title>&#x0201C;Cloze&#x0201D; procedure: A new tool for measuring readability</article-title>. <source>J. Q.</source> <volume>30</volume>, <fpage>415</fpage>&#x02013;<lpage>433</lpage>. <pub-id pub-id-type="doi">10.1177/107769905303000401</pub-id></citation>
</ref>
<ref id="B78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vitu</surname> <given-names>F.</given-names></name> <name><surname>McConkie</surname> <given-names>G. W.</given-names></name> <name><surname>Kerr</surname> <given-names>P.</given-names></name> <name><surname>O&#x00027;Regan</surname> <given-names>J. K.</given-names></name></person-group> (<year>2001</year>). <article-title>Fixation location effects on fixation durations during reading: an inverted optimal viewing position effect</article-title>. <source>Vision Res.</source> <volume>41</volume>, <fpage>3513</fpage>&#x02013;<lpage>3533</lpage>. <pub-id pub-id-type="doi">10.1016/S0042-6989(01)00166-3</pub-id><pub-id pub-id-type="pmid">11718792</pub-id></citation></ref>
<ref id="B79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wagenmakers</surname> <given-names>E.-J.</given-names></name> <name><surname>Wetzels</surname> <given-names>R.</given-names></name> <name><surname>Borsboom</surname> <given-names>D.</given-names></name> <name><surname>van der Maas</surname> <given-names>H. L. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011)</article-title>. <source>J. Pers. Soc. Psychol.</source> <volume>100</volume>, <fpage>426</fpage>&#x02013;<lpage>432</lpage>. <pub-id pub-id-type="doi">10.1037/a0022790</pub-id><pub-id pub-id-type="pmid">21280965</pub-id></citation></ref>
<ref id="B80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Pomplun</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>M.</given-names></name> <name><surname>Ko</surname> <given-names>H.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name></person-group> (<year>2010</year>). <article-title>Estimating the effect of word predictability on eye movements in Chinese reading using latent semantic analysis and transitional probability</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>63</volume>, <fpage>37</fpage>&#x02013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1080/17470210903380814</pub-id><pub-id pub-id-type="pmid">19998069</pub-id></citation></ref>
<ref id="B81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Westbury</surname> <given-names>C..</given-names></name></person-group> (<year>2016</year>). <article-title>Pay no attention to that man behind the curtain</article-title>. <source>Ment. Lex.</source> <volume>11</volume>, <fpage>350</fpage>&#x02013;<lpage>374</lpage>. <pub-id pub-id-type="doi">10.1075/ml.11.3.02wes</pub-id><pub-id pub-id-type="pmid">32552478</pub-id></citation></ref>
<ref id="B82">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Wilcox</surname> <given-names>E. G.</given-names></name> <name><surname>Gauthier</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Qian</surname> <given-names>P.</given-names></name> <name><surname>Levy</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <source>On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2006.01912">https://arxiv.org/abs/2006.01912</ext-link></citation>
</ref>
<ref id="B83">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wood</surname> <given-names>S. N..</given-names></name></person-group> (<year>2017</year>). <source>Generalized Additive Models: An Introduction With R.</source> <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>CRC press</publisher-name>. <pub-id pub-id-type="doi">10.1201/9781315370279</pub-id></citation>
</ref>
<ref id="B84">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Rincon</surname> <given-names>D.</given-names></name> <name><surname>Gu</surname> <given-names>Q.</given-names></name> <name><surname>Christofides</surname> <given-names>P. D.</given-names></name></person-group> (<year>2021</year>). <article-title>Statistical machine learning in model predictive control of nonlinear processes</article-title>. <source>Mathematics</source> <volume>9</volume>, <fpage>1</fpage>&#x02013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.3390/math9161912</pub-id><pub-id pub-id-type="pmid">22336388</pub-id></citation></ref>
</ref-list>

<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="http://www.opensubtitles.org">www.opensubtitles.org</ext-link></p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://code.google.com/p/berkeleylm/">https://code.google.com/p/berkeleylm/</ext-link></p></fn>
<fn id="fn0003"><p><sup>3</sup><ext-link ext-link-type="uri" xlink:href="http://gibbslda.sourceforge.net/">http://gibbslda.sourceforge.net/</ext-link></p></fn>
<fn id="fn0004"><p><sup>4</sup><ext-link ext-link-type="uri" xlink:href="https://github.com/yandex/faster-rnnlm">https://github.com/yandex/faster-rnnlm</ext-link></p></fn>
<fn id="fn0005"><p><sup>5</sup><ext-link ext-link-type="uri" xlink:href="https://clarinoai.informatik.uni-leipzig.de/fedora/objects/mrr:11022000000001F2FB/datastreams/EngelmannVasishthEngbertKliegl2013_1.0/content">https://clarinoai.informatik.uni-leipzig.de/fedora/objects/mrr:11022000000001F2FB/datastreams/EngelmannVasishthEngbertKliegl2013_1.0/content</ext-link></p></fn>
<fn id="fn0006"><p><sup>6</sup><ext-link ext-link-type="uri" xlink:href="http://www.corpora.uni-leipzig.de/en?corpusId=deu_newscrawl-public_201">http://www.corpora.uni-leipzig.de/en?corpusId=deu_newscrawl-public_201</ext-link></p></fn>
</fn-group>

</back>
</article>