<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Syst. Neurosci.</journal-id>
<journal-title>Frontiers in Systems Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Syst. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5137</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnsys.2022.865453</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Spatiotemporal Signatures of Surprise Captured by Magnetoencephalography</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Mousavi</surname> <given-names>Zahra</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1576423/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Kiani</surname> <given-names>Mohammad Mahdi</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1392197/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Aghajan</surname> <given-names>Hamid</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1576913/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Electrical Engineering, Sharif University of Technology</institution>, <addr-line>Tehran</addr-line>, <country>Iran</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Iiro P. J&#x00E4;&#x00E4;skel&#x00E4;inen, Aalto University, Finland</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Kaisu Lankinen, Massachusetts General Hospital and Harvard Medical School, United States; Seppo P. Ahlfors, Massachusetts General Hospital and Harvard Medical School, United States</p></fn>
<corresp id="c001">&#x002A;Correspondence: Hamid Aghajan, <email>aghajan@ee.sharif.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>06</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>16</volume>
<elocation-id>865453</elocation-id>
<history>
<date date-type="received">
<day>29</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>05</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Mousavi, Kiani and Aghajan.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Mousavi, Kiani and Aghajan</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Surprise and social influence are linked through several neuropsychological mechanisms. By garnering attention, causing arousal, and motivating engagement, surprise provides a context for effective or durable social influence. Attention to a surprising event motivates the formation of an explanation or updating of models, while high arousal experiences due to surprise promote memory formation. They both encourage engagement with the surprising event through efforts aimed at understanding the situation. By affecting the behavior of the individual or a social group <italic>via</italic> setting an attractive engagement context, surprise plays an important role in shaping personal and social change. Surprise is an outcome of the brain&#x2019;s function in constantly anticipating the future of sensory inputs based on past experiences. When new sensory data is different from the brain&#x2019;s predictions shaped by recent trends, distinct neural signals are generated to report this surprise. As a quantitative approach to modeling the generation of brain surprise, input stimuli containing surprising elements are employed in experiments such as oddball tasks during which brain activity is recorded. Although surprise has been well characterized in many studies, an information-theoretical model to describe and predict the surprise level of an external stimulus in the recorded MEG data has not been reported to date, and setting forth such a model is the main objective of this paper. Through mining trial-by-trial MEG data in an oddball task according to theoretical definitions of surprise, the proposed surprise decoding model employs the entire epoch of the brain response to a stimulus to measure surprise and assesses which collection of temporal/spatial components in the recorded data can provide optimal power for describing the brain&#x2019;s surprise. We considered three different theoretical formulations for surprise assuming the brain acts as an ideal observer that calculates transition probabilities to estimate the generative distribution of the input. We found that middle temporal components and the right and left fronto-central regions offer the strongest power for decoding surprise. Our findings provide a practical and rigorous method for measuring the brain&#x2019;s surprise, which can be employed in conjunction with behavioral data to evaluate the interactive and social effects of surprising events.</p>
</abstract>
<kwd-group>
<kwd>brain surprise</kwd>
<kwd>shift in belief</kwd>
<kwd>surprise decoder</kwd>
<kwd>oddball task</kwd>
<kwd>magnetoencephalography</kwd>
<kwd>decoding power</kwd>
<kwd>temporal/spatial MEG components</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="4"/>
<equation-count count="4"/>
<ref-count count="93"/>
<page-count count="15"/>
<word-count count="11316"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>The predictive coding framework (<xref ref-type="bibr" rid="B72">Rao and Ballard, 1999</xref>) postulates that the brain is constantly predicting its incoming sensory input. Past inputs are used by the brain to form prior knowledge while receiving the most recent input leads to updating of this belief in the Bayesian brain model (<xref ref-type="bibr" rid="B23">Friston, 2005</xref>; <xref ref-type="bibr" rid="B20">Doya et al., 2007</xref>; for a review see <xref ref-type="bibr" rid="B47">Kok and de Lange, 2015</xref>). An input different from what the brain has predicted will be surprising in that it generates a form of response measurable by brain imaging techniques. This surprise (or prediction error) has been quantified in the literature based on the expectation of a near-optimal observer who attempts to estimate the generative distribution of the input (<xref ref-type="bibr" rid="B83">Shannon, 1948</xref>; <xref ref-type="bibr" rid="B3">Baldi, 2002</xref>; <xref ref-type="bibr" rid="B22">Faraji et al., 2018</xref>). In addition, the quantified surprise has been widely shown to be reflected in the brain response, especially in the components of Event-Related Potentials (ERP) (<xref ref-type="bibr" rid="B46">Knill and Pouget, 2004</xref>; <xref ref-type="bibr" rid="B85">Strange et al., 2005</xref>; <xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B24">Friston, 2009</xref>; <xref ref-type="bibr" rid="B44">Itti and Baldi, 2009</xref>; <xref ref-type="bibr" rid="B4">Baldi and Itti, 2010</xref>; <xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>; <xref ref-type="bibr" rid="B82">Seer et al., 2016</xref>; <xref ref-type="bibr" rid="B60">Modirshanechi et al., 2019</xref>; <xref ref-type="bibr" rid="B61">Musiolek et al., 2019</xref>). These studies underscore the importance and suitability of surprise to describe the neural activity in an uncertain environment.</p>
<p>A strong link exists between the concept of the brain&#x2019;s surprise and social influence. Generation of a surprise signal by the brain instigates other functions which lead to eliciting the attention of the individual and influencing the course of cognitive processes involved in perception, memory formation, decision making, and engagement with the situation. Surprising events lead to engagement with the prevailing event through mechanisms such as attention and arousal (<xref ref-type="bibr" rid="B76">Russell and Barrett, 1999</xref>). When expectations about the sequence of events in a given context are violated, elevated attention levels are called for by the brain in order to find an explanation for the error. Surprising events hence attract attention and can lead to engagement with the source of surprise (<xref ref-type="bibr" rid="B79">Sch&#x00FC;tzwohl, 1998</xref>; <xref ref-type="bibr" rid="B40">Horstmann, 2002</xref>; <xref ref-type="bibr" rid="B44">Itti and Baldi, 2009</xref>). The occurrence of surprise means that the brain&#x2019;s model of the current event could not predict the particular instance recently observed and thus the model may need to be adjusted to make better predictions. Therefore, surprise changes what is believed and can hence influence its recipient by shaping both their perception and future behavior (<xref ref-type="bibr" rid="B71">Petty and Cacioppo, 1986</xref>; <xref ref-type="bibr" rid="B52">Loewenstein, 2019</xref>).</p>
<p>In addition, surprise is connected to high arousal experiences (<xref ref-type="bibr" rid="B76">Russell and Barrett, 1999</xref>). Efforts by the brain aimed at making sense of the situation promote memory for the event (<xref ref-type="bibr" rid="B11">Bradley et al., 1992</xref>). Another point is that people tend to share surprising contents with each other, rendering surprise to have the potential for large-scale social impact (<xref ref-type="bibr" rid="B38">Heath et al., 2001</xref>). Through setting an attractive engagement context, surprise influences the behavior of the individual or a social group and plays an important role in promoting personal and social change.</p>
<p>Studying the characteristics of surprise plays an important role in understanding how the mechanisms of attention and arousal, learning and memory formation, and decision to engage are formed in the brain. A remarkable observation is that the unpredictability of an instance in a sequence of stimuli which leads to a high value of surprise produces distinct brain signals in the process of eliciting the attention of the observer (<xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B29">Garrido et al., 2016</xref>; <xref ref-type="bibr" rid="B75">Rubin et al., 2016</xref>; <xref ref-type="bibr" rid="B82">Seer et al., 2016</xref>). In this context, surprise is often represented by a parameter that the brain attempts to minimize during the process of learning and perceptual inference (<xref ref-type="bibr" rid="B77">Schmidhuber, 2010</xref>; <xref ref-type="bibr" rid="B73">Roesch et al., 2012</xref>; <xref ref-type="bibr" rid="B27">Friston and Frith, 2015</xref>; <xref ref-type="bibr" rid="B26">Friston et al., 2017</xref>; <xref ref-type="bibr" rid="B22">Faraji et al., 2018</xref>).</p>
<p>In a recent study, it was discussed that surprise minimization not only plays a key role in the cognitive processes of a single agent, but also can be considered efficaciously in multi-agent frameworks to describe social phenomena like cooperation and social decision-making as well as explain the emergence of social rules for two agents (<xref ref-type="bibr" rid="B37">Hartwig and Peters, 2020</xref>). Importantly, <xref ref-type="bibr" rid="B81">Schwartenbeck et al. (2015)</xref> showed that in a simple binary choice setup, a surprise minimization paradigm could explain decision making better than utility maximization. In the context of predictive coding, the brain tries to avoid surprise to prevent stress, which can in long-term lead to heart disease, depression, and type 2 diabetes (<xref ref-type="bibr" rid="B70">Peters et al., 2017</xref>).</p>
<p>Shannon surprise (<xref ref-type="bibr" rid="B83">Shannon, 1948</xref>) has been widely used as a measure for quantifying surprise based on the likelihood of the data (<xref ref-type="bibr" rid="B85">Strange et al., 2005</xref>; <xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B49">Kolossa et al., 2015</xref>; <xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>; <xref ref-type="bibr" rid="B75">Rubin et al., 2016</xref>; <xref ref-type="bibr" rid="B82">Seer et al., 2016</xref>; <xref ref-type="bibr" rid="B60">Modirshanechi et al., 2019</xref>). The more &#x201C;unlikely&#x201D; an input is, the more the value of its corresponding Shannon surprise will be. The Bayesian surprise differentiates the estimated generative distribution of the received stimuli before and after the arrival of each input. Therefore, it quantifies how the belief about the distribution of the input is &#x201C;updated&#x201D; or &#x201C;shifted&#x201D; after receiving each stimulus. This concept of surprise was introduced by <xref ref-type="bibr" rid="B3">Baldi (2002)</xref> and has been used thereafter by many researchers (<xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B44">Itti and Baldi, 2009</xref>; <xref ref-type="bibr" rid="B4">Baldi and Itti, 2010</xref>; <xref ref-type="bibr" rid="B82">Seer et al., 2016</xref>; <xref ref-type="bibr" rid="B61">Musiolek et al., 2019</xref>). <xref ref-type="bibr" rid="B22">Faraji et al. (2018)</xref> introduced an alternative quantification of surprise, named the confidence-corrected surprise, which reflects the &#x201C;unexpectedness&#x201D; (not unlikeliness) of the input by differentiating the estimated posterior distribution of the input with that of a na&#x00EF;ve observer (who bases his model on the most recent input and a uniform prior) using the Kullback&#x2013;Leibler (KL) divergence (<xref ref-type="bibr" rid="B50">Kullback, 1997</xref>; <xref ref-type="bibr" rid="B15">Cover, 1999</xref>).</p>
<p>Temporal components of MEG (Magnetoencephalography) records that represent surprise have not been as much investigated as EEG (Electroencephalography) data. Nevertheless, some studies have focused on how the violation of an expected event in a sequence of stimuli is reflected in the MEG response (<xref ref-type="bibr" rid="B13">Chait et al., 2007</xref>; <xref ref-type="bibr" rid="B89">Todorovic et al., 2011</xref>; <xref ref-type="bibr" rid="B92">Wacongne et al., 2011</xref>; <xref ref-type="bibr" rid="B88">Todorovic and de Lange, 2012</xref>; <xref ref-type="bibr" rid="B86">Strauss et al., 2015</xref>; <xref ref-type="bibr" rid="B5">Barascud et al., 2016</xref>; <xref ref-type="bibr" rid="B39">Heilbron and Chait, 2018</xref>). These studies include reports on the observation of mismatch components in the brain&#x2019;s MEG response to unpredicted stimuli or novelty.</p>
<p>Previous surprise modeling studies mainly base their conclusion on a single component extracted from the EEG data, with the MMN (mismatch negativity) (<xref ref-type="bibr" rid="B28">Garrido et al., 2009</xref>; <xref ref-type="bibr" rid="B51">Lieder et al., 2013</xref>) or the P300 (<xref ref-type="bibr" rid="B84">Squires et al., 1976</xref>; <xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B48">Kolossa et al., 2013</xref>) or both (<xref ref-type="bibr" rid="B67">Ostwald et al., 2012</xref>) serving as the main components revealing the occurrence of surprise. Abnormal values in these components have also been proposed as biomarkers for cognitive disorders such as Schizophrenia and Alzheimer&#x2019;s disease (<xref ref-type="bibr" rid="B64">Nieuwenhuis et al., 2005</xref>; <xref ref-type="bibr" rid="B68">Patel et al., 2005</xref>; <xref ref-type="bibr" rid="B6">Barcelo et al., 2006</xref>; <xref ref-type="bibr" rid="B21">Duncan et al., 2009</xref>), reflecting their importance not only in understanding the behavior of the normal brain in handling surprise, but also in the detection of a number of brain disorders. While such single component analysis simplifies the ensuing effort to develop an encoder or a decoder for the brain surprise, it ignores the possible contribution of other temporal components corresponding to different post-stimulus latencies.</p>
<p>Recent studies have proposed models using the entire temporal signals for decoding Shannon surprise (<xref ref-type="bibr" rid="B54">Maheu et al., 2019</xref>; <xref ref-type="bibr" rid="B60">Modirshanechi et al., 2019</xref>; <xref ref-type="bibr" rid="B31">Gijsen et al., 2021</xref>), assuming that the entire epoch of the response might be modulated by the statistical properties of the input sequence. We will take a similar approach in this paper and mine trial-by-trial MEG data to analyze how the entire epoch of the brain response reflects the prediction error and which collection of the temporal/spatial components provide optimal power for describing the brain&#x2019;s surprise.</p>
<p>In a study by <xref ref-type="bibr" rid="B60">Modirshanechi et al. (2019)</xref>, the density of significant temporal features for decoding Shannon surprise was compared in the middle and late segments of EEG data and no significant difference was observed between these two segments in terms of decoding surprise. Also, <xref ref-type="bibr" rid="B54">Maheu et al. (2019)</xref> conducted a study on MEG data with participants exposed to auditory sequences with different statistical regularities, and modeled the activity of the brain with Shannon surprise levels using several learning models. <xref ref-type="bibr" rid="B31">Gijsen et al. (2021)</xref> described the EEG dynamics of the somatosensory learning system in terms of its neural surprise signatures.</p>
<p>In the current study, aside from considering different concepts of surprise, the value of each of the temporal components is assessed and compared with others in MEG records of an auditory oddball task. Besides, analytical definitions are proposed for the early, middle, and late segments based on a method that partitions the response of each trial to three temporal segments based on the behavior of each segment in describing surprise. We compare the middle part of the recorded response and the late part in terms of reflecting the surprise of the brain. We aim to examine whether there is one temporal component or a subset of components that best describe each of the three mentioned surprise concepts. We also perform a sensor-level analysis to identify the best locations on the scalp to capture information about surprise from neural activities.</p>
<p>The repetition-break plot structure (<xref ref-type="bibr" rid="B53">Loewenstein and Heath, 2009</xref>) is one of the recipes proposed for eliciting surprise in studies on its social influence (<xref ref-type="bibr" rid="B52">Loewenstein, 2019</xref>). In computational frameworks for studying surprise based on measured brain signals, oddball experiments are employed in which repeated exposure to surprising instances of the stimuli allow for trial averaging and noise reduction. The current study focuses on binary oddball tasks and formulates its definition of surprise assuming a transition probability matrix that describes the generative distribution of the stimuli sequence (<xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>). Considering the generative distribution as a Markov process, this transition probability matrix serves as sufficient statistics to describe the distribution. It was shown in <xref ref-type="bibr" rid="B59">Meyniel et al. (2016)</xref> that this assumption leads to a surprise value (prediction error) that is highly correlated with the P300 response. Also, <xref ref-type="bibr" rid="B31">Gijsen et al. (2021)</xref> showed that this first order transition probability is the best inference model in terms of goodness of fit to EEG data.</p>
<p>The paper sets forth comparative results for the mentioned surprise decoders, and statistically elaborates on the relative importance of the different channels/temporal components in decoding the three surprise concepts (Shannon, Bayesian, and confidence-corrected surprise which, respectively, represent the unlikeliness, updating, and the unexpectedness of the input) elicited by the stimuli. The results support the Bayesian learning assumption and provide evidence for predictive coding.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> provides an overview of the overall flow of data and the decoding approach used in our analysis.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The overall diagram of the decoding model of temporal analysis. The steps are explained in the Section &#x201C;Materials and Methods&#x201D;. <bold>(A)</bold> The scheme of the decoding model and machine learning tools. The power of decoding is measured by the fraction of variance that is explained (<italic>R</italic>-squared). <bold>(B)</bold> The processes performed on the preprocessed MEG data to acquire features for regression. The feature matrix is shown by <italic>S</italic><sub><italic>N&#x00D7;p</italic></sub>. The length of each feature is <italic>p</italic> and the number of features is <italic>N</italic>. <bold>(C)</bold> The surprise calculation module using the oddball sequence of stimuli, consisting of standards S and deviants D, as input and generating labels for training the regression model. The labels vector is shown as <italic>Y</italic><sub><italic>N&#x00D7;1</italic></sub>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnsys-16-865453-g001.tif"/>
</fig>
<sec id="S2.SS1">
<title>Dataset and Task</title>
<p>Our analysis is applied to a dataset consisting of MEG responses recorded in an auditory oddball task (<xref ref-type="bibr" rid="B54">Maheu et al., 2019</xref>). In this task, the standard and deviant stimuli were two different French syllables randomly drawn from a binomial distribution with the probability of the frequent syllable being 2/3 and that of the deviant syllable being 1/3. Each syllable lasted about 200 milliseconds and the interval between two successive stimuli was 1400 milliseconds. The data record consisted of one block of stimuli with around 405 trials.</p>
<p>Participants included 11 females and 9 males, aged between 18 and 25. The data of two subjects were removed because of their excessive head movements. To ensure that the participants paid attention to the task, they were asked every 12&#x2013;18 trials to predict the next stimuli (being a standard or a deviant) using one of two buttons.</p>
<p>The brain activity was recorded by a 306 channels (102 magnetometers and 204 gradiometers) whole-head Elekta Neuromag MEG system using a sampling rate of 1000 Hz and a hardware-based band-pass filter of 0.1&#x2013;330 Hz.</p>
</sec>
<sec id="S2.SS2">
<title>Preprocessing of Data</title>
<p>The following preprocessing steps were performed on the raw data as reported by <xref ref-type="bibr" rid="B54">Maheu et al. (2019)</xref>: Raw MEG data were corrected for between-session head movement and bad channels. Then, data were epoched between &#x2212;250 ms to 1 s and were also cleaned from powerline and muscle and other movement artifacts. Trials containing muscle artifacts were detected using semi-automatic methods (based on the variance of signals across sensors and first order derivatives of signals over time) and removed. Then, a low-pass filter below 30 and a 250 Hz down-sampler was applied to the data. Eye blinks and cardiac artifacts were removed using ICA (Independent Component Analysis) (<xref ref-type="bibr" rid="B7">Bell and Sejnowski, 1995</xref>). Finally, the data was baseline corrected using a window of 250 ms before the stimulus onset. Similar to the earlier study (<xref ref-type="bibr" rid="B54">Maheu et al., 2019</xref>), the analysis was performed only on the data of the magnetometers using the EEGLAB toolbox (<xref ref-type="bibr" rid="B19">Delorme and Makeig, 2004</xref>).</p>
<p>For temporal analysis, in order to obtain independent sources of MEG record as features of the regression model, we performed ICA analysis (<xref ref-type="bibr" rid="B7">Bell and Sejnowski, 1995</xref>). We chose FastICA (<xref ref-type="bibr" rid="B42">Hyv&#x00E4;rinen, 1999</xref>) for this data because of the high number of channels (102) which could render the InfoMax algorithm excessively slow. We ended up with an average (over subjects) of 69 independent components for the entire set of sensors using FastICA. We also considered the interval of [&#x2212;200 ms, 600 ms] as the response period and reduced the number of samples by downsampling to 80 samples per epoch. We took each trial as a feature, so the number of features used for training was <italic>N</italic> &#x2208; [400,&#x2004;409] (equal to the number of stimuli in the block which varied between the participants). We concatenated the vectors of independent components to make a longer vector which serves as the decoder input. Thus, the maximum dimension of each feature was around <italic>p</italic> = 80&#x00D7;69 = 5520 (equal to the number of time samples multiplied by the number of independent components). The superiority of using independent components instead of the data of the channels is that the resulting feature vectors contain lower dependencies between their elements.</p>
<p>For spatial analysis, for the recorded signal of each channel, we selected the interval of [&#x2212;200 ms, 600 ms] as the response period and reduced the number of samples by downsampling to 80.</p>
</sec>
<sec id="S2.SS3">
<title>Ideal Observer Model</title>
<p>A fundamental question in the Bayesian brain literature is how the brain learns the distribution of the sensory stimuli. The brain is assumed a near-optimal estimator of the probability of the input sequence based on a generative model with Bayesian inference (<xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B16">Daunizeau et al., 2010</xref>; <xref ref-type="bibr" rid="B25">Friston, 2012</xref>; <xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>; <xref ref-type="bibr" rid="B75">Rubin et al., 2016</xref>; <xref ref-type="bibr" rid="B60">Modirshanechi et al., 2019</xref>). To be more precise, the brain uses a prior belief about the environment, and updates it after each stimulus arrives. In addition, in order to initialize the inference process, it is presumed that the brain begins with the assumption of equally probable input types despite exposure to any possible previous blocks of stimuli (<xref ref-type="bibr" rid="B85">Strange et al., 2005</xref>; <xref ref-type="bibr" rid="B36">Harrison et al., 2006</xref>; <xref ref-type="bibr" rid="B9">Bestmann et al., 2008</xref>; <xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>).</p>
<p>Here, two crucial questions to ponder on are what exactly constitutes the statistics that the brain attempts to learn from the recent history of observations, and what mechanism it employs to arrive at an optimal estimate of this probability.</p>
</sec>
<sec id="S2.SS4">
<title>Transition Probabilities</title>
<p>In an oddball experiment, each stimulus can be denoted by a binary random variable <italic>x<sup>i</sup></italic> for <italic>i</italic> = 1,&#x2026;,<italic>T</italic>, where <italic>T</italic> is the length of the stimuli sequence. We consider <italic>x<sup>i</sup></italic> = 0 if the <italic>ith</italic> stimulus is a standard and <italic>x<sup>i</sup></italic> = 1 otherwise. This variable follows a Binomial distribution with parameters <italic>p</italic><sub><italic>0</italic></sub> and <italic>p</italic><sub>1</sub> = 1&#x2212;<italic>p</italic><sub><italic>o</italic></sub> as the probabilities of the standard and deviant stimuli, respectively. Based on the hypothesis that the sequence of items has been generated by a &#x201C;Markovian&#x201D; generative process, the sequence can be modeled by the probabilities of transition between the stimuli types. For a binary oddball sequence, the transition probabilities can be stated as a 2&#x00D7;2 matrix, which can be estimated by counting the number of successive transitions (<xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>). It has been demonstrated that utilizing the transition probability matrix for describing the stimuli sequence statistically outperforms the single-parameter approach to describe the brain&#x2019;s response (<xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>).</p>
<p>For a binary oddball sequence, the definition of the model parameter &#x03B8; can be stated in the form of a 2&#x00D7;2 matrix:</p>
<disp-formula id="S2.Ex1">
<mml:math id="M1">
<mml:mrow>
<mml:mrow>
<mml:mi>&#x03B8;</mml:mi>
<mml:mo>&#x225C;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mtable columnspacing="5pt" displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>p</italic><sub><italic>a&#x2014;b</italic></sub> is the probability of transition from stimulus type <italic>b</italic> to stimulus type <italic>a</italic>. Since the sum of each column of this matrix is equal to 1, we can reduce the model parameter&#x2019;s definition to a vector&#x03B8; :</p>
<disp-formula id="S2.Ex2">
<mml:math id="M2">
<mml:mrow>
<mml:mrow>
<mml:mi>&#x03B8;</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mtable displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mtable displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Based on this definition, the likelihood of a sequence of observations <bold>X</bold><italic><sup>j</sup></italic>with a length <italic>j</italic> will be:</p>
<disp-formula id="S2.E1">
<label>(1)</label>
<mml:math id="M3">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msup>
<mml:mtext mathvariant="bold">X</mml:mtext>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo stretchy="false">|</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mpadded width="+2.8pt">
<mml:mn>0.5</mml:mn>
</mml:mpadded>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mmultiscripts>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:none/>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mprescripts/>
<mml:none/>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mmultiscripts>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mmultiscripts>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:none/>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mprescripts/>
<mml:none/>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mmultiscripts>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where &#x03B8;<italic><sup>j</sup></italic>with elements <inline-formula><mml:math id="INEQ13"><mml:msubsup><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>j</mml:mi></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="INEQ14"><mml:msubsup><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>j</mml:mi></mml:msubsup></mml:math></inline-formula> is the estimated parameter vector after receiving <italic>j</italic> inputs denoted by the vector <bold>X</bold><italic><sup>j</sup></italic>, the probability of the first stimulus is assumed to be <inline-formula><mml:math id="INEQ16"><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:math></inline-formula>, and <inline-formula><mml:math id="INEQ17"><mml:msubsup><mml:mi>n</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mi>j</mml:mi></mml:msubsup></mml:math></inline-formula> is the number of transitions from stimulus type <italic>b</italic> to stimulus type <italic>a</italic> in the <italic>j</italic> observations up to the present sample.</p>
<p>The parameter <inline-formula><mml:math id="INEQ18"><mml:msubsup><mml:mi>n</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mi>j</mml:mi></mml:msubsup></mml:math></inline-formula> can be computed in different ways depending a forgetting model for the memory (<xref ref-type="bibr" rid="B41">Huettel et al., 2002</xref>; <xref ref-type="bibr" rid="B45">Kiebel et al., 2008</xref>; <xref ref-type="bibr" rid="B35">Harrison et al., 2011</xref>; <xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>). In this paper, we have adopted a leaky integration method to account for earlier observations. In this method, the most recent stimulus is given a maximum weight and the weights of the preceding observations decrease exponentially with a parameter <italic>w</italic> (the integration coefficient) moving backward toward earlier observations (<xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>).</p>
<p>Eq. 1 is the product of two Binomial distributions, each representing one of the two elements of the vector&#x03B8;. Using the Beta distribution notation to represent the prior probability of these elements as the conjugate prior of Binary distribution, the posterior distribution of &#x03B8;<italic><sup>j</sup></italic> after <italic>j</italic> inputs will be the multiple of two new Beta distributions:</p>
<disp-formula id="S2.E2">
<label>(2)</label>
<mml:math id="M4">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:msup>
<mml:mtext mathvariant="bold">X</mml:mtext>
<mml:mi>j</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mtext>Beta</mml:mtext>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mtext>Beta</mml:mtext>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>To sum up, the posterior probability of the stimulus-generating Binomial distribution parameter is obtained using a two-dimensional descriptor parameter in Eq. 2. The next step is to use this equation to calculate the theoretical surprise inherent in the stimuli sequence.</p>
</sec>
<sec id="S2.SS5">
<title>Surprise Calculation</title>
<p>In the previous section, we estimated the stimulus-generating distribution assuming transition probability matrix as sufficient statistics. When the brain encounters a stimulus that was not predicted using this estimated distribution, it may produce a &#x201C;surprise&#x201D; response reflecting the prediction error (<xref ref-type="bibr" rid="B57">Mars et al., 2008</xref>; <xref ref-type="bibr" rid="B51">Lieder et al., 2013</xref>; <xref ref-type="bibr" rid="B59">Meyniel et al., 2016</xref>; <xref ref-type="bibr" rid="B75">Rubin et al., 2016</xref>; <xref ref-type="bibr" rid="B60">Modirshanechi et al., 2019</xref>). There are three mathematical approaches in the literature to quantify this surprise. We elaborated the approaches and derived the formulas for calculating the three surprise measures completely in <xref ref-type="supplementary-material" rid="TS1">Supplementary Material.</xref> The labels of the decoder are these surprise values <italic>Y</italic><sub><italic>N&#x00D7;1</italic></sub> used to train the regressor.</p>
</sec>
<sec id="S2.SS6">
<title>Temporal Analysis</title>
<p>Four methods for selecting the temporal components are employed as described below.</p>
<p>In our study, first we seek to identify the significant single time instances or time intervals, which can best regress the surprise value of the stimuli. Hence, we define four different regimes of selecting samples from the temporal data record (<xref ref-type="fig" rid="F2">Figure 2</xref>):</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Different temporal component selection regimes are used to define feature vectors as inputs to the decoder. <bold>(A)</bold> All temporal components of a trial are used (Entire epoch). <bold>(B)</bold> Each single temporal point <italic>t</italic> is used (Samples). <bold>(C)</bold> Temporal components in the range of [&#x2212;200,<italic>t</italic>] are used (Intervals). <bold>(D)</bold> Temporal segments are used with optimum <italic>t</italic><sub><italic>1</italic></sub> and<italic>t</italic><sub>2</sub> (Segments).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnsys-16-865453-g002.tif"/>
</fig>
<list list-type="simple">
<list-item>
<label>1.</label>
<p><bold>Entire epoch:</bold> The total response time (-200 ms to 600 ms) is used for regression to identify all significant coefficients (<xref ref-type="fig" rid="F2">Figure 2A</xref>).</p>
</list-item>
<list-item>
<label>2.</label>
<p><bold>Samples:</bold> A single sample at time <italic>t</italic>is employed as the decoder&#x2019;s input (<xref ref-type="fig" rid="F2">Figure 2B</xref>), and this operation is repeated for all values of <italic>t</italic> to determine their relative powers in estimating the stimuli surprise.</p>
</list-item>
<list-item>
<label>3.</label>
<p><bold>Intervals:</bold> To evaluate the significance of an interval of accumulated temporal samples from the beginning of the epoch to the current target time, the interval of &#x2212;200 to time <italic>t</italic> is used as input to the decoder (<xref ref-type="fig" rid="F2">Figure 2C</xref>). This operation is repeated for all values of <italic>t</italic>. This allows the decoder to utilize the dependency among the temporal samples in the recorded data.</p>
</list-item>
<list-item>
<label>4.</label>
<p><bold>Segments (Baseline, Early, Middle, and Late):</bold> To evaluate the regression power of the target time interval and to compare the segments of the MEG records, a range of temporal samples is used as the decoder&#x2019;s input feature vector (<xref ref-type="fig" rid="F2">Figure 2D</xref>).</p>
</list-item>
</list>
<p>Four disjoint time segments are identified for coarse-level segmentation of the response profile: From &#x2212;200 ms pre-stimulus to time 0 (Baseline), from time 0 to <italic>t</italic><sub><italic>1</italic></sub> (Early components), from <italic>t</italic><sub><italic>1</italic></sub> to <italic>t</italic><sub><italic>2</italic></sub> (Middle components), and from <italic>t</italic><sub><italic>2</italic></sub> to 600 ms (Late components) (<xref ref-type="fig" rid="F2">Figure 2D</xref>).</p>
<p>In our work, the values of <italic>t</italic><sub><italic>1</italic></sub> and <italic>t</italic><sub><italic>2</italic></sub> are determined to provide decoding behavior-based definitions for the Early, Middle, and Late segments using the decoding powers obtained in the <italic>Samples</italic> regime. After analyzing the <italic>Samples</italic> regime, we define <italic>t</italic><sub><italic>1</italic></sub> as the first point that reaches the 10 percent of the globally maximum decoding power. Furthermore, observing that two local maxima exist in the middle and late responses, we define <italic>t</italic><sub><italic>2</italic></sub> as the point with minimum decoding power in the interval [250 ms, 400 ms] in order to separate the Middle and Late segments (see <xref ref-type="fig" rid="F3">Figure 3A</xref> in Section &#x201C;Temporal Analysis&#x201D;). When we capitalize the name of these segments, we mean the segments with boundaries defined based on this approach.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p><bold>(A)</bold> Surprise decoding powers of the three surprise quantifications and chance level for the <italic>Samples</italic> regime. <bold>(B)</bold> Statistical analysis of the <italic>Samples</italic> regime. Each diagram illustrates an 80 &#x00D7; 80 matrix of <italic>p</italic>-values in logarithmic scale obtained by <italic>t</italic>-test. The red areas indicate the presence of significant differences between the two samples representing the horizontal and vertical axes values. <bold>(C)</bold> Surprise decoding powers of the three surprise quantifications and chance level for the temporal regime of <italic>Intervals</italic> [-200 ms, <italic>t</italic>]. <bold>(D)</bold> Statistical analysis of the <italic>Intervals</italic> regime. Each diagram illustrates an 80 &#x00D7; 80 matrix of <italic>p</italic>-values in logarithmic scale obtained by <italic>t</italic>-test to evaluate the significance of difference between decoding powers of each pair of features in the <italic>Intervals</italic> regime. <bold>(E)</bold> Box plots of decoding powers for different <italic>Segments</italic> (early, middle, and late). In these boxplots, horizontal lines indicate the median of the data and the boxes extend from the lower to upper quartile values. The whiskers extending from the boxes indicate the range of the data. Flier points are those past the ends of the whiskers. <bold>(F)</bold> The decoding power of different values of <italic>w</italic> (vertical axis) over time (horizontal axis) for different definitions of surprise in the <italic>Intervals</italic> and <italic>Samples</italic> regimes. Colors denote the decoding power. Note that the <italic>p</italic>-values are shown in a logarithmic scale for better visualization.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnsys-16-865453-g003.tif"/>
</fig>
</sec>
<sec id="S2.SS7">
<title>Spatial Analysis</title>
<p>Spatial analysis is performed in an essentially similar fashion to the temporal analysis but the feature matrix is defined in such a way to allow for comparing the different magnetometers in collecting the most surprise-correlated brain activity (see <xref ref-type="fig" rid="F4">Figure 4</xref>). Similarly, the decoding model is essentially a Lasso linear regression module and the labels for this regressor are the calculated theoretical surprise values. We perform two methods of analysis for the data of each channel (magnetometer):</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Different feature matrices used for temporal (<italic>Samples</italic> regime) and spatial analysis: <italic>S</italic><sub><italic>N&#x00D7;c</italic></sub> and <italic>S</italic><sub><italic>N&#x00D7;T</italic></sub>, respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnsys-16-865453-g004.tif"/>
</fig>
<list list-type="simple">
<list-item>
<label>1.</label>
<p>The feature matrix fed to the regression module is an <italic>N</italic>&#x00D7;<italic>T</italic> matrix, i.e., all temporal samples are used to decode the level of surprise for each magnetometer.</p>
</list-item>
<list-item>
<label>2.</label>
<p>The feature matrix fed to the regression module is an <italic>N</italic>&#x00D7;<italic>T</italic>&#x2032; matrix, where <italic>T</italic>&#x2032; &#x003C; <italic>T</italic>, meaning that a portion of the temporal samples is used to regress the level of surprise. The goal is adding a temporal view to the spatial analysis in order to compare the surprise-decoding regions on the scalp using different temporal segments: Early, Middle, and Late segments.</p>
</list-item>
</list>
</sec>
<sec id="S2.SS8">
<title>The Decoder Design</title>
<p>The decoder we use for this analysis was introduced by <xref ref-type="bibr" rid="B60">Modirshanechi et al. (2019)</xref>, and we modified its input features as well as the surprise labels to fit our analysis as described above. More details about the decoder can be found in <xref ref-type="bibr" rid="B60">Modirshanechi et al. (2019)</xref>.</p>
<p>Briefly, the decoder mainly consists of one module of linear regression. A Lasso linear regression model takes as its input the feature matrix<italic>S</italic><sub><italic>N</italic>&#x00D7;<italic>p</italic>&#x2032;</sub> extracted from the data according to one of the 4 described temporal feature selection regimes (<italic>N</italic> is the number of features and <italic>p</italic>&#x2032;&#x2264;<italic>p</italic> is the dimension of each feature which depends on the temporal feature selection regime), as well as the label vector <italic>Y</italic><sub><italic>N&#x00D7;1</italic></sub> calculated from the input stimuli sequence according to one of the three mentioned definitions of surprise as its labels (see <xref ref-type="fig" rid="F1">Figure 1</xref>). The Lasso regressor aims to minimize the reconstruction error while observing an added sparsity term, eliminating the input features that might be irrelevant to the reconstruction of surprise, and helps avoid overfitting to the training data.</p>
<p>To evaluate the trained model on the test data using a fivefold cross-validation, we used the R-squared measure as decoding power. These values were compared to chance levels to test (and reject) the hypothesis that the input features are independent from surprise labels. Noticing that the decoding power is a function of the integration coefficient <italic>w</italic> (the parameter defining the coefficients of the window of integration), we reported the maximum decoding power across all the <italic>w</italic> values for each regression by employing the best integration coefficients. Also, in the end, we reported and analyzed the best values for the integration coefficients averaged over subjects. After the removal of features with zero coefficients by the Lasso regressor, the remaining features were presumed effective and employed in describing the surprise.</p>
<p>At the end of each decoding analysis task, to judge the resulting <italic>R</italic>-squared values, we tested the hypothesis that <italic>S</italic><sub><italic>N&#x00D7;p</italic></sub> and <italic>Y</italic><sub><italic>N&#x00D7;1</italic></sub>are independent of each other (<xref ref-type="bibr" rid="B74">Rouder et al., 2009</xref>; <xref ref-type="bibr" rid="B60">Modirshanechi et al., 2019</xref>). This was done by making random permutations in the vector <italic>Y</italic><sub><italic>N&#x00D7;1</italic></sub> and acquiring the <italic>R</italic>-squared value of the resulting regression each time as chance level (<xref ref-type="bibr" rid="B69">Pereira et al., 2009</xref>).</p>
<p>The entire analysis was performed separately for each subject and for each type of surprise. We used Matlab to design and simulate the decoder.</p>
</sec>
</sec>
<sec id="S3" sec-type="results">
<title>Results</title>
<p><xref ref-type="table" rid="T1">Tables 1</xref>&#x2013;<xref ref-type="table" rid="T4">4</xref> and <xref ref-type="fig" rid="F3">Figures 3</xref>, <xref ref-type="fig" rid="F5">5</xref>, <xref ref-type="fig" rid="F6">6</xref> summarize our results, which are described in detail next.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Decoding power (<italic>R</italic><sup>2</sup> values), chance level, and <italic>p</italic>-values of <italic>t</italic>-tests comparing chance levels and decoding powers for the three definitions of surprise for the temporal regime of <italic>Entire epoch</italic>.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Decoding power</td>
<td valign="top" align="center">Shannon</td>
<td valign="top" align="center">Confidence-corrected</td>
<td valign="top" align="center">Bayesian</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Entire epoch</td>
<td valign="top" align="center">0.134</td>
<td valign="top" align="center">0.070</td>
<td valign="top" align="center">0.033</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Chance level</bold></td>
<td valign="top" colspan="3"/>
</tr>
<tr>
<td valign="top" align="left">Entire epoch</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0050</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0051</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0050</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>p</italic>-values</bold></td>
<td valign="top" colspan="3"/>
</tr>
<tr>
<td valign="top" align="left">Entire epoch</td>
<td valign="top" align="center"><bold>0.000100</bold></td>
<td valign="top" align="center"><bold>0.000112</bold></td>
<td valign="top" align="center"><bold>0.000747</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>Highlighted p-values are the ones lower than significance level using Bonferroni correction (equal to 0.0042).</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>The temporal borders separating the Early, Middle, and Late segments obtained from partitioning the decoding power curves in <xref ref-type="fig" rid="F3">Figure 3A</xref>.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center">Shannon</td>
<td valign="top" align="center">Bayesian</td>
<td valign="top" align="center">Confidence-corrected</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">(<italic>t</italic><sub><italic>1</italic></sub>, <italic>t</italic><sub><italic>2</italic></sub>) (see <xref ref-type="fig" rid="F2">Figure 2D</xref>)</td>
<td valign="top" align="center">(60, 350)</td>
<td valign="top" align="center">(50, 360)</td>
<td valign="top" align="center">(60, 380)</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Decoding power, chance level, and <italic>p</italic>-values of <italic>t</italic>-tests comparing chance levels and decoding powers for the three definitions of surprise for the temporal regime of <italic>Segments</italic>.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Decoding power</td>
<td valign="top" align="center">Shannon</td>
<td valign="top" align="center">Bayesian</td>
<td valign="top" align="center">Confidence-corrected</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Early</td>
<td valign="top" align="center">0.004 &#x00B1; 0.007</td>
<td valign="top" align="center">0.002 &#x00B1; 0.001</td>
<td valign="top" align="center">0.002 &#x00B1; 0.001</td>
</tr>
<tr>
<td valign="top" align="left">Middle</td>
<td valign="top" align="center">0.147 &#x00B1; 0.116</td>
<td valign="top" align="center">0.056 &#x00B1; 0.066</td>
<td valign="top" align="center">0.087 &#x00B1; 0.088</td>
</tr>
<tr>
<td valign="top" align="left">Late</td>
<td valign="top" align="center">0.036 &#x00B1; 0.046</td>
<td valign="top" align="center">0.025 &#x00B1; 0.035</td>
<td valign="top" align="center">0.021 &#x00B1; 0.038</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Chance level</bold></td>
<td valign="top" colspan="3"/>
</tr>
<tr>
<td valign="top" align="left">Early</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0050</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0051</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0051</td>
</tr>
<tr>
<td valign="top" align="left">Middle</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0050</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0051</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0051</td>
</tr>
<tr>
<td valign="top" align="left">Late</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0050</td>
<td valign="top" align="center">&#x2212;0.0004 &#x00B1; 0.0069</td>
<td valign="top" align="center">&#x2212;0.0031 &#x00B1; 0.0051</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>p</italic>&#x2212;values</bold></td>
<td valign="top" align="center"/>
<td/>
<td valign="top" align="center"/>
</tr>
<tr>
<td valign="top" align="left">Early</td>
<td valign="top" align="center">0.015457</td>
<td valign="top" align="center"><bold>0.000160</bold></td>
<td valign="top" align="center"><bold>0.000112</bold></td>
</tr>
<tr>
<td valign="top" align="left">Middle</td>
<td valign="top" align="center"><bold>0.000001</bold></td>
<td valign="top" align="center"><bold>0.000228</bold></td>
<td valign="top" align="center"><bold>0.000035</bold></td>
</tr>
<tr>
<td valign="top" align="left">Late</td>
<td valign="top" align="center"><bold>0.002373</bold></td>
<td valign="top" align="center">0.005639</td>
<td valign="top" align="center">0.024679</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>Highlighted p-values are the ones lower than significance level using Bonferroni correction (equal to 0.0042).</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Results of the ANOVA test for comparing three temporal segments of Early, Middle, and Late of the <italic>Segments</italic> regime in decoding each surprise.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Surprise model</td>
<td valign="top" align="center">ANOVA <italic>f</italic>-value</td>
<td valign="top" align="center"><italic>p</italic>-value (Middle vs. Late)</td>
<td valign="top" align="center"><italic>p</italic>-value (Early vs. Middle)</td>
<td valign="top" align="center"><italic>p</italic>-value (Early vs. Late)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Shannon</td>
<td valign="top" align="center">18.909</td>
<td valign="top" align="center"><bold>0.0007</bold></td>
<td valign="top" align="center"><bold>1.31e-05</bold></td>
<td valign="top" align="center"><bold>0.0079</bold></td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="center">7.575</td>
<td valign="top" align="center">0.0919</td>
<td valign="top" align="center"><bold>0.0017</bold></td>
<td valign="top" align="center"><bold>0.0106</bold></td>
</tr>
<tr>
<td valign="top" align="left">Confidence-corrected</td>
<td valign="top" align="center">11.937</td>
<td valign="top" align="center"><bold>0.0077</bold></td>
<td valign="top" align="center"><bold>0.0003</bold></td>
<td valign="top" align="center">0.0501</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>Highlighted p-values are the ones lower than significance level using Bonferroni correction (equal to 0.0167).</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p><bold>(A)</bold> Decoding power of magnetometers averaged on subjects for the three surprise values and chance level. <bold>(B)</bold> Channel locations. <bold>(C)</bold> Decoding powers of three surprise values in different channels using the entire response epoch. <bold>(D)</bold> <italic>P-value</italic> of <italic>t</italic>-test of difference between decoding powers of pairs of surprise values. Channels with lower <italic>p</italic>-values are shown in yellow. Note that a different color scale is used in each plot for better visibility of areas with high decoding powers. Also, <italic>p</italic>-values are logarithmically scaled for better visualization.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnsys-16-865453-g005.tif"/>
</fig>
<sec id="S3.SS1">
<title>Temporal Analysis</title>
<p>Here we describe the decoding powers of the three quantifications of surprise when each is employed as label for training surprise decoders.</p>
<sec id="S3.SS1.SSS1">
<title>Entire Epoch</title>
<p>The <italic>R</italic><sup>2</sup> values and chance levels when using the entire epoch are presented in <xref ref-type="table" rid="T1">Table 1</xref>. The mean of <italic>R</italic><sup>2</sup> values are at least ten times bigger than the mean of chance levels. We conducted <italic>t</italic>-tests to examine the presence of significant differences between chance levels and decoding powers. We corrected the significance level using Bonferroni correction (<xref ref-type="bibr" rid="B10">Bonferroni, 1936</xref>) considering 12 tests to 0.0042 (we conducted these tests with the tests of <xref ref-type="table" rid="T3">Table 3</xref> that compares decoding powers with the chance levels for the Segments simultaneously. Three surprise models and four segments lead to 12 tests). We observed that the decoding powers are significantly higher than chance level in the <italic>Entire epoch</italic> regime. The values of chance level and decoding power in this table can be considered as upper bounds for other goodness of fit measurements in the other three temporal regimes. We selected the maximum value among the three chance levels to plot for comparison in <xref ref-type="fig" rid="F3">Figures 3A,C</xref>,<xref ref-type="fig" rid="F3">5A.</xref></p>
</sec>
<sec id="S3.SS1.SSS2">
<title>Samples Regime</title>
<p>The decoding power of this regression model is illustrated in <xref ref-type="fig" rid="F3">Figure 3A</xref> for different values of <italic>t</italic> &#x2208; [&#x2212;200ms,&#x2004;&#x2004;600ms]. Due to the employment of only one time sample in each epoch for describing the trial&#x2019;s surprise; it is understandable to have relatively low <italic>R</italic><sup>2</sup> levels. In the curves of <xref ref-type="fig" rid="F3">Figure 3A</xref>, the middle and late components appear to describe surprise better than the early components. In addition, one noticeable peak is observed in the middle segment. The fact that the <italic>Samples</italic> regime is able to identify time points in the middle segment of the MEG response with the highest surprise-decoding powers (for any of the three definitions of surprise) is a remarkable observation in our study.</p>
</sec>
<sec id="S3.SS1.SSS3">
<title>Intervals Regime</title>
<p><xref ref-type="fig" rid="F3">Figure 3C</xref> illustrates the decoding powers of decoders trained using an interval of temporal samples in the range of [-200 ms, <italic>t</italic>] for different values of <italic>t</italic>. This regime is expected to reveal at which time instance enough evidence has been accumulated from the response for achieving a confident decoding performance. In each curve, the <italic>R</italic><sup>2</sup> value stays close to zero until around 100 ms, when there is a considerable rise in the decoding power. This increase occurs in the temporal range which we called the middle segment in our <italic>Segments</italic> regime. The decoding power increases only little after around 250 ms. We can deduct that the response components do not add much information about surprise after around 250 ms.</p>
</sec>
<sec id="S3.SS1.SSS4">
<title>Segments Regime</title>
<p>First, the time points that best partition the entire after-onset epoch to three parts are obtained based on the method described in Section &#x201C;Preprocessing of Data&#x201D; and reported in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<p>The <italic>R</italic><sup>2</sup> values and chance levels for using data points in each of the segments named Early, Middle, and Late for decoding surprise are presented in <xref ref-type="table" rid="T3">Table 3</xref>. We observe that for the Early segment, the <italic>R</italic><sup>2</sup> values and the chance levels are close to each other. We conducted <italic>t</italic>-tests to examine the presence of significant differences between chance levels and decoding powers. We corrected the significance level using Bonferroni correction to 0.0042. We observed that decoding powers are significantly higher than chance level in the Middle segment for all three surprise models. However, we did not observe this significance for the Early and Late segments in any of the surprise models. This result is expected since the early segment of the response epoch is known to have little or no information about surprise and has been reported to mainly reflect the physical aspects of the stimuli (<xref ref-type="bibr" rid="B87">Sur and Sinha, 2009</xref>). To explain this result, we note that even though the characteristics of the two types of stimuli (standards and deviants) are different from each other, the components recorded during the early processing of the stimuli do not appear to account for the stimuli&#x2019;s surprise. In other words, these processes also seem to create signatures in the recorded response that are not differentiable from each other in a significant way as far as the issue of their confound with the brain&#x2019;s surprise is considered. The latter point is a remarkable observation which our statistical analysis also reveals and as such, provides further evidence that early sensory processes in the brain employ generic sets of operations on all stimuli as the surprise aspects of the input are still not known to the brain.</p>
<p>We can further add that even though the differences between the characteristics of the two types of stimuli may affect the early part of the recorded brain response (which might be observed as differences between the two responses when the usual trial averaging techniques are used and decoding the surprise of each trial is not an objective), such differences in the recorded response cannot be used to decode the surprise that is embedded in the input sequence. In other words, this lack of differentiability in terms of surprise decoding between the early parts of the response to the stimuli can itself serve as an indication that the input characteristics do not interject any confound into the decoding process employed in our model.</p>
<p>We observe in <xref ref-type="table" rid="T3">Table 3</xref> that the Middle segment demonstrates significant values of decoding power. <xref ref-type="fig" rid="F3">Figure 3E</xref> shows the variation of the decoding powers of different segments across the subjects using three boxplots for the three surprise values.</p>
</sec>
<sec id="S3.SS1.SSS5">
<title>Significance of Temporal Features</title>
<p><xref ref-type="table" rid="T4">Table 4</xref> shows the results of repeated measures of ANOVA (Analysis of Variance) (<xref ref-type="bibr" rid="B32">Gueorguieva and Krystal, 2004</xref>) for comparing the decoding powers of the three segments of Early, Middle, and Late, employing data from the different subjects as the statistical samples. The <italic>f</italic>-value of the ANOVA analysis and the <italic>p</italic>-values of the <italic>post hoc</italic> analysis are reported in the table. We conducted three one-way ANOVA tests each corresponding to a surprise model. The significance level is corrected to 0.0167 using Bonferroni correction.</p>
<p>The table indicates that not only the Early segment is significantly less powerful than the Middle segment, but also significant difference is observed between the <italic>R</italic><sup>2</sup> of the Shannon and Confidence-corrected surprise values for the Middle and Late segments. This is because the Late components, as it was also observed in the results of the <italic>Samples</italic> regime, are significantly less powerful than the Middle components. However, this is not the case for the Bayesian surprise, which offers relatively similar decoding powers for the Middle and Late segments.</p>
<p>Similarly, in <xref ref-type="fig" rid="F3">Figure 3B</xref>, the relative importance of temporal components for decoding surprise is assessed for decoders based on the <italic>Samples</italic> regime. In these figures, each picture illustrates an 80&#x00D7;80matrix of &#x2212;<italic>log</italic><sub>2</sub>(<italic>p</italic>&#x2212;<italic>values</italic>) coded to colors, representing <italic>1600</italic> tests performed to evaluate the significance of the difference between the decoding powers of each pair of features in the <italic>Samples</italic> regime. We used the logarithmic <italic>p</italic>-value scale to afford a wider range for better visualization. Note that these are uncorrected <italic>p</italic>-values as we only want to compare the relative levels of <italic>p</italic>-values here and scaling all of them (using Bonferroni correction) has no impact. A similar plot is shown in <xref ref-type="fig" rid="F3">Figure 3D</xref> for the <italic>Intervals</italic> regime.</p>
<p>In the <italic>Samples</italic> regime, there is no single time instance with significantly better decoding power (for decoding any of the three surprise quantifications) than all the other temporal points (<xref ref-type="fig" rid="F3">Figure 3B</xref>).</p>
<p>In the <italic>Intervals regime</italic>, the relatively narrow diagonal blue line around the point (200 ms, 200 ms) shows the rapid rising behavior of the decoding power when the points of the middle segment are included (<xref ref-type="fig" rid="F3">Figure 3D</xref>). Furthermore, after around 250 ms (in <xref ref-type="fig" rid="F3">Figure 3D</xref>), adding new temporal components as features for decoding surprise (for any of the three definitions) does not lead to a significantly higher decoding power.</p>
</sec>
<sec id="S3.SS1.SSS6">
<title>The Effect of Integration Coefficient</title>
<p>In <xref ref-type="fig" rid="F3">Figure 3F</xref>, the decoding powers of the designed decoders are plotted for different integration coefficients in the range of [1,100]. Two different behaviors can be observed for the three surprise quantifications. For the Shannon and confidence-corrected surprise values, when <italic>w</italic> is not small, a relatively high decoding power is observed. However, for the Bayesian surprise <italic>w</italic> needs to be relatively small in order to obtain high decoding powers.</p>
<p>In addition, in the <italic>Samples</italic> regime of this analysis, the best integration coefficient is not much dependent on time. In other words, the best <italic>w</italic> is not much different for the middle and late components (<xref ref-type="fig" rid="F3">Figure 3F</xref>).</p>
</sec>
</sec>
<sec id="S3.SS2">
<title>Spatial Analysis</title>
<p>In this part, first the decoding power for each of the 102 magnetometers is obtained for the three surprise quantifications using the entire temporal epoch as the input feature for regressors. In <xref ref-type="fig" rid="F5">Figure 5A</xref> the decoding power is averaged over subjects and plotted for all channels. The value of the decoding power is clearly greater in comparison to the chance level listed in <xref ref-type="table" rid="T1">Table 1</xref>, so the assumption of independence between surprise values and the entire epoch of the MEG data can be rejected. Interestingly, for almost all magnetometers, the MEG data decodes Shannon surprise best and Bayesian surprise worst. However, these comparisons are also statistically assessed using paired <italic>t</italic>-test to see whether the difference of decoding powers between pairs of surprise values is significant for each channel considering the subjects as samples. The resulting <italic>p</italic>-values are plotted as topographic maps in <xref ref-type="fig" rid="F5">Figure 5D</xref> with lower <italic>p</italic>-value shown in yellow. The <italic>p</italic>-values are uncorrected and shown in logarithmic scale for better visualization. These plots are produced using the FieldTrip toolbox (<xref ref-type="bibr" rid="B65">Oostenveld et al., 2011</xref>) on the &#x201C;neuromag306mag&#x201D; layout, which is shown in <xref ref-type="fig" rid="F5">Figure 5B</xref>. Then, the average values of the decoding power over the subjects are plotted as topographic maps in <xref ref-type="fig" rid="F5">Figure 5C</xref> in which the brighter channels are the best magnetometers that can be selected for decoding Shannon (middle), Bayesian (left), and confidence-corrected (right) surprise values.</p>
<p>In the second part of the analysis, the decoding power of each channel is assessed temporally for the three defined segments of Early, Middle, and Late (see <xref ref-type="fig" rid="F2">Figure 2</xref>). The goal is to gain an insight into the spatiotemporal value of the data in terms of describing surprise. <xref ref-type="fig" rid="F6">Figure 6</xref> depicts topographic plots of decoding powers for the three surprise quantifications for each of the mentioned temporal segments. The Middle segment possesses the highest level of decoding power, and the Late segment offers a lower decoding power compared to the Middle segment. These topographies in the Early and Late segments include the areas reported by <xref ref-type="bibr" rid="B92">Wacongne et al. (2011)</xref> for the effect of local mismatch at 120 ms and the effect of global deviance at 350 ms after the onset. Local mismatch and global deviance can lead to high theoretical surprise and relate the temporal samples reported by <xref ref-type="bibr" rid="B92">Wacongne et al. (2011)</xref> to our results. In addition, <xref ref-type="bibr" rid="B86">Strauss et al. (2015)</xref> reported the effect of local mismatch at 150 ms and the global variance at 350 ms for MEG data, which are correlated with the temporal segments used to decode surprise in the Middle and Late segments in our analysis.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Sensor-level topographies of decoding powers of three surprise values in different channels using the Early, Middle, and Late segments. Note that a different color scales is used in each plot for better visibility of areas with high decoding powers. Also, <italic>p</italic>-values are logarithmically scaled for better visualization.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnsys-16-865453-g006.tif"/>
</fig>
</sec>
</sec>
<sec id="S4" sec-type="discussion">
<title>Discussion</title>
<sec id="S4.SS1">
<title>Evidence for Bayesian Brain and Ideal Observer</title>
<p>The assumptions of the Bayesian brain and the ideal observer (<xref ref-type="bibr" rid="B46">Knill and Pouget, 2004</xref>; <xref ref-type="bibr" rid="B8">Behrens et al., 2007</xref>; <xref ref-type="bibr" rid="B58">Mathys et al., 2011</xref>; <xref ref-type="bibr" rid="B62">Nassar et al., 2012</xref>, <xref ref-type="bibr" rid="B63">2010</xref>) are embedded in the way we have calculated the theoretical surprise of each stimulus. Although there are three different approaches for defining this surprise, all are based on the parameters learned following the Bayesian brain and ideal observer assumptions. Our results demonstrate the feasibility of decoding these three quantifications of the theoretical surprise on a trial-by-trial basis with significant decoding power, and hence provide new evidence for supporting the Bayesian brain and ideal observer assumptions.</p>
</sec>
<sec id="S4.SS2">
<title>Optimal Use of Temporal Components in Measuring Surprise</title>
<p>A remarkable distinction of our work is that we have not considered any single predefined temporal sample as a representative for the surprise of the brain. Extracting a reliable single temporal value from each epoch (even after epoch averaging, which is a common practice in ERP analysis) is a complex and rather <italic>ad hoc</italic> procedure (<xref ref-type="bibr" rid="B18">Debener et al., 2005</xref>; <xref ref-type="bibr" rid="B12">Cecotti and Graser, 2010</xref>; <xref ref-type="bibr" rid="B90">Turnip et al., 2011</xref>; <xref ref-type="bibr" rid="B2">Amini et al., 2013</xref>; <xref ref-type="bibr" rid="B48">Kolossa et al., 2013</xref>). In our approach, we use data from the entire response on a trial-by-trial basis to derive the surprise of the brain as a linear combination of the samples of the response with optimally determined weights.</p>
</sec>
<sec id="S4.SS3">
<title>Optimal Use of the Spatially Distributed Effects of Surprise</title>
<p>Earlier studies based on fMRI data analysis have reported that the Shannon and Bayesian surprise values are modulated in different brain regions (<xref ref-type="bibr" rid="B67">Ostwald et al., 2012</xref>; <xref ref-type="bibr" rid="B66">O&#x2019;Reilly et al., 2013</xref>; <xref ref-type="bibr" rid="B80">Schwartenbeck et al., 2016</xref>; <xref ref-type="bibr" rid="B91">Visalli et al., 2019</xref>). In addition, the well-known surprise-related components of the ERP signal such as MMN and P300 have been shown to emanate from the fronto-parietal and the fronto-central regions of the brain, respectively (<xref ref-type="bibr" rid="B30">Giard et al., 1990</xref>). In the current study, we have not imposed any spatial preferences between the magnetometers or among the ICs with spatial distributions close to the known sources of surprise in the brain. This choice offers generality to our analysis through employing all available data and letting the decoders capture all the relevant information during the training procedure. In fact, we employed a sparse regression model, which forces the coefficients of the surprise-irrelevant temporal/spatial components to be zero.</p>
</sec>
<sec id="S4.SS4">
<title>Optimizing the Timescale of Integration</title>
<p>The best description for the Bayesian surprise derived from the brain&#x2019;s response occurs when a rather short window of integration is used. This behavior stems from the very definition of the Bayesian surprise. The value of the KL divergence constantly decreases as we increase the timescale of integration since the two distributions involved become closer to each other. Given the rather short window of integration involved in keeping track of the Bayesian surprise, this quantification of surprise tends to be more sensitive to fluctuations in the recorded data compared to the Shannon and confidence-corrected surprises. The latter two use longer windows of integration, and are hence more robust to such fluctuations and can provide more accurate estimates of the underlying statistics of the input sequence generation process. This superiority is reflected in the higher decoding performance for these two concepts of surprise over the Bayesian surprise as illustrated across all of our results.</p>
</sec>
<sec id="S4.SS5">
<title>Magnetoencephalography and Electroencephalography Comparison</title>
<p><xref ref-type="bibr" rid="B55">Malmivuo (2012)</xref> suggested that MEG and EEG recordings are only partially independent. While EEG-based studies have provided an understanding of the temporal and spatial signatures of surprise, the better signal-to-noise ratio and readability of the MEG recordings compared to EEG (<xref ref-type="bibr" rid="B33">H&#x00E4;m&#x00E4;l&#x00E4;inen et al., 1993</xref>; <xref ref-type="bibr" rid="B86">Strauss et al., 2015</xref>) offer opportunities based on MEG data for further examination of the mechanisms that generate surprise in the brain. A larger number of recording sensors distributed more densely across the head, as is often the case for MEG recordings, provides better coverage of local activity beneath the scalp.</p>
<p>A likely explanation for the lower performance of the late components in the MEG analysis in our decoding model, which do not reflect the powerful P300 response in EEG data, can be that while each EEG sensor collects and integrates data from a rather distributed and deep set of sources in the brain (<xref ref-type="bibr" rid="B56">Malmivuo and Plonsey, 1995</xref>), each MEG sensor can only capture the activities of sources in its close proximity beneath the scalp (<xref ref-type="bibr" rid="B78">Schomer and Da Silva, 2012</xref>). The surprise generation mechanisms of the brain transmit signals to a number of different regions of the brain, which in turn produce the late components of surprise which are distributed and diffused. The relatively lower decoding power of the late components in MEG records can be explained by noting that since these late components are generated by distributed sources, MEG sensors may not be able to adequately capture them (<xref ref-type="bibr" rid="B92">Wacongne et al., 2011</xref>; <xref ref-type="bibr" rid="B43">Ilmoniemi and Sarvas, 2019</xref>).</p>
</sec>
<sec id="S4.SS6">
<title>Spatial Signatures of Surprise</title>
<p>Frontal regions of the cortex (including the dorsal cingulate cortex) were reported in fMRI studies (e.g., by <xref ref-type="bibr" rid="B80">Schwartenbeck et al., 2016</xref>) to modulate activities related to information-theoretic (Shannon) surprise. The posterior parietal cortex (<xref ref-type="bibr" rid="B66">O&#x2019;Reilly et al., 2013</xref>) and the inferior frontal gyrus are proposed as two regions that correlate with both the Shannon and Bayesian surprises (<xref ref-type="bibr" rid="B91">Visalli et al., 2019</xref>). Our observations on data collected from the scalp by MEG sensors are in agreement with these fMRI-based studies. Magnetometers placed on the two sides of the frontal midline may detect the surprise-related activity of the dorsal cingulate cortex, which is located closer to the scalp, while magnetometers placed on the two sides of the parietal midline may detect the activities of the posterior parietal cortex (see <xref ref-type="fig" rid="F5">Figures 5</xref>, <xref ref-type="fig" rid="F6">6</xref>). However, making interpretations about the sources evoked by auditory stimulation which result in such topographic maps is subject to ambiguity as discussed in the literature for some time (<xref ref-type="bibr" rid="B34">H&#x00E4;m&#x00E4;l&#x00E4;inen et al., 1995</xref>). On the one hand, interpretations such as above may be challenged in light of the implied orientation of the underlying sources, i.e., to have both the cingulate and posterior parietal source dipoles be oriented along the anterior-posterior axis, which is not expected anatomically. Accordingly, an alternative interpretation of the topographic maps in <xref ref-type="fig" rid="F5">Figures 5</xref>, <xref ref-type="fig" rid="F6">6</xref> could be that they might reveal activities corresponding to bilateral superior temporal lobe sources as maps similar to those are typically evoked by auditory stimuli and are reported to indicate bilateral auditory cortex sources (<xref ref-type="bibr" rid="B93">Zevin, 2009</xref>). On the other hand, some studies argue for not attributing MEG sources to deep regions of the brain (like temporal lobe) by pointing out that the MEG data acquisition is most sensitive to superficial sources, and that its sensitivity is much reduced for deep sources (<xref ref-type="bibr" rid="B14">Cohen and Cuffin, 1983</xref>; <xref ref-type="bibr" rid="B17">de Jongh et al., 2005</xref>; <xref ref-type="bibr" rid="B1">Ahlfors et al., 2010</xref>). According to such observations, attributing the four maxima in the maps of <xref ref-type="fig" rid="F5">Figures 5</xref>, <xref ref-type="fig" rid="F6">6</xref> to frontal and parietal source pairs may be a possibility. However, and adding to the complexity of making interpretations on the MEG topographical maps, one could also mention the possibility that bilateral sources in the auditory cortices may also produce an extra deflection in these maps close to the posterior midline due to the proximity of fields from the two sources which have opposite directions (<xref ref-type="bibr" rid="B34">H&#x00E4;m&#x00E4;l&#x00E4;inen et al., 1995</xref>).</p>
</sec>
</sec>
<sec id="S5" sec-type="conclusion">
<title>Conclusion</title>
<p>Surprise and its impact have been well characterized in many studies on social interactions as well as in computational frameworks using recorded brain signals. However, an information-theoretical model to describe and predict the surprise level of an external stimulus in recorded MEG data has not been reported to date. The current study proposed a regression model for decoding the level of the brain&#x2019;s surprise in response to sensory sequences using optimally selected temporal components of recorded MEG data. Three surprise quantification definitions, Shannon, Bayesian, and confidence-corrected, were assessed in offering decoding power in modeling the recorded data. Four different regimes for selecting temporal samples were used to evaluate which parts of the recorded data contain signatures that best represent the brain&#x2019;s surprise. We found that the middle temporal components of the MEG response offer the strongest power for decoding surprise. The best magnetometers for collecting the activities related to all three concepts of surprise were found to be in the right and left fronto-central regions. Measuring surprise of the brain by decoding techniques such as the method proposed in the current study can complement data obtained <italic>via</italic> behavioral observations in order to devise computational models for evaluating the effect of surprise in social interactions.</p>
</sec>
<sec id="S6" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://osf.io/wtnke/">https://osf.io/wtnke/</ext-link>.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>MK and ZM contributed to material preparation and data analysis. ZM wrote the first draft of the manuscript. HA supervised the work and edited the manuscript. All authors contributed to the study conception and design, read, and approved the final manuscript.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack><p>We would like to thank Maxime Maheu and his colleagues for making the MEG oddball dataset and the Bayesian observer code publicly available (<xref ref-type="bibr" rid="B54">Maheu et al., 2019</xref>).</p>
</ack>
<sec id="S9" sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fnsys.2022.865453/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fnsys.2022.865453/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.docx" id="TS1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahlfors</surname> <given-names>S. P.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Belliveau</surname> <given-names>J. W.</given-names></name> <name><surname>H&#x00E4;m&#x00E4;l&#x00E4;inen</surname> <given-names>M. S.</given-names></name></person-group> (<year>2010</year>). <article-title>Sensitivity of MEG and EEG to source orientation.</article-title> <source><italic>Brain Topography</italic></source> <volume>23</volume> <fpage>227</fpage>&#x2013;<lpage>232</lpage>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amini</surname> <given-names>Z.</given-names></name> <name><surname>Abootalebi</surname> <given-names>V.</given-names></name> <name><surname>Sadeghi</surname> <given-names>M. T.</given-names></name></person-group> (<year>2013</year>). <article-title>Comparison of performance of different feature extraction methods in detection of P300.</article-title> <source><italic>Biocybern. Biomed. Eng.</italic></source> <volume>33</volume> <fpage>3</fpage>&#x2013;<lpage>20</lpage>.</citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldi</surname> <given-names>P.</given-names></name></person-group> (<year>2002</year>). <source><italic>A Computational Theory of Surprise. In Information, Coding and Mathematics.</italic></source> <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Springer</publisher-name>, <fpage>1</fpage>&#x2013;<lpage>25</lpage>.</citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldi</surname> <given-names>P.</given-names></name> <name><surname>Itti</surname> <given-names>L.</given-names></name></person-group> (<year>2010</year>). <article-title>Of bits and wows: A Bayesian theory of surprise with applications to attention.</article-title> <source><italic>Neural Netw.</italic></source> <volume>23</volume> <fpage>649</fpage>&#x2013;<lpage>666</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2009.12.007</pub-id> <pub-id pub-id-type="pmid">20080025</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barascud</surname> <given-names>N.</given-names></name> <name><surname>Pearce</surname> <given-names>M. T.</given-names></name> <name><surname>Griffiths</surname> <given-names>T. D.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Chait</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Brain responses in humans reveal ideal observer-like sensitivity to complex acoustic patterns.</article-title> <source><italic>Proc. Nat. Acad. Sci. U. S. A.</italic></source> <volume>113</volume> <fpage>E616</fpage>&#x2013;<lpage>E625</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1508523113</pub-id> <pub-id pub-id-type="pmid">26787854</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barcelo</surname> <given-names>F.</given-names></name> <name><surname>Escera</surname> <given-names>C.</given-names></name> <name><surname>Corral</surname> <given-names>M. J.</given-names></name> <name><surname>Peri&#x00E1;&#x00F1;ez</surname> <given-names>J. A.</given-names></name></person-group> (<year>2006</year>). <article-title>Task switching and novelty processing activate a common neural network for cognitive control.</article-title> <source><italic>J. Cogn. Neurosci.</italic></source> <volume>18</volume> <fpage>1734</fpage>&#x2013;<lpage>1748</lpage>.</citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>A. J.</given-names></name> <name><surname>Sejnowski</surname> <given-names>T. J.</given-names></name></person-group> (<year>1995</year>). <article-title>An information-maximization approach to blind separation and blind deconvolution.</article-title> <source><italic>Neural Computation</italic></source> <volume>7</volume> <fpage>1129</fpage>&#x2013;<lpage>1159</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1995.7.6.1129</pub-id> <pub-id pub-id-type="pmid">7584893</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behrens</surname> <given-names>T. E.</given-names></name> <name><surname>Woolrich</surname> <given-names>M. W.</given-names></name> <name><surname>Walton</surname> <given-names>M. E.</given-names></name> <name><surname>Rushworth</surname> <given-names>M. F.</given-names></name></person-group> (<year>2007</year>). <article-title>Learning the value of information in an uncertain world</article-title>. <source><italic>Nat. Neurosci.</italic></source> <volume>10</volume>, <fpage>1214</fpage>&#x2013;<lpage>1221</lpage>. <pub-id pub-id-type="doi">10.1038/nn1954</pub-id> <pub-id pub-id-type="pmid">17676057</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bestmann</surname> <given-names>S.</given-names></name> <name><surname>Harrison</surname> <given-names>L. M.</given-names></name> <name><surname>Blankenburg</surname> <given-names>F.</given-names></name> <name><surname>Mars</surname> <given-names>R. B.</given-names></name> <name><surname>Haggard</surname> <given-names>P.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Influence of uncertainty and surprise on human corticospinal excitability during preparation for action.</article-title> <source><italic>Curr. Biol.</italic></source> <volume>18</volume> <fpage>775</fpage>&#x2013;<lpage>780</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2008.04.051</pub-id> <pub-id pub-id-type="pmid">18485711</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonferroni</surname> <given-names>C.</given-names></name></person-group> (<year>1936</year>). <article-title>Teoria statistica delle classi e calcolo delle probabilita.</article-title> <source><italic>Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze</italic></source> <volume>8</volume> <fpage>3</fpage>&#x2013;<lpage>62</lpage>.</citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bradley</surname> <given-names>M. M.</given-names></name> <name><surname>Greenwald</surname> <given-names>M. K.</given-names></name> <name><surname>Petry</surname> <given-names>M. C.</given-names></name> <name><surname>Lang</surname> <given-names>P. J.</given-names></name></person-group> (<year>1992</year>). <article-title>Remembering pictures: pleasure and arousal in memory.</article-title> <source><italic>J. Exp. Psychol.</italic></source> <volume>18</volume>:<issue>379</issue>. <pub-id pub-id-type="doi">10.1037//0278-7393.18.2.379</pub-id> <pub-id pub-id-type="pmid">1532823</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cecotti</surname> <given-names>H.</given-names></name> <name><surname>Graser</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>Convolutional neural networks for P300 detection with application to brain-computer interfaces.</article-title> <source><italic>IEEE Trans. Pattern Analysis Machine Intellig.</italic></source> <volume>33</volume> <fpage>433</fpage>&#x2013;<lpage>445</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2010.125</pub-id> <pub-id pub-id-type="pmid">20567055</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chait</surname> <given-names>M.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name> <name><surname>De Cheveign&#x00E9;</surname> <given-names>A.</given-names></name> <name><surname>Simon</surname> <given-names>J. Z.</given-names></name></person-group> (<year>2007</year>). <article-title>Processing asymmetry of transitions between order and disorder in human auditory cortex.</article-title> <source><italic>J. Neurosci.</italic></source> <volume>27</volume> <fpage>5207</fpage>&#x2013;<lpage>5214</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0318-07.2007</pub-id> <pub-id pub-id-type="pmid">17494707</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname> <given-names>D.</given-names></name> <name><surname>Cuffin</surname> <given-names>B. N.</given-names></name></person-group> (<year>1983</year>). <article-title>Demonstration of useful differences between magnetoencephalogram and electroencephalogram.</article-title> <source><italic>Electroencephalogr. Clin. Neurophysiol.</italic></source> <volume>56</volume> <fpage>38</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1016/0013-4694(83)90005-6</pub-id> <pub-id pub-id-type="pmid">6190632</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cover</surname> <given-names>T. M.</given-names></name></person-group> (<year>1999</year>). <source><italic>Elements of Information Theory.</italic></source> <publisher-loc>New York</publisher-loc>: <publisher-name>John Wiley &#x0026; Sons</publisher-name>.</citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Den Ouden</surname> <given-names>H. E.</given-names></name> <name><surname>Pessiglione</surname> <given-names>M.</given-names></name> <name><surname>Kiebel</surname> <given-names>S. J.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Observing the observer (I): meta-bayesian models of learning and decision-making.</article-title> <source><italic>PLoS One</italic></source> <volume>5</volume>:<issue>e15554</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0015554</pub-id> <pub-id pub-id-type="pmid">21179480</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Jongh</surname> <given-names>A.</given-names></name> <name><surname>de Munck</surname> <given-names>J. C.</given-names></name> <name><surname>Gon&#x00E7;alves</surname> <given-names>S. I.</given-names></name> <name><surname>Ossenblok</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <article-title>Differences in MEG/EEG epileptic spike yields explained by regional differences in signal-to-noise ratios.</article-title> <source><italic>J. Clin. Neurophysiol.</italic></source> <volume>22</volume> <fpage>153</fpage>&#x2013;<lpage>158</lpage>. <pub-id pub-id-type="doi">10.1097/01.wnp.0000158947.68733.51</pub-id> <pub-id pub-id-type="pmid">15805816</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Debener</surname> <given-names>S.</given-names></name> <name><surname>Makeig</surname> <given-names>S.</given-names></name> <name><surname>Delorme</surname> <given-names>A.</given-names></name> <name><surname>Engel</surname> <given-names>A. K.</given-names></name></person-group> (<year>2005</year>). <article-title>What is novel in the novelty oddball paradigm? Functional significance of the novelty P3 event-related potential as revealed by independent component analysis.</article-title> <source><italic>Cogn. Brain Res.</italic></source> <volume>22</volume> <fpage>309</fpage>&#x2013;<lpage>321</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogbrainres.2004.09.006</pub-id> <pub-id pub-id-type="pmid">15722203</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Delorme</surname> <given-names>A.</given-names></name> <name><surname>Makeig</surname> <given-names>S.</given-names></name></person-group> (<year>2004</year>). <article-title>EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis.</article-title> <source><italic>J. Neurosci. Met.</italic></source> <volume>134</volume> <fpage>9</fpage>&#x2013;<lpage>21</lpage>.</citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doya</surname> <given-names>K.</given-names></name> <name><surname>Ishii</surname> <given-names>S.</given-names></name> <name><surname>Pouget</surname> <given-names>A.</given-names></name> <name><surname>Rao</surname> <given-names>R. P.</given-names></name></person-group> (<role>eds</role>) (<year>2007</year>). <source><italic>Bayesian Brain: Probabilistic Approaches to Neural Coding.</italic></source> <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT press</publisher-name>.</citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duncan</surname> <given-names>C. C.</given-names></name> <name><surname>Barry</surname> <given-names>R. J.</given-names></name> <name><surname>Connolly</surname> <given-names>J. F.</given-names></name> <name><surname>Fischer</surname> <given-names>C.</given-names></name> <name><surname>Michie</surname> <given-names>P. T.</given-names></name> <name><surname>N&#x00E4;&#x00E4;t&#x00E4;nen</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2009</year>). <article-title>Event-related potentials in clinical research: guidelines for eliciting, recording, and quantifying mismatch negativity, P300, and N400.</article-title> <source><italic>Clin. Neurophysiol.</italic></source> <volume>120</volume> <fpage>1883</fpage>&#x2013;<lpage>1908</lpage>. <pub-id pub-id-type="doi">10.1016/j.clinph.2009.07.045</pub-id> <pub-id pub-id-type="pmid">19796989</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faraji</surname> <given-names>M.</given-names></name> <name><surname>Preuschoff</surname> <given-names>K.</given-names></name> <name><surname>Gerstner</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>Balancing new against old information: the role of puzzlement surprise in learning.</article-title> <source><italic>Neural Computation</italic></source> <volume>30</volume> <fpage>34</fpage>&#x2013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1162/neco_a_01025</pub-id> <pub-id pub-id-type="pmid">29064784</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2005</year>). <article-title>A theory of cortical responses.</article-title> <source><italic>Biol. Sci.</italic></source> <volume>360</volume> <fpage>815</fpage>&#x2013;<lpage>836</lpage>.</citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>The free-energy principle: a rough guide to the brain?</article-title> <source><italic>Trends Cogn. Sci.</italic></source> <volume>13</volume> <fpage>293</fpage>&#x2013;<lpage>301</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2009.04.005</pub-id> <pub-id pub-id-type="pmid">19559644</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2012</year>). <article-title>The history of the future of the Bayesian brain.</article-title> <source><italic>NeuroImage</italic></source> <volume>62</volume> <fpage>1230</fpage>&#x2013;<lpage>1233</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2011.10.004</pub-id> <pub-id pub-id-type="pmid">22023743</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name> <name><surname>FitzGerald</surname> <given-names>T.</given-names></name> <name><surname>Rigoli</surname> <given-names>F.</given-names></name> <name><surname>Schwartenbeck</surname> <given-names>P.</given-names></name> <name><surname>Pezzulo</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Active inference: a process theory.</article-title> <source><italic>Neural Computation</italic></source> <volume>29</volume> <fpage>1</fpage>&#x2013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1162/NECO_a_00912</pub-id> <pub-id pub-id-type="pmid">27870614</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Frith</surname> <given-names>C. D.</given-names></name></person-group> (<year>2015</year>). <article-title>Active inference, communication and hermeneutics.</article-title> <source><italic>Cortex</italic></source> <volume>68</volume> <fpage>129</fpage>&#x2013;<lpage>143</lpage>. <pub-id pub-id-type="doi">10.1016/j.cortex.2015.03.025</pub-id> <pub-id pub-id-type="pmid">25957007</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garrido</surname> <given-names>M. I.</given-names></name> <name><surname>Kilner</surname> <given-names>J. M.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2009</year>). <article-title>The mismatch negativity: a review of underlying mechanisms.</article-title> <source><italic>Clin. Neurophysiol.</italic></source> <volume>120</volume> <fpage>453</fpage>&#x2013;<lpage>463</lpage>. <pub-id pub-id-type="doi">10.1016/j.clinph.2008.11.029</pub-id> <pub-id pub-id-type="pmid">19181570</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garrido</surname> <given-names>M. I.</given-names></name> <name><surname>Teng</surname> <given-names>C. L. J.</given-names></name> <name><surname>Taylor</surname> <given-names>J. A.</given-names></name> <name><surname>Rowe</surname> <given-names>E. G.</given-names></name> <name><surname>Mattingley</surname> <given-names>J. B.</given-names></name></person-group> (<year>2016</year>). <article-title>Surprise responses in the human brain demonstrate statistical learning under high concurrent cognitive demand.</article-title> <source><italic>Npj Sci. Learn.</italic></source> <volume>1</volume> <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1038/npjscilearn.2016.6</pub-id> <pub-id pub-id-type="pmid">30792892</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giard</surname> <given-names>M. H.</given-names></name> <name><surname>Perrin</surname> <given-names>F.</given-names></name> <name><surname>Pernier</surname> <given-names>J.</given-names></name> <name><surname>Bouchet</surname> <given-names>P.</given-names></name></person-group> (<year>1990</year>). <article-title>Brain generators implicated in the processing of auditory stimulus deviance: a topographic event-related potential study.</article-title> <source><italic>Psychophysiology</italic></source> <volume>27</volume> <fpage>627</fpage>&#x2013;<lpage>640</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.1990.tb03184.x</pub-id> <pub-id pub-id-type="pmid">2100348</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gijsen</surname> <given-names>S.</given-names></name> <name><surname>Grundei</surname> <given-names>M.</given-names></name> <name><surname>Lange</surname> <given-names>R. T.</given-names></name> <name><surname>Ostwald</surname> <given-names>D.</given-names></name> <name><surname>Blankenburg</surname> <given-names>F.</given-names></name></person-group> (<year>2021</year>). <article-title>Neural surprise in somatosensory Bayesian learning.</article-title> <source><italic>PLoS Computational Biol.</italic></source> <volume>17</volume>:<issue>e1008068</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1008068</pub-id> <pub-id pub-id-type="pmid">33529181</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gueorguieva</surname> <given-names>R.</given-names></name> <name><surname>Krystal</surname> <given-names>J. H.</given-names></name></person-group> (<year>2004</year>). <article-title>Move over anova: progress in analyzing repeated-measures data andits reflection in papers published in the archives of general psychiatry.</article-title> <source><italic>Archives General Psychiatry</italic></source> <volume>61</volume> <fpage>310</fpage>&#x2013;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1001/archpsyc.61.3.310</pub-id> <pub-id pub-id-type="pmid">14993119</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>H&#x00E4;m&#x00E4;l&#x00E4;inen</surname> <given-names>M.</given-names></name> <name><surname>Hari</surname> <given-names>R.</given-names></name> <name><surname>Ilmoniemi</surname> <given-names>R. J.</given-names></name> <name><surname>Knuutila</surname> <given-names>J.</given-names></name> <name><surname>Lounasmaa</surname> <given-names>O. V.</given-names></name></person-group> (<year>1993</year>). <article-title>Magnetoencephalography&#x2014;theory, instrumentation, and applications to noninvasive studies of the working human brain.</article-title> <source><italic>Rev. Modern Physics</italic></source> <volume>65</volume>:<issue>413</issue>.</citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>H&#x00E4;m&#x00E4;l&#x00E4;inen</surname> <given-names>M.</given-names></name> <name><surname>Hari</surname> <given-names>R.</given-names></name> <name><surname>Lounasmaa</surname> <given-names>O. V.</given-names></name> <name><surname>Williamson</surname> <given-names>S. J.</given-names></name></person-group> (<year>1995</year>). <article-title>Do auditory stimuli activate human parietal brain regions?</article-title> <source><italic>Neuro Rep.</italic></source> <volume>6</volume>:<issue>1712</issue>. <pub-id pub-id-type="doi">10.1097/00001756-199509000-00001</pub-id> <pub-id pub-id-type="pmid">8541465</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname> <given-names>L.</given-names></name> <name><surname>Bestmann</surname> <given-names>S.</given-names></name> <name><surname>Rosa</surname> <given-names>M. J.</given-names></name> <name><surname>Penny</surname> <given-names>W.</given-names></name> <name><surname>Green</surname> <given-names>G. G.</given-names></name></person-group> (<year>2011</year>). <article-title>Time scales of representation in the human brain: Weighing past information to predict future events.</article-title> <source><italic>Front. Hum. Neurosci.</italic></source> <volume>5</volume>:<issue>37</issue>. <pub-id pub-id-type="doi">10.3389/fnhum.2011.00037</pub-id> <pub-id pub-id-type="pmid">21629858</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname> <given-names>L. M.</given-names></name> <name><surname>Duggins</surname> <given-names>A.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Encoding uncertainty in the hippocampus.</article-title> <source><italic>Neural Netw.</italic></source> <volume>19</volume> <fpage>535</fpage>&#x2013;<lpage>546</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2005.11.002</pub-id> <pub-id pub-id-type="pmid">16527453</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hartwig</surname> <given-names>M.</given-names></name> <name><surname>Peters</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Cooperation and Social Rules Emerging From the Principle of Surprise Minimization.</article-title> <source><italic>Front. Psychol.</italic></source> <volume>11</volume>:<issue>3668</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2020.606174</pub-id> <pub-id pub-id-type="pmid">33551917</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heath</surname> <given-names>C.</given-names></name> <name><surname>Bell</surname> <given-names>C.</given-names></name> <name><surname>Sternberg</surname> <given-names>E.</given-names></name></person-group> (<year>2001</year>). <article-title>Emotional selection in memes: the case of urban legends.</article-title> <source><italic>J. Person. Soc. Psychol.</italic></source> <volume>81</volume>:<issue>1028</issue>. <pub-id pub-id-type="doi">10.1037//0022-3514.81.6.1028</pub-id> <pub-id pub-id-type="pmid">11761305</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heilbron</surname> <given-names>M.</given-names></name> <name><surname>Chait</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Great expectations: is there evidence for predictive coding in auditory cortex?</article-title> <source><italic>Neuroscience</italic></source> <volume>389</volume> <fpage>54</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroscience.2017.07.061</pub-id> <pub-id pub-id-type="pmid">28782642</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horstmann</surname> <given-names>G.</given-names></name></person-group> (<year>2002</year>). <article-title>Evidence for attentional capture by a surprising color singleton in visual search.</article-title> <source><italic>Psychol. Sci.</italic></source> <volume>13</volume> <fpage>499</fpage>&#x2013;<lpage>505</lpage>. <pub-id pub-id-type="doi">10.1111/1467-9280.00488</pub-id> <pub-id pub-id-type="pmid">12430832</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huettel</surname> <given-names>S. A.</given-names></name> <name><surname>Mack</surname> <given-names>P. B.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>2002</year>). <article-title>Perceiving patterns in random series: dynamic processing of sequence in prefrontal cortex.</article-title> <source><italic>Nat. Neurosci.</italic></source> <volume>5</volume> <fpage>485</fpage>&#x2013;<lpage>490</lpage>. <pub-id pub-id-type="doi">10.1038/nn841</pub-id> <pub-id pub-id-type="pmid">11941373</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hyv&#x00E4;rinen</surname> <given-names>A.</given-names></name></person-group> (<year>1999</year>). <article-title>Fast and robust fixed-point algorithms for independent component analysis.</article-title> <source><italic>IEEE Trans. Neural Netw.</italic></source> <volume>10</volume> <fpage>626</fpage>&#x2013;<lpage>634</lpage>. <pub-id pub-id-type="doi">10.1109/72.761722</pub-id> <pub-id pub-id-type="pmid">18252563</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ilmoniemi</surname> <given-names>R. J.</given-names></name> <name><surname>Sarvas</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <source><italic>Brain Signals: Physics and Mathematics of MEG and EEG.</italic></source> <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itti</surname> <given-names>L.</given-names></name> <name><surname>Baldi</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>Bayesian surprise attracts human attention.</article-title> <source><italic>Vision Res.</italic></source> <volume>49</volume> <fpage>1295</fpage>&#x2013;<lpage>1306</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2008.09.007</pub-id> <pub-id pub-id-type="pmid">18834898</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiebel</surname> <given-names>S. J.</given-names></name> <name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2008</year>). <article-title>A hierarchy of time-scales and the brain.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>4</volume>:<issue>e1000209</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000209</pub-id> <pub-id pub-id-type="pmid">19008936</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knill</surname> <given-names>D. C.</given-names></name> <name><surname>Pouget</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>The Bayesian brain: the role of uncertainty in neural coding and computation.</article-title> <source><italic>Trends Neurosci.</italic></source> <volume>27</volume> <fpage>712</fpage>&#x2013;<lpage>719</lpage>. <pub-id pub-id-type="doi">10.1016/j.tins.2004.10.007</pub-id> <pub-id pub-id-type="pmid">15541511</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kok</surname> <given-names>P.</given-names></name> <name><surname>de Lange</surname> <given-names>F. P.</given-names></name></person-group> (<role>eds</role>) (<year>2015</year>). <source><italic>Predictive Coding in Sensory Cortex. In an Introduction to Model-Based Cognitive Neuroscience.</italic></source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>, <fpage>221</fpage>&#x2013;<lpage>244</lpage>.</citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kolossa</surname> <given-names>A.</given-names></name> <name><surname>Fingscheidt</surname> <given-names>T.</given-names></name> <name><surname>Wessel</surname> <given-names>K.</given-names></name> <name><surname>Kopp</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>A model-based approach to trial-by-trial P300 amplitude fluctuations.</article-title> <source><italic>Front. Hum. Neurosci.</italic></source> <volume>6</volume>:<issue>359</issue>. <pub-id pub-id-type="doi">10.3389/fnhum.2012.00359</pub-id> <pub-id pub-id-type="pmid">23404628</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kolossa</surname> <given-names>A.</given-names></name> <name><surname>Kopp</surname> <given-names>B.</given-names></name> <name><surname>Fingscheidt</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>A computational analysis of the neural bases of Bayesian inference.</article-title> <source><italic>Neuroimage</italic></source> <volume>106</volume> <fpage>222</fpage>&#x2013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2014.11.007</pub-id> <pub-id pub-id-type="pmid">25462794</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kullback</surname> <given-names>S.</given-names></name></person-group> (<year>1997</year>). <source><italic>Information Theory and Statistics.</italic></source> <publisher-loc>Massachusetts</publisher-loc>: <publisher-name>Courier Corporation</publisher-name>.</citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lieder</surname> <given-names>F.</given-names></name> <name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Garrido</surname> <given-names>M. I.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name></person-group> (<year>2013</year>). <article-title>Modelling trial-by-trial changes in the mismatch negativity.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>9</volume>:<issue>e1002911</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002911</pub-id> <pub-id pub-id-type="pmid">23436989</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Loewenstein</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Surprise, recipes for surprise, and social influence.</article-title> <source><italic>Top. Cogn. Sci.</italic></source> <volume>11</volume> <fpage>178</fpage>&#x2013;<lpage>193</lpage>. <pub-id pub-id-type="doi">10.1111/tops.12312</pub-id> <pub-id pub-id-type="pmid">29411939</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Loewenstein</surname> <given-names>J.</given-names></name> <name><surname>Heath</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>The Repetition-Break plot structure: A cognitive influence on selection in the marketplace of ideas.</article-title> <source><italic>Cogn. Sci.</italic></source> <volume>33</volume> <fpage>1</fpage>&#x2013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1111/j.1551-6709.2008.01001.x</pub-id> <pub-id pub-id-type="pmid">21585461</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maheu</surname> <given-names>M.</given-names></name> <name><surname>Dehaene</surname> <given-names>S.</given-names></name> <name><surname>Meyniel</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Brain signatures of a multiscale process of sequence learning in humans.</article-title> <source><italic>Elife</italic></source> <volume>8</volume>:<issue>e41541</issue>. <pub-id pub-id-type="doi">10.7554/eLife.41541</pub-id> <pub-id pub-id-type="pmid">30714904</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malmivuo</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Comparison of the properties of EEG and MEG in detecting the electric activity of the brain.</article-title> <source><italic>Brain Topography</italic></source> <volume>25</volume> <fpage>1</fpage>&#x2013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1007/s10548-011-0202-1</pub-id> <pub-id pub-id-type="pmid">21912974</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malmivuo</surname> <given-names>J.</given-names></name> <name><surname>Plonsey</surname> <given-names>R.</given-names></name></person-group> (<year>1995</year>). <source><italic>Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields.</italic></source> <publisher-loc>Oxford, USA</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mars</surname> <given-names>R. B.</given-names></name> <name><surname>Debener</surname> <given-names>S.</given-names></name> <name><surname>Gladwin</surname> <given-names>T. E.</given-names></name> <name><surname>Harrison</surname> <given-names>L. M.</given-names></name> <name><surname>Haggard</surname> <given-names>P.</given-names></name> <name><surname>Rothwell</surname> <given-names>J. C.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Trial-by-trial fluctuations in the event-related electroencephalogram reflect dynamic changes in the degree of surprise.</article-title> <source><italic>J. Neurosci.</italic></source> <volume>28</volume> <fpage>12539</fpage>&#x2013;<lpage>12545</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2925-08.2008</pub-id> <pub-id pub-id-type="pmid">19020046</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mathys</surname> <given-names>C.</given-names></name> <name><surname>Daunizeau</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Stephan</surname> <given-names>K. E.</given-names></name></person-group> (<year>2011</year>). <article-title>A Bayesian foundation for individual learning under uncertainty.</article-title> <source><italic>Front. Hum. Neurosci.</italic></source> <volume>5</volume>:<issue>39</issue>. <pub-id pub-id-type="doi">10.3389/fnhum.2011.00039</pub-id> <pub-id pub-id-type="pmid">21629826</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meyniel</surname> <given-names>F.</given-names></name> <name><surname>Maheu</surname> <given-names>M.</given-names></name> <name><surname>Dehaene</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Human inferences about sequences: A minimal transition probability model.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>12</volume>:<issue>e1005260</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005260</pub-id> <pub-id pub-id-type="pmid">28030543</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Modirshanechi</surname> <given-names>A.</given-names></name> <name><surname>Kiani</surname> <given-names>M. M.</given-names></name> <name><surname>Aghajan</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>Trial-by-trial surprise-decoding model for visual and auditory binary oddball tasks.</article-title> <source><italic>NeuroImage</italic></source> <volume>196</volume> <fpage>302</fpage>&#x2013;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2019.04.028</pub-id> <pub-id pub-id-type="pmid">30980899</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Musiolek</surname> <given-names>L.</given-names></name> <name><surname>Blankenburg</surname> <given-names>F.</given-names></name> <name><surname>Ostwald</surname> <given-names>D.</given-names></name> <name><surname>Rabovsky</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Modeling the n400 brain potential as semantic Bayesian surprise</article-title>,&#x201D; in <source><italic>Proceedings of the 2019 Conference on Cognitive Computational Neuroscience</italic></source>, <publisher-loc>Berlin</publisher-loc>.</citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nassar</surname> <given-names>M. R.</given-names></name> <name><surname>Rumsey</surname> <given-names>K. M.</given-names></name> <name><surname>Wilson</surname> <given-names>R. C.</given-names></name> <name><surname>Parikh</surname> <given-names>K.</given-names></name> <name><surname>Heasly</surname> <given-names>B.</given-names></name> <name><surname>Gold</surname> <given-names>J. I.</given-names></name></person-group> (<year>2012</year>). <article-title>Rational regulation of learning dynamics by pupil-linked arousal systems.</article-title> <source><italic>Nat. Neurosci.</italic></source> <volume>15</volume>:<issue>1040</issue>. <pub-id pub-id-type="doi">10.1038/nn.3130</pub-id> <pub-id pub-id-type="pmid">22660479</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nassar</surname> <given-names>M. R.</given-names></name> <name><surname>Wilson</surname> <given-names>R. C.</given-names></name> <name><surname>Heasly</surname> <given-names>B.</given-names></name> <name><surname>Gold</surname> <given-names>J. I.</given-names></name></person-group> (<year>2010</year>). <article-title>An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment.</article-title> <source><italic>J. Neurosci.</italic></source> <volume>30</volume> <fpage>12366</fpage>&#x2013;<lpage>12378</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0822-10.2010</pub-id> <pub-id pub-id-type="pmid">20844132</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nieuwenhuis</surname> <given-names>S.</given-names></name> <name><surname>Aston-Jones</surname> <given-names>G.</given-names></name> <name><surname>Cohen</surname> <given-names>J. D.</given-names></name></person-group> (<year>2005</year>). <article-title>Decision making, the P3, and the locus coeruleus&#x2013;norepinephrine system.</article-title> <source><italic>Psychol. Bull.</italic></source> <volume>131</volume>:<issue>510</issue>. <pub-id pub-id-type="doi">10.1037/0033-2909.131.4.510</pub-id> <pub-id pub-id-type="pmid">16060800</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oostenveld</surname> <given-names>R.</given-names></name> <name><surname>Fries</surname> <given-names>P.</given-names></name> <name><surname>Maris</surname> <given-names>E.</given-names></name> <name><surname>Schoffelen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2011</year>). <article-title>FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data.</article-title> <source><italic>Comput. Intelligence Neurosci.</italic></source> <volume>2011</volume>:<issue>156869</issue>. <pub-id pub-id-type="doi">10.1155/2011/156869</pub-id> <pub-id pub-id-type="pmid">21253357</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x2019;Reilly</surname> <given-names>J. X.</given-names></name> <name><surname>Sch&#x00FC;ffelgen</surname> <given-names>U.</given-names></name> <name><surname>Cuell</surname> <given-names>S. F.</given-names></name> <name><surname>Behrens</surname> <given-names>T. E.</given-names></name> <name><surname>Mars</surname> <given-names>R. B.</given-names></name> <name><surname>Rushworth</surname> <given-names>M. F.</given-names></name></person-group> (<year>2013</year>). <article-title>Dissociable effects of surprise and model update in parietal and anterior cingulate cortex.</article-title> <source><italic>Proc. Nat. Acad. Sci. U. S. A.</italic></source> <volume>110</volume> <fpage>E3660</fpage>&#x2013;<lpage>E3669</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1305373110</pub-id> <pub-id pub-id-type="pmid">23986499</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ostwald</surname> <given-names>D.</given-names></name> <name><surname>Spitzer</surname> <given-names>B.</given-names></name> <name><surname>Guggenmos</surname> <given-names>M.</given-names></name> <name><surname>Schmidt</surname> <given-names>T. T.</given-names></name> <name><surname>Kiebel</surname> <given-names>S. J.</given-names></name> <name><surname>Blankenburg</surname> <given-names>F.</given-names></name></person-group> (<year>2012</year>). <article-title>Evidence for neural encoding of Bayesian surprise in human somatosensation.</article-title> <source><italic>NeuroImage</italic></source> <volume>62</volume> <fpage>177</fpage>&#x2013;<lpage>188</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2012.04.050</pub-id> <pub-id pub-id-type="pmid">22579866</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>A. D.</given-names></name> <name><surname>Iversen</surname> <given-names>J. R.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Repp</surname> <given-names>B. H.</given-names></name></person-group> (<year>2005</year>). <article-title>The influence of metricality and modality on synchronization with a beat.</article-title> <source><italic>Exp. Brain Res.</italic></source> <volume>163</volume> <fpage>226</fpage>&#x2013;<lpage>238</lpage>. <pub-id pub-id-type="doi">10.1007/s00221-004-2159-8</pub-id> <pub-id pub-id-type="pmid">15654589</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira</surname> <given-names>F.</given-names></name> <name><surname>Mitchell</surname> <given-names>T.</given-names></name> <name><surname>Botvinick</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Machine learning classifiers and fMRI: a tutorial overview.</article-title> <source><italic>Neuroimage</italic></source> <volume>45</volume> <fpage>S199</fpage>&#x2013;<lpage>S209</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2008.11.007</pub-id> <pub-id pub-id-type="pmid">19070668</pub-id></citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peters</surname> <given-names>A.</given-names></name> <name><surname>McEwen</surname> <given-names>B. S.</given-names></name> <name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Uncertainty and stress: why it causes diseases and how it is mastered by the brain.</article-title> <source><italic>Prog. Neurobiol.</italic></source> <volume>156</volume> <fpage>164</fpage>&#x2013;<lpage>188</lpage>. <pub-id pub-id-type="doi">10.1016/j.pneurobio.2017.05.004</pub-id> <pub-id pub-id-type="pmid">28576664</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Petty</surname> <given-names>R. E.</given-names></name> <name><surname>Cacioppo</surname> <given-names>J. T.</given-names></name></person-group> (<role>eds</role>) (<year>1986</year>). &#x201C;<article-title>The elaboration likelihood model of persuasion</article-title>,&#x201D; in <source><italic>Communication and Persuasion</italic></source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>24</lpage>.</citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>R. P.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.</article-title> <source><italic>Nat. Neurosci.</italic></source> <volume>2</volume> <fpage>79</fpage>&#x2013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1038/4580</pub-id> <pub-id pub-id-type="pmid">10195184</pub-id></citation></ref>
<ref id="B73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roesch</surname> <given-names>M. R.</given-names></name> <name><surname>Esber</surname> <given-names>G. R.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Daw</surname> <given-names>N. D.</given-names></name> <name><surname>Schoenbaum</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Surprise! Neural correlates of Pearce&#x2013;Hall and Rescorla&#x2013;Wagner coexist within the brain.</article-title> <source><italic>Euro. J. Neurosci.</italic></source> <volume>35</volume> <fpage>1190</fpage>&#x2013;<lpage>1200</lpage>. <pub-id pub-id-type="doi">10.1111/j.1460-9568.2011.07986.x</pub-id> <pub-id pub-id-type="pmid">22487047</pub-id></citation></ref>
<ref id="B74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rouder</surname> <given-names>J. N.</given-names></name> <name><surname>Speckman</surname> <given-names>P. L.</given-names></name> <name><surname>Sun</surname> <given-names>D.</given-names></name> <name><surname>Morey</surname> <given-names>R. D.</given-names></name> <name><surname>Iverson</surname> <given-names>G.</given-names></name></person-group> (<year>2009</year>). <article-title>Bayesian t tests for accepting and rejecting the null hypothesis.</article-title> <source><italic>Psychonomic Bull. Rev.</italic></source> <volume>16</volume> <fpage>225</fpage>&#x2013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.3758/PBR.16.2.225</pub-id> <pub-id pub-id-type="pmid">19293088</pub-id></citation></ref>
<ref id="B75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rubin</surname> <given-names>J.</given-names></name> <name><surname>Ulanovsky</surname> <given-names>N.</given-names></name> <name><surname>Nelken</surname> <given-names>I.</given-names></name> <name><surname>Tishby</surname> <given-names>N.</given-names></name></person-group> (<year>2016</year>). <article-title>The representation of prediction error in auditory cortex.</article-title> <source><italic>PLoS Computational Biol.</italic></source> <volume>12</volume>:<issue>e1005058</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005058</pub-id> <pub-id pub-id-type="pmid">27490251</pub-id></citation></ref>
<ref id="B76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Russell</surname> <given-names>J. A.</given-names></name> <name><surname>Barrett</surname> <given-names>L. F.</given-names></name></person-group> (<year>1999</year>). <article-title>Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant.</article-title> <source><italic>J. Person. Soc. Psychol.</italic></source> <volume>76</volume>:<issue>805</issue>. <pub-id pub-id-type="doi">10.1037//0022-3514.76.5.805</pub-id> <pub-id pub-id-type="pmid">10353204</pub-id></citation></ref>
<ref id="B77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Formal theory of creativity, fun, and intrinsic motivation (1990&#x2013;2010).</article-title> <source><italic>IEEE Trans. Auto Mental Dev.</italic></source> <volume>2</volume> <fpage>230</fpage>&#x2013;<lpage>247</lpage>.</citation></ref>
<ref id="B78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schomer</surname> <given-names>D. L.</given-names></name> <name><surname>Da Silva</surname> <given-names>F. L.</given-names></name></person-group> (<year>2012</year>). <source><italic>Niedermeyer&#x2019;s Electroencephalography: Basic Principles, Clinical Applications, and Related Fields.</italic></source> <publisher-loc>Philadelphia</publisher-loc>: <publisher-name>Lippincott Williams &#x0026; Wilkins</publisher-name>.</citation></ref>
<ref id="B79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sch&#x00FC;tzwohl</surname> <given-names>A.</given-names></name></person-group> (<year>1998</year>). <article-title>Surprise and schema strength.</article-title> <source><italic>J. Exp. Psychol.</italic></source> <volume>24</volume>:<issue>1182</issue>. <pub-id pub-id-type="doi">10.1037//0278-7393.24.5.1182</pub-id> <pub-id pub-id-type="pmid">9747529</pub-id></citation></ref>
<ref id="B80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartenbeck</surname> <given-names>P.</given-names></name> <name><surname>FitzGerald</surname> <given-names>T. H.</given-names></name> <name><surname>Dolan</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Neural signals encoding shifts in beliefs.</article-title> <source><italic>Neuroimage</italic></source> <volume>125</volume> <fpage>578</fpage>&#x2013;<lpage>586</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2015.10.067</pub-id> <pub-id pub-id-type="pmid">26520774</pub-id></citation></ref>
<ref id="B81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartenbeck</surname> <given-names>P.</given-names></name> <name><surname>FitzGerald</surname> <given-names>T. H.</given-names></name> <name><surname>Mathys</surname> <given-names>C.</given-names></name> <name><surname>Dolan</surname> <given-names>R.</given-names></name> <name><surname>Kronbichler</surname> <given-names>M.</given-names></name> <name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <article-title>Evidence for surprise minimization over value maximization in choice behavior.</article-title> <source><italic>Sci. Rep.</italic></source> <volume>5</volume>:<issue>16575</issue>. <pub-id pub-id-type="doi">10.1038/srep16575</pub-id> <pub-id pub-id-type="pmid">26564686</pub-id></citation></ref>
<ref id="B82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seer</surname> <given-names>C.</given-names></name> <name><surname>Lange</surname> <given-names>F.</given-names></name> <name><surname>Boos</surname> <given-names>M.</given-names></name> <name><surname>Dengler</surname> <given-names>R.</given-names></name> <name><surname>Kopp</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Prior probabilities modulate cortical surprise responses: a study of event-related potentials.</article-title> <source><italic>Brain Cogn.</italic></source> <volume>106</volume> <fpage>78</fpage>&#x2013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1016/j.bandc.2016.04.011</pub-id> <pub-id pub-id-type="pmid">27266394</pub-id></citation></ref>
<ref id="B83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shannon</surname> <given-names>C. E.</given-names></name></person-group> (<year>1948</year>). <article-title>A mathematical theory of communication.</article-title> <source><italic>Bell. Syst. Tech. J.</italic></source> <volume>27</volume> <fpage>379</fpage>&#x2013;<lpage>423</lpage>.</citation></ref>
<ref id="B84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Squires</surname> <given-names>K. C.</given-names></name> <name><surname>Wickens</surname> <given-names>C.</given-names></name> <name><surname>Squires</surname> <given-names>N. K.</given-names></name> <name><surname>Donchin</surname> <given-names>E.</given-names></name></person-group> (<year>1976</year>). <article-title>The effect of stimulus sequence on the waveform of the cortical event-related potential.</article-title> <source><italic>Science</italic></source> <volume>193</volume> <fpage>1142</fpage>&#x2013;<lpage>1146</lpage>. <pub-id pub-id-type="doi">10.1126/science.959831</pub-id> <pub-id pub-id-type="pmid">959831</pub-id></citation></ref>
<ref id="B85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strange</surname> <given-names>B. A.</given-names></name> <name><surname>Duggins</surname> <given-names>A.</given-names></name> <name><surname>Penny</surname> <given-names>W.</given-names></name> <name><surname>Dolan</surname> <given-names>R. J.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2005</year>). <article-title>Information theory, novelty and hippocampal responses: unpredicted or unpredictable?</article-title> <source><italic>Neural Networks</italic></source> <volume>18</volume> <fpage>225</fpage>&#x2013;<lpage>230</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2004.12.004</pub-id> <pub-id pub-id-type="pmid">15896570</pub-id></citation></ref>
<ref id="B86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strauss</surname> <given-names>M.</given-names></name> <name><surname>Sitt</surname> <given-names>J. D.</given-names></name> <name><surname>King</surname> <given-names>J. R.</given-names></name> <name><surname>Elbaz</surname> <given-names>M.</given-names></name> <name><surname>Azizi</surname> <given-names>L.</given-names></name> <name><surname>Buiatti</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Disruption of hierarchical predictive coding during sleep.</article-title> <source><italic>Proc. Nat. Acad. Sci. U. S. A.</italic></source> <volume>112</volume> <fpage>E1353</fpage>&#x2013;<lpage>E1362</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1501026112</pub-id> <pub-id pub-id-type="pmid">25737555</pub-id></citation></ref>
<ref id="B87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sur</surname> <given-names>S.</given-names></name> <name><surname>Sinha</surname> <given-names>V. K.</given-names></name></person-group> (<year>2009</year>). <article-title>Event-related potential: An overview.</article-title> <source><italic>Industrial Psychiatry J.</italic></source> <volume>18</volume>:<issue>70</issue>. <pub-id pub-id-type="doi">10.4103/0972-6748.57865</pub-id> <pub-id pub-id-type="pmid">21234168</pub-id></citation></ref>
<ref id="B88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Todorovic</surname> <given-names>A.</given-names></name> <name><surname>de Lange</surname> <given-names>F. P.</given-names></name></person-group> (<year>2012</year>). <article-title>Repetition suppression and expectation suppression are dissociable in time in early auditory evoked fields.</article-title> <source><italic>J. Neurosci.</italic></source> <volume>32</volume> <fpage>13389</fpage>&#x2013;<lpage>13395</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2227-12.2012</pub-id> <pub-id pub-id-type="pmid">23015429</pub-id></citation></ref>
<ref id="B89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Todorovic</surname> <given-names>A.</given-names></name> <name><surname>van Ede</surname> <given-names>F.</given-names></name> <name><surname>Maris</surname> <given-names>E.</given-names></name> <name><surname>de Lange</surname> <given-names>F. P.</given-names></name></person-group> (<year>2011</year>). <article-title>Prior expectation mediates neural adaptation to repeated sounds in the auditory cortex: an MEG study.</article-title> <source><italic>J. Neurosci.</italic></source> <volume>31</volume> <fpage>9118</fpage>&#x2013;<lpage>9123</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.1425-11.2011</pub-id> <pub-id pub-id-type="pmid">21697363</pub-id></citation></ref>
<ref id="B90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turnip</surname> <given-names>A.</given-names></name> <name><surname>Hong</surname> <given-names>K. S.</given-names></name> <name><surname>Jeong</surname> <given-names>M. Y.</given-names></name></person-group> (<year>2011</year>). <article-title>Real-time feature extraction of P300 component using adaptive nonlinear principal component analysis.</article-title> <source><italic>Biomed. Eng. Online</italic></source> <volume>10</volume>:<issue>83</issue>. <pub-id pub-id-type="doi">10.1186/1475-925X-10-83</pub-id> <pub-id pub-id-type="pmid">21939560</pub-id></citation></ref>
<ref id="B91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Visalli</surname> <given-names>A.</given-names></name> <name><surname>Capizzi</surname> <given-names>M.</given-names></name> <name><surname>Ambrosini</surname> <given-names>E.</given-names></name> <name><surname>Mazzonetto</surname> <given-names>I.</given-names></name> <name><surname>Vallesi</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Bayesian modeling of temporal expectations in the human brain.</article-title> <source><italic>NeuroImage</italic></source> <volume>202</volume>:<issue>116097</issue>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2019.116097</pub-id> <pub-id pub-id-type="pmid">31415885</pub-id></citation></ref>
<ref id="B92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wacongne</surname> <given-names>C.</given-names></name> <name><surname>Labyt</surname> <given-names>E.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name> <name><surname>Bekinschtein</surname> <given-names>T.</given-names></name> <name><surname>Naccache</surname> <given-names>L.</given-names></name> <name><surname>Dehaene</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Evidence for a hierarchy of predictions and prediction errors in human cortex.</article-title> <source><italic>Proc. Nat. Acad. Sci. U. S. A.</italic></source> <volume>108</volume> <fpage>20754</fpage>&#x2013;<lpage>20759</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1117807108</pub-id> <pub-id pub-id-type="pmid">22147913</pub-id></citation></ref>
<ref id="B93"><citation citation-type="journal"><person-group person-group-type="editor"><name><surname>Zevin</surname> <given-names>J.</given-names></name></person-group> <role>(ed.)</role> (<year>2009</year>). &#x201C;<article-title>Word recognition In Squire LR</article-title>,&#x201D; in <source><italic>Encyclopedia of Neuroscience</italic></source> (<publisher-loc>Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>517</fpage>&#x2013;<lpage>522</lpage>.</citation></ref>
</ref-list>
</back>
</article>