<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="methods-article" dtd-version="1.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title-group>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2026.1643463</article-id>
<article-version article-version-type="Version of Record" vocab="NISO-RP-8-2008"/>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Hierarchical Bayesian Regression for experimental psychology: a case study of cognitive control</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Dudey</surname>
<given-names>Thomas A.</given-names>
</name>
<xref ref-type="aff" rid="aff1"/>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/3076696"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Formal analysis" vocab-term-identifier="https://credit.niso.org/contributor-roles/formal-analysis/">Formal analysis</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="visualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/visualization/">Visualization</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="investigation" vocab-term-identifier="https://credit.niso.org/contributor-roles/investigation/">Investigation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="methodology" vocab-term-identifier="https://credit.niso.org/contributor-roles/methodology/">Methodology</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="validation" vocab-term-identifier="https://credit.niso.org/contributor-roles/validation/">Validation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x0026; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-review-editing/">Writing &#x2013; review &#x0026; editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="software" vocab-term-identifier="https://credit.niso.org/contributor-roles/software/">Software</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="conceptualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jackson</surname>
<given-names>Joshua J.</given-names>
</name>
<xref ref-type="aff" rid="aff1"/>
<uri xlink:href="https://loop.frontiersin.org/people/92877"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="validation" vocab-term-identifier="https://credit.niso.org/contributor-roles/validation/">Validation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Formal analysis" vocab-term-identifier="https://credit.niso.org/contributor-roles/formal-analysis/">Formal analysis</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x0026; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-review-editing/">Writing &#x2013; review &#x0026; editing</role>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cooper</surname>
<given-names>Shelly R.</given-names>
</name>
<xref ref-type="aff" rid="aff1"/>
<uri xlink:href="https://loop.frontiersin.org/people/455500"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x0026; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-review-editing/">Writing &#x2013; review &#x0026; editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Braver</surname>
<given-names>Todd S.</given-names>
</name>
<xref ref-type="aff" rid="aff1"/>
<uri xlink:href="https://loop.frontiersin.org/people/478"/>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="investigation" vocab-term-identifier="https://credit.niso.org/contributor-roles/investigation/">Investigation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="conceptualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Funding acquisition" vocab-term-identifier="https://credit.niso.org/contributor-roles/funding-acquisition/">Funding acquisition</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="resources" vocab-term-identifier="https://credit.niso.org/contributor-roles/resources/">Resources</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Project administration" vocab-term-identifier="https://credit.niso.org/contributor-roles/project-administration/">Project administration</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &#x0026; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-review-editing/">Writing &#x2013; review &#x0026; editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Data curation" vocab-term-identifier="https://credit.niso.org/contributor-roles/data-curation/">Data curation</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="methodology" vocab-term-identifier="https://credit.niso.org/contributor-roles/methodology/">Methodology</role>
</contrib>
</contrib-group>
<aff id="aff1"><institution>Department of Psychological and Brain Sciences, Washington University in St. Louis</institution>, <city>St. Louis</city>, <state>MO</state>, <country country="us">United States</country></aff>
<author-notes>
<corresp id="c001"><label>&#x002A;</label>Correspondence: Thomas A. Dudey, <email xlink:href="mailto:dudey@wustl.edu">dudey@wustl.edu</email></corresp>
</author-notes>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2026-03-19">
<day>19</day>
<month>03</month>
<year>2026</year>
</pub-date>
<pub-date publication-format="electronic" date-type="collection">
<year>2026</year>
</pub-date>
<volume>17</volume>
<elocation-id>1643463</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>06</month>
<year>2025</year>
</date>
<date date-type="rev-recd">
<day>02</day>
<month>03</month>
<year>2026</year>
</date>
<date date-type="accepted">
<day>04</day>
<month>03</month>
<year>2026</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2026 Dudey, Jackson, Cooper and Braver.</copyright-statement>
<copyright-year>2026</copyright-year>
<copyright-holder>Dudey, Jackson, Cooper and Braver</copyright-holder>
<license>
<ali:license_ref start_date="2026-03-19">https://creativecommons.org/licenses/by/4.0/</ali:license_ref>
<license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License (CC BY)</ext-link>. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<abstract>
<p>Arising from the so-called &#x2018;replication crisis&#x2019; in the experimental psychology literature, there has been a growing call to reassess whether specific analytic practices might enhance the accuracy and precision of reported findings. This issue is explored here, through a case study examination of two previously collected datasets from the Dual Mechanisms of Cognitive Control (DMCC) task battery. This case study highlights the unique advantages afforded by Hierarchical Bayesian Regression (HBR) models as a potentially more rigorous analytic approach to statistical inference. In the DMCC datasets, two sets of HBR models are presented, with the estimates of the former used as priors for the latter. In addition to systematically generating cumulative posterior distributions for all effects of theoretical interest, we further illustrate how our particular application of HBR models provides novel insights regarding specific indicators of proactive/reactive control in each of the four DMCC tasks, by: (1) estimating the consistency of effects across datasets; (2) estimating the relative strength of null effects; (3) accurately modeling the specific properties of response time distributions; and (4) appropriately modeling accuracy patterns at the trial level.</p>
</abstract>
<kwd-group>
<kwd>cognitive control</kwd>
<kwd>dual mechanisms of control (DMC)</kwd>
<kwd>Hierarchical Bayesian Regression</kwd>
<kwd>hierarchical models</kwd>
<kwd>proactive control</kwd>
<kwd>reactive control</kwd>
<kwd>sequential updating</kwd>
</kwd-group>
<funding-group>
<funding-statement>The author(s) declared that financial support was received for this work and/or its publication. This research was supported by National Institute of Health grant R37 MH066078 and T32 NS115672, Office of Naval Research grant MURI N00014-22-S-F0, and McDonnell Center for Systems Neuroscience.</funding-statement>
</funding-group>
<counts>
<fig-count count="5"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="67"/>
<page-count count="18"/>
<word-count count="17535"/>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Cognition</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1">
<title>Introduction</title>
<p>Over the past decade, the field of experimental psychology has grappled with the discovery that a startling proportion of highly influential findings do not hold up under increased scrutiny. One large-scale project found that among 100 initial psychology experiments with mostly significant results, over half of them no longer met the conventional significance criterion (<italic>p</italic>&#x202F;&#x003C;&#x202F;0.05) and had diminished effect sizes after undergoing well-powered replications (<xref ref-type="bibr" rid="ref40">Open Science Foundation, 2015</xref>). This &#x2018;replication crisis&#x2019; may seem like a significant step backwards in our current knowledge of psychological phenomena. Alternatively, however, the present circumstance could be perceived as an opportunity from which to reassess the analytic practices that best ensure the validity and robustness of a given finding. It is becoming increasingly appreciated that even though <italic>t</italic>-tests and ANOVAs are some of the most frequently used approaches in the psychological literature, due to the relative ease in which they can be computed, they are ultimately limited in terms of the statistical inferences they enable (<xref ref-type="bibr" rid="ref9001">Lindley, 1957</xref>; <xref ref-type="bibr" rid="ref61">Wacholder et al., 2004</xref>; <xref ref-type="bibr" rid="ref23">Ioannidis, 2005</xref>; <xref ref-type="bibr" rid="ref13">Erceg-Hurn and Mirosevich, 2008</xref>; <xref ref-type="bibr" rid="ref66">Zhou and Skidmore, 2017</xref>).</p>
<p>Two sets of critiques have been leveled against the prevalent adoption of such conventional statistical analysis methods. First, their validity is contingent upon a set of underlying assumptions regarding the observed data (e.g., normality and independence). As a result, the systematic violation of these assumptions will contribute to the inflation of false positive and negative rates within the field (<xref ref-type="bibr" rid="ref13">Erceg-Hurn and Mirosevich, 2008</xref>; <xref ref-type="bibr" rid="ref66">Zhou and Skidmore, 2017</xref>). In response to this issue, hierarchical models<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> assuming non-Gaussian distributions (i.e., generalized hierarchical models) have arisen as an alternative analytic tool that can more accurately and fully model the properties of the data (<xref ref-type="bibr" rid="ref51">Singmann and Kellen, 2019</xref>; <xref ref-type="bibr" rid="ref36">Meteyard and Davies, 2020</xref>; <xref ref-type="bibr" rid="ref6">Brown, 2021</xref>).</p>
<p>A second and more general concern regarding both conventional and more contemporary statistical approaches is that they are situated within the frequentist framework, which has its own set of inherent limitations (<xref ref-type="bibr" rid="ref39">O&#x2019;Hagan, 2008</xref>). Besides relying on isolated samples to make inferences about the entire population, they also do not enable estimation of the relative probability that the alternative or null hypothesis is true (see <xref ref-type="bibr" rid="ref18">Greenland et al., 2016</xref>; <xref ref-type="bibr" rid="ref64">Wagenmakers et al., 2018</xref>; <xref ref-type="bibr" rid="ref46">Rouder et al., 2018</xref>, for extensive discussions about the limitations of NHST). Due to such problems, there has been another emerging movement within the field to transition toward the more intuitive and flexible Bayesian framework, which not only allows for the incorporation of prior information outside of the available data to generate more precise findings, but also enables direct probabilistic estimates regarding specific parameters and hypotheses of interest.</p>
<p>The benefits afforded by generalized hierarchical models and the Bayesian framework are not mutually exclusive. Indeed, the broad goal of this paper is to describe how the intersection of these methods through Hierarchical Bayesian Regression (HBR) modeling can provide a comprehensive and rigorous analytic approach that addresses the limitations of previous methods (<xref ref-type="bibr" rid="ref47">Rouder and Lu, 2005</xref>; <xref ref-type="bibr" rid="ref38">Nalborczyk et al., 2019</xref>; <xref ref-type="bibr" rid="ref57">Veenman et al., 2024</xref>). Although HBR is more computationally intensive than standard approaches, due to the increased complexity of its inferential algorithms, the goal here is to illustrate the manifold advantages of the HBR approach through concrete examples. In particular, we provide an illustrative case study example utilizing two datasets (N&#x202F;&#x003E;&#x202F;100), that were previously acquired to validate the online task battery for the Dual Mechanisms of Cognitive Control (DMCC) project described below.</p>
<p>The Dual Mechanisms of Cognitive Control (DMC) framework provides a neurobiologically-based, mechanistic account of cognitive control that postulates two qualitatively distinct modes&#x2014;proactive and reactive. Proactive control is characterized as a sustained and anticipatory mode of cognitive control that actively maintains goal-related information, while reactive control reflects a more transient mode that retrieves goal-related information in response to conflict (<xref ref-type="bibr" rid="ref4">Braver, 2012</xref>). In addition to individual- and group-level differences influencing the tendency or ability to adopt one control mode versus the other, careful experimental manipulations have revealed contextual or situational factors that can elicit intra-individual modulations in how the control modes are deployed (<xref ref-type="bibr" rid="ref4">Braver, 2012</xref>; <xref ref-type="bibr" rid="ref16">Gonthier et al., 2016a</xref>, <xref ref-type="bibr" rid="ref17">2016b</xref>). Based on these findings, the Dual Mechanisms of Cognitive Control (DMCC) project was initiated by our group to develop and validate paradigms that would reliably produce within-subject shifts in cognitive control settings across different conditions (<xref ref-type="bibr" rid="ref5">Braver et al., 2021</xref>).</p>
<p>The DMCC project involved the development and validation of a new cognitive control task battery, including baseline, proactive, and reactive variants for each of four different experimental paradigms, reflecting distinct cognitive domains: the AX-CPT for context processing, the Sternberg WM for working memory, the Stroop task for selective attention, and the Cued-TS paradigm for multi-tasking. While the baseline versions for each task were constructed to maximize variability in the cognitive control mode utilized, the proactive and reactive versions were manipulated to bias participants toward each specific mode of control. One aim of the DMCC project was to demonstrate the task battery&#x2019;s replicability and generalizability, at least in terms of behavioral metrics. As such, each task and its variants were assessed via multiple behavioral indicators of proactive or reactive control that were compared across conditions to identify general indices of cognitive control (i.e., proactive and reactive versus baseline conditions), as well as double dissociations in control modes (i.e., proactive versus reactive conditions).</p>
<p><xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref> provided the first systematic validation of the online DMCC task battery, with an initial dataset collected in 2018. However, the analysis approach utilized conventional NHST methods, with paired <italic>t</italic>-tests used to identify within- and between-condition effects. Since that time, an additional sample was also collected [in 2020], with participants undergoing a nearly identical protocol, to test the generalizability of the findings. Here, we examine both datasets in relation to each other, with one goal being to replicate and extend the results of <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, using an alternative HBR framework for analyses<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>. Yet a potentially more primary goal was to highlight the various advantageous features of the HBR framework, as detailed further below.</p>
<p>A primary advantage of HBR models is the use of Bayesian frameworks for statistical inference. In parameter estimation, both frequentist and Bayesian regression models can compute point estimates with surrounding intervals to determine the influence of predictors upon an outcome variable. In Bayesian models, however, these estimates are directly constructed as probability distributions, through Bayes Theorem. By utilizing both initial prior beliefs about parameter estimates, <inline-formula>
<mml:math id="M1">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">(</mml:mo>
<mml:mi>&#x03B8;</mml:mi>
<mml:mo stretchy="true">)</mml:mo>
</mml:math>
</inline-formula>, and the likelihood of the data, <inline-formula>
<mml:math id="M2">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">(</mml:mo>
<mml:mtext mathvariant="italic">data</mml:mtext>
<mml:mspace width="0.25em"/>
<mml:mo>&#x2223;</mml:mo>
<mml:mspace width="0.25em"/>
<mml:mi>&#x03B8;</mml:mi>
<mml:mo stretchy="true">)</mml:mo>
<mml:mo>,</mml:mo>
</mml:math>
</inline-formula> it is possible to compute posterior distributions, <inline-formula>
<mml:math id="M3">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">(</mml:mo>
<mml:mi>&#x03B8;</mml:mi>
<mml:mspace width="0.33em"/>
<mml:mo>&#x2223;</mml:mo>
<mml:mspace width="0.33em"/>
<mml:mtext mathvariant="italic">data</mml:mtext>
<mml:mo stretchy="true">)</mml:mo>
</mml:math>
</inline-formula>. These derived posteriors inherently offer a richer understanding of each estimate, providing the relative certainty for its possible values, versus a single most likely value and its standard error. Critically, these distributions can be continually updated with new information, through a sequential learning procedure (i.e., previous posteriors serving as priors for the analysis of subsequent datasets) which ultimately reduces uncertainty regarding possible values as well as enables more precise and valid predictions to be made toward future observations (<xref ref-type="bibr" rid="ref63">Wagenmakers et al., 2016</xref>).</p>
<p>In the analyses of the DMCC datasets, we utilized this sequential updating procedure to inform our understanding regarding the degree of replicability and consistency in key effects of interest. In particular, two types of &#x201C;objective priors&#x201D; were employed in a complementary manner across the two datasets (see <xref ref-type="bibr" rid="ref3">Berger et al., 2015</xref>; <xref ref-type="bibr" rid="ref15">Goldstein, 2006</xref>; <xref ref-type="bibr" rid="ref11">Efron, 2012</xref>; <xref ref-type="bibr" rid="ref55">Torsen, 2015</xref>, for debates between the usage of &#x201C;objective&#x201D; versus &#x201C;subjective&#x201D; priors). The first set of priors were noninformative/weak, which minimize the degree to which the prior influences the posterior distribution. The use of noninformative/weak priors will cause the computed parameters to closely match with the central values estimated by data-driven approaches like frequentist maximum likelihood estimation (MLE) (<xref ref-type="bibr" rid="ref10">Dunson, 2001</xref>; <xref ref-type="bibr" rid="ref39">O&#x2019;Hagan, 2008</xref>). Consequently, we used noninformative/weak priors as a conservative choice for our re-analysis of the 2018 DMCC dataset, to better compare with the findings reported by <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>. However, it is often the case that researchers <italic>do</italic> have some degree of information that should be used to shape predictions regarding the most likely estimates of particular phenomena (<xref ref-type="bibr" rid="ref30">Lindley, 2004</xref>; <xref ref-type="bibr" rid="ref64">Wagenmakers et al., 2018</xref>). Indeed, this was the case for the current study, as we were able to make use of the posterior estimates derived from the 2018 DMCC dataset as data-based priors in analyses of the 2020 dataset. This approach optimizes the available data to generate posterior distributions with relatively narrow credible intervals; such distributions represent the accumulated state of current knowledge regarding behavioral indicators of proactive and reactive control.</p>
<p>A related advantage of the Bayesian framework is that it enables more diverse hypothesis tests to be conducted to investigate specific hypotheses. With the replication crisis as an underlying problem that shapes the goals of this project, one important hypothesis of interest is to formally test how well new results align with previous findings on the same phenomenon. Indeed, one of the main reasons behind our choice to implement a sequential updating procedure was to simultaneously generate posterior estimates as well as to provide a more nuanced test of replication across datasets, in terms of the relationship between prior and posterior distributions. In particular, the Savage-Dickey Ratio (SDR) metric represents an approximation of Bayes Factor (BF) to test the strength of specific alternative hypotheses, by computing the ratio of the posterior and prior distributions at a selected point value. For our purposes, this value was chosen at the mean of the prior as a form of meta-analysis (<xref ref-type="bibr" rid="ref62">Wagenmakers et al., 2010</xref>; <xref ref-type="bibr" rid="ref58">Verhagen and Wagenmakers, 2014</xref>; <xref ref-type="bibr" rid="ref33">Ly et al., 2019</xref>; <xref ref-type="bibr" rid="ref29">Lin et al., 2024</xref>). With this approach, an SDR value greater than one indicates an increased probability in the original estimate with the addition of new information (i.e., there is inter-sample reliability in the magnitude and direction of the given effect), whereas an SDR value of less than one instead suggests that this additional data caused a notable shift away from the original estimate and toward a new more probable one (i.e., there is inter-sample variability regarding the quantitative properties of the given effect). In cases where theoretical predictions of the DMCC task battery have been strongly validated during initial analyses, this approach addresses the goal of determining whether key DMCC task manipulations induce estimates of proactive and reactive control that are reliable and consistent, or instead vary substantially, across datasets and participant samples.</p>
<p>Another unique aspect of the Bayesian framework is that it can be used to test the strength of evidence both in favor and against a hypothesis of interest. First, probability of direction (pd) scores and highest density intervals (HDIs) can be employed in a manner that is analogous to <italic>p</italic>-values and confidence intervals respectively, but they offer more interpretable assessments of significance than their frequentist counterparts, via an empirical posterior distribution versus a hypothetical null distribution (<xref ref-type="bibr" rid="ref34">Makowski et al., 2019</xref>). While pd. scores reference the proportion of the posterior with the same sign as the median, HDIs are the most probable values that make up a certain density within the posterior<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. Conversely, Bayesian hypothesis tests can also be employed to assess whether there is instead evidence in favor of the null hypothesis (<xref ref-type="bibr" rid="ref64">Wagenmakers et al., 2018</xref>). There are several ways to perform hypothesis testing for null effects. One approach that has been most frequently employed in the psychological research literature is a Bayes Factor (BF) model comparison approach that directly compares the predictive accuracy of two models: an alternative hypothesis model (<italic>M<sub>1</sub></italic>) that contains the parameter of interest, and a null hypothesis model (<italic>M<sub>0</sub></italic>) that does not. This approach provides a direct quantification of the strength of one hypothesis/model versus the other (<xref ref-type="bibr" rid="ref49">Schmalz et al., 2023</xref>).</p>
<p>Yet it is also possible to utilize a complementary approach within a Bayesian framework to evaluate null effects. After estimating the HDI, we can easily assess the degree to which this interval falls inside or outside a region of practical equivalence (ROPE). Since ROPE is an interval around zero which represents a negligible effect (i.e., one of no interest), this assessment can check the proportion of parameter values within a given posterior that support or challenge the given null hypothesis. By using both Bayesian metrics in a convergent manner, more nuanced inferences are possible regarding the evidence for a null effect than is possible with NHST. Specifically, results can be divided into three possible outcomes: (1) strong evidence for a meaningful effect (i.e., BF<sub>10</sub>&#x202F;&#x003E;&#x202F;10, or HDI is fully outside ROPE); (2) strong evidence for a result practically equivalent to zero (i.e., BF<sub>10</sub>&#x202F;&#x003C;&#x202F;1/10, or HDI is fully inside ROPE); and (3) little or inconclusive evidence in either direction (i.e., 1/10&#x202F;&#x003C;&#x202F;BF<sub>10</sub>&#x202F;&#x003C;&#x202F;10, or partial overlap of HDI and ROPE). By distinguishing between these latter two possibilities, it becomes easier to determine the most practical next step for future investigations. For instance, inconclusive evidence for an effect of interest may indicate that collecting more data is necessary to ascertain its presence, while strong evidence for a null effect would indicate that current task manipulations did not successfully induce it. More generally, the convergent use of both ROPE and BF model comparison methods offers the opportunity to provide fine-grained evidence for a strong, inconclusive or null effect (<xref ref-type="bibr" rid="ref26">Kruschke, 2014</xref>; <xref ref-type="bibr" rid="ref46">Rouder et al., 2018</xref>).</p>
<p>A broader advantage of the HBR approach is that it also incorporates all the benefits that accrue from generalized hierarchical modeling. While conventional methods like paired t-tests and repeated-measures ANOVAs require a first stage of data aggregation across trials for each participant and condition, hierarchical models enable the full use of trial-level data. Specifically, the hierarchical structuring of this trial-level data will produce more reliable estimates, by not only incorporating random intercepts and slopes to better model subject-level variability, but also by shrinking outlying estimates toward their average/fixed effects (<xref ref-type="bibr" rid="ref57">Veenman et al., 2024</xref>). The application of these hierarchical models has become increasingly common within cognitive control research, such as in studies examining reliability and individual differences in Stroop task effects (<xref ref-type="bibr" rid="ref45">Rouder and Haaf, 2019</xref>; <xref ref-type="bibr" rid="ref59">Viviani et al., 2023b</xref>; <xref ref-type="bibr" rid="ref60">Viviani et al., 2024</xref>). Moreover, generalized versions of these models can utilize various likelihood functions to better model data that do not follow a normal/Gaussian distribution.</p>
<p>Within cognitive science, the domain of response time (RT) analysis has been quite impacted by these advances in modeling approach. Specifically, RTs tend to follow a skewed, rather than symmetric, distribution (<xref ref-type="bibr" rid="ref42">Ratcliff, 1979</xref>). Although there have been longstanding attempts to consider this factor by trimming or transforming the data, these approaches bring with them their own complications and limitations (<xref ref-type="bibr" rid="ref20">Heathcote et al., 1991</xref>; <xref ref-type="bibr" rid="ref43">Ratcliff, 1993</xref>; <xref ref-type="bibr" rid="ref31">Lo and Andrews, 2015</xref>; <xref ref-type="bibr" rid="ref65">Zhou and Krott, 2016</xref>; <xref ref-type="bibr" rid="ref50">Schramm and Rouder N., 2019</xref>). As a result, there has been increased support for the use of generalized models that can directly model the properties of the RT data through a set of parameters that appropriately represent its distributional shape. Indeed, such models also make it possible to specify and compare results when assuming different distributions (i.e., likelihood functions) for the data. In analyses of the DMCC dataset, for cases in which it was useful to explore whether observed effects were related to the form of RT distribution, HBR models utilizing more skewed distributions (i.e., shifted log-normal, ex-Gaussian) were compared to examine whether these modeling choices would affect the results. More broadly, generalized hierarchical models offer the opportunity to implement likelihood functions that more accurately capture the true properties of the RT data, and as such, can increase confidence in the validity of HBR analysis results.</p>
<p>A parallel benefit arises with error rate data. In other words, the same benefits from generalized mixed effect models can be applied to binary outcomes, such as correct versus incorrect responses. While standard approaches operationalize these outcomes as proportions, this approach is sub-optimal relative to hierarchical logistic regression, even with the incorporation of available transformation methods (e.g., arcsine, empirical logit) (<xref ref-type="bibr" rid="ref24">Jaeger, 2008</xref>; <xref ref-type="bibr" rid="ref9">Dixon, 2008</xref>; <xref ref-type="bibr" rid="ref21">Houpt and Bittner, 2018</xref>). First, logistic regression models provide better fits of the outcome data, by operationalizing the binary outcomes as probabilities and assuming a Bernoulli rather than Gaussian distribution. Furthermore, they can more appropriately capture any non-linear relationships between predictors and the dependent variable, via the &#x2018;logit&#x2019; link function. Finally, logistic regression integrates well with hierarchical modeling, by allowing for predictors to vary both by subject and condition, through access to each person&#x2019;s trial-level data. This information is otherwise lost in typical aggregation-based procedures for accuracy/error rate data.</p>
<p>The overarching goal of this paper is to illustrate the combined power and flexibility afforded by an HBR approach, relative to more conventional ones, through a set of case study examples that directly highlight such benefits, taken from representative analyses of the DMCC task battery. To provide a roadmap of the results, the key conceptual principle linked to each task is described next. First, within the context of the AX-CPT paradigm, we highlight the advantages of the sequential updating approach and use of SDR to test for replication patterns across the 2018 and 2020 datasets, focusing on the BX error interference effect as a general cognitive control index. We demonstrate the different conclusions drawn regarding the stability of this effect, when considering either the proactive or reactive control indicator. Second, within the context of the Sternberg working memory (WM) task, we apply tests of evidence in favor of the null hypothesis, focusing on the novel positive (NP) RT effect as an indicator of proactive control. We demonstrate how strong evidence for and against the null hypothesis can be obtained using both ROPE and BF approaches in a convergent manner. Third, within the context of the Stroop task, we examine how more precisely modeling RT distributions can impact statistical inference, focusing on the congruency cost as an indicator of proactive control. We demonstrate how theoretically aligned RT distributions, such as the shifted log-normal and ex-Gaussian, can provide stronger evidence for a reliable effect than standard Gaussian distributions, which are clearly inappropriate. Fourth, within the context of cued task-switching (Cued-TS), we examine how the use of hierarchical (i.e., trial-by-trial) modeling of error rate data can also impact statistical inference, focusing on the error task-rule congruency effect (TRCE) as an index of reactive control. We demonstrate how hierarchical logistic regression approaches more sensitively model effects near floor or ceiling, relative to conventional analyses of error rate. Together, these collective findings not only provide additional validation of the DMCC task battery across four key dimensions, beyond that established in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, but also potentially more importantly, demonstrate the inherent and wide-ranging advantages of the HBR modeling framework to the broader experimental psychology community.</p>
</sec>
<sec sec-type="methods" id="sec2">
<title>Methods</title>
<sec id="sec3">
<title>Participants</title>
<p>The data comprise two participant samples, who each completed the online DMCC task battery. The samples were collected through the Amazon Mechanical Turk (MTurk) online platform, with 178 participants collected in 2018 and 185 collected in 2020. Participants were not excluded by age range<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> for either the 2018 sample<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref> (21&#x2013;64, M&#x202F;=&#x202F;36.03, SD&#x202F;=&#x202F;9.92, 104 F and 74&#x202F;M) or the 2020 sample (18&#x2013;77, M&#x202F;=&#x202F;38.41, SD&#x202F;=&#x202F;10.87, 91 F and 93&#x202F;M and 1 &#x2018;prefer not to say&#x2019;). For each sample and task, only participants with complete datasets for the battery were retained for analysis (the 2018 and 2020 datasets, respectively, yielded the following sample sizes for AX-CPT: 132 and 125; Sternberg: 133 and 131; Stroop: 123 and 127; Cued-TS: 133 and 135).</p>
</sec>
<sec id="sec4">
<title>Design and procedure</title>
<p>Besides the different time points in which participants completed the study, there were additional differences between the 2018 and 2020 samples. First, while participants in the 2018 sample completed the DMCC task battery twice, when possible (i.e., finishing 15 sessions in the &#x2018;test&#x2019; phase and 15 in the &#x2018;retest&#x2019; phase), those in the 2020 sample went through the task battery only once. <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref> only used the &#x2018;test&#x2019; phase data from the 2018 sample to control for potential practice effects. In contrast, we chose to include all of the available trial-level data for each participant in the 2018 dataset, given that one of the conceptual goals of this paper was to utilize all the data within and across samples, in order to generate cumulative posterior distributions based on the full breadth of our current knowledge regarding behavioral metrics of proactive and reactive control.</p>
<p>Another key difference between samples is the order in which participants completed their sessions: the 2018 sample completed the task battery in the Baseline-Reactive-Proactive order, while the 2020 sample completed it in the Baseline-Proactive-Reactive order to counter-balance potential carry-over order effects. Thus, when aggregated across the 2018 and 2020 datasets, analyses should provide a more comprehensive and robust estimate of differences between proactive and reactive control indicators that controls for session order. Conversely, any differences in estimates between the 2018 and 2020 datasets could be due to session order differences, although without additional data it would be impossible draw this inference conclusively.</p>
<p>All participants were expected to perform approximately 5 sessions per week, ensuring that the study would take 3&#x2013;6&#x202F;weeks to complete. Each session was 20&#x2013;40&#x202F;min long, except for a 1-h first session that included a Stroop practice session to validate vocal responses as well as a battery of demographic and self-report questionnaires. Completed sessions were examined for accuracy and compliance (see <xref ref-type="bibr" rid="ref54">Tang et al., 2023</xref>). Subjects were discontinued from the study if they did not complete the sessions in a timely manner or did not comply with task instructions. The AX-CPT, Sternberg, and Cued-TS were programmed with in-house JavaScript code (available upon request at <ext-link xlink:href="https://sites.wustl.edu/dualmechanisms/request-form/" ext-link-type="uri">https://sites.wustl.edu/dualmechanisms/request-form/</ext-link>), while the Stroop task was programmed and delivered using Inquisit software, as it allowed for the collection of online vocal responses (also available at the above link).</p>
</sec>
<sec id="sec5">
<title>Tasks</title>
<p>As a detailed description of the task manipulations and theoretical rationale for the DMCC battery has already been provided in other companion papers (<xref ref-type="bibr" rid="ref5">Braver et al., 2021</xref>; <xref ref-type="bibr" rid="ref54">Tang et al., 2023</xref>; <xref ref-type="bibr" rid="ref52">Snijder et al., 2023</xref>), here we provide only a brief description of each task paradigm and its variants below, along with a task illustration of the battery (<xref ref-type="fig" rid="fig1">Figure 1</xref>).</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Illustration of the DMCC task battery.</p>
</caption>
<graphic xlink:href="fpsyg-17-1643463-g001.tif" mimetype="image" mime-subtype="tiff">
<alt-text content-type="machine-generated">Panel A shows the Stroop task with trials where participants name the color of words with either congruent or incongruent color-word associations. Panel B illustrates the AX-CPT task, detailing different letter sequence trials identifying target and nontarget pairs based on first and second letters. Panel C depicts the Cued-TS task, where participants attend to numbers or letters with congruent and incongruent conditions, followed by feedback. Panel D represents the Sternberg working memory task, presenting word lists and requiring recognition of probes as present or absent. Each panel labels timing, task structure, and cognitive control index measured.</alt-text>
</graphic>
</fig>
<sec id="sec6">
<title>AX-CPT</title>
<p>In the AX-CPT, each trial includes a cue (i.e., A or B), a delay period, and a probe (i.e., X, Y, or number). When an &#x201C;A&#x201D; cue precedes the &#x201C;X&#x201D; probe, a target response should be made; however, the presence of a &#x201C;B&#x201D; cue (i.e., any cue letter besides &#x201C;A&#x201D;) and/or a &#x201C;Y&#x201D; probe (i.e., any letter besides &#x201C;X&#x201D;) indicates that a non-target response should instead be made; digit probes (i.e., 1&#x2013;9) indicate no-go trials, for which participants were instructed to withhold their response. These no-go trials were included to reduce the inherent bias to fully prepare probe responses following the cue. The proactive and reactive versions of the task had specific modifications designed to induce each mode of control, respectively. The proactive variant explicitly instructed participants to prepare a target response after seeing an &#x201C;A&#x201D; cue and a non-target response after seeing a non-A cue; practice sessions and reminders to use the strategy were meant to reinforce the use of proactive control for this session. The reactive variant instead included a red border color at a unique location presented immediately before probe onset to indicate an upcoming high-conflict trial (i.e., AY, BX, no-go trials). The proactive version was designed to bias participants to engage and maintain cognitive control during the cue and delay period; instead, the reactive version included a specific probe-linked feature that alerted participants to actively retrieve the cue and abstract rule only at the time of probe presentation.</p>
</sec>
<sec id="sec7">
<title>Sternberg</title>
<p>In the Sternberg task, participants had to determine whether a probe word was also present in the previous word list (memory set) on trials that varied possible working memory loads (i.e., the lists could contain 2&#x2013;8 words, which were presented across two encoding displays). If the probe word was indeed part of the memory set, the trial would be categorized as a novel positive (NP) one, and a target response would be correct. On both novel negative (NN) and recent negative (RN) trials, the probe item was not part of the memory set for the current trial; however, the latter was characterized RN because the probe item <italic>was</italic> present in the memory set of the immediately preceding trial (i.e., was recently encoded into memory), thereby causing increased familiarity and false positives for RN versus NN trials. Both the number of items in a word list and the proportion of RN, NN, and NP trials were manipulated in different ways to induce proactive, reactive, or neither/baseline mode of control. In the proactive condition, most trials had low-load (2&#x2013;4 items) memory sets, which biased participants to actively maintain these items as an effective strategy to bias attention toward the probe. In both the baseline and reactive conditions, most trials instead had high-load (6&#x2013;8 items) memory sets, which biased participants to instead use a familiarity strategy to determine whether to make a target response to the probe item. In the reactive condition, however, the high frequency of RN trials would instead cause probe familiarity to serve as an alerting signal for full memory set retrieval. Finally, each condition included a matched set of 5-item lists that were treated as &#x201C;critical items&#x201D; and used for cross-condition comparisons.</p>
</sec>
<sec id="sec8">
<title>Stroop</title>
<p>Although all versions of the color-word Stroop task utilized the classic contrast in performance between low-conflict congruent items (the font color and word match, e.g., BLUE in blue font) and high-conflict incongruent items (i.e., the font color and word do not, e.g., BLUE in red font), the proportion congruence (PC) within a task block was uniquely manipulated across conditions to induce different degrees of cognitive control use. A subset of diagnostic items that were equally likely to be congruent or incongruent (i.e., PC-50/diagnostic items) was included, so that directly matched comparisons could be made between conditions. Another subset of items was either mostly congruent or mostly incongruent (i.e., biased/inducer items), which distinguished the different conditions from one another. The baseline condition was designed to have high list-wide PC, in which congruent trials were relatively frequent and incongruent trials were rare, so that cognitive control demands would be relatively low throughout the task block. In contrast, the proactive variant was designed to have low list-wide PC which would bias participants to prospectively (i.e., before stimulus onset in a trial) utilize cognitive control throughout the task block, by attenuating attention toward the distracting &#x2018;word&#x2019; dimension. In the reactive condition, the PC manipulation was item-specific such that the inducer items were mostly incongruent colors, while other filler items were always congruent; thus, the list-wide PC was matched to the baseline condition, but specific colors (inducer items) indicated high control demands. As a result, participants were expected use the color feature (i.e., detected after stimulus onset) as an indicator of whether cognitive control should be utilized (reactively) on that trial.</p>
</sec>
<sec id="sec9">
<title>Cued-TS</title>
<p>All stimuli in this task consisted of letter-digit pairs (e.g., A-1, B-2); however, a task rule cued at the beginning of each trial indicated the relevant feature dimension to attend on that trial (i.e., attend to the letter to make a consonant/vowel discrimination, or attend to the number to make an odd/even discrimination). Two overlapping manual responses were used for each task; thus, the meaning of each manual response was dependent on the task rule, with stimuli that could either be congruent (i.e., the two task rules are associated with the same response) or incongruent (i.e., the two task rules are associated with different responses). The baseline version followed closely to this general task structure, with the task cues appearing in red font and the task stimuli appearing in black font. In contrast, within the proactive and reactive variants, a subset of incentive trials was included to bias participants toward one mode of control. The proactive version included reward incentive trials, pre-cued by the green font of the task rule, so that participants would engage in anticipatory cognitive control before stimuli onset, to make faster and more accurate responses. In contrast, the reactive version included punishment incentive trials (if an error was made), cued by the green font of the <italic>target stimulus</italic>, and primarily on incongruent trials, so that the processing of incongruence itself was associated with potential monetary loss. A matched set of non-incentivized trials were utilized for cross-condition comparisons. These non-incentivized trials were also biased to be mostly congruent, increasing the likelihood of conflict and interference on incongruent trials.</p>
</sec>
</sec>
<sec id="sec10">
<title>Data preprocessing</title>
<p>To better accommodate hierarchical modeling, the preprocessing approach used here differed somewhat from that applied in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>. However, preprocessing choices were also designed to be somewhat conservative, with the goal of ensuring that any different findings across the current analyses relative to <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref> would be driven by modeling rather than preprocessing choices. Datasets were first preprocessed by removing trials that contained outlying response times (RTs). Outlier determination was first set in a task-based manner, relative to the task RT distribution. In particular, relative to <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, more restrictive cutoffs were chosen for tasks in which some trials showed clear discontinuities from the respective RT distribution, whereas more relaxed cutoffs were used in tasks for which the trials were clearly continuous with the RT distribution. Based on these criteria, the upper limit cutoffs were as follows: RT&#x202F;&#x003C;&#x202F;2000&#x202F;ms for AX-CPT and Sternberg, RT&#x202F;&#x003C;&#x202F;5,000&#x202F;ms for Stroop; RT&#x202F;&#x003C;&#x202F;6,000&#x202F;ms for Cued-TS. The lower limit cutoff was the same for all tasks: RT&#x202F;&#x003E;&#x202F;200 milliseconds ms. Additional subject-level thresholds (i.e., RTs that were 3 sd&#x2019;s above/below an individual&#x2019;s mean RT) were then put in place to ensure that individual RT distributions did not include outlying observations in respect to the participants&#x2019; own performance patterns within a given task. Together, the percentage of removed trials for the 2018 and 2020 DMCC datasets for each task were AX-CPT: 3.7, 3%; Sternberg: 3.8, 4.2%; Stroop: 2.6, 4%; Cued-TS: 1.9, 1.9%.</p>
<p>In addition to excluding individual trials, entire participants were removed if they exhibited unusual performance patterns for a given task, whether it was due to missing data, obvious performance-related deficits as specified by <xref ref-type="bibr" rid="ref52">Snijder et al. (2023)</xref>, or inconsistent/ stereotypical response patterns<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref>. More specifically, participants were removed if they: (1) had an incomplete number of trials/sessions within a task; (2) were missing more than 50% of correct responses after all trial-level filtering criteria were applied; (3) did not respond to more than 40% of trials requiring a response; (4) had incorrect responses for more than 40% of trials; (5) had more than 40% of responses faster than the task-specific minimum threshold; (6) had more than 20% of responses slower than the task-specific maximum threshold; (7) had conditional RT data forming mixed distributions with multiple peaks (i.e., inconsistent responses); and (8) showed systematic repeating or alternating responses independent of the stimulus content (i.e., stereotypical responses). Finally, participants completing the task battery on a MacOS rather than Windows operating system were removed from both Stroop datasets since the Mac platform had compatibility issues with Inquisit, thus leading to systematically slower RTs. Following these outlier exclusions, the derived 2018 and 2020 DMCC datasets, respectively, contained the following number of participants for AX-CPT: 132; 123; Sternberg: 139, 126; Stroop: 126, 114; Cued-TS: 135, 126.</p>
</sec>
<sec id="sec11">
<title>Data analysis</title>
<p>Hierarchical Bayesian regression (HBR) models were fit using the brms package in the R software environment (<xref ref-type="bibr" rid="ref8">B&#x00FC;rkner, 2017</xref>). Logistic models were applied to trial-level data in analyzing models with a binary outcome variable (i.e., target versus non-target responses, or correct versus incorrect responses). Models with response time as the dependent variable were primarily analyzed with shifted log-normal functions on trial-level data. However, we also fit ex-Gaussian functions in cases where it was important to capture the skewed properties of the RT data yet preserve its original units (ms). Additionally, we fit Gaussian (and ex-Gaussian) distributions in cases for which we wished to directly compare the predictive accuracy of each distribution (these cases will be described in the Results section). The categorical predictor &#x2018;mode&#x2019; (Baseline, Proactive and Reactive sessions) was dummy coded to ensure direct comparisons between conditions on the outcome variables of interest; other relevant subject-level variables were also dummy coded to operationalize these behavioral indicators of proactive or reactive control. The specific models for each task are included in both the Results and <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref> Results sections. While implementable Wilkinson notation is provided in both sections, fully indexed notation is also included in the <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>.</p>
<p>The hierarchical nature of the models was accommodated by including random intercepts and slopes. Since the inclusion of these parameters substantially increases the number of parameters that must be estimated, and consequently slows the runtime of each model, only difficulty [&#x2206;] <italic>&#x03BC;</italic> parameters besides &#x2018;mode&#x2019; were systematically added as both fixed and random effects (e.g., intercept, congruency, item type and trial type). This choice reflects the prioritization of modeling intrinsic subject-level differences within the reference condition (i.e., &#x2018;baseline&#x2019; performance unrelated to targeted DMCC experimental manipulations). For similar reasons, distribution-specific parameters for different likelihood functions (e.g., the shift parameter <inline-formula>
<mml:math id="M4">
<mml:mi>&#x03B8;</mml:mi>
</mml:math>
</inline-formula> within a shifted log-normal distribution, the exponential decay parameter <inline-formula>
<mml:math id="M5">
<mml:mi>&#x03C4;</mml:mi>
</mml:math>
</inline-formula> within the ex-Gaussian distribution, and the scale parameter <inline-formula>
<mml:math id="M6">
<mml:mi>&#x03C3;</mml:mi>
</mml:math>
</inline-formula> within all models) were fixed across condition and subject.</p>
<p>The posterior distributions for each relevant parameter were iteratively generated using Markov chain Monte Carlo (MCMC) chains. The number of iterations across chains differed by task and sample, but the validity of each HBR model output was confirmed by utilizing diagnostic metrics of appropriate chain mixing (i.e., Rhat &#x003C; 1.05), sufficient sampling across the parameter space (i.e., Bulk and Tail ESS&#x202F;&#x003E;&#x202F;400), and posterior predictive checks that simulated predictions from the model output and compared their estimations to the observed data. Besides running more iterations for either model convergence or reliable marginal likelihood estimates (typically 40,000 iterations), the re-specification of adapt_delta (from default 0.8 to 0.9&#x2013;0.99) and max_treedepth (from default 10 to 12&#x2013;15) arguments can ensure tractable MCMC sampling of the data (<xref ref-type="bibr" rid="ref53">Stan Development Team, 2025</xref>). Considering the novel utilization of these models for the online DMCC task battery, the 2018 HBR models used noninformative default priors for the fixed effects as well as weakly informative default priors for the random effects. In contrast, the 2020 HBR models constructed the priors based on the fixed effects of the posterior distributions generated by their 2018 counterparts, specifically utilizing informed Gaussian prior distributions shaped by the mean and standard error of the 2018 parameter estimates.</p>
<p>For each parameter estimate across both sets of HBR models, we report the mean, standard error, the 95% highest density interval (HDI), and probability of direction (pd) values (scores greater than 97.5% were considered strong evidence for an effect) (<xref ref-type="bibr" rid="ref25">Kelter, 2020</xref>). Additionally, we utilized SDR scores to implement point estimate hypothesis testing for the parameter of interest in each 2020 HBR model, specifically to determine if their prior and posterior distributions had sufficient overlap (SDR&#x202F;&#x003E;&#x202F;1), versus a significant shift in the posterior from the original estimate (SDR&#x202F;&#x003C;&#x202F;1). Visualizations of the relationships between the prior and posterior distributions are included for all measures of interest.</p>
<p>The Bayesian hypothesis testing metrics, ROPE and BF model comparisons, were additionally utilized in cases for which we wished to test evidence in favor of the null hypothesis. With ROPE, the HDI fully outside of the ROPE would be strong evidence for the alternative hypothesis, the HDI fully inside the ROPE would be strong evidence for the null hypothesis, and the HDI having partial overlap with ROPE would be inconclusive evidence for either hypothesis. Both default 89% and conservative 95% HDI intervals were utilized to compare their respective degree of overlap with the prespecified ROPE range (<xref ref-type="bibr" rid="ref26">Kruschke, 2014</xref>, <xref ref-type="bibr" rid="ref27">2018</xref>; McElreath, R., 2018). For BF model comparison in which the alternative hypothesis (<italic>M</italic><sub>1</sub>) is the numerator and the null hypothesis (<italic>M</italic><sub>0</sub>) is the denominator (BF<sub>10</sub>), BF<sub>10</sub>&#x202F;&#x003E;&#x202F;10 is standardly treated as strong evidence for the alternative hypothesis, whereas BF<sub>10</sub>&#x202F;&#x003C;&#x202F;1/10 is strong evidence for the null hypothesis; 1/10&#x202F;&#x003C;&#x202F;BF<sub>10</sub>&#x202F;&#x003C;&#x202F;10 is treated as inconclusive evidence for either hypothesis. Inversely, BF<sub>01</sub> represents the case where <italic>M</italic><sub>0</sub> is now the numerator, and <italic>M</italic><sub>1</sub> has become the denominator.</p>
<p>Importantly, leave-one-out cross-validation (LOO-CV) approaches can be used alongside BF model comparison to provide converging evidence for the relative impact of various modeling choices (e.g., likelihood functions, the inclusion of a main/interaction term) on predictive accuracy. Although LOO-CV is distinct from ROPE/BF, as it does not explicitly quantify evidence in favor of a null effect, its conclusions are less affected by prior specifications, thus serving as a valuable sensitivity check when conducting any model comparisons (<xref ref-type="bibr" rid="ref28">Lartillot, 2023</xref>). When the absolute difference in the expected log pointwise predictive density (|&#x2206;elpd<sub>LOO</sub>|) estimator is greater than four and two-times higher than its computed standard deviation (&#x2206;SE<sub>LOO</sub>), the most accurate model would be considered a substantially better fit than its competitor(s) (<xref ref-type="bibr" rid="ref53">Stan Development Team, 2025</xref>). In contrast, either |&#x2206; elpd<sub>LOO</sub>|&#x202F;&#x003C;&#x202F;4 or |&#x2206; elpd<sub>LOO</sub>|&#x202F;&#x003C;&#x202F;&#x2206;SE<sub>LOO</sub> x 2 would indicate that there is no meaningful difference in predictive accuracy between models. Finally, posterior predictive checks (PPCs) were additionally used to provide visualizations of how well models with different likelihood functions fit onto the actual data. Data and code for all analyses are publicly available on the OSF platform (<ext-link xlink:href="https://osf.io/bj6fy/" ext-link-type="uri">https://osf.io/bj6fy/</ext-link>).</p>
</sec>
</sec>
<sec sec-type="results" id="sec12">
<title>Results</title>
<p>Given the case study nature of this project, we focus the results on a subset of analyses conducted with the two DMCC datasets that best illustrate the impact of HBR analyses on findings and interpretations. These involved a specific metric within each of the four tasks in the DMCC battery (AX-CPT: BX error interference effect; Sternberg: NP RT effect; Stroop: congruency cost; and Cued-TS: TRCE error effect). Since these analyses entail dissociations across different levels of our predictors (e.g., proactive versus reactive or baseline, between specific trial/item types), we chose to fit separate models onto specific subsets of data within each DMCC task. Arguably, a more powerful approach is to fit a single model to the entire DMCC dataset and subsequently extract relevant contrasts via posterior linear combinations; however, we felt that fitting separate models better aligned with our various HBR modeling applications and expository goals. Nevertheless, in the <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref> Results section, we implemented this alternative method to assess the impact of this modeling choice; any place where this choice might have impacted the interpretation of the primary results was also included in the manuscript. Furthermore, our <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref> Results provide a comprehensive set of analyses that address the complete findings and theoretical predictions tested in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, as well as descriptive statistics for each sample, and when all data are aggregated together. These comprehensive analyses almost fully replicated the patterns in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, except where discussed below.</p>
<sec id="sec13">
<title>Estimating replicability with HBR models: AX-CPT</title>
<sec id="sec14">
<title>Background</title>
<p>Within the AX-CPT, a primary focus of theoretical interest is the BX error interference effect, an index of cognitive control which reflects the ability to appropriately utilize contextual cue information to correctly bias nontarget responses to the probe. In particular, the BX error interference effect occurs due to the relative difficulty of high-conflict BX trials (i.e., the same probe combined with an &#x201C;A&#x201D; cue would have required a target response) versus low-conflict BY trials (i.e., neither the probe nor the cue indicate a target response). However, the deployment of either proactive or reactive control is theoretically predicted to lead to reduced BX error interference relative to the Baseline condition; this metric should serve as a general marker of cognitive control. From this prediction, we used the HBR modeling approach as a tool to explore the consistency of predicted effects across datasets, specifically examining the magnitude of BX error interference in Reactive and Proactive conditions across the DMCC 2018 and 2020 datasets.</p>
</sec>
<sec id="sec15">
<title>Analysis and results</title>
<p>In the HBR analysis, the BX interference effect was estimated as the log odds of making an incorrect response for BX versus BY trials, with the following model run on BX/BY trial-level data: <italic>probeCorrect ~ 1&#x202F;+&#x202F;trial type x mode + (1&#x202F;+&#x202F;trial type x mode | ID)</italic><xref ref-type="fn" rid="fn0007"><sup>7</sup></xref>. The outcome variable &#x2018;probeCorrect&#x2019; was binary coded so that the model predicts the log odds of an error (i.e., correct&#x202F;=&#x202F;1; incorrect&#x202F;=&#x202F;0). The categorical variable &#x2018;trial type&#x2019; was dummy coded, with BY trials as the reference category to estimate relative BX interference. The categorical variable &#x2018;mode&#x2019; was dummy coded with Baseline as the reference category to compare how the Proactive and Reactive modes affected this estimate; a separate HBR model was run to compare each mode of control to Baseline. The interaction effect between trial type and mode (i.e., trial type x mode) was the key parameter of interest to examine the difference in BX error interference between modes. This interaction term was also treated as a random effect within each subject to more fully account for trial-level variability.</p>
<p>In line with theoretical predictions and previous results from <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, the 2020 HBR models indicated strong evidence for a reduction in the BX error interference effect in both the Proactive (<inline-formula>
<mml:math id="M7">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> =&#x202F;&#x2212;&#x202F;0.43, se&#x202F;=&#x202F;0.13, 95% HDI&#x202F;=&#x202F;[&#x2212;0.70, &#x2212;0.20], pd.&#x202F;=&#x202F;99.97%) and Reactive modes (<inline-formula>
<mml:math id="M8">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> =&#x202F;&#x2212;&#x202F;0.52, se&#x202F;=&#x202F;0.15, 95% HDI&#x202F;=&#x202F;[&#x2212;0.82, &#x2212;0.22], pd.&#x202F;=&#x202F;99.96%) relative to Baseline. Our key focus of interest was to compare the prior and posterior distributions for both models of BX error interference reduction (<xref ref-type="fig" rid="fig2">Figures 2A</xref>,<xref ref-type="fig" rid="fig2">B</xref>). The posteriors of this measure had higher peaks and narrower distributions relative to their priors, signifying that the addition of new data reduced uncertainty in the estimate of this parameter.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>AX-CPT analyses, focused on inter-sample reliability. <bold>(A)</bold> Posterior and prior distributions for the Pro &#x2013; Bas BX error interference reduction effect. The posterior overlaps with the prior, indicating high inter-sample consistency in the estimated parameter. <bold>(B)</bold> Posterior and prior distributions for the Rea &#x2013; Bas BX error interference reduction effect. The posterior shifts away from the prior and zero, indicating the prior underestimates this effect.</p>
</caption>
<graphic xlink:href="fpsyg-17-1643463-g002.tif" mimetype="image" mime-subtype="tiff">
<alt-text content-type="machine-generated">Two side-by-side density plots comparing baseline BX error interference reduction between proactive (panel A, left) and reactive (panel B, right) conditions. Each plot shows prior (light blue) and posterior (dark blue) distributions, with posterior curves peaking higher and with narrower distributions &#x002A;, indicating more precise estimates &#x002A; in both conditions.</alt-text>
</graphic>
</fig>
<p>Interestingly, there was a notable difference in the SDR results across the Proactive and Reactive BX error reduction models. The SDR of the Proactive versus Baseline BX error interference estimate was greater than one (1.44; <xref ref-type="fig" rid="fig2">Figure 2A</xref>), which indicated a substantial overlap between the prior and posterior distributions. This result can be interpreted as evidence that the estimate of the BX error interference effect reduction in Proactive was quite consistent across the 2018 and 2020 DMCC datasets. By contrast, the SDR of the Reactive versus Baseline BX error interference estimate was clearly smaller than one (0.66; <xref ref-type="fig" rid="fig2">Figure 2B</xref>), which indicates that the parameter estimate from the 2018 dataset underestimated the value obtained when both 2018 and 2020 datasets were included. Furthermore, the pd. was less than 97.5% in the 2018 HBR model (pd&#x202F;=&#x202F;92.1%) but 100% in the 2020 one. The SDR findings that compare the updated posterior with its original prior highlight how accumulating more data enables a direct test of consistency. In this case, we demonstrate that the reactive BX error interference effect can still be treated as a general indicator of cognitive control engagement; yet at least for the reactive condition, the results point to a lack of consistency in the presence and magnitude of the effect across our two datasets.</p>
</sec>
<sec id="sec16">
<title>Summary</title>
<p>The BX error interference effects from the AX-CPT illustrate how the sequential learning process inherent in a Bayesian framework can provide a cumulative quantification of the most likely values for a parameter of interest. Specifically, we demonstrate both a case where the accumulation of additional data increased our confidence in the initial estimate, as well as a case in which our confidence shifted from the initial estimate toward a new more likely one. Specifically, the magnitude of the Proactive BX interference effect was stable across both datasets, signaling that the Proactive variant generated a reduction in BX interference that was highly consistent across two dataset samples. By contrast, the magnitude of the Reactive BX interference effect was found to vary substantially across the datasets. Thus, one future direction would be to explore the reason(s) behind this discrepancy in the consistency of the Proactive and Reactive BX interference effects. The increased certainty in the original Proactive BX error interference estimates across these two datasets suggests that the effect should successfully generalize to new samples and contexts. Conversely, the contrasting magnitude of the Reactive BX interference effects could be due to either a lack of replicability or of generalizability. Interestingly, one meaningful difference between the two datasets was the order of experimental conditions. Within the 2018 datasets, the Reactive condition was last, and after the Proactive condition, whereas in the 2020 dataset, the Reactive condition was performed before Proactive and immediately after Baseline. Consequently, it would be ideal to collect future datasets with the same or different condition orders to see whether this factor might serve as a significant moderating variable on Reactive BX error interference. More generally, the sequential updating approach implemented in HBR can be useful in providing clues regarding when it might be profitable to conduct additional replication studies, and further how these might be conducted, in terms of the key factors to examine.</p>
</sec>
</sec>
<sec id="sec17">
<title>Evidence in favor of the null hypothesis: Sternberg</title>
<sec id="sec18">
<title>Background</title>
<p>In the Sternberg task, the novel positive (NP) effect in reaction times (RT) is of theoretical interest as a potential index of proactive control. The effect is defined as the difference in response time to make correct responses for NP trials across control modes. While low-load and high-load memory sets were used to bias participants toward or away from proactive control, each condition also included an equal number of critical trials (i.e., those with 5 items per memory set) that could be used to make direct comparisons across conditions. The key theoretical prediction was that response decisions to probe items through proactive control would be directly based on active and accessible WM representations (i.e., in the focus of attention), which would enable such decisions to be made more quickly than other conditions in which the probe item had to be elicited via a retrieval cue to reactivate information.</p>
<p>In <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, the NP effect was found to be statistically significant, but with unclear reliability in the Proactive versus Baseline contrast, while the effect for the Proactive versus Reactive contrast was numerically in the correct direction but not statistically significant. Here, we relied upon a key feature of HBR methods to explore these effects further, by directly assessing the evidence for each contrast, regarding both the alternative (Proactive &#x003C; Baseline NP RT) and null hypotheses (Proactive&#x202F;=&#x202F;Reactive NP RT) respectively.</p>
</sec>
<sec id="sec19">
<title>Analysis and results</title>
<p>To specifically test for the relative evidence in favor of each hypothesis, a Bayes Factor (BF) model comparison approach was utilized, by running two types of models: one representing the null hypothesis, <italic>RT ~ 1&#x202F;+&#x202F;(1 | ID)</italic>, and the other representing the alternative hypothesis, <italic>RT ~ 1&#x202F;+&#x202F;mode + (1 | ID)</italic>. &#x2018;Mode&#x2019; was dummy coded with Baseline and Reactive as the reference conditions to show NP performance in the Proactive mode relative to other conditions. Since BF model comparison was employed to determine if the fixed effect &#x2018;mode&#x2019; had a strong impact on the NP RT results, we chose to not include &#x2018;mode&#x2019; as a random effect for these HBR models; therefore, only the intercept was entered as a random effect nested within subject to account for trial-level variability. Additionally, we employed the ROPE method as an alternative approach to assess evidence for the null hypothesis, assuming that an RT difference within a range of 5 milliseconds would correspond to a negligible NP effect. Here we used an ex-Gaussian distribution to model RTs. As we discuss further for the Stroop task, the shifted log-normal distribution is also an excellent choice for modeling RTs; however, its transformation of the RT data into log-normal units makes it more challenging with regard to null hypothesis interpretability (i.e., which values should be used to define the ROPE). Both conventional 89% and conservative 95% HDI thresholds were used to assess whether this choice affects qualitative conclusions.</p>
<p>For this set of analyses, we focused on the 2020 HBR models which have the compounded benefits of incorporating informed priors to generate a cumulative posterior distribution, random effects to account for the hierarchical structure of the data, and an ex-Gaussian function to model its skewed properties. Even with these advantages, we did not find the NP RT effect to be a robust and general indicator of proactive control: although there were decisively faster responses for NP trials in Proactive versus Baseline (<inline-formula>
<mml:math id="M9">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> =&#x202F;&#x2212;&#x202F;19.92, se&#x202F;=&#x202F;2.38, 95% HDI&#x202F;=&#x202F;[&#x2212;24.57, &#x2212;15.26], pd.&#x202F;=&#x202F;100%), there was no evidence of an NP effect in the Proactive versus Reactive comparison (<inline-formula>
<mml:math id="M10">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 0.73, se&#x202F;=&#x202F;2.25, 95% HDI&#x202F;=&#x202F;[&#x2212;3.63, 5.19], pd.&#x202F;=&#x202F;62.67%).</p>
<p>In both cases, ROPE and BF model comparison tests were performed to illustrate how these methods can either provide converging evidence for the alternative hypothesis in the presence of a strong effect, or how they can instead show relative evidence for the null hypothesis when an effect is essentially zero. For the Proactive versus Baseline NP effect, the BF model comparison approach indicated decisive evidence for the alternative hypothesis (<italic>M<sub>1</sub></italic>), with a BF<sub>10</sub>&#x202F;=&#x202F;1.72 &#x00D7; 10<sup>8</sup>. In line with this result, the entirety of the 89 and 95% HDIs fell outside the ROPE region for this effect (see <xref ref-type="fig" rid="fig3">Figure 3A</xref>, for the 89% HDI result). In contrast, the Proactive versus Reactive NP effect exhibited strong evidence for the null hypothesis (<italic>M</italic><sub>0</sub>), both in terms of the computed Bayes Factor, which had a BF<sub>10</sub>&#x202F;=&#x202F;0.038, and the 89% HDI falling completely inside the ROPE range (<xref ref-type="fig" rid="fig3">Figure 3B</xref>). However, follow-up analyses with the 95% HDI indicated that a small portion of the posterior estimate did fall outside the ROPE range (i.e., 0.43%). While this more conservative threshold would technically indicate there is inconclusive evidence for <italic>M<sub>0</sub></italic> versus <italic>M<sub>1</sub></italic>, the default variant of this ROPE procedure combined with BF model comparison overall indicated strong evidence favoring <italic>M</italic><sub>0</sub> over <italic>M</italic><sub>1</sub>. Finally, a sensitivity analysis employing LOO-CV confirmed there was a meaningful Proactive &#x2013; Baseline NP RT effect (&#x2206;<sub>M1&#x2013;M0</sub>elpd<sub>LOO</sub>&#x202F;=&#x202F;&#x2212;20.4, &#x2206;<sub>M1&#x2013;M0</sub>SE<sub>LOO</sub>&#x202F;=&#x202F;4.8) but no evidence for a Proactive &#x2013; Reactive effect (&#x2206;<sub>M1&#x2013;M0</sub>elpd<sub>LOO</sub>&#x202F;=&#x202F;&#x2212;0.3, &#x2206;<sub>M1&#x2013;M0</sub>SE<sub>LOO</sub>&#x202F;=&#x202F;0.2). However, it is important to acknowledge that by itself, the LOO-CV metric does not directly quantify evidence for <italic>M</italic><sub>0</sub> in the latter case. Nevertheless, when considered together, the collective results demonstrate how the Bayesian framework provides various hypothesis testing methods, which can be used in a convergent manner to determine the degree of evidence favoring specific alternative or null hypotheses.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Sternberg WM analyses, focused on evaluating null effects. <bold>(A)</bold> The 89% highest density interval (HDI) of the posterior. The HDI is completely outside the pre-established region of practical equivalence (ROPE) range of &#x2212;5&#x202F;ms to 5&#x202F;ms, indicating strong evidence for the significance of this effect. <bold>(B)</bold> The 89% HDI of the posterior. The HDI is completely within the ROPE range, indicating strong evidence for a null effect.</p>
</caption>
<graphic xlink:href="fpsyg-17-1643463-g003.tif" mimetype="image" mime-subtype="tiff">
<alt-text content-type="machine-generated">Panel A is a density plot for &#x201C;Pro &#x2013; Bas NP RT effect&#x201D; showing posterior parameter values with 89 percent highest density&#x002A; interval in red and 100 percent in cyan; the shaded region indicates the region of practical equivalence to zero&#x002A;. Panel B presents &#x201C;Pro &#x2013; Rea NP RT effect&#x201D; with the same&#x002A; intervals and equivalence region, the highest density interval is within the region of practical equivalence here, reflecting different parameter distributions. Both panels illustrate posterior distributions for comparative parameter analysis.</alt-text>
</graphic>
</fig>
</sec>
<sec id="sec20">
<title>Summary</title>
<p>The HBR analyses strengthened both conclusions reported in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref> regarding the NP RT effect. For the Proactive versus Baseline contrast, the HBR analyses confirmed that there was indeed very strong evidence for the reliability of the effect, both in terms of BF and ROPE methods. However, regarding the more theoretically and methodologically critical Proactive versus Reactive contrast, the HBR analyses even more strongly confirmed the presence of a null effect. The equivalence of Proactive and Reactive NP performance was already suggested in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, not only with RT patterns, but also with the results on NP error rates, which were in the opposite direction of theoretical predictions (and this pattern was confirmed with the current HBR analyses; see <xref ref-type="supplementary-material" rid="SM1">Supplementary materials</xref>). Yet, the hypothesis testing made possible through HBR analysis allowed the conclusion of a null effect in the Proactive versus Reactive comparison to be more strongly asserted beyond what was possible in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>.</p>
<p>In contrast with the null hypothesis significance testing (NHST) framework, which permits only a binary assertion regarding whether the null hypothesis can or cannot be rejected, Bayesian hypothesis testing enables more fine-grained statements regarding the strength of evidence for both the null and alternative hypotheses. This advantageous property of Bayesian frameworks has enabled the development of heuristic (i.e., post-hoc) approaches from which to apply BF metrics, to data that were primarily analyzed with NHST approaches (e.g., as was done in <xref ref-type="bibr" rid="ref54">Tang et al., 2023</xref>, which did report BF indices for the NP RT and error measures). Yet it is important to note that some researchers prefer the ROPE approach as a complementary and potentially stronger metric from which to identify effects that for all practical purposes can be treated as equivalent to zero. Estimation tests for ROPE are more directly aligned with the Bayesian rather than NHST framework (given that they rely upon generation of a full posterior distribution); yet because of this feature, they cannot be computed in a post-hoc, heuristic manner (<xref ref-type="bibr" rid="ref26">Kruschke, 2014</xref>). Here, we took advantage of the fully-fledged HBR analyses, which enabled both BF and ROPE metrics to be computed.</p>
<p>Nevertheless, it is important to acknowledge both metrics still inherit some of the same limitations associated with NHST, in that they also rely upon categorical criteria to make somewhat broad inferences about a tested hypothesis. Furthermore, the derivations that underlie NHST and Bayesian metrics can lead to diverging conclusions in the case of large sample sizes, which will be biased toward the alternative or null hypothesis, respectively, (i.e., Lindley&#x2019;s paradox) (<xref ref-type="bibr" rid="ref9001">Lindley, 1957</xref>). However, we still believe that, when utilized together, our specific application of ROPE and BF model comparison not only align with previous conclusions, but also provide more granular inferences than those allowed by NHST. More specifically, these metrics provided convergent, and thus stronger, evidence in favor of a null Proactive versus Reactive NP effect (i.e., both BF&#x202F;&#x003C;&#x202F;&#x003C; 1/10 and entirety of the [89%] HDI interval within the ROPE)<xref ref-type="fn" rid="fn0008"><sup>8</sup></xref>. In particular, the cumulative results across samples and analytic approaches permit the strong conclusion that, despite the goal of the DMCC project to validate a robust indicator of proactive control for the Sternberg task, this goal was not accomplished. Future work should reconsider task design alterations that have greater potential to elicit unambiguous behavioral markers of proactive control.</p>
</sec>
</sec>
<sec id="sec21">
<title>Precise modeling of RT distributions: Stroop</title>
<sec id="sec22">
<title>Background</title>
<p>In the Stroop task, the congruency cost is of theoretical interest as a potential index of proactive control engagement. In particular, the deployment of proactive control is thought to bring not only performance benefits on high-demand incongruent trials, but also performance costs on congruent trials. This cost&#x2013;benefit pattern should be most pertinent when contrasting Proactive and Reactive conditions, given that proactive control is postulated to involve preparatory mechanisms that enable biasing away from the &#x2018;word&#x2019; dimension across all item types, whereas reactive control biasing is thought to occur only after detection of the relevant color feature and the presence of incongruence, so should not impact congruent items. Further, this key theoretical prediction is most stringently tested on diagnostic/PC-50 items.</p>
<p>Notably, the theoretical prediction regarding congruency cost was not confirmed in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>. Specifically, in that analysis, the congruency cost was found to be statistically unreliable. However, the unreliability of the congruency cost effect reported in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref> also contrasts with its observed robustness across multiple other companion papers (<xref ref-type="bibr" rid="ref16">Gonthier et al., 2016a</xref>, <xref ref-type="bibr" rid="ref17">2016b</xref>; <xref ref-type="bibr" rid="ref7">Bugg, 2017</xref>; <xref ref-type="bibr" rid="ref22">Ileri-Tayar et al., 2025</xref>). Since the congruency cost is a small effect (~10&#x202F;ms), making it potentially sensitive to both measurement error and incorrect assumptions regarding data structure, an important question arises on whether modeling the specific properties of the sample RT distribution impacts the identification of congruency cost effects, in terms of sensitivity and reliability. The flexibility of HBR models to assume both a hierarchical structure and non-Gaussian distribution of the data may therefore be a useful approach to elucidate the presence and strength of the congruency cost in these samples. We thus explored the robustness of the congruency cost metric as a function of the RT distribution employed in the analysis.</p>
</sec>
<sec id="sec23">
<title>Analysis and results</title>
<p>To evaluate the congruency cost, the RT data were first modeled using a shifted log-normal distribution to test for differences between the Proactive and Reactive control modes, restricting analyses to correctly responded diagnostic congruent trials. The following model was used for estimation: <italic>RT ~ 1&#x202F;+&#x202F;mode + (1&#x202F;+&#x202F;mode | ID)</italic>. Mode was dummy coded with Reactive as the reference category to demonstrate the relative congruency cost (i.e., increase in reaction time) of proactive control for congruent diagnostic items. The intercept and mode were entered as random effects nested within subject, so that the congruency cost was also treated as a random effect.</p>
<p>In contrast to the results found in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, but in line with theoretical predictions and other reported findings, the congruency cost was reliably present in the 2018 dataset when analyzed with an HBR model that assumed a shifted log-normal distribution for RTs (<inline-formula>
<mml:math id="M11">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 0.02, se&#x202F;=&#x202F;0.01, 95% HDI&#x202F;=&#x202F;[0, 0.05], pd.&#x202F;=&#x202F;98.99%). Conversely, when this same effect was modeled with a conventional Gaussian distribution, the effect was not statistically reliable (<inline-formula>
<mml:math id="M12">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 8.92, se&#x202F;=&#x202F;5.37, 95% HDI&#x202F;=&#x202F;[&#x2212;1.81, 19.27], pd.&#x202F;=&#x202F;95.09%). These contrasting patterns directly demonstrate that the choice of distribution can affect the inferences made about a small effect like the congruency cost.</p>
<p>To more fully understand this finding, we next sought to isolate which properties of the shifted log-normal distribution were essential for appropriately modeling this effect. To do so, the congruency cost was analyzed with a third distributional function: the ex-Gaussian. This distribution also captures the positive skew of response time data and additionally has a long prior history of work illustrating its increased sensitivity in modeling Stroop RT patterns (<xref ref-type="bibr" rid="ref20">Heathcote et al., 1991</xref>). If capturing both the bulk of fast responses times along with a long tail of slow response times is necessary to accurately model the congruency cost, then the HBR model assuming an ex-Gaussian distribution should match the qualitative finding of the model using a shifted log-normal, but not Gaussian, function. Indeed, this prediction was confirmed, as a model assuming an ex-Gaussian distribution also found a statistically reliable congruency cost (<inline-formula>
<mml:math id="M13">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 9.87, se&#x202F;=&#x202F;3.03, 95% HDI&#x202F;=&#x202F;[3.98, 15.8], pd.&#x202F;=&#x202F;99.94%).</p>
<p>A quick visual inspection of the posterior predictive checks (PPCs) for each of these models provides clear evidence that the shifted log-normal and ex-Gaussian distributions produce an improved fit to the observed RT data, relative to the Gaussian distribution (<xref ref-type="fig" rid="fig4">Figures 4A</xref>&#x2013;<xref ref-type="fig" rid="fig4">C</xref>). However, the difference in predictive accuracy between the ex-Gaussian and shifted log-normal is less clear with this approach; we therefore next sought to more directly quantify the relative predictive accuracy of each distributional function in modeling the congruency cost. Through BF and LOO-CV model comparisons, we defined the ex-Gaussian model as <italic>M</italic><sub>0</sub>, the shifted log-normal model as <italic>M</italic><sub>1</sub> and the Gaussian model as <italic>M</italic><sub>2</sub>. The BF model comparison metric showed decisive evidence favoring the shifted log-normal (<italic>M</italic><sub>1</sub>) over the ex-Gaussian (<italic>M</italic><sub>0</sub>) (i.e, BF<sub>10</sub>&#x202F;=&#x202F;7.4 &#x00D7; 10<sup>24</sup>, or BF<sub>01</sub>&#x202F;=&#x202F;1.35 &#x00D7; 10<sup>&#x2212;25</sup>), while the Gaussian model had the worst fit (<italic>M</italic><sub>0</sub> / <italic>M</italic><sub>2</sub>; BF<sub>20</sub>&#x202F;=&#x202F;0, or BF<sub>02</sub>&#x202F;=&#x202F;Inf). However, LOO-CV model comparison revealed that the difference in predictive accuracy between the shifted log-normal and ex-Gaussian was not meaningful (&#x2206;<sub>M1&#x2013;M0</sub>elpd<sub>LOO</sub>&#x202F;=&#x202F;&#x2212;66.9, &#x2206;<sub>M1&#x2013;M0</sub>SE<sub>LOO</sub>&#x202F;=&#x202F;85.8), although the Gaussian was clearly worse than the shifted log-normal (&#x2206;<sub>M1&#x2013;M2</sub>elpd<sub>LOO</sub>&#x202F;=&#x202F;&#x2212;7563.9, &#x2206;<sub>M1&#x2013;M2</sub> SE<sub>LOO</sub>&#x202F;=&#x202F;338.3).</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Stroop analyses, focused on modeling RT distributions. <bold>(A)</bold> The 2018 PPCs from a shifted lognormal distribution for RT show that the simulated data from this model consistently fits well onto the actual data. <bold>(B)</bold> The 2018 PPCs from a Gaussian distribution show that the simulated data from this model does not fit well onto the actual data in this case (i.e., the simulated data from a Gaussian function is more symmetric with a lower peak than the actual data). <bold>(C)</bold> The 2018 PPCs from an ex-Gaussian distribution show that the simulated data from this model consistently fits well onto the data. <bold>(D)</bold> The 2020 PPCs from a shifted lognormal distribution for RT show that the simulated data from this model consistently fits well onto the actual data. <bold>(E)</bold> The 2020 PPCs from a Gaussian distribution show that the simulated data from this model does not fit well onto the actual data in this case (i.e., the simulated data from a Gaussian function is more symmetric with a lower peak than the actual data). <bold>(F)</bold> The 2020 PPCs from an ex-Gaussian distribution show that the simulated data from this model consistently fits well onto the data.</p>
</caption>
<graphic xlink:href="fpsyg-17-1643463-g004.tif" mimetype="image" mime-subtype="tiff">
<alt-text content-type="machine-generated">Six-panel figure showing posterior predictive checks of Pro&#x2013;Rea Congruency Cost distributions for 2018 and 2020 using three fitting approaches: shifted log-normal (A, D), Gaussian (B, E), and ex-Gaussian (C, F). Each panel compares actual data (black) and simulated data (blue) with closely aligned lines, indicating model fit quality for each distribution and year.</alt-text>
</graphic>
</fig>
<p>Given the discrepancy between the BF and LOO-CV model comparison conclusions, we next sought to investigate how the incorporation of additional information, from the 2020 sample, would impact the results. Interestingly, while the same quantitative and qualitative conclusions regarding the congruency cost were drawn from the cumulative 2020 HBR models that assumed either a shifted log-normal (<inline-formula>
<mml:math id="M14">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 0.03, se&#x202F;=&#x202F;0.01, 95% HDI&#x202F;=&#x202F;[0.01, 0.04], pd.&#x202F;=&#x202F;99.97%) or ex-Gaussian (<inline-formula>
<mml:math id="M15">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 11.11, se&#x202F;=&#x202F;2.45, 95% HDI&#x202F;=&#x202F;[6.22, 15.77], pd.&#x202F;=&#x202F;100%) distribution, the 2020 HBR model with a Gaussian function also showed strong evidence for this effect (<inline-formula>
<mml:math id="M16">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 11.55, se&#x202F;=&#x202F;3.88, 95% HDI&#x202F;=&#x202F;[3.98, 19.21], pd.&#x202F;=&#x202F;99.83%), aligning with the qualitative conclusions of the other models; however, the PPCs for the 2020 HBR models again demonstrated the same pattern of increased predictive accuracy of the skewed distributions, as before (<xref ref-type="fig" rid="fig4">Figures 4D</xref>&#x2013;<xref ref-type="fig" rid="fig4">F</xref>). Furthermore, we once again ran BF and LOO-CV model comparisons. Despite the inclusion of additional data, we found the same contrasting pattern of results across model comparison metrics in which the BF measure favored the shifted log-normal over ex-Gaussian (i.e., BF<sub>10</sub>&#x202F;=&#x202F;8.4 &#x00D7; 10<sup>21</sup>), while the LOO-CV indicated no difference in predictive accuracy between the two metrics (&#x2206;<sub>M1&#x2013;M0</sub>elpd<sub>LOO</sub>&#x202F;=&#x202F;&#x2212;36.4, &#x2206;<sub>M1&#x2013;M0</sub>SE<sub>LOO</sub>&#x202F;=&#x202F;58.3), but again that they were superior to the Gaussian (BF<sub>20</sub>&#x202F;=&#x202F;0; &#x2206;<sub>M1&#x2013;M2</sub>elpd<sub>LOO</sub>&#x202F;=&#x202F;&#x2212;4079.3, &#x2206;<sub>M1&#x2013;M2</sub>SE<sub>LOO</sub>&#x202F;=&#x202F;226.9). This consistent discrepancy regarding the differential benefits of the shifted log-normal and ex-Gaussian likelihood functions reinforces the value of applying multiple analytic approaches to avoid drawing potentially premature conclusions. More importantly to the conceptual principle of this section, both distributions were able to reliably identify the congruency cost within our data.</p>
</sec>
<sec id="sec24">
<title>Summary</title>
<p>The results provide clear evidence that the identification of a subtle RT effect, such as the congruency cost, can be impacted by the chosen distribution to model it. With HBR modeling approaches, it is quite easy to select from a range of possible distributions, and then directly compare their performance to the available data via posterior predictive checks (PPCs) as well as BF and/or LOO-CV model comparisons. While such metrics are not meant to parse apart subtler differences in predictive accuracy between distributions that already capture the general properties of the data, the results from all model comparison metrics proved that the shifted log-normal and ex-Gaussian functions were both superior to the Gaussian distribution in reliably identifying the congruency cost. However, we also demonstrated that even the advantages garnered from skewed distributions versus the standard Gaussian distribution were influenced by the inclusion of both datasets.</p>
<p>At first blush, the findings related to the inclusion of both datasets may seem to indicate diminishing benefits to the use of more accurate distributions for RT data. Yet a more important interpretation relates to the efficiency and flexibility of the HBR framework. Specifically, the use of more appropriate distributions, such as the shifted log-normal or ex-Gaussian, may reliably detect RT effects of interest with less data than that required from a conventional Gaussian model. The degree of efficiency is likely to be moderated by the properties of each dataset. As a result, we advocate for a routine practice of directly comparing multiple likelihood functions to assess their relative predictive accuracy for the data on hand.</p>
<p>For this purpose, PPCs allow for a quick visual comparison of different likelihood functions on the observed data, while BF/LOO-CV model comparison approaches can be applied as computationally intensive, but more quantitatively direct, metrics to determine the best-fitting model. Critically however, these recommendations do not imply that predictive accuracy should be the decisive factor while selecting among different likelihood functions. Instead, likelihood functions can be selected and investigated in a flexible manner, to address various methodological goals. For example, it has been argued that a key advantage of the shifted log-normal function relates to its clear mechanistic interpretability (i.e., subject-level RT distributions should be decomposed into a non-decision time component and evidence accumulation process that results in a rightward skew) (<xref ref-type="bibr" rid="ref19">Haines et al., 2025</xref>). Our goal here was primarily to illustrate the simultaneous predictive and theoretical limitations that come with fitting Gaussian distributions onto RT data. In addition to this more universal point however, these collective findings also practically demonstrate how the additional power gained by alternative likelihood functions can appropriately capture the congruency cost as a reliable metric of proactive control.</p>
</sec>
</sec>
<sec id="sec25">
<title>Precise modeling of error distributions: cued-TS</title>
<sec id="sec26">
<title>Background</title>
<p>Within the Cued-TS paradigm, a primary indicator of interest for reactive control was the task-rule congruency effect (TRCE) for errors. The inclusion of punishment cues, presented shortly before stimulus presentation and paired with high-conflict items, was hypothesized to elicit late-stage reactive control, corresponding to enhanced performance on incongruent trials in the Reactive mode, relative to the Baseline and Proactive modes. Since a strong association should manifest between incongruence and punishment, the performance benefits were expected to extend even to non-incentivized incongruent trials, based on the detection of perceived conflict. This improved performance on incongruent trials was in turn expected to cause a relative reduction in the TRCE error effect within the Reactive mode. Initial evidence for this Reactive TRCE error effect was reported in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>. However, the task design for the Reactive variant was novel at the time, so prior findings could not be considered. For such cases, the HBR approach is especially useful to either provide further validation, or to instead challenge initial findings from novel task manipulations.</p>
<p>When considering binary outcome variables, such as a correct or error response, the use of a logistic function operationalizes each observation in terms of the underlying probability that one outcome versus the other will occur. This statistical approach not only circumvents the distortion of results that can arise when making incorrect modeling assumptions with proportion data, but it also enables direct consideration of trial-level variability within each subject when applied through a hierarchical model. The trial-level granularity achieved by hierarchical logistic models cannot be assessed through the conventional use of error rates, which instead rely upon condition-based averages that omit such information. Indeed, this access to trial-level data allows for the effective modeling of more complex random effects within each participant. In other words, an error rate model will be more likely than a logistic regression model to run into convergence issues while modeling interaction terms that vary by subject. To effectively illustrate the specific advantages of modeling accuracy data as log odds versus error rates, both the logistic regression and error rate models were matched to include the same simple random effects (see <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref> Results, for qualitatively similar results from simultaneously modeling the TRCE error effect with a logistic function and maximal random effect structure). Here we illustrate how HBR models with a logistic function and a hierarchical structure (i.e., trials nested within conditions, further nested within individuals) can provide a more rigorous test of the Reactive TRCE effect than what was possible through the original analyses provided by <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>.</p>
</sec>
<sec id="sec27">
<title>Analysis and results</title>
<p>The task-rule congruency effect (TRCE) for errors was modeled as the relative log-odds of making an error on incongruent trials versus congruent ones, with the following model estimating the difference in this effect across modes: <italic>Correct ~ 1&#x202F;+&#x202F;con.id&#x002A;mode + (1&#x202F;+&#x202F;con.id | ID)</italic>. The outcome variable &#x2018;Correct&#x2019; was coded so the model predicts the log odds of making an error (i.e., correct&#x202F;=&#x202F;0; incorrect&#x202F;=&#x202F;1). The predictor &#x2018;con.id&#x2019; was dummy coded to assess the difference in performance between incongruent and congruent trials (i.e., the TRCE effect). The predictor &#x2018;mode&#x2019; was dummy coded, with separate models either using Baseline or Proactive as the reference category to test the relative effect of the Reactive mode on the log odds to make an error. Finally, the interaction effect between &#x2018;mode&#x2019; and con.id&#x2019; reflects the key parameter of interest as the difference in TRCE error effect between modes. Both intercept and con.id are treated as random variables nested within subject to model the TRCE error as a random effect. Only non-incentivized trials were selected for analyses to allow for direct comparisons between the Reactive mode versus Baseline and Proactive. Furthermore, only biased (i.e., mostly congruent) trials were included since they were expected to elicit the largest TRCE error across modes. The key hypothesis was that the TRCE error would be reduced in the Reactive mode versus the Baseline and Proactive modes.</p>
<p>In contrast to these predictions however, the 2018 HBR models indicated that the TRCE error was actually increased, rather than decreased, in the Reactive as compared to Baseline mode (<inline-formula>
<mml:math id="M17">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 0.37, se&#x202F;=&#x202F;0.11, 95% HDI&#x202F;=&#x202F;[0.15, 0.58], pd&#x202F;=&#x202F;99.98%) (<xref ref-type="fig" rid="fig5">Figure 5A</xref>); furthermore there was little evidence for a difference between the Reactive and Proactive modes (<inline-formula>
<mml:math id="M18">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> = 0.14, se&#x202F;=&#x202F;0.11, 95% HDI&#x202F;=&#x202F;[&#x2212;0.08, 0.35], pd&#x202F;=&#x202F;89.62%) (<xref ref-type="fig" rid="fig5">Figure 5B</xref>). These findings were qualitatively distinct from those reported in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>. Specifically, in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, it was reported that the TRCE error was reduced in Reactive relative to both Baseline and Proactive, with the latter contrast showing high statistical reliability. Given this discrepancy in findings, we were interested in testing whether the logistic function provided a more accurate way to model the TRCE error.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>Cued-TS analyses, focused on modeling accuracy distributions. To generate these means, we removed the intercept from all models. <bold>(A)</bold> Marginal means of the reactive and baseline cued-TS errors in log odds units. There are significant main effects of mode and congruency as well as a positive interaction effect (which is opposite to what was found in <xref ref-type="bibr" rid="ref54">Tang et al., 2023</xref> and theoretical predictions). <bold>(B)</bold> Marginal means of the reactive and proactive cued-TS errors in log odds units. There are significant main effects of mode and congruency but no significant interaction effect (also different from what was found in <xref ref-type="bibr" rid="ref54">Tang et al., 2023</xref>). <bold>(C)</bold> Marginal means of the reactive and baseline cued-TS errors in error rate units. There are significant main effects of mode and congruency as well as a negative interaction effect. <bold>(D)</bold> Marginal means of the reactive and proactive cued-TS errors in error rate units. There are significant main effects of mode and congruency as well as a negative interaction effect.</p>
</caption>
<graphic xlink:href="fpsyg-17-1643463-g005.tif" mimetype="image" mime-subtype="tiff">
<alt-text content-type="machine-generated">Four-panel figure displaying marginal means of cued-task-switching errors. Panels A and B show log odds errors; A compares baseline versus reactive and B compares proactive versus reactive, each for congruent and incongruent conditions. Panels C and D present corresponding error rates. Congruent and incongruent conditions are color-coded red and blue, respectively. Each panel includes error bars and axis labels for condition and error metric.</alt-text>
</graphic>
</fig>
<p>Rather than summarizing the error rates as proportions and incorrectly assuming a Gaussian distribution, the use of logistic regression directly models the probability of an error through a Bernoulli distribution, transforming the data into log odds so that the function can more sensitively capture differences between low and high probabilities. While standard approaches assume a linear effect between predictors and the outcome variable, probabilities typically follow a sigmoidal function, in which the strength of probability changes is dependent on their initial values (i.e., the largest absolute change will be around <italic>p</italic> =&#x202F;0.5, and the smallest changes will be near their extremes, due to their bounded nature). By utilizing a logit link to transform the data onto a log-odds scale, logistic regression will appropriately strengthen differences between conditions when the reference group is close to floor/ceiling. Since a ceiling effect will particularly apply to congruent trials across modes versus incongruent trials, this transformation provides a potential explanation for why the current results were in the opposite direction of initial findings. To test this interpretation of the findings, the results were re-analyzed by using error rates rather than log odds as the outcome variable. Indeed, when running a HBR model on the 2018 data, but with error rates, a consistent pattern with <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref> is observed, with a strong reduction in the TRCE effect observed for Reactive versus both Baseline (<inline-formula>
<mml:math id="M19">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> =&#x202F;&#x2212;&#x202F;0.02, se&#x202F;=&#x202F;0.01, 95% HDI&#x202F;=&#x202F;[&#x2212;0.03, 0], pd.&#x202F;=&#x202F;99.56%) (see <xref ref-type="fig" rid="fig5">Figure 5C</xref>) and Proactive (<inline-formula>
<mml:math id="M20">
<mml:mi>&#x03B2;</mml:mi>
</mml:math>
</inline-formula> =&#x202F;&#x2212;&#x202F;0.06, se&#x202F;=&#x202F;0.01, 95% HDI&#x202F;=&#x202F;[&#x2212;0.07, &#x2212;0.04], pd.&#x202F;=&#x202F;100%) modes (see <xref ref-type="fig" rid="fig5">Figure 5D</xref>).</p>
</sec>
<sec id="sec28">
<title>Summary</title>
<p>The Cued-TS findings provide a clear demonstration that initial conclusions regarding effects from novel paradigms may not be consistent across different analytic approaches and metrics, thus highlighting the importance of carefully selecting statistical methods that lead to accurate and precise results. More specifically, the analyses modeling the difference in TRCE error between modes found there was only very weak evidence for a reduced Reactive effect (relative to Proactive) when utilizing a logistic function. Although these results conflict with previous conclusions drawn by <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>, greater confidence can be had in the validity of the results obtained from the HBR analysis, due to the now well-established benefits of hierarchical logistic regression models (<xref ref-type="bibr" rid="ref24">Jaeger, 2008</xref>; <xref ref-type="bibr" rid="ref9">Dixon, 2008</xref>; <xref ref-type="bibr" rid="ref21">Houpt and Bittner, 2018</xref>).</p>
<p>However, it is also important to clarify that these findings are not in strong contradiction with the theoretical claims of the DMC framework, regarding benefits of Reactive control for Cued-TS performance. Indeed, when restricting analyses to just the incongruent trials, there were clear observed performance benefits present in the Reactive mode, regardless of whether the outcome variable was modeled in terms of error rates or the log odds to make an error (see <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref> for additional analyses on the simple and main effects of condition on accuracy performance). The primary distinctions between the models were instead on congruent trials, which had overall very low error rates (i.e., less than 5%). As a result, differences between control modes for these trials were not reliably detected in the models using summary error rate measures but were found to be consistently greater when modeled in terms of log odds. Since hierarchical logistic regression models are still not the norm for analyses of error rates in cognitive control tasks, it is possible that other effects involving low error rate conditions (e.g., congruent trials) have also been missed in prior studies. Consequently, an important direction for future work would be to better understand the circumstances in which task manipulations to the Cued-TS paradigm, such as that of the reactive control variant, might lead to performance improvements for both congruent and incongruent items.</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion" id="sec29">
<title>Discussion</title>
<p>The goal of this paper was to demonstrate how Hierarchical Bayesian Regression (HBR) models could be utilized to provide more detailed and valid statistical inferences, in terms of both parameter estimation and hypothesis testing, using the DMCC task battery as a case study example. In particular, we leveraged the following advantageous features of the HBR approach, to build upon prior findings with the DMCC dataset, initially reported in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>: (1) the combined implementation of a sequential updating procedure and SDR metric to assess the replicability and consistency of the reduced Proactive and Reactive BX error interference effects; (2) the utilization of multiple null-hypothesis metrics (i.e., ROPE and BF model comparison) to test the relative strength of evidence against the Proactive NP RT effect (at least in contrast to reactive); (3) the library of available RT likelihood functions (e.g., ex-Gaussian and shifted log-normal) to more accurately model the Proactive congruency cost; and (4) the application of hierarchical logistic regression to appropriately model the reduced Reactive TRCE error effect. For each of these effects, more clearly refined quantitative and/or qualitative conclusions could be drawn for each of the DMCC tasks. Despite these clear benefits, a full assessment of the HBR approach also warrants an acknowledgement of its potential limitations, which we next turn to below.</p>
<sec id="sec30">
<title>Limitations</title>
<p>While it is beyond the scope of this paper to compare other possible analytic approaches with our methods, we address here one clear alternative for how analyses comparing two independent samples might have been performed. In contrast to the sequential updating procedure used throughout this paper, the two available datasets could have been directly aggregated to perform a type of mega-analysis (<xref ref-type="bibr" rid="ref12">Eisenhauer, 2021</xref>). The direct integration of two datasets and inclusion of &#x2018;sample&#x2019; as an additional covariate (i.e., to compare the 2018 and 2020 samples) would enable a cumulative estimate to be computed, while also enabling detection of significant differences between the two samples. Consequently, it could be argued that the use of data-informative priors did not add anything meaningful to our results. Yet we believe the sequential updating procedure provided a more meaningful inference, regarding whether new information would significantly modify the original conclusions drawn regarding a particular estimate (i.e., we are not comparing the estimate between two datasets, but rather the estimate before and after the integration of new information), such as those reported for the BX error interference effect in <xref ref-type="bibr" rid="ref54">Tang et al. (2023)</xref>. Furthermore, since the original and cumulative estimates could be operationalized in terms of prior and posterior distributions, more specific hypotheses could be made regarding the relative probability of values within each distribution as well as the relative probability of values between the two distributions (<xref ref-type="bibr" rid="ref58">Verhagen and Wagenmakers, 2014</xref>; <xref ref-type="bibr" rid="ref63">Wagenmakers et al., 2016</xref>; <xref ref-type="bibr" rid="ref33">Ly et al., 2019</xref>; <xref ref-type="bibr" rid="ref29">Lin et al., 2024</xref>).</p>
<p>Yet it is important to acknowledge a clear limitation of sequential updating, in that the data-informative priors serve as proxy metrics that may not fully represent the attributes of the original dataset (i.e., directly combining the datasets may lead to more accurate cumulative estimates). Conversely, a standard aggregation approach not only loses the advantages associated with sequential updating, but it is inherently restricted to contexts where one has direct access to the available data as well as sufficient time and CPU memory/power to continuously aggregate new datasets (<xref ref-type="bibr" rid="ref41">Oravecz et al., 2016</xref>). The benefits of informative priors come through their simultaneous efficiency and flexibility, which enable HBR models to not only have a stronger starting point from which to generate appropriate estimates, but also allow for the inclusion of more limited external information (e.g., fixed/random estimates from previous models). While care should be taken that these priors do not bias the results, with prior sensitivity analyses checking the degree to which the prior specifications may cause different outcomes, the ability to use data-informative priors should ultimately be considered as a clear strength of HBR models.</p>
<p>In our case, we took a conservative approach, only utilizing data-informed priors in sequential updating analyses testing for inter-sample reliability. However, future analyses should more fully consider the use of weakly informative priors to enable more efficient and tractable parameter estimation. Prior predictive checks can be used to carefully select a set of intermediate priors that are between the default of uninformed priors and model-based ones from previous samples (i.e., vaguely informative priors which are more widely dispersed than the latter but still more constrained than the former). The selection of these moderate priors not only lead to fewer convergence issues for more complex models but can also significantly speed up their runtime (<xref ref-type="bibr" rid="ref48">Schad et al., 2021</xref>).</p>
<p>The case-study results described here highlight advantages of the HBR approach. However, these advantages are also balanced by important tradeoffs in computation time, particularly when modeling different underlying data structures (e.g., multiple sources of variability, non-linear relationships between variables and structures, and non-Gaussian likelihood functions). In our case, we were able to conduct a comprehensive set of HBR analyses on the DMCC task battery by making use of a university-wide computing cluster (i.e., the &#x2018;Center for High Performance Computing&#x2019;), which provided additional CPU resources to both speed up individual model runtime and allow multiple models to be run in parallel. Additionally, the more sophisticated computing interface &#x2018;cmdstanr&#x2019; can utilize within-chain parallelization for each model (i.e., assigning multiple cores for each chain to reduce their execution time). Nevertheless, the inherent computational challenges of HBR models indicate the importance of acquiring additional resources for large-scale projects and following best practices to efficiently implement them (e.g., using weak rather than uninformative priors, prioritizing the most relevant random effects to model).</p>
</sec>
<sec id="sec31">
<title>General recommendations</title>
<p>For the interested researcher who wishes to switch away from conventional statistical tests to an HBR modeling approach, we recommend a gradual transition. By first understanding the different components of HBR and the benefits to be gained from each, one can flexibly utilize models of varying complexities to address the specific needs of a project. Through the use of DMCC case study examples, we sought to make the overarching point that conventional statistical approaches like t-tests and ANOVAs are not optimized to address the kinds of data that are routine within experimental psychology research, such as non-Gaussian outcome variables (i.e., accuracy, RT) and trial-level data nested within individuals. While the benefits of HBR modeling may be less notable for large group-level effects and sample sizes, we believe that conclusions drawn in experimental psychology research may be particularly susceptible to misinterpretations if systematic trial-level variability is ignored. In addition to each subject having notable differences in their baseline cognitive characteristics, there will also be changes that can be expected in behavioral performance as participants become more familiar with the underlying structure of a given task. For instance, <xref ref-type="bibr" rid="ref59">Viviani et al. (2023b</xref>, <xref ref-type="bibr" rid="ref60">2024)</xref> highlighted the value of explicitly modeling within-subject changes in Stroop performance as each participant learned the potential statistical regularities within the task environment. As a result of these factors, we suggest that the continued use of conventional approaches will contribute to the inflation of false negative/positive results, which were identified in our own work for the Stroop congruency cost and Cued-TS TRCE error effect, respectively.</p>
<p>Although more sophisticated approaches are necessary to properly address issues such as cross-trial learning, it is important to acknowledge that generalized hierarchical models by themselves (i.e., non-Bayesian) may be conceptually sufficient to accurately model response time and error rate data. For example, a non-Gaussian hierarchical logistic regression model could have been applied to the TRCE error effect to reach the same qualitative conclusion regarding its failed replication. Similarly, a non-Gaussian distribution from standard generalized hierarchical modeling packages like lme4() could have been assumed for the Stroop RT data. Nevertheless, we do recommend the brms() package as the most flexible one for RT modeling, as it offers a wider library of likelihood functions, including the shifted log-normal and ex-Gaussian functions that were specifically derived for RT data in cognitive psychology paradigms.</p>
<p>Since frequentist generalized hierarchical models can already yield novel insights within the task battery, our overarching goal is not necessarily to argue for a Bayesian framework over a frequentist one, but instead to highlight the possible advantages that can come with utilizing more advanced analytic approaches, relative to that of conventional statistical methods. Still, we felt that HBR models were particularly well-suited for us to make more nuanced conclusions regarding key phenomenon of interest. Indeed, these models not only encompass the capabilities of earlier models but also extend beyond them to readily offer a set of additional advantages, such as in providing cumulative parameter estimation across multiple sources of information, and in allowing for more nuanced statements regarding inter-sample replicability and null evaluation.</p>
<p>Beyond our current application, HBR models can also incorporate even more sophisticated analytic approaches. For instance, Diffusion Decision models (DDM) extend beyond the likelihood functions described within this manuscript (i.e., shifted log-normal and ex-Gaussian, Bernoulli) by simultaneously modeling both trial-level accuracy and response time data (<xref ref-type="bibr" rid="ref44">Ratcliff and McKoon, 2008</xref>; <xref ref-type="bibr" rid="ref37">Myers et al., 2022</xref>). Through the inclusion of additional mechanistic parameters (i.e., boundary separation, starting point, drift rate and non-decision time), DDMs can more fully mimic the diffusion process that underlies most 2-choice cognitive control paradigms, in which participants will noisily accumulate evidence in favor of one decision over the other. Indeed, this approach can even more accurately capture the proactive/reactive control indicators within our Cued-TS, AX-CPT and SternbergWM task variants (e.g., TRCE error/RT effects, BX interference error/RT effects, and NP error/RT effects, respectively). While such models would not be appropriate for the DMCC Stroop tasks, which involve many potential vocal responses as well as ceiling-level accuracy rates, the inverse Gaussian/Wald likelihood function could be applied to simulate this diffusion process independently from the choice itself (<xref ref-type="bibr" rid="ref1">Anders et al., 2016</xref>; <xref ref-type="bibr" rid="ref32">Luce, 1991</xref>). Although these more complex models require additional computational time and resources to perform, DDMs and related models have become increasingly applied within experimental psychology, to gain an improved understanding of cognitive variability. For instance, <xref ref-type="bibr" rid="ref2">Aschenbrenner and Jackson (2025)</xref> recently implemented DDMs through the &#x2018;brms&#x2019; package to examine how healthy ageing and mild cognitive impairment impacts sustained attentional performance across all diffusion model parameters. Specifically, ageing in general was shown to strongly impact mean drift rate, boundary separation and non-decision time, with mild cognitive impairment exacerbating drift rates. Given their feasibility and practical benefits, we plan to apply such models to the DMCC datasets in a future project. More generally, the ability to also utilize DDMs and inverse Gaussian distributions further illustrate the flexibility and power of HBR approaches.</p>
<p>Indeed, the primary contribution of the HBR framework is that it allows for the full incorporation of the advantages found within many different analytic techniques, providing a comprehensive foundation from which to assess and characterize phenomena of interest. We therefore recommend a scaffolding perspective to understand the benefits that can be gained from adopting analytic approaches of varying complexity. Although the many differences between standard and complex analytic approaches may initially seem daunting, there has been a growing movement toward making these methods more accessible with affordable hardware, user-friendly software/packages, and detailed tutorials on how to implement them (<xref ref-type="bibr" rid="ref14">Gelman and Hill, 2006</xref>; <xref ref-type="bibr" rid="ref26">Kruschke, 2014</xref>; <xref ref-type="bibr" rid="ref35">McElreath, 2018</xref>; <xref ref-type="bibr" rid="ref56">van de Schoot et al., 2014</xref>; <xref ref-type="bibr" rid="ref38">Nalborczyk et al., 2019</xref>; <xref ref-type="bibr" rid="ref57">Veenman et al., 2024</xref>). As such, interested researchers will be in the position to make better informed decisions regarding when the use of HBR modeling is warranted for their scientific questions of interest.</p>
</sec>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="sec32">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>. Data and code for all analyses are publicly available on the OSF platform (<ext-link xlink:href="https://osf.io/bj6fy/" ext-link-type="uri">https://osf.io/bj6fy/</ext-link>), further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="ethics-statement" id="sec33">
<title>Ethics statement</title>
<p>The studies involving humans were approved by the Institutional Review Board of Washington University in St. Louis. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.</p>
</sec>
<sec sec-type="author-contributions" id="sec34">
<title>Author contributions</title>
<p>TD: Formal analysis, Visualization, Writing &#x2013; original draft, Investigation, Methodology, Validation, Writing &#x2013; review &#x0026; editing, Software, Conceptualization. JJ: Validation, Supervision, Formal analysis, Writing &#x2013; review &#x0026; editing. SC: Writing &#x2013; review &#x0026; editing, Supervision. TB: Supervision, Investigation, Conceptualization, Funding acquisition, Resources, Project administration, Writing &#x2013; review &#x0026; editing, Data curation, Writing &#x2013; original draft, Methodology.</p>
</sec>
<ack>
<title>Acknowledgments</title>
<p>The authors would like to thank Erin Gourley for her contributions to DMCC task programming, Rongxiang Tang for her many contributions to data collection and initial analyses of the 2018 DMCC dataset, Rachel Brough for data collection and administration of the 2020 DMCC dataset, Joset Etzel for her expertise and advice regarding data management and quality control, and members of the Cognitive Control and Psychopathology lab for fruitful discussions and feedback regarding this project.</p>
</ack>
<sec sec-type="COI-statement" id="sec35">
<title>Conflict of interest</title>
<p>The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="sec36">
<title>Generative AI statement</title>
<p>The author(s) declared that Generative AI was not used in the creation of this manuscript.</p>
<p>Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.</p>
</sec>
<sec sec-type="disclaimer" id="sec37">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="sec38">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2026.1643463/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fpsyg.2026.1643463/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="data_sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Anders</surname><given-names>R.</given-names></name> <name><surname>Alario</surname><given-names>F.</given-names></name> <name><surname>Van Maanen</surname><given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>The shifted wald distribution for response time data analysis</article-title>. <source>Psychol. Methods</source> <volume>21</volume>:<fpage>309</fpage>. doi: <pub-id pub-id-type="doi">10.1037/met0000066</pub-id>, <pub-id pub-id-type="pmid">26867155</pub-id></mixed-citation></ref>
<ref id="ref2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aschenbrenner</surname><given-names>A. J.</given-names></name> <name><surname>Jackson</surname><given-names>J.</given-names></name></person-group> (<year>2025</year>). <article-title>A diffusion model account of cognitive variability in healthy aging and mild cognitive impairment</article-title>. <source>Exp. Aging Res.</source> <volume>51</volume>, <fpage>285</fpage>&#x2013;<lpage>302</lpage>. doi: <pub-id pub-id-type="doi">10.1080/0361073X.2024.2409588</pub-id>, <pub-id pub-id-type="pmid">39344176</pub-id></mixed-citation></ref>
<ref id="ref3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Berger</surname><given-names>J.</given-names></name> <name><surname>Bernardo</surname><given-names>J.</given-names></name> <name><surname>Sun</surname><given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>Overall objective priors</article-title>. <source>Bayesian Anal.</source> <volume>10</volume>, <fpage>189</fpage>&#x2013;<lpage>221</lpage>. doi: <pub-id pub-id-type="doi">10.1214/14-BA915</pub-id></mixed-citation></ref>
<ref id="ref4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braver</surname><given-names>T. S.</given-names></name></person-group> (<year>2012</year>). <article-title>The variable nature of cognitive control: a dual mechanisms framework</article-title>. <source>Trends Cogn. Sci.</source> <volume>16</volume>, <fpage>106</fpage>&#x2013;<lpage>113</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tics.2011.12.010</pub-id>, <pub-id pub-id-type="pmid">22245618</pub-id></mixed-citation></ref>
<ref id="ref5"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braver</surname><given-names>T. S.</given-names></name> <name><surname>Kizhner</surname><given-names>A.</given-names></name> <name><surname>Tang</surname><given-names>R.</given-names></name> <name><surname>Freund</surname><given-names>M. C.</given-names></name> <name><surname>Etzel</surname><given-names>J. A.</given-names></name></person-group> (<year>2021</year>). <article-title>The dual mechanisms of cognitive control project</article-title>. <source>J. Cogn. Neurosci.</source> <volume>96</volume>, <fpage>434</fpage>&#x2013;<lpage>442</lpage>. doi: <pub-id pub-id-type="doi">10.1162/jocn_a_01768</pub-id>, <pub-id pub-id-type="pmid">15026468</pub-id></mixed-citation></ref>
<ref id="ref6"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname><given-names>V.</given-names></name></person-group> (<year>2021</year>). <article-title>An introduction to linear mixed-effects modeling in R</article-title>. <source>Adv. Methods Pract. Psychol. Sci.</source> <volume>4</volume>, <fpage>1</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1177/2515245920960351</pub-id></mixed-citation></ref>
<ref id="ref7"><mixed-citation publication-type="other"><person-group person-group-type="author"><name><surname>Bugg</surname><given-names>J. M.</given-names></name></person-group> (<year>2017</year>). <source>&#x201C;Context, conflict and control&#x201D;, in T. Egner. (Ed.), The Wiley Handbook of Cognitive Control, New York: Wiley Blackwell</source>, <fpage>79</fpage>&#x2013;<lpage>96</lpage>. doi: <pub-id pub-id-type="doi">10.1002/9781118920497.ch5</pub-id></mixed-citation></ref>
<ref id="ref8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>B&#x00FC;rkner</surname><given-names>P.-C.</given-names></name></person-group> (<year>2017</year>). <article-title>Brms: an R package for Bayesian multilevel models using Stan</article-title>. <source>J. Stat. Softw.</source> <volume>80</volume>, <fpage>1</fpage>&#x2013;<lpage>28</lpage>. doi: <pub-id pub-id-type="doi">10.18637/jss.v080.i01</pub-id></mixed-citation></ref>
<ref id="ref9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dixon</surname><given-names>P.</given-names></name></person-group> (<year>2008</year>). <article-title>Models of accuracy in repeated-measures designs</article-title>. <source>J. Mem. Lang.</source> <volume>59</volume>, <fpage>447</fpage>&#x2013;<lpage>456</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jml.2007.11.004</pub-id></mixed-citation></ref>
<ref id="ref10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dunson</surname><given-names>D.</given-names></name></person-group> (<year>2001</year>). <article-title>Commentary: practical advantages of Bayesian analysis of epidemiologic data</article-title>. <source>Am. J. Epidemiol.</source> <volume>153</volume>, <fpage>1222</fpage>&#x2013;<lpage>1226</lpage>. doi: <pub-id pub-id-type="doi">10.1093/aje/153.12.1222</pub-id>, <pub-id pub-id-type="pmid">11415958</pub-id></mixed-citation></ref>
<ref id="ref11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Efron</surname><given-names>B.</given-names></name></person-group> (<year>2012</year>). <article-title>Why isn&#x2019;t everyone a Bayesian?</article-title> <source>Am. Stat.</source> <volume>40</volume>, <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi: <pub-id pub-id-type="doi">10.1080/00031305.1986.10475342</pub-id></mixed-citation></ref>
<ref id="ref12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eisenhauer</surname><given-names>J. G.</given-names></name></person-group> (<year>2021</year>). <article-title>Meta-analysis and mega-analysis: a simple introduction</article-title>. <source>Teach. Stat.</source> <volume>43</volume>, <fpage>21</fpage>&#x2013;<lpage>27</lpage>. doi: <pub-id pub-id-type="doi">10.1111/test.12242</pub-id></mixed-citation></ref>
<ref id="ref13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Erceg-Hurn</surname><given-names>D. M.</given-names></name> <name><surname>Mirosevich</surname><given-names>V. M.</given-names></name></person-group> (<year>2008</year>). <article-title>Modern robust statistical methods: an easy way to maximize the accuracy and power of your research</article-title>. <source>Am. Psychol.</source> <volume>63</volume>, <fpage>591</fpage>&#x2013;<lpage>601</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0003-066X.63.7.591</pub-id>, <pub-id pub-id-type="pmid">18855490</pub-id></mixed-citation></ref>
<ref id="ref14"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Gelman</surname><given-names>A.</given-names></name> <name><surname>Hill</surname><given-names>J.</given-names></name></person-group> (<year>2006</year>). <source>Data Analysis Using Regression and Multilevel/Hierarchical Models</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</mixed-citation></ref>
<ref id="ref15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname><given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>Subjective bayesian analysis: principles and practice</article-title>. <source>Bayesian Anal.</source> <volume>1</volume>, <fpage>403</fpage>&#x2013;<lpage>420</lpage>. doi: <pub-id pub-id-type="doi">10.1214/06-BA116</pub-id></mixed-citation></ref>
<ref id="ref16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gonthier</surname><given-names>C.</given-names></name> <name><surname>Braver</surname><given-names>T. S.</given-names></name> <name><surname>Bugg</surname><given-names>J. M.</given-names></name></person-group> (<year>2016a</year>). <article-title>Dissociating proactive and reactive control in the Stroop task</article-title>. <source>Mem. Cogn.</source> <volume>44</volume>, <fpage>778</fpage>&#x2013;<lpage>788</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13421-016-0591-1</pub-id>, <pub-id pub-id-type="pmid">26861210</pub-id></mixed-citation></ref>
<ref id="ref17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gonthier</surname><given-names>C.</given-names></name> <name><surname>Macnamara</surname><given-names>B. N.</given-names></name> <name><surname>Chow</surname><given-names>M.</given-names></name> <name><surname>Conway</surname><given-names>A. R. A.</given-names></name> <name><surname>Braver</surname><given-names>T. S.</given-names></name></person-group> (<year>2016b</year>). <article-title>Inducing proactive control shifts in the AX-CPT</article-title>. <source>Front. Psychol.</source> <volume>7</volume>, <fpage>1</fpage>&#x2013;<lpage>14</lpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2016.01822</pub-id>, <pub-id pub-id-type="pmid">27920741</pub-id></mixed-citation></ref>
<ref id="ref18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Greenland</surname><given-names>S.</given-names></name> <name><surname>Senn</surname><given-names>S. J.</given-names></name> <name><surname>Rothman</surname><given-names>K. J.</given-names></name> <name><surname>Carlin</surname><given-names>J. B.</given-names></name> <name><surname>Poole</surname><given-names>C.</given-names></name> <name><surname>Goodman</surname><given-names>S. N.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations</article-title>. <source>Eur. J. Epidemiol.</source> <volume>31</volume>, <fpage>337</fpage>&#x2013;<lpage>350</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10654-016-0149-3</pub-id>, <pub-id pub-id-type="pmid">27209009</pub-id></mixed-citation></ref>
<ref id="ref19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haines</surname><given-names>N.</given-names></name> <name><surname>Kvam</surname><given-names>P. D.</given-names></name> <name><surname>Irving</surname><given-names>L.</given-names></name> <name><surname>Smith</surname><given-names>C. T.</given-names></name> <name><surname>Beauchaine</surname><given-names>T. P.</given-names></name> <name><surname>Pitt</surname><given-names>M. A.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>A tutorial on using generative models to advance psychological science: Lessons from the reliability paradox</article-title>. <source>Psychological Methods. Advance online publication</source>, <fpage>1</fpage>&#x2013;<lpage>22</lpage>. doi: <pub-id pub-id-type="doi">10.1037/met0000674</pub-id></mixed-citation></ref>
<ref id="ref20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Heathcote</surname><given-names>A.</given-names></name> <name><surname>Popiel</surname><given-names>S. J.</given-names></name> <name><surname>Mewhort</surname><given-names>D. J.</given-names></name></person-group> (<year>1991</year>). <article-title>Analysis of response time distributions: an example using the Stroop task</article-title>. <source>Psychol. Bull.</source> <volume>109</volume>, <fpage>340</fpage>&#x2013;<lpage>347</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0033-2909.109.2.340</pub-id></mixed-citation></ref>
<ref id="ref21"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Houpt</surname><given-names>J.</given-names></name> <name><surname>Bittner</surname><given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Analyzing thresholds and efficiency with hierarchical bayesian logistic regression</article-title>. <source>Vis. Res.</source> <volume>148</volume>, <fpage>49</fpage>&#x2013;<lpage>58</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.visres.2018.04.004</pub-id>, <pub-id pub-id-type="pmid">29678536</pub-id></mixed-citation></ref>
<ref id="ref22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ileri-Tayar</surname><given-names>M.</given-names></name> <name><surname>Bugg</surname><given-names>J.</given-names></name> <name><surname>Dudey</surname><given-names>T.</given-names></name> <name><surname>Braver</surname><given-names>T.</given-names></name></person-group> (<year>2025</year>). <article-title>Proactive control declines while reactive control is preserved across the adult lifespan</article-title>. <source>J. Exp. Psychol. Gen.</source> <volume>154</volume>, <fpage>3029</fpage>&#x2013;<lpage>3047</lpage>. doi: <pub-id pub-id-type="doi">10.1037/xge0001824</pub-id>, <pub-id pub-id-type="pmid">40853776</pub-id></mixed-citation></ref>
<ref id="ref23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ioannidis</surname><given-names>J.</given-names></name></person-group> (<year>2005</year>). <article-title>Why Most published research findings are false</article-title>. <source>PLoS Med.</source> <volume>19</volume>, <fpage>e1004085</fpage>&#x2013;<lpage>e1000701</lpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pmed.1004085</pub-id>, <pub-id pub-id-type="pmid">36007233</pub-id></mixed-citation></ref>
<ref id="ref24"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jaeger</surname><given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Categorical data analysis: away from ANOVAs (transformation or not) and towards logit mixed models</article-title>. <source>J. Mem. Lang.</source> <volume>59</volume>, <fpage>434</fpage>&#x2013;<lpage>446</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jml.2007.11.007</pub-id>, <pub-id pub-id-type="pmid">19884961</pub-id></mixed-citation></ref>
<ref id="ref25"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kelter</surname><given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>Bayesian alternative to null hypothesis significance testing in biomedical research: a non-technical introduction to Bayesian inference with JASP</article-title>. <source>BMC Med. Res. Methodol.</source> <volume>20</volume>:<fpage>142</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12874-020-00980-6</pub-id>, <pub-id pub-id-type="pmid">32503439</pub-id></mixed-citation></ref>
<ref id="ref26"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Kruschke</surname><given-names>J. K.</given-names></name></person-group> (<year>2014</year>). <source>Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan</source>. <edition>2nd</edition> Edn. <publisher-name>Burlington: Academic Press</publisher-name>.</mixed-citation></ref>
<ref id="ref27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kruschke</surname><given-names>J. K.</given-names></name></person-group> (<year>2018</year>). <article-title>Rejecting or accepting parameter values in Bayesian estimation</article-title>. <source>Adv. Methods Pract. Psychol. Sci.</source> <volume>1</volume>, <fpage>270</fpage>&#x2013;<lpage>280</lpage>. doi: <pub-id pub-id-type="doi">10.1177/2515245918771304</pub-id></mixed-citation></ref>
<ref id="ref28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lartillot</surname><given-names>N.</given-names></name></person-group> (<year>2023</year>). <article-title>Identifying the best approximating model in Bayesian phylogenetics: Bayes factors, cross-validation or wAIC?</article-title> <source>Syst. Biol.</source> <volume>72</volume>, <fpage>616</fpage>&#x2013;<lpage>638</lpage>. doi: <pub-id pub-id-type="doi">10.1093/sysbio/syad004</pub-id>, <pub-id pub-id-type="pmid">36810802</pub-id></mixed-citation></ref>
<ref id="ref29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname><given-names>Y.</given-names></name> <name><surname>Brough</surname><given-names>R. E.</given-names></name> <name><surname>Tay</surname><given-names>A.</given-names></name> <name><surname>Jackson</surname><given-names>J. J.</given-names></name> <name><surname>Braver</surname><given-names>T. S.</given-names></name></person-group> (<year>2024</year>). <article-title>Working memory capacity preferentially enhances implementation of proactive control</article-title>. <source>Journal of Experimental Psychology: Learning, Memory and Cognition</source>, <volume>50</volume>, <fpage>287</fpage>&#x2013;<lpage>305</lpage>. doi: <pub-id pub-id-type="doi">10.1037/xlm0001195</pub-id></mixed-citation></ref>
<ref id="ref9001"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lindley</surname><given-names>D. V.</given-names></name></person-group> (<year>1957</year>). <article-title>A Statistical Paradox</article-title>. <source>Biometrika</source> <volume>44</volume>, <fpage>187</fpage>&#x2013;<lpage>192</lpage>. doi: <pub-id pub-id-type="doi">10.2307/2333251</pub-id></mixed-citation></ref>
<ref id="ref30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lindley</surname><given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>That wretched prior</article-title>. <source>Significance</source> <volume>1</volume>, <fpage>85</fpage>&#x2013;<lpage>87</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1740-9713.2004.026.x</pub-id></mixed-citation></ref>
<ref id="ref31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lo</surname><given-names>S.</given-names></name> <name><surname>Andrews</surname><given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>To transform or not to transform: using generalized linear models to analyse reaction time data</article-title>. <source>Front. Psychol.</source> <volume>6</volume>, <fpage>1</fpage>&#x2013;<lpage>16</lpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2015.01171</pub-id></mixed-citation></ref>
<ref id="ref32"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Luce</surname><given-names>R. D.</given-names></name></person-group> (<year>1991</year>). <source>Response Times: Their role in Inferring Elementary Mental Organization</source>. <publisher-name>New York: Oxford University Press</publisher-name>.</mixed-citation></ref>
<ref id="ref33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ly</surname><given-names>A.</given-names></name> <name><surname>Etz</surname><given-names>A.</given-names></name> <name><surname>Marsman</surname><given-names>M.</given-names></name> <name><surname>Wagenmakers</surname><given-names>E. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Replication Bayes factors from evidence updating</article-title>. <source>Behav. Res. Methods</source> <volume>51</volume>, <fpage>2498</fpage>&#x2013;<lpage>2508</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13428-018-1092-x</pub-id>, <pub-id pub-id-type="pmid">30105445</pub-id></mixed-citation></ref>
<ref id="ref34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Makowski</surname><given-names>D.</given-names></name> <name><surname>Ben-Shachar</surname><given-names>M. S.</given-names></name> <name><surname>Chen</surname><given-names>S. A.</given-names></name> <name><surname>L&#x00FC;decke</surname><given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Indices of effect existence and significance in the Bayesian framework</article-title>. <source>Front. Psychol.</source> <volume>10</volume>:<fpage>2767</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2019.02767</pub-id>, <pub-id pub-id-type="pmid">31920819</pub-id></mixed-citation></ref>
<ref id="ref35"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>McElreath</surname><given-names>R.</given-names></name></person-group> (<year>2018</year>). <source>Statistical Rethinking: A Bayesian Course with Examples in R and Stan</source>. <publisher-name>New York: Chapman and Hall/CRC</publisher-name>.</mixed-citation></ref>
<ref id="ref36"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Meteyard</surname><given-names>L.</given-names></name> <name><surname>Davies</surname><given-names>R. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Best practice guidance for linear mixed-effects models in psychological science</article-title>. <source>J. Mem. Lang.</source> <volume>112</volume>:<fpage>104092</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jml.2020.104092</pub-id></mixed-citation></ref>
<ref id="ref37"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Myers</surname><given-names>C. E.</given-names></name> <name><surname>Interian</surname><given-names>A.</given-names></name> <name><surname>Moustafa</surname><given-names>A.</given-names> <suffix>A.</suffix></name></person-group> (<year>2022</year>). <article-title>A practical introduction to using the drift diffusion model of decision-making in cognitive psychology, neuroscience, and health sciences</article-title>. <source>Front. Psychol.</source> <volume>13</volume>, <fpage>1</fpage>&#x2013;<lpage>26</lpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2022.1039172</pub-id>, <pub-id pub-id-type="pmid">36571016</pub-id></mixed-citation></ref>
<ref id="ref38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nalborczyk</surname><given-names>L.</given-names></name> <name><surname>Batailler</surname><given-names>C.</given-names></name> <name><surname>L&#x0153;venbruck</surname><given-names>H.</given-names></name> <name><surname>Vilain</surname><given-names>A.</given-names></name> <name><surname>B&#x00FC;rkner</surname><given-names>P. C.</given-names></name></person-group> (<year>2019</year>). <article-title>An introduction to Bayesian multilevel models using brms: a case study of gender effects on vowel variability in standard Indonesian</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>62</volume>, <fpage>1225</fpage>&#x2013;<lpage>1242</lpage>. doi: <pub-id pub-id-type="doi">10.1044/2018_JSLHR-S-18-0006</pub-id>, <pub-id pub-id-type="pmid">31082309</pub-id></mixed-citation></ref>
<ref id="ref39"><mixed-citation publication-type="other"><person-group person-group-type="author"><name><surname>O&#x2019;Hagan</surname><given-names>A.</given-names></name></person-group> (<year>2008</year>). <source>&#x201C;The Bayesian Approach to Statistics&#x201D;, in T. Rudas (Ed.), Handbook of Probability Theory and Applications, Thousand Oaks: Sage Publications</source>, <volume>62</volume>, <fpage>85</fpage>&#x2013;<lpage>100</lpage>.</mixed-citation></ref>
<ref id="ref40"><mixed-citation publication-type="journal"><collab id="coll1">Open Science Foundation</collab> (<year>2015</year>). <article-title>Estimating the reproducibility of psychological science</article-title>. <source>Science</source> <volume>349</volume>, <fpage>6251</fpage>. doi: <pub-id pub-id-type="doi">10.1126/science.aac4716</pub-id></mixed-citation></ref>
<ref id="ref41"><mixed-citation publication-type="other"><person-group person-group-type="author"><name><surname>Oravecz</surname><given-names>Z.</given-names></name> <name><surname>Huentelman</surname><given-names>M.</given-names></name> <name><surname>Vandekerckhove</surname><given-names>J.</given-names></name></person-group> (<year>2016</year>). <chapter-title>Oravecz, Z. Huentelman, M., and Vandekerckhove, J. (2016). &#x201C;Sequential Bayesian updating for big data&#x201D;, in M. N. Jones (Ed.), Big data in cognitive science, New York: Psychology Press</chapter-title>. <fpage>22</fpage>&#x2013;<lpage>42</lpage>. Available at: <ext-link xlink:href="https://escholarship.org/uc/item/3fk408t3" ext-link-type="uri">https://escholarship.org/uc/item/3fk408t3</ext-link></mixed-citation></ref>
<ref id="ref42"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname><given-names>R.</given-names></name></person-group> (<year>1979</year>). <article-title>Group reaction time distributions and an analysis of distribution statistics</article-title>. <source>Psychol. Bull.</source> <volume>86</volume>, <fpage>446</fpage>&#x2013;<lpage>461</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0033-2909.86.3.446</pub-id>, <pub-id pub-id-type="pmid">451109</pub-id></mixed-citation></ref>
<ref id="ref43"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname><given-names>R.</given-names></name></person-group> (<year>1993</year>). <article-title>Methods for dealing with reaction time outliers</article-title>. <source>Psychol. Bull.</source> <volume>114</volume>, <fpage>510</fpage>&#x2013;<lpage>532</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0033-2909.114.3.510</pub-id>, <pub-id pub-id-type="pmid">8272468</pub-id></mixed-citation></ref>
<ref id="ref44"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname><given-names>R.</given-names></name> <name><surname>McKoon</surname><given-names>G.</given-names></name></person-group> (<year>2008</year>). <article-title>The diffusion decision model: theory and data for two-choice decision tasks</article-title>. <source>Neural Comput.</source> <volume>20</volume>, <fpage>873</fpage>&#x2013;<lpage>922</lpage>. doi: <pub-id pub-id-type="doi">10.1162/neco.2008.12-06-420</pub-id>, <pub-id pub-id-type="pmid">18085991</pub-id></mixed-citation></ref>
<ref id="ref45"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rouder</surname><given-names>J. N.</given-names></name> <name><surname>Haaf</surname><given-names>J. M.</given-names></name></person-group> (<year>2019</year>). <article-title>A psychometrics of individual differences in experimental tasks</article-title>. <source>Psychon. Bull. Rev.</source> <volume>26</volume>, <fpage>452</fpage>&#x2013;<lpage>467</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-018-1558-y</pub-id></mixed-citation></ref>
<ref id="ref46"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rouder</surname><given-names>J.</given-names></name> <name><surname>Haaf</surname><given-names>J.</given-names></name> <name><surname>Vandekerckhove</surname><given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Bayesian inference for psychology, part IV: parameter estimation and Bayes factors</article-title>. <source>Psychon. Bull. Rev.</source> <volume>25</volume>, <fpage>102</fpage>&#x2013;<lpage>113</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-017-1420-7</pub-id>, <pub-id pub-id-type="pmid">29441460</pub-id></mixed-citation></ref>
<ref id="ref47"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rouder</surname><given-names>J.</given-names></name> <name><surname>Lu</surname><given-names>J.</given-names></name></person-group> (<year>2005</year>). <article-title>An introduction to Bayesian hierarchical models with an application in the theory of signal detection</article-title>. <source>Psychon. Bull. Rev.</source> <volume>12</volume>, <fpage>573</fpage>&#x2013;<lpage>604</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03196750</pub-id>, <pub-id pub-id-type="pmid">16447374</pub-id></mixed-citation></ref>
<ref id="ref48"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schad</surname><given-names>D.</given-names></name> <name><surname>Betancourt</surname><given-names>M.</given-names></name> <name><surname>Vasishth</surname><given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Toward a principled Bayesian workflow in cognitive science</article-title>. <source>Psychol. Methods</source> <volume>26</volume>, <fpage>103</fpage>&#x2013;<lpage>126</lpage>. doi: <pub-id pub-id-type="doi">10.1037/met0000275</pub-id>, <pub-id pub-id-type="pmid">32551748</pub-id></mixed-citation></ref>
<ref id="ref49"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schmalz</surname><given-names>X.</given-names></name> <name><surname>Biurrun Manresa</surname><given-names>J.</given-names></name> <name><surname>Zhang</surname><given-names>L.</given-names></name></person-group> (<year>2023</year>). <article-title>What is a Bayes factor?</article-title> <source>Psychol. Methods</source> <volume>28</volume>:<fpage>705</fpage>. doi: <pub-id pub-id-type="doi">10.1037/met0000421</pub-id>, <pub-id pub-id-type="pmid">34780246</pub-id></mixed-citation></ref>
<ref id="ref50"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schramm</surname><given-names>P.</given-names></name> <name><surname>Rouder</surname><given-names>J.</given-names> <suffix>N.</suffix></name></person-group> (<year>2019</year>). <article-title>Are reaction time transformations really beneficial?</article-title> <source>PsyArXiv Preprints.</source> <fpage>1</fpage>&#x2013;<lpage>28</lpage>. doi: <pub-id pub-id-type="doi">10.31234/osf.io/9ksa6</pub-id></mixed-citation></ref>
<ref id="ref51"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Singmann</surname><given-names>H.</given-names></name> <name><surname>Kellen</surname><given-names>D.</given-names></name></person-group> (<year>2019</year>). <chapter-title>An Introduction to Mixed Models for Experimental Psychology, in D. H. Spieler and E. Schumacher (Eds.) New Methods in Cognitive Psychology, New York: Psychology Press</chapter-title>, <fpage>4</fpage>&#x2013;<lpage>31</lpage>.</mixed-citation></ref>
<ref id="ref52"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Snijder</surname><given-names>J.-P.</given-names></name> <name><surname>Tang</surname><given-names>R.</given-names></name> <name><surname>Bugg</surname><given-names>J. M.</given-names></name> <name><surname>Conway</surname><given-names>A. R. A.</given-names></name> <name><surname>Braver</surname><given-names>T. S.</given-names></name></person-group> (<year>2023</year>). <article-title>On the psychometric evaluation of cognitive control tasks: an investigation with the dual mechanisms of cognitive control (DMCC) battery</article-title>. <source>Behav. Res. Methods</source> <volume>56</volume>, <fpage>1604</fpage>&#x2013;<lpage>1639</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13428-023-02111-7</pub-id>, <pub-id pub-id-type="pmid">37040066</pub-id></mixed-citation></ref>
<ref id="ref53"><mixed-citation publication-type="other"><collab id="coll2">Stan Development Team</collab> (<year>2025</year>). <article-title>Stan Reference Manual, 2.37</article-title>. Available online at: <ext-link xlink:href="https://mc-stan.org" ext-link-type="uri">https://mc-stan.org</ext-link> (Accessed March 11, 2025).</mixed-citation></ref>
<ref id="ref54"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname><given-names>R.</given-names></name> <name><surname>Bugg</surname><given-names>J. M.</given-names></name> <name><surname>Snijder</surname><given-names>J.-P.</given-names></name> <name><surname>Conway</surname><given-names>A. R. A.</given-names></name> <name><surname>Braver</surname><given-names>T. S.</given-names></name></person-group> (<year>2023</year>). <article-title>The dual mechanisms of cognitive control (DMCC) project: validation of an online behavioural task battery</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>76</volume>, <fpage>1457</fpage>&#x2013;<lpage>1480</lpage>. doi: <pub-id pub-id-type="doi">10.1177/17470218221114769</pub-id>, <pub-id pub-id-type="pmid">35815536</pub-id></mixed-citation></ref>
<ref id="ref55"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Torsen</surname><given-names>E.</given-names></name></person-group> (<year>2015</year>). <article-title>Objective versus subjective Bayesian inference: a comparative study (2015)</article-title>. <source>Int. J. Adv. Res.</source> <volume>3</volume>, <fpage>56</fpage>&#x2013;<lpage>65</lpage>.</mixed-citation></ref>
<ref id="ref56"><mixed-citation publication-type="other"><person-group person-group-type="author"><name><surname>van de Schoot</surname><given-names>R.</given-names></name> <name><surname>Kaplan</surname><given-names>D.</given-names></name> <name><surname>Denissen</surname><given-names>J.</given-names></name> <name><surname>Asendorpf</surname><given-names>J.</given-names></name> <name><surname>Neyer</surname><given-names>F.</given-names></name></person-group>, and van Aken (<year>2014</year>). <article-title>A gentle introduction to Bayesian analysis: applications to developmental research</article-title>. <source>Child Dev.</source>, (<volume>85</volume>)(3), <fpage>842</fpage>&#x2013;<lpage>860</lpage>. doi:<pub-id pub-id-type="doi">10.1111/cdev.12169</pub-id>, <pub-id pub-id-type="pmid">24116396</pub-id></mixed-citation></ref>
<ref id="ref57"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Veenman</surname><given-names>M.</given-names></name> <name><surname>Stefan</surname><given-names>A. M.</given-names></name> <name><surname>Haaf</surname><given-names>J. M.</given-names></name></person-group> (<year>2024</year>). <article-title>Bayesian hierarchical modeling: an introduction and reassessment</article-title>. <source>Behav Res</source> <volume>56</volume>, <fpage>4600</fpage>&#x2013;<lpage>4631</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13428-023-02204-3</pub-id>, <pub-id pub-id-type="pmid">37749423</pub-id></mixed-citation></ref>
<ref id="ref58"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Verhagen</surname><given-names>J.</given-names></name> <name><surname>Wagenmakers</surname><given-names>E.-J.</given-names></name></person-group> (<year>2014</year>). <article-title>Bayesian tests to quantify the result of a replication attempt</article-title>. <source>J. Exp. Psychol. Gen.</source> <volume>143</volume>, <fpage>1457</fpage>&#x2013;<lpage>1475</lpage>. doi: <pub-id pub-id-type="doi">10.1037/a0036731</pub-id>, <pub-id pub-id-type="pmid">24867486</pub-id></mixed-citation></ref>
<ref id="ref59"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Viviani</surname><given-names>G.</given-names></name> <name><surname>Visalli</surname><given-names>A.</given-names></name> <name><surname>Finos</surname><given-names>L.</given-names></name> <name><surname>Vallesi</surname><given-names>A.</given-names></name> <name><surname>Ambrosini</surname><given-names>E.</given-names></name></person-group> (<year>2023b</year>). <article-title>A comparison between different variants of the spatial Stroop task: the influence of analytic flexibility on Stroop effect estimates and reliability</article-title>. <source>Behav. Res. Methods</source> <volume>56</volume>, <fpage>934</fpage>&#x2013;<lpage>951</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13428-023-02091-8</pub-id>, <pub-id pub-id-type="pmid">36894759</pub-id></mixed-citation></ref>
<ref id="ref60"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Viviani</surname><given-names>G.</given-names></name> <name><surname>Visalli</surname><given-names>A.</given-names></name> <name><surname>Montefinese</surname><given-names>M.</given-names></name> <name><surname>Vallesi</surname><given-names>A.</given-names></name> <name><surname>Ambrosini</surname><given-names>E.</given-names></name></person-group> (<year>2024</year>). <article-title>Tango of control: the interplay between proactive and reactive control</article-title>. <source>J. Exp. Psychol. Gen.</source> <volume>153</volume>, <fpage>1644</fpage>&#x2013;<lpage>1670</lpage>. doi: <pub-id pub-id-type="doi">10.1037/xge0001585</pub-id>, <pub-id pub-id-type="pmid">38661633</pub-id></mixed-citation></ref>
<ref id="ref61"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wacholder</surname><given-names>S.</given-names></name> <name><surname>Chanock</surname><given-names>S.</given-names></name> <name><surname>Garcia-Closas</surname><given-names>M.</given-names></name> <name><surname>El ghormli</surname><given-names>L.</given-names></name> <name><surname>Rothman</surname><given-names>N.</given-names></name></person-group> (<year>2004</year>). <article-title>Assessing the probability that a positive report is false: an approach for molecular epidemiology studies</article-title>. <source>J. Natl. Cancer Inst.</source> <volume>96</volume>, <fpage>434</fpage>&#x2013;<lpage>442</lpage>. doi: <pub-id pub-id-type="doi">10.1093/jnci/djh075</pub-id>, <pub-id pub-id-type="pmid">15026468</pub-id></mixed-citation></ref>
<ref id="ref62"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wagenmakers</surname><given-names>E.-J.</given-names></name> <name><surname>Lodewyckx</surname><given-names>T.</given-names></name> <name><surname>Kuriyal</surname><given-names>H.</given-names></name> <name><surname>Grasman</surname><given-names>R.</given-names></name></person-group> (<year>2010</year>). <article-title>Bayesian hypothesis testing for psychologists: a tutorial on the savage-dickey method</article-title>. <source>Cogn. Psychol.</source> <volume>60</volume>, <fpage>158</fpage>&#x2013;<lpage>189</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cogpsych.2009.12.001</pub-id>, <pub-id pub-id-type="pmid">20064637</pub-id></mixed-citation></ref>
<ref id="ref63"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wagenmakers</surname><given-names>E.-J.</given-names></name> <name><surname>Morey</surname><given-names>R. D.</given-names></name> <name><surname>Lee</surname><given-names>M. D.</given-names></name></person-group> (<year>2016</year>). <article-title>Bayesian benefits for the pragmatic researcher</article-title>. <source>Curr. Dir. Psychol. Sci.</source> <volume>25</volume>, <fpage>169</fpage>&#x2013;<lpage>176</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0963721416643289</pub-id></mixed-citation></ref>
<ref id="ref64"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wagenmakers</surname><given-names>E.-J.</given-names></name> <name><surname>Marsman</surname><given-names>M.</given-names></name> <name><surname>Jamil</surname><given-names>T.</given-names></name> <name><surname>Ly</surname><given-names>A.</given-names></name> <name><surname>Verhagen</surname><given-names>J.</given-names></name> <name><surname>Love</surname><given-names>J.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Bayesian inference for psychology. Part I: advantages and practical ramifications</article-title>. <source>Psychon. Bull. Rev.</source> <volume>25</volume>, <fpage>35</fpage>&#x2013;<lpage>57</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-017-1343-3</pub-id></mixed-citation></ref>
<ref id="ref65"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname><given-names>B.</given-names></name> <name><surname>Krott</surname><given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Data trimming procedure can eliminate bilingual cognitive advantage</article-title>. <source>Psychon. Bull. Rev.</source> <volume>23</volume>, <fpage>1221</fpage>&#x2013;<lpage>1230</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-015-0981-6</pub-id>, <pub-id pub-id-type="pmid">26608692</pub-id></mixed-citation></ref>
<ref id="ref66"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname><given-names>Y.</given-names></name> <name><surname>Skidmore</surname><given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>A reassessment of ANOVA reporting practices: a review of three APA journals</article-title>. <source>J. Methods Meas. Soc. Sci.</source> <volume>8</volume>, <fpage>3</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.2458/v8i1.22019</pub-id></mixed-citation></ref>
</ref-list>
<fn-group>
<fn fn-type="custom" custom-type="edited-by" id="fn0009">
<p>Edited by: <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/129731/overview">Igor Douven</ext-link>, Universit&#x00E9; Paris-Sorbonne, France</p>
</fn>
<fn fn-type="custom" custom-type="reviewed-by" id="fn0010">
<p>Reviewed by: <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3101762/overview">Mahbod Mehrvarz</ext-link>, University of California, Irvine, United States</p>
<p><ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3207399/overview">Margherita Calderan</ext-link>, University of Padua, Italy</p>
</fn>
</fn-group>
<fn-group>
<fn id="fn0001">
<label>1</label>
<p>Note our chosen term &#x2018;hierarchical&#x2019; is often used interchangeably with the terms &#x2018;multilevel&#x2019; and &#x2018;mixed-effect&#x2019; in this literature; all of these terms refer to statistical models that account for systematically clustered observations.</p>
</fn>
<fn id="fn0002">
<label>2</label>
<p>A second paper published on the 2018 dataset also used HBR approaches, but those analyses were focused on test&#x2013;retest reliability and cross-task correlations (<xref ref-type="bibr" rid="ref52">Snijder et al., 2023</xref>). Consequently, this is the first analysis of the 2018 dataset using HBR approaches for statistical inference regarding within-subject group average effects.</p>
</fn>
<fn id="fn0003">
<label>3</label>
<p>HDIs are a type of credible interval (CI) that inherently incorporates the posterior distribution&#x2019;s scale. While equal tailed intervals (ETIs) are the default CI in &#x2018;brms&#x2019; and ensure there are evenly sized tails on both sides of the distribution, HDIs are better suited to account for potentially skewed distributions and ROPE analyses (<xref ref-type="bibr" rid="ref26">Kruschke, 2014</xref>). We therefore only report HDIs in our paper/analyses.</p>
</fn>
<fn id="fn0004">
<label>4</label>
<p>An astute reader may note that age is a known variable that impacts the ability or tendency to implement cognitive control. Therefore, the allowance of a wide age range could impact the results. However, the analyses described here are focused on the general advantages of HBR models, and the role of replication datasets. As such, the issue of age effects is not germane, and should not impact any of the conclusions being drawn.</p>
</fn>
<fn id="fn0005">
<label>5</label>
<p>The 2018 wave 1 testing sample did not collect age, so age range/mean/sd were generated through the wave 1 retesting sample and wave 2 testing sample.</p>
</fn>
<fn id="fn0006">
<label>6</label>
<p>While subject-level filtering criteria were applied at the trial/item type level for <xref ref-type="bibr" rid="ref52">Snijder et al. (2023)</xref>, these criteria were later deemed to be overly restrictive. As a result, all our criteria were only applied at the condition level (i.e., session/mode and phase).</p>
</fn>
<fn id="fn0007">
<label>7</label>
<p>To ensure that the &#x2018;Intercept&#x2019; parameter represents the reference categories for each predictor instead of their centered values, the &#x2018;0 + intercept&#x2019; syntax is included on the right-hand side &#x2018;brms&#x2019; formulas, instead of conventional &#x2018;1 + &#x2026;&#x2019; in frequentist modeling packages (<xref ref-type="bibr" rid="ref8">B&#x00FC;rkner, 2017</xref>). This syntax especially helps to ensure appropriate prior specifications (e.g., our other sample model output for the intercept will be matched to our current model, rather than its centered counterpart). However, we write our model formulas in the manuscript with &#x2018;1&#x202F;+&#x202F;&#x2026;&#x2019; syntax to avoid confusion among non-brms package users.</p>
</fn>
<fn id="fn0008">
<label>8</label>
<p>These results were from two separate models fit onto different subsets of the data (i.e., Proactive and Baseline data; Proactive and Reactive data). Based on the Reviewer&#x2019;s suggestion, we additionally ran models that not only included all levels of the &#x2018;mode&#x2019; predictor but also every trial and item type (i.e., NP/RN/NN and critical/non-critical items). While BF model comparison is not feasible for such a model, we did observe that the 89% HDI for the Proactive &#x2013; Reactive NP RT effect was now completely outside our ROPE interval. Still, this effect was in the opposite direction of theoretical predictions, thus serving as additional evidence against the effectiveness of our experimental manipulation.</p>
</fn>
</fn-group>
</back>
</article>