<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<?covid-19-tdm?>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="brief-report" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2023.1268074</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Uncovering Differential Item Functioning effects using MIMIC and mediated MIMIC models</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes"><name><surname>Tsaousis</surname> <given-names>Ioannis</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/232808/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-&#x0026;-editing/"/>
</contrib>
<contrib contrib-type="author"><name><surname>Alahmandi</surname> <given-names>Maisaa Taleb S.</given-names></name><xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/978261/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-&#x0026;-editing/"/>
</contrib>
<contrib contrib-type="author"><name><surname>Asiri</surname> <given-names>Halimah</given-names></name><xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2414355/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-&#x0026;-editing/"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Psychology, National and Kapodistrian University of Athens (NKUA)</institution>, <addr-line>Athens</addr-line>, <country>Greece</country></aff>
<aff id="aff2"><sup>2</sup><institution>Education and Training Evaluation Commission (ETEC)</institution>, <addr-line>Riyadh</addr-line>, <country>Saudi Arabia</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0001">
<p>Edited by: Hamdollah Ravand, Vali-E-Asr University of Rafsanjan, Iran</p>
</fn>
<fn fn-type="edited-by" id="fn0002">
<p>Reviewed by: Yi-Hsin Chen, University of South Florida, United States; Wolfgang Lenhard, Julius Maximilian University of W&#x00FC;rzburg, Germany</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Ioannis Tsaousis, <email>ioantsaousis@psych.uoa.gr</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>10</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1268074</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>07</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>05</day>
<month>10</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Tsaousis, Alahmandi and Asiri.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Tsaousis, Alahmandi and Asiri</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The aim of this study was twofold: first, to examine the presence of bias across gender in a scholastic achievement test named the Academic Achievement Test (AAT) for the Science Track. Second, to understand the underlying mechanism that causes these bias effects by examining the effect of general cognitive ability as a mediator. The sample consisted of 1,300 Saudi high school students randomly selected from a larger pool of 173,133 participants to reduce the effects of excessive power. To examine both goals, the Multiple Indicators Multiple Causes (MIMIC) approach for detecting Differential Item Functioning (DIF) items was used. The results showed that 13 AAT items exhibited DIF effects for different gender groups. In most of these items, male participants were more likely to answer them correctly than their female counterparts. Next, the mediated MIMIC approach was applied to explore possible underlying mechanisms that explain these DIF effects. The results from this study showed that general cognitive ability (i.e., General Aptitude Test - GAT) seems to be a factor that could explain why an AAT item exhibits DIF across gender. It was found that GAT scores fully explain the DIF effect in two AAT items (full mediation). In most other cases, GAT helps account for only a proportion of the DIF effect (partial mediation). The results from this study will help experts improve the quality of their instruments by identifying DIF items and deciding how to revise them, considering the mediator&#x2019;s effect on participants&#x2019; responses.</p>
</abstract>
<kwd-group>
<kwd>Differential Item Functioning (DIF)</kwd>
<kwd>uniform DIF</kwd>
<kwd>MIMIC approach</kwd>
<kwd>mediation analysis</kwd>
<kwd>mediated MIMIC model</kwd>
</kwd-group>
<counts>
<fig-count count="1"/>
<table-count count="3"/>
<equation-count count="1"/>
<ref-count count="26"/>
<page-count count="6"/>
<word-count count="4695"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Quantitative Psychology and Measurement</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1"><label>1.</label>
<title>Introduction</title>
<p>In modern psychometrics, there is an increasing interest in identifying and understanding what causes a Differential Item Functioning (DIF) effect (<xref ref-type="bibr" rid="ref19">Raykov and Marcoulides, 2011</xref>). DIF refers to a situation where an item performs differently across groups of individuals even though those individuals are supposed to have the same level of the trait being measured (<xref ref-type="bibr" rid="ref10">Dorans and Holland, 1993</xref>). DIF can be caused by cultural, societal, or demographic variables, and it can undermine the fairness and validity of a test or assessment (<xref ref-type="bibr" rid="ref1">Ackerman, 1994</xref>). DIF can be categorized into two main types: uniform and non-uniform. An item shows uniform DIF when the performance of one group is always superior to another group for each ability level. On the other hand, non-uniform DIF occurs when an item&#x2019;s bias varies across different levels of the latent trait. Therefore, it is important first to identify DIF items and remove them from the scale.</p>
<p>Several statistical methods for identifying items with DIF have been proposed within the Classical Test Theory (CTT) and the Item Response Theory (IRT). Within the IRT framework, the model-based likelihood ratio test is an approach that is typically used to evaluate the significance of observed differences in parameter estimates between groups (<xref ref-type="bibr" rid="ref23">Thissen et al., 1993</xref>). Other methods include the likelihood ratio goodness-of-fit test (<xref ref-type="bibr" rid="ref22">Thissen et al., 1986</xref>) and the simultaneous item bias test (SIBTEST) method (<xref ref-type="bibr" rid="ref20">Shealy and Stout, 1993</xref>). Within the CTT framework, the Mantel&#x2013;Haenszel (MH) approach (<xref ref-type="bibr" rid="ref13">Holland and Thayer, 1988</xref>) and the logistic regression (LR) procedure (<xref ref-type="bibr" rid="ref21">Swaminathan and Rogers, 1990</xref>) are some of the most popular approaches.</p>
<p>Structural Equation Modelling (SEM) also provides a comprehensive framework for examining and understanding the DIF issue (<xref ref-type="bibr" rid="ref5">Camilli and Shepard, 1994</xref>). Within this context, several different methods have been suggested, including the Multi-Group CFA method (MG-CFA; <xref ref-type="bibr" rid="ref18">Pae and Park, 2006</xref>), the modification indices method (<xref ref-type="bibr" rid="ref6">Chan, 2000</xref>), and the Multiple-Indicator, Multiple-Causes approach (MIMIC; <xref ref-type="bibr" rid="ref15">MacIntosh and Hashim, 2003</xref>). One of the major advantages of the MIMIC approach over the MG-CFA method is that it uses the entire sample of responses to estimate model parameters and test for DIF (<xref ref-type="bibr" rid="ref8">Chun et al., 2016</xref>). In this case, the total sample size needed for detecting DIF is smaller than that needed in the MG-CFA approach, where model parameters are estimated separately for each contrasted group (<xref ref-type="bibr" rid="ref002">Muth&#x00E9;n, 1989</xref>). Additionally, several explanatory variables (e.g., demographic) can be included within a MIMIC model, allowing us to identify possible causes of DIF. An example of a MIMIC DIF model is shown in <xref rid="fig1" ref-type="fig">Figure 1</xref> (upper panel), in which a grouping variable (Gender) has direct effects on the items of the scale (e.g., AAT<sub>i</sub>) and the latent mean (e.g., scholastic achievement) simultaneously.</p>
<fig position="float" id="fig1"><label>Figure 1</label>
<caption>
<p>MIMIC and mediated MIMIC models for testing DIF effects. <bold>(A)</bold> The standard MIMIC approach to detecting DIF. <bold>(B)</bold> The mediated MIMIC approach to detecting DIF.</p>
</caption>
<graphic xlink:href="fpsyg-14-1268074-g001.tif"/>
</fig>
<p>Recently, <xref ref-type="bibr" rid="ref7">Cheng et al. (2016)</xref> proposed a method for detecting DIF items in which they combined the MIMIC methodology with mediation analysis to uncover possible causes of DIF effects. In mediation analysis, it is hypothesized that the independent variable (e.g., Gender) affects the dependent variable (e.g., the item AAT<sub>i</sub>) <italic>via</italic> an intervening variable called the mediator (e.g., GAT Score) (<xref ref-type="bibr" rid="ref2">Baron and Kenny, 1986</xref>). The effect of the mediator in the relationship between the independent and dependent variables can be either full (the direct relationship between Gender and AAT<sub>i</sub> disappears after the effect of the mediator is controlled) or partial (the mediator can only explain a part of the relationship between the Gender and AAT<sub>i</sub>). This relationship constitutes a uniform DIF and is graphically presented in <xref rid="fig1" ref-type="fig">Figure 1</xref> (lower panel).</p>
</sec>
<sec id="sec2"><label>2.</label>
<title>Research purpose and specific aims</title>
<p>Previous studies have shown that gender is assumed to considerably affect students&#x2019; academic performance since many studies have shown that boys and girls perform differently (e.g., <xref ref-type="bibr" rid="ref003">Voyer and Voyer, 2014</xref>). Nevertheless, not all studies agree on the direction and magnitude of this difference (e.g., <xref ref-type="bibr" rid="ref12">Else-Quest et al., 2010</xref>), and the gender gap in academic attainment is still an open question. This study uses gender as a grouping variable to examine possible DIF effects on academic achievement. It was hypothesized that the response to an AAT item (e.g., AAT<sub>i</sub>), which measures scholastic achievement (i.e., the latent variable), involves some general cognitive ability level (i.e., the mediator). Thus, cognitive ability, as measured by the General Aptitude Test (GAT), will completely or partially mediate the relationship between gender and a response to an AAT item when controlling for scholastic achievement. In this study, only uniform DIF was examined.</p>
</sec>
<sec sec-type="methods" id="sec3"><label>3.</label>
<title>Methods</title>
<sec id="sec4"><label>3.1.</label>
<title>Participants and procedure</title>
<p>Previous simulation studies on Differential Item Functioning (DIF) and mediation analysis suggested that with a sample size as large as 1,000 or up and a mediation effect of 0.10 or up, the analysis has enough power to provide robust results (<xref ref-type="bibr" rid="ref7">Cheng et al., 2016</xref>). Therefore, to reduce the effects of excessive power, a sample of 1,300 participants was randomly selected from a larger pool of 173,133 high school students who completed an achievement test as part of a national examination process. Of them, 648 (49.8%) were males, and 652 (50.2%) were females. The participants&#x2019; mean age was 17.99 (SD&#x2009;=&#x2009;0.53). In terms of place of residence, participants originated from all 13 regions of Saudi Arabia. The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of the Education &#x0026; Training Evaluation Commission (Approval Code: TR369-2023, Approval Date: 20/11/2022).</p>
</sec>
<sec id="sec5"><label>3.2.</label>
<title>Measures</title>
<sec id="sec6"><label>3.2.1.</label>
<title>The academic achievement test for the science track (AAT; education and training evaluation commission - ETEC)</title>
<p>The AAT is a 44-item admission test that measures achievement level in accordance with university study readiness standards. It consists of four subscales that focus on the general outcomes of the following courses: First, second-, and third-year Biology (12 items), Chemistry (10 items), Physics (10 items), and Mathematics (12 items) of the secondary stage (grades 10, 11, and 12). The AAT test items are in a multiple-choice format and are scored as correct (1) or wrong (0). The test has a 50-min duration and is presented in Arabic.</p>
</sec>
<sec id="sec7"><label>3.2.2.</label>
<title>General aptitude test (GAT) for science major (education and training evaluation commission - ETEC)</title>
<p>This is a general cognitive ability test developed in the Arabic language that measures analytical and deductive skills. It is composed of two cognitive domains: (a) language-related skills (68 items) and (b) numerical-related skills (52 items). Each domain comprises several subdomains, including word meaning, sentence completion, reading comprehension, arithmetic, analysis, geometry, etc. The global cognitive ability factor composed of the scores from the two domain scales was the only available score from this test in this study. All scores were transformed into standard scores (T-scores), with a range of 0&#x2013;100.</p>
</sec>
</sec>
<sec id="sec8"><label>3.3.</label>
<title>Data analysis</title>
<p>Before examining DIF effects and possible causes within the Structural Equation Modeling (SEM) framework, the measurement model specification of each of the four AAT scales was examined. The following goodness of fit indices were used: the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Square Residual (SRMR). CFI and TLI values higher than 0.90 indicate an acceptable fit (with values &#x003E;0.95 being ideal), and RMSEA and SRMR values up to 0.08 indicate a reasonable fit (with values &#x003C;0.05 indicating an excellent fit (<xref ref-type="bibr" rid="ref14">Hu and Bentler, 1999</xref>).</p>
<p>Next, the MIMIC model approach was used to detect DIF items across the different AAT scales. The <italic>MIMIC model with scale purification</italic> (<italic>M-SP</italic>) method was used (<xref ref-type="bibr" rid="ref24">Wang and Shih, 2010</xref>) for each scale separately. In this approach, the direct effect of the grouping variable (e.g., gender) on an item response (e.g., AAT<italic>
<sub>i</sub>
</italic>) is estimated. In <xref rid="fig1" ref-type="fig">Figure 1</xref> (upper panel), this relationship is represented by a direct path from Gender to item AAT<italic>
<sub>i</sub>
</italic>. The direct effect represents the difference in item response between the two levels of the grouping variable (i.e., males vs. females) given the same scholastic achievement ability (latent variable). If the direct effect is significant, this indicates a DIF effect. The indirect effect is represented by a path from grouping variable to latent variable and indicates whether the mean of the latent variable across groups is different. The same procedure will be followed for all AAT items, one at a time. It should also be noted that Bonferroni correction will be adopted to control for the Type I error (<xref ref-type="bibr" rid="ref11">Dunn, 1961</xref>).</p>
<p>After identifying DIF items, the mediated MIMIC approach was used to uncover possible causes of the emerging DIF effects. As discussed earlier, a mediator (e.g., GAT score) can mediate the relationship between group membership (e.g., gender) and an item response (AAT<italic>
<sub>i</sub>
</italic>), conditioning on the latent trait (e.g., scholastic achievement). Therefore, when we fit a DIF item (found in the previous analysis step) in the mediation model, we obtain direct and indirect effects for each model. If the direct effect (from the grouping variable to the item) becomes non-significant when the mediator (i.e., GAT score) is taken into account in this relationship (from the grouping variable to the mediator and then to the item), we have full mediation (the indirect effect is significant). This means that the mediator fully explains the DIF effect. On the other hand, if the direct effect is still significant when the mediator is entered into the equation, and the indirect effect is significant, we have partial mediation. In this case, the mediator explains to some extent the DIF effect, but maybe additional mediators are needed to explain the causes of the DIF effect fully. All analyses were conducted using Mplus 8.03 (<xref ref-type="bibr" rid="ref17">Muth&#x00E9;n and Muth&#x00E9;n, 1998-2018</xref>).</p>
</sec>
</sec>
<sec sec-type="results" id="sec9"><label>4.</label>
<title>Results</title>
<p>First, the measurement model of each AAT scale (i.e., Biology, Chemistry, Physics, and Mathematics) was examined <italic>via</italic> CFA. A unidimensional structure for each scale was hypothesized. In <xref rid="tab1" ref-type="table">Table 1</xref>, the results from the CFA are reported. The results showed that all measurement models fit the data very well.</p>
<table-wrap position="float" id="tab1"><label>Table 1</label>
<caption>
<p>Model fit indices for AAT scales.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Scales</th>
<th align="center" valign="top">&#x03C7;<sup>2</sup></th>
<th align="center" valign="top"><italic>df</italic></th>
<th align="center" valign="top">CFI</th>
<th align="center" valign="top">TLI</th>
<th align="center" valign="top">RMSEA (95% CIs)</th>
<th align="center" valign="top">SRMR</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Biology</td>
<td align="left" valign="top">79.610<sup>&#x002A;</sup></td>
<td align="center" valign="top">54</td>
<td align="char" valign="top" char=".">0.973</td>
<td align="char" valign="top" char=".">0.967</td>
<td align="char" valign="top" char="(">0.019 (0.009&#x2013;0.028)</td>
<td align="char" valign="top" char=".">0.038</td>
</tr>
<tr>
<td align="left" valign="top">Chemistry</td>
<td align="left" valign="top">77.599<sup>&#x002A;&#x002A;</sup></td>
<td align="center" valign="top">35</td>
<td align="char" valign="top" char=".">0.984</td>
<td align="char" valign="top" char=".">0.980</td>
<td align="char" valign="top" char="(">0.031 (0.021&#x2013;0.040)</td>
<td align="char" valign="top" char=".">0.043</td>
</tr>
<tr>
<td align="left" valign="top">Physics</td>
<td align="left" valign="top">111.354<sup>&#x002A;&#x002A;</sup></td>
<td align="center" valign="top">35</td>
<td align="char" valign="top" char=".">0.924</td>
<td align="char" valign="top" char=".">0.903</td>
<td align="char" valign="top" char="(">0.041 (0.033&#x2013;0.050)</td>
<td align="char" valign="top" char=".">0.057</td>
</tr>
<tr>
<td align="left" valign="top">Mathematics</td>
<td align="left" valign="top">84.707<sup>&#x002A;&#x002A;</sup></td>
<td align="center" valign="top">54</td>
<td align="char" valign="top" char=".">0.985</td>
<td align="char" valign="top" char=".">0.981</td>
<td align="char" valign="top" char="(">0.021 (0.012&#x2013;0.029)</td>
<td align="char" valign="top" char=".">0.037</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>&#x03C7;</italic><sup>2</sup>, chi-square goodness of fit statistic; df, degrees of freedom; CFI, Comparative Fit Index; TLI, Tucker Lewis Index; RMSEA, Root Mean Square Error of Approximation; 95% CIs&#x2009;=&#x2009;95% Confidence Intervals; SRMR, Standardized Root Mean Square Residual. <sup>&#x002A;&#x002A;</sup> Models are significant at <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001; <sup>&#x002A;</sup> Models are significant at <italic>p</italic>&#x2009;&#x003C;&#x2009;0.01.</p>
</table-wrap-foot>
</table-wrap>
<p>Next, a MIMIC approach was applied to detecting uniform DIF items across gender for all AAT scales. During the process of identifying DIF items, every item within each scale was regressed on the grouping variable, with all other items presumed as non-DIF items and serving as the anchor set. In the grouping variable (i.e., gender), males were coded as 0 (the reference group) and females as 1 (the focal group). A negative z value indicates that males at the same level of scholastic achievement as females are more likely to respond to the item correctly. To identify potential DIF items, the following equation was applied:<disp-formula id="E1">
<mml:math id="M1">
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>&#x03BB;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2217;</mml:mo>
<mml:msub>
<mml:mi>&#x03B8;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula></p>
<p>Where,</p>
<p>&#x03A5;<italic>
<sub>ij</sub>
</italic>&#x2009;=&#x2009;the latent response for item <italic>j</italic> for participant <italic>i.</italic></p>
<p>&#x03BB;<italic>
<sub>j</sub>
</italic>&#x2009;=&#x2009;the factor loading of item <italic>j.</italic></p>
<p>&#x03B8;<italic>
<sub>i</sub>
</italic>&#x2009;=&#x2009;the latent ability of the participant <italic>i.</italic></p>
<p>z<italic>
<sub>i</sub>
</italic>&#x2009;=&#x2009;the grouping indicator of the participant <italic>i.</italic></p>
<p>&#x03B2;<italic>
<sub>j</sub>
</italic>&#x2009;=&#x2009;the regression coefficient of the corresponding grouping variable, and.</p>
<p>e<italic>
<sub>ij</sub>
</italic>&#x2009;=&#x2009;the random error term.</p>
<p>If &#x03B2;<italic>
<sub>j</sub>
</italic> is non-significant, then item <italic>j</italic> is the same across groups of variable z<italic>
<sub>i</sub>
</italic>. However, if &#x03B2;<italic>
<sub>j</sub>
</italic> is significant, it designates a difference in the response probabilities across groups of variable z<italic>
<sub>i</sub>
</italic>, designating a DIF item. Practically, DIF is detected when the direct relationship between the group variable (gender) and the item in question is statistically significant. It should be noted that the Benjamini-Hochberg correction was applied to control for false discovery rate (<xref ref-type="bibr" rid="ref3">Benjamini and Hochberg, 1995</xref>). <xref rid="tab2" ref-type="table">Table 2</xref> presents the results from the DIF analysis.</p>
<table-wrap position="float" id="tab2"><label>Table 2</label>
<caption>
<p>MIMIC examination for DIF across gender.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top">Items</th>
<th align="center" valign="top">Estimate (&#x03B2;)</th>
<th align="center" valign="top">S.E.</th>
<th align="center" valign="top"><italic>z</italic> value</th>
<th align="center" valign="top"><italic>p</italic>-value</th>
</tr>
<tr>
<th align="center" valign="top" colspan="5">Biology scale</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Bio1</td>
<td align="char" valign="top" char=".">&#x2212;0.044</td>
<td align="char" valign="top" char=".">0.036</td>
<td align="char" valign="top" char=".">&#x2212;1.226</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio2</td>
<td align="char" valign="top" char=".">&#x2212;0.008</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;0.255</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio3</td>
<td align="char" valign="top" char=".">0.042</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">1.333</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio4</td>
<td align="char" valign="top" char=".">0.028</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">0.867</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio5</td>
<td align="char" valign="top" char=".">&#x2212;0.055</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;1.697</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio6</td>
<td align="char" valign="top" char=".">&#x2212;0.013</td>
<td align="char" valign="top" char=".">0.033</td>
<td align="char" valign="top" char=".">&#x2212;0.405</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio7</td>
<td align="char" valign="top" char=".">&#x2212;0.099</td>
<td align="char" valign="top" char=".">0.034</td>
<td align="char" valign="top" char=".">&#x2212;2.888</td>
<td align="center" valign="top">0.004</td>
</tr>
<tr>
<td align="left" valign="top">Bio8</td>
<td align="char" valign="top" char=".">0.095</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">3.027</td>
<td align="center" valign="top">0.002</td>
</tr>
<tr>
<td align="left" valign="top">Bio9</td>
<td align="char" valign="top" char=".">0.064</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">2.015</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio10</td>
<td align="char" valign="top" char=".">&#x2212;0.088</td>
<td align="char" valign="top" char=".">0.037</td>
<td align="char" valign="top" char=".">&#x2212;2.393</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio11</td>
<td align="char" valign="top" char=".">0.034</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">1.059</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Bio12</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">0.974</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5">Chemistry scale</td>
</tr>
<tr>
<td align="left" valign="top">Chem13</td>
<td align="char" valign="top" char=".">0.116</td>
<td align="char" valign="top" char=".">0.036</td>
<td align="char" valign="top" char=".">3.214</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Chem14</td>
<td align="char" valign="top" char=".">&#x2212;0.012</td>
<td align="char" valign="top" char=".">0.033</td>
<td align="char" valign="top" char=".">&#x2212;0.372</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Chem15</td>
<td align="char" valign="top" char=".">&#x2212;0.121</td>
<td align="char" valign="top" char=".">0.037</td>
<td align="char" valign="top" char=".">&#x2212;3.316</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Chem16</td>
<td align="char" valign="top" char=".">0.050</td>
<td align="char" valign="top" char=".">0.038</td>
<td align="char" valign="top" char=".">1.322</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Chem17</td>
<td align="char" valign="top" char=".">0.046</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">1.481</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Chem18</td>
<td align="char" valign="top" char=".">&#x2212;0.100</td>
<td align="char" valign="top" char=".">0.034</td>
<td align="char" valign="top" char=".">&#x2212;2.910</td>
<td align="center" valign="top">0.004</td>
</tr>
<tr>
<td align="left" valign="top">Chem19</td>
<td align="char" valign="top" char=".">0.065</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">2.0101</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Chem20</td>
<td align="char" valign="top" char=".">&#x2212;0.080</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;2.456</td>
<td align="center" valign="top">0.014</td>
</tr>
<tr>
<td align="left" valign="top">Chem21</td>
<td align="char" valign="top" char=".">&#x2212;0.022</td>
<td align="char" valign="top" char=".">0.033</td>
<td align="char" valign="top" char=".">&#x2212;0.668</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Chem22</td>
<td align="char" valign="top" char=".">0.023</td>
<td align="char" valign="top" char=".">0.034</td>
<td align="char" valign="top" char=".">0.067</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5">Physics scale</td>
</tr>
<tr>
<td align="left" valign="top">Phys23</td>
<td align="char" valign="top" char=".">0.056</td>
<td align="char" valign="top" char=".">0.034</td>
<td align="char" valign="top" char=".">1.638</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Phys24</td>
<td align="char" valign="top" char=".">0.018</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">0.570</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Phys25</td>
<td align="char" valign="top" char=".">&#x2212;0.166</td>
<td align="char" valign="top" char=".">0.040</td>
<td align="char" valign="top" char=".">&#x2212;4.114</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys26</td>
<td align="char" valign="top" char=".">&#x2212;0.177</td>
<td align="char" valign="top" char=".">0.041</td>
<td align="char" valign="top" char=".">&#x2212;4.330</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys27</td>
<td align="char" valign="top" char=".">&#x2212;0.083</td>
<td align="char" valign="top" char=".">0.045</td>
<td align="char" valign="top" char=".">&#x2212;1.845</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Phys28</td>
<td align="char" valign="top" char=".">&#x2212;0.117</td>
<td align="char" valign="top" char=".">0.037</td>
<td align="char" valign="top" char=".">&#x2212;3.199</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys29</td>
<td align="char" valign="top" char=".">0.140</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">4.409</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys30</td>
<td align="char" valign="top" char=".">&#x2212;0.048</td>
<td align="char" valign="top" char=".">0.035</td>
<td align="char" valign="top" char=".">&#x2212;1.371</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Phys31</td>
<td align="char" valign="top" char=".">0.186</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">5.921</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys32</td>
<td align="char" valign="top" char=".">&#x2212;0.063</td>
<td align="char" valign="top" char=".">0.037</td>
<td align="char" valign="top" char=".">&#x2212;1.689</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5">Mathematics scale</td>
</tr>
<tr>
<td align="left" valign="top">Math33</td>
<td align="char" valign="top" char=".">0.023</td>
<td align="char" valign="top" char=".">0.033</td>
<td align="char" valign="top" char=".">0.0700</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math34</td>
<td align="char" valign="top" char=".">&#x2212;0.128</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">&#x2212;4.143</td>
<td align="center" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Math35</td>
<td align="char" valign="top" char=".">&#x2212;0.068</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;2.146</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math36</td>
<td align="char" valign="top" char=".">0.077</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">2.391</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math37</td>
<td align="char" valign="top" char=".">&#x2212;0.042</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;1.319</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math38</td>
<td align="char" valign="top" char=".">&#x2212;0.008</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;0.258</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math39</td>
<td align="char" valign="top" char=".">&#x2212;0.023</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;0.718</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math40</td>
<td align="char" valign="top" char=".">&#x2212;0.029</td>
<td align="char" valign="top" char=".">0.032</td>
<td align="char" valign="top" char=".">&#x2212;0.913</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math41</td>
<td align="char" valign="top" char=".">0.052</td>
<td align="char" valign="top" char=".">0.031</td>
<td align="char" valign="top" char=".">1.706</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math42</td>
<td align="char" valign="top" char=".">0.023</td>
<td align="char" valign="top" char=".">0.041</td>
<td align="char" valign="top" char=".">0.552</td>
<td align="center" valign="top">ns</td>
</tr>
<tr>
<td align="left" valign="top">Math43</td>
<td align="char" valign="top" char=".">0.085</td>
<td align="char" valign="top" char=".">0.030</td>
<td align="char" valign="top" char=".">2.784</td>
<td align="center" valign="top">0.005</td>
</tr>
<tr>
<td align="left" valign="top">Math44</td>
<td align="char" valign="top" char=".">0.012</td>
<td align="char" valign="top" char=".">0.033</td>
<td align="char" valign="top" char=".">0.375</td>
<td align="center" valign="top">ns</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Bio, Biology; Chem, Chemistry; Phys, Physics; Math, Mathematics.</p>
</table-wrap-foot>
</table-wrap>
<p>The analysis uncovered 13 DIF items. For example, in the Biology scale, items 7 and 8 were detected as DIF items. In item 7, the z value (&#x2212;2.888) indicates that controlling for scholastic achievement, a male participant is more likely to respond correctly than a female participant. In item 8, on the other hand, the positive z value indicates that female participants are more likely to respond correctly than male participants, although they are at the same level of scholastic achievement.</p>
<p>After this step, the mediated MIMIC approach was applied in an attempt to understand what causes DIF in these items. It was hypothesized that general cognitive ability (i.e., GAT) could be a mediator that mediates the relationship between the grouping variable and the response to a specific item. <xref rid="tab3" ref-type="table">Table 3</xref> presents the results of the mediation analysis within a MIMIC model.</p>
<table-wrap position="float" id="tab3"><label>Table 3</label>
<caption>
<p>Direct and indirect (mediation) effects for DIF items.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Item</th>
<th align="center" valign="top">Direct effect</th>
<th align="center" valign="top"><italic>p</italic>-value</th>
<th align="center" valign="top">Indirect effect</th>
<th align="center" valign="top"><italic>p</italic>-value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Bio7</td>
<td align="char" valign="top" char=".">&#x2212;0.107</td>
<td align="char" valign="top" char=".">0.002</td>
<td align="char" valign="top" char=".">0.019</td>
<td align="char" valign="top" char=".">0.006</td>
</tr>
<tr>
<td align="left" valign="top">Bio8</td>
<td align="char" valign="top" char=".">0.090</td>
<td align="char" valign="top" char=".">0.005</td>
<td align="char" valign="top" char=".">0.022</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Chem 13</td>
<td align="char" valign="top" char=".">0.113</td>
<td align="char" valign="top" char=".">0.003</td>
<td align="char" valign="top" char=".">0.039</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Chem 15</td>
<td align="char" valign="top" char=".">&#x2212;0.092</td>
<td align="char" valign="top" char=".">0.017</td>
<td align="char" valign="top" char=".">0.026</td>
<td align="char" valign="top" char=".">0.004</td>
</tr>
<tr>
<td align="left" valign="top">Chem 18</td>
<td align="char" valign="top" char=".">&#x2212;0.056</td>
<td align="char" valign="top" char=".">0.121</td>
<td align="char" valign="top" char=".">0.040</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Chem 20</td>
<td align="char" valign="top" char=".">&#x2212;0.048</td>
<td align="char" valign="top" char=".">0.155</td>
<td align="char" valign="top" char=".">0.033</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys25</td>
<td align="char" valign="top" char=".">&#x2212;0.166</td>
<td align="char" valign="top" char=".">0.001</td>
<td align="char" valign="top" char=".">0.023</td>
<td align="char" valign="top" char=".">0.003</td>
</tr>
<tr>
<td align="left" valign="top">Phys26</td>
<td align="char" valign="top" char=".">&#x2212;0.178</td>
<td align="char" valign="top" char=".">0.001</td>
<td align="char" valign="top" char=".">&#x2212;0.003</td>
<td align="char" valign="top" char=".">0.595</td>
</tr>
<tr>
<td align="left" valign="top">Phys28</td>
<td align="char" valign="top" char=".">&#x2212;0.114</td>
<td align="char" valign="top" char=".">0.002</td>
<td align="char" valign="top" char=".">0.036</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys29</td>
<td align="char" valign="top" char=".">0.142</td>
<td align="char" valign="top" char=".">0.001</td>
<td align="char" valign="top" char=".">0.029</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Phys31</td>
<td align="char" valign="top" char=".">0.190</td>
<td align="char" valign="top" char=".">0.001</td>
<td align="char" valign="top" char=".">0.020</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Math34</td>
<td align="char" valign="top" char=".">&#x2212;0.136</td>
<td align="char" valign="top" char=".">0.001</td>
<td align="char" valign="top" char=".">0.021</td>
<td align="char" valign="top" char=".">0.002</td>
</tr>
<tr>
<td align="left" valign="top">Math43</td>
<td align="char" valign="top" char=".">0.081</td>
<td align="char" valign="top" char=".">0.010</td>
<td align="char" valign="top" char=".">0.025</td>
<td align="char" valign="top" char=".">0.001</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Bio, Biology; Chem, Chemistry; Phys, Physics; Math, Mathematics.</p>
</table-wrap-foot>
</table-wrap>
<p>The results showed that cognitive ability seems to be a factor that could explain why an AAT item exhibits DIF across gender. GAT fully explains the DIF effect in two AAT items (i.e., Chem18 and Chem20) since the direct effect is no longer significant after the mediator enters the equation (full mediation). In both cases, the effect of the GAT score on the probability of correct response is positive (a<sub>7</sub>&#x2009;=&#x2009;0.323, SE&#x2009;=&#x2009;0.048, z&#x2009;=&#x2009;6.723, <italic>p</italic>&#x2009;=&#x2009;0.001, and a<sub>8</sub>&#x2009;=&#x2009;0.265, SE&#x2009;=&#x2009;0.034, z&#x2009;=&#x2009;6.074, p&#x2009;=&#x2009;0.001, respectively). This means that the higher the GAT score, the higher the probability of answering the item correctly. However, the direct effect on both items is negative (&#x03B2;<sub>7</sub>&#x2009;=&#x2009;&#x2212;0.056, SE&#x2009;=&#x2009;0.036, <italic>p</italic>&#x2009;=&#x2009;0.121, and &#x03B2;<sub>8</sub>&#x2009;=&#x2009;&#x2212;0.048, SE&#x2009;=&#x2009;0.034, <italic>p</italic>&#x2009;=&#x2009;0.155). This finding suggests that females with the same GAT score are less likely to answer this item correctly compared to males.</p>
<p>In most other cases, GAT helps account for only a proportion of the DIF effect (partial mediation). Obviously, additional factors intervene in the relationship between gender and answering an item correctly and cause DIF effects. Only in one case (i.e., Phys26) could GAT not explain why male participants are more likely to respond correctly to this item than female participants, although both are at the same underlying level of cognitive ability. Interestingly, males were more likely to respond correctly to some items than females (i.e., Bio7, Chem15, Chem18, Chem20, Phys28, and Math34). But when the GAT score was taken into account (i.e., as a mediator), the probability of correctly answering these items was higher for females than for males.</p>
</sec>
<sec sec-type="discussion" id="sec10"><label>5.</label>
<title>Discussion</title>
<p>The aim of this study was twofold: first, to examine whether there are gender differences in the probability of correctly answering an item of the AAT. In other words, whether there are DIF items in terms of gender. Second, to understand the underlying mechanism that causes these DIF effects. The first aim, detecting DIF items, was examined <italic>via</italic> a MIMIC approach. MIMIC models have been used extensively for identifying items with DIF (<xref ref-type="bibr" rid="ref16">Muth&#x00E8;n, 1985</xref>) since it has been found that they work equally well with other methods (<xref ref-type="bibr" rid="ref25">Woods, 2009</xref>). This study used a MIMIC model to detect possible DIF items across gender for a scholastic achievement test (i.e., AAT). The analysis revealed that 13 AAT items exhibited DIF across gender (i.e., two from the Biology scale, four from the Chemistry scale, five from the Physics scale, and two from the Mathematics scale). Furthermore, in most (9 out of 13), male participants were more likely to answer the items correctly than their female counterparts.</p>
<p>The second aim of this study, to uncover possible causes of DIF, was examined <italic>via</italic> the mediated MIMIC approach. Mediation analysis is a statistical method that provides a framework for understanding why certain phenomena in the relationship among variables occur. Using this analysis within a MIMIC model for detecting DIF, we can explore possible underlying mechanisms that explain these DIF effects. It was hypothesized that general cognitive ability, as measured by the General Aptitude Test (GAT), could mediate the relationship between the grouping variable (e.g., gender) and the response to a specific item. If a mediation effect exists, we can explain why a DIF effect occurs, depending on the Type of mediation (full or partial).</p>
<p>The results from this study showed that general cognitive ability fully explains the DIF effect in two AAT items (i.e., Chem18 and Chem20). In most other cases, GAT helps account for only a proportion of the DIF effect (partial mediation). It seems that additional factors intervene in the relationship between gender and answering an item correctly and cause DIF effects. Interestingly, from all detected DIF items, only for one item (Phys26), GAT could not explain why the DIF effect occurred.</p>
<p>This study offers valuable information regarding DIF effects and the possible causes of these effects. Using the MIMIC approach, DIF effects were examined within the mediation analysis framework. As a result, it was revealed that general cognitive ability mediates the relationship between gender and the probability of success in an item and provides a context for understanding the underlying mechanism of why DIF effects occurred. Therefore, this study will help experts improve the quality of their instruments by identifying DIF items and deciding how to revise them, considering the mediator&#x2019;s effect on participants&#x2019; responses. Taking the Biology scale as an example, when Subject Matter Experts (SMEs) are asked to generate items, they should pay careful attention to producing items that are purely related to specific knowledge (i.e., physics) rather than general cognitive ability.</p>
<p>The present study also has certain limitations. First, only GAT scores were available as potential mediators. Future studies should explore the role of other variables, including cognitive (e.g., GPA) and emotional (e.g., self-efficacy) constructs, that could be used to explain the emergence of DIF effects. Second, only gender was examined as a potential grouping variable. In future studies, additional variables (e.g., Type of school: public vs. private) could be examined as potential causes of DIF. Finally, in this study, only uniform DIF was investigated. We would like to expand this approach to examine also non-uniform DIF effects. This type of DIF examines whether an item discriminates differently between the groups in question. Thus, important information about non-uniform DIF effects could be revealed by conceptualizing DIF within the context of moderated mediation analysis (<xref ref-type="bibr" rid="ref001">Montoya and Jeon, 2020</xref>).</p>
</sec>
<sec sec-type="data-availability" id="sec11">
<title>Data availability statement</title>
<p>The data analyzed in this study is subject to the following licenses/restrictions: The data that supports the findings of this study are available from the Education and Training Evaluation Commission (ETEC). Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the authors upon reasonable request and with the permission of the ETEC. Requests to access these datasets should be directed to MA, <email>m.ahmadi@etec.gov.sa</email>.</p>
</sec>
<sec sec-type="author-contributions" id="sec12">
<title>Author contributions</title>
<p>IT: Conceptualization, Formal analysis, Methodology, Writing &#x2013; original draft, Writing &#x2013; review &#x0026; editing. MA: Methodology, Writing &#x2013; review &#x0026; editing. HA: Data curation, Writing &#x2013; review &#x0026; editing.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="sec13">
<title>Funding</title>
<p>The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Education &#x0026; Training Evaluation Commission (ETEC).</p>
</sec>
<sec sec-type="COI-statement" id="sec14">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="sec15">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1268074/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1268074/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.DOCX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ackerman</surname> <given-names>T. A.</given-names></name> <name><surname>Evans</surname> <given-names>J. A.</given-names></name></person-group> (<year>1994</year>). <article-title>The Influence of Conditioning Scores In Performing DIF Analyses</article-title>. <source>Applied Psychological Measurement</source> <volume>18</volume>, <fpage>329</fpage>&#x2013;<lpage>342</lpage>.</citation></ref>
<ref id="ref2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baron</surname> <given-names>R. M.</given-names></name> <name><surname>Kenny</surname> <given-names>D. A.</given-names></name></person-group> (<year>1986</year>). <article-title>The moderator&#x2013;mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations</article-title>. <source>J. Pers. Soc. Psychol.</source> <volume>51</volume>, <fpage>1173</fpage>&#x2013;<lpage>1182</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0022-3514.51.6.1173</pub-id></citation></ref>
<ref id="ref3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benjamini</surname> <given-names>Y.</given-names></name> <name><surname>Hochberg</surname> <given-names>Y.</given-names></name></person-group> (<year>1995</year>). <article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>. <source>J. R. Stat. Soc. Ser. B</source> <volume>57</volume>, <fpage>289</fpage>&#x2013;<lpage>300</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.2517-6161.1995.tb02031.x</pub-id></citation></ref>
<ref id="ref5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Camilli</surname> <given-names>G.</given-names></name> <name><surname>Shepard</surname> <given-names>L. A.</given-names></name></person-group> (<year>1994</year>). <source>Methods for identifying biased test items</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Sage</publisher-name>.</citation></ref>
<ref id="ref6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chan</surname> <given-names>D.</given-names></name></person-group> (<year>2000</year>). <article-title>Detection of differential item functioning on the Kirton adaption-innovation inventory using multiple-group mean and covariance structure analyses</article-title>. <source>Multivar. Behav. Res.</source> <volume>35</volume>, <fpage>169</fpage>&#x2013;<lpage>199</lpage>. doi: <pub-id pub-id-type="doi">10.1207/S15327906MBR3502_2</pub-id>, PMID: <pub-id pub-id-type="pmid">26754082</pub-id></citation></ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Shao</surname> <given-names>C.</given-names></name> <name><surname>Lathrop</surname> <given-names>Q. N.</given-names></name></person-group> (<year>2016</year>). <article-title>The mediated MIMIC model for understanding the underlying mechanism of DIF</article-title>. <source>Educ. Psychol. Meas.</source> <volume>76</volume>, <fpage>43</fpage>&#x2013;<lpage>63</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0013164415576187</pub-id>, PMID: <pub-id pub-id-type="pmid">29795856</pub-id></citation></ref>
<ref id="ref8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chun</surname> <given-names>S.</given-names></name> <name><surname>Stark</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>E. S.</given-names></name> <name><surname>Chernyshenko</surname> <given-names>O. S.</given-names></name></person-group> (<year>2016</year>). <article-title>MIMIC methods for detecting DIF among multiple groups: exploring a new sequential-free baseline procedure</article-title>. <source>Appl. Psychol. Meas.</source> <volume>40</volume>, <fpage>486</fpage>&#x2013;<lpage>499</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0146621616659738</pub-id>, PMID: <pub-id pub-id-type="pmid">29881065</pub-id></citation></ref>
<ref id="ref10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dorans</surname> <given-names>N. J.</given-names></name> <name><surname>Holland</surname> <given-names>P. W.</given-names></name></person-group> (<year>1993</year>). &#x201C;<article-title>DIF detection and description: mantel-Haenszel and standardization</article-title>&#x201D; in <source>Differential item functioning</source>. eds. <person-group person-group-type="editor"><name><surname>Holland</surname> <given-names>P. W.</given-names></name> <name><surname>Wainer</surname> <given-names>H.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum</publisher-name>), <fpage>35</fpage>&#x2013;<lpage>66</lpage>.</citation></ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dunn</surname> <given-names>J. O.</given-names></name></person-group> (<year>1961</year>). <article-title>Multiple comparisons among means</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>56</volume>, <fpage>52</fpage>&#x2013;<lpage>64</lpage>. doi: <pub-id pub-id-type="doi">10.1080/01621459.1961.10482090</pub-id></citation></ref>
<ref id="ref12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Else-Quest</surname> <given-names>N. M.</given-names></name> <name><surname>Hyde</surname> <given-names>J. S.</given-names></name> <name><surname>Linn</surname> <given-names>M. C.</given-names></name></person-group> (<year>2010</year>). <article-title>Cross-national patterns of gender differences in mathematics: a meta-analysis</article-title>. <source>Psychol. Bull.</source> <volume>136</volume>, <fpage>103</fpage>&#x2013;<lpage>127</lpage>. doi: <pub-id pub-id-type="doi">10.1037/a0018053</pub-id>, PMID: <pub-id pub-id-type="pmid">20063928</pub-id></citation></ref>
<ref id="ref13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Holland</surname> <given-names>P. W.</given-names></name> <name><surname>Thayer</surname> <given-names>D. T.</given-names></name></person-group> (<year>1988</year>). &#x201C;<article-title>Differential item performance and the mantel-Haenszel procedure</article-title>&#x201D; in <source>Test validity</source>. eds. <person-group person-group-type="editor"><name><surname>Wainer</surname> <given-names>H.</given-names></name> <name><surname>Braun</surname> <given-names>H. I.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum</publisher-name>), <fpage>129</fpage>&#x2013;<lpage>145</lpage>.</citation></ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>L. T.</given-names></name> <name><surname>Bentler</surname> <given-names>P. M.</given-names></name></person-group> (<year>1999</year>). <article-title>Cut-off criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives</article-title>. <source>Struct. Equ. Model.</source> <volume>6</volume>, <fpage>1</fpage>&#x2013;<lpage>55</lpage>. doi: <pub-id pub-id-type="doi">10.1080/10705519909540118</pub-id></citation></ref>
<ref id="ref15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacIntosh</surname> <given-names>R.</given-names></name> <name><surname>Hashim</surname> <given-names>S.</given-names></name></person-group> (<year>2003</year>). <article-title>Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis</article-title>. <source>Appl. Psychol. Meas.</source> <volume>27</volume>, <fpage>372</fpage>&#x2013;<lpage>379</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0146621603256021</pub-id></citation></ref>
<ref id="ref001">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Montoya</surname> <given-names>A. K.</given-names></name> <name><surname>Jeon</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>MIMIC Models for Uniform and Nonuniform DIF as Moderated Mediation Models</article-title>. <source>Applied psychological measurement</source> <volume>44</volume>, <fpage>118</fpage>&#x2013;<lpage>136</lpage>. </citation></ref>
<ref id="ref002">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muth&#x00E9;n</surname> <given-names>B. O.</given-names></name></person-group> (<year>1989</year>). <article-title>Latent variable modeling in heterogeneous populations</article-title>. <source>Psychometrika</source> <volume>54</volume>, <fpage>557</fpage>&#x2013;<lpage>585</lpage>.</citation></ref>
<ref id="ref16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muth&#x00E8;n</surname> <given-names>B. O.</given-names></name></person-group> (<year>1985</year>). <article-title>A method for studying the homogeneity of test items with respect to other relevant variables</article-title>. <source>J. Educ. Stat.</source> <volume>10</volume>, <fpage>121</fpage>&#x2013;<lpage>132</lpage>. doi: <pub-id pub-id-type="doi">10.3102/10769986010002121</pub-id></citation></ref>
<ref id="ref17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Muth&#x00E9;n</surname> <given-names>L. K.</given-names></name> <name><surname>Muth&#x00E9;n</surname> <given-names>B. O.</given-names></name></person-group> (<year>1998-2018</year>). <source>Mplus User&#x2019;s Guide</source>. <edition>8th</edition> Edn. <publisher-loc>Los Angeles, CA</publisher-loc>: <publisher-name>Muth&#x00E9;n &#x0026; Muth&#x00E9;n</publisher-name>.</citation></ref>
<ref id="ref18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pae</surname> <given-names>T. I.</given-names></name> <name><surname>Park</surname> <given-names>G. P.</given-names></name></person-group> (<year>2006</year>). <article-title>Examining the relationship between differential item functioning and differential test functioning</article-title>. <source>Lang. Test.</source> <volume>23</volume>, <fpage>475</fpage>&#x2013;<lpage>496</lpage>. doi: <pub-id pub-id-type="doi">10.1191/0265532206lt338oa</pub-id></citation></ref>
<ref id="ref19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Raykov</surname> <given-names>T.</given-names></name> <name><surname>Marcoulides</surname> <given-names>G. A.</given-names></name></person-group> (<year>2011</year>). <source>Introduction to psychometric theory</source> <publisher-name>Routledge</publisher-name>.</citation></ref>
<ref id="ref20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shealy</surname> <given-names>R. T.</given-names></name> <name><surname>Stout</surname> <given-names>W. F.</given-names></name></person-group> (<year>1993</year>). &#x201C;<article-title>An item response theory model for test bias and differential item functioning</article-title>&#x201D; in <source>Differential item functioning</source>. eds. <person-group person-group-type="editor"><name><surname>Holland</surname> <given-names>P. W.</given-names></name> <name><surname>Wainer</surname> <given-names>H.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum Associates</publisher-name>), <fpage>197</fpage>&#x2013;<lpage>240</lpage>.</citation></ref>
<ref id="ref21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swaminathan</surname> <given-names>H.</given-names></name> <name><surname>Rogers</surname> <given-names>H. J.</given-names></name></person-group> (<year>1990</year>). <article-title>Detecting item bias using logistic regression procedures</article-title>. <source>J. Educ. Meas.</source> <volume>27</volume>, <fpage>361</fpage>&#x2013;<lpage>370</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1745-3984.1990.tb00754.x</pub-id></citation></ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thissen</surname> <given-names>D.</given-names></name> <name><surname>Steinberg</surname> <given-names>L.</given-names></name> <name><surname>Gerrard</surname> <given-names>M.</given-names></name></person-group> (<year>1986</year>). <article-title>Beyond group-mean differences: the concept of item bias</article-title>. <source>Psychol. Bull.</source> <volume>99</volume>, <fpage>118</fpage>&#x2013;<lpage>128</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0033-2909.99.1.118</pub-id></citation></ref>
<ref id="ref23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Thissen</surname> <given-names>D.</given-names></name> <name><surname>Steinberg</surname> <given-names>L.</given-names></name> <name><surname>Wainer</surname> <given-names>H.</given-names></name></person-group> (<year>1993</year>). &#x201C;<article-title>Detection of differential item functioning using the parameters of item response models</article-title>&#x201D; in <source>Differential item functioning</source>. eds. <person-group person-group-type="editor"><name><surname>Holland</surname> <given-names>P. W.</given-names></name> <name><surname>Wainer</surname> <given-names>H.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum Associates</publisher-name>), <fpage>67</fpage>&#x2013;<lpage>113</lpage>.</citation></ref>
<ref id="ref003">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Voyer</surname> <given-names>D.</given-names></name> <name><surname>Voyer</surname> <given-names>S. D.</given-names></name></person-group> (<year>2014</year>). <article-title>Gender differences in scholastic achievement: a meta-analysis</article-title>. <source>Psychological Bulletin</source> <volume>140</volume>, <fpage>1174</fpage>&#x2013;<lpage>1204</lpage>. doi: <pub-id pub-id-type="doi">10.1037/a0036620</pub-id></citation></ref>
<ref id="ref24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>W. C.</given-names></name> <name><surname>Shih</surname> <given-names>C. L.</given-names></name></person-group> (<year>2010</year>). <article-title>MIMIC methods for assessing differential item functioning in polytomous items</article-title>. <source>Appl. Psychol. Meas.</source> <volume>34</volume>, <fpage>166</fpage>&#x2013;<lpage>180</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0146621609355279</pub-id></citation></ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Woods</surname> <given-names>C. M.</given-names></name></person-group> (<year>2009</year>). <article-title>Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis</article-title>. <source>Multivar. Behav. Res.</source> <volume>44</volume>, <fpage>1</fpage>&#x2013;<lpage>27</lpage>. doi: <pub-id pub-id-type="doi">10.1080/00273170802620121</pub-id>, PMID: <pub-id pub-id-type="pmid">26795105</pub-id></citation></ref>
</ref-list>
</back>
</article>