<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">748884</article-id>
<article-id pub-id-type="doi">10.3389/feduc.2021.748884</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Investigating the Distractors to Explain DIF Effects Across Gender in Large-Scale Tests With Non-Linear Logistic Regression Models</article-title>
<alt-title alt-title-type="left-running-head">Ozdemir and AlGhamdi</alt-title>
<alt-title alt-title-type="right-running-head">Differential Item and Distractor Functioning</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Ozdemir</surname>
<given-names>Burhanettin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1420265/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>AlGhamdi</surname>
<given-names>Hanan M.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>
<institution>Department of Mathematics and Sciences, College of Humanities and Sciences, Prince Sultan University</institution>, <addr-line>Riyadh</addr-line>, <country>Saudi Arabia</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>
<institution>National Center for Assessment, Education and Training Evaluation Commission (ETEC)</institution>, <addr-line>Riyadh</addr-line>, <country>Saudi Arabia</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/714057/overview">Sedat Sen</ext-link>, Harran University, Turkey</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/779021/overview">Shawna Goodrich</ext-link>, Department of National Defence, Canada</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1004241/overview">Murat Dogan Sahin</ext-link>, Anadolu University, Turkey</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Burhanettin Ozdemir, <email>bozdemir@psu.edu.sa</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Assessment, Testing and Applied Measurement, a section of the journal Frontiers in Education</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>01</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>6</volume>
<elocation-id>748884</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Ozdemir and AlGhamdi.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Ozdemir and AlGhamdi</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>The purpose of this study is to examine the distractors of items that exhibit differential item functioning (DIF) across gender to explain the possible sources of DIF in the context of large-scale tests. To this end, two non-linear logistic regression (NLR) models-based DIF methods (three parameters, 3PL-NLR and four-parameter, 4PL-NLR) were first used to detect DIF items, and the Mantel-Haenszel Delta (MH-Delta) DIF method was used to calculate the DIF effect size for each DIF item. Then, the multinomial log-linear regression (MLR) model and 2-PL nested logit model (2PL-NLM) were applied to items exhibiting DIF with moderate and large DIF effect sizes. The ultimate goals are (a) to examine behaviors of distractors across gender and (b) to investigate if distractors have any impact on DIF effects. DIF results of the Art Section of the General Aptitude Test (GAT-ART) based on both 3PL-NLR and 4PL-NLR methods indicate that only 10 DIF items had moderate to large DIF effects sizes. According to MLR differential distractor functioning (DDF) results, all items exhibited DDF across gender except for one item. An interesting finding of this study is that DIF items related to the <italic>verbal analogy</italic> and <italic>context analysis</italic> were in favor of female students, while all DIF items related to the <italic>reading comprehension</italic> subdomain were in favor of male students, which may signal the existence of content specific DIF or true ability difference across gender. DDF results show that distractors have a significant effect on DIF results. Therefore, DDF analysis is suggested along with DIF analysis since it signals the possible causes of&#x20;DIF.</p>
</abstract>
<kwd-group>
<kwd>large-scale tests</kwd>
<kwd>item bias</kwd>
<kwd>DIF</kwd>
<kwd>differential distractor functioning</kwd>
<kwd>distractor analysis</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Many types of research have been carried out to determine the validity and reliability of large-scale assessments because the performance of examinees on these tests has a critical impact on their educational admissions and future careers. For that reason, validity is a core feature of any kind of assessment assumed to be accurate and fair (<xref ref-type="bibr" rid="B11">Bond et&#x20;al., 2003</xref>; <xref ref-type="bibr" rid="B36">Jamalzadeh et&#x20;al., 2021</xref>). Therefore, the goal of test developers and testing companies is to increase the validity and reliability of tests by decreasing any types of confounding factors and errors to ensure fairness across different subgroups.</p>
<p>As large-scale tests are used to make high-stakes decisions for test-takers, they require comprehensive and careful examination (<xref ref-type="bibr" rid="B60">Shohamy, 2001</xref>; <xref ref-type="bibr" rid="B62">Stobart, 2005</xref>; <xref ref-type="bibr" rid="B73">Weir, 2005</xref>; <xref ref-type="bibr" rid="B21">Fulcher and Davidson, 2007</xref>). Examining the factorial structure of tests, investigating the differential item functioning (DIF) across subgroups, investigating the behavior of distractors, and determining what causes these confounding factors serve the purpose of increasing the validity of and fairness of score inferences. Additionally, the comparison among subgroups, such as gender or nationality groups, on the underlying construct is necessary for fairness purposes.</p>
<p>The fundamental structure that underlies the scale across groups usually requires examination of DIF under the umbrella of confirmatory factor analysis (CFA) and item response theory (IRT) frameworks (<xref ref-type="bibr" rid="B80">Dimitrov, 2017</xref>). An item is flagged as exhibiting DIF if students from different subgroups with the same ability level have a different probability of answering an item correctly (<xref ref-type="bibr" rid="B28">Hambleton and Rogers, 1989</xref>; <xref ref-type="bibr" rid="B13">Camilli and Shepard, 1994</xref>; <xref ref-type="bibr" rid="B22">Fulcher and Davidson, 2013</xref>).</p>
<p>DIF analysis has been employed in various contexts concerning different aspects of the test conditions. They are mainly used to investigate the equity and test fairness across gender or race groups, the existence of unfair content concerning the examinees&#x2019; background, appropriateness of selection procedures, adequacy of criterion being used, atmosphere effect, and testing conditions (<xref ref-type="bibr" rid="B67">Takala and Kaftandjieva, 2000</xref>; <xref ref-type="bibr" rid="B40">Kim, 2001</xref>; <xref ref-type="bibr" rid="B54">Pae, 2004</xref>; <xref ref-type="bibr" rid="B37">Karami, 2011</xref>; <xref ref-type="bibr" rid="B35">Jalili et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B71">Walker and G&#xf6;&#xe7;er, 2020</xref>).</p>
<p>Many different methods are used to detect DIF items (<xref ref-type="bibr" rid="B41">Kim et&#x20;al., 2007</xref>; <xref ref-type="bibr" rid="B43">Loken and Rulison, 2010</xref>; <xref ref-type="bibr" rid="B47">Magis and De Boeck, 2011</xref>; <xref ref-type="bibr" rid="B39">Kim and Oshima, 2013</xref>; <xref ref-type="bibr" rid="B48">Magis, 2013</xref>; <xref ref-type="bibr" rid="B49">Magis et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B9">Berger and Tutz, 2016</xref>; <xref ref-type="bibr" rid="B19">Drabinov&#xe1; and Martinkov&#xe1;, 2016</xref>; <xref ref-type="bibr" rid="B50">Martinkov&#xe1; et&#x20;al., 2017</xref>). Moreover, these methods differ with respect to measurement models and criteria used to define DIF items (<xref ref-type="bibr" rid="B12">Borsboom, 2006</xref>; <xref ref-type="bibr" rid="B26">Hambleton, 2006</xref>; <xref ref-type="bibr" rid="B52">Millsap, 2006</xref>). However, in general, DIF methods are classified into two categories: parametric DIF methods based on IRT models and non-parametric methods based on non-IRT measurement models (<xref ref-type="bibr" rid="B29">Hambleton et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B34">Hunter, 2014</xref>). Although many different methods to detect DIF exist, it is important to consider the advantages and inadequacies of DIF methods before utilizing&#x20;them.</p>
<p>DIF items are also grouped as uniform DIF and non-uniform DIF based on a probability distribution of the item characteristic curves (ICCs). An item is said to show uniform DIF if it favors the same group across the entire range of ability, while an item exhibits non-uniform DIF if it favors different groups at different ability levels (<xref ref-type="bibr" rid="B27">Hambleton et&#x20;al., 1993</xref>). An item might be detected as a DIF item; however, one cannot claim that this item is a biased item without investigating the potential underlying cause of DIF. Detection of DIF is based on statistical tests, while bias is related to systematic error, requiring expert opinions and statistical tests (<xref ref-type="bibr" rid="B13">Camilli and Shepard, 1994</xref>; <xref ref-type="bibr" rid="B14">Clauser and Mazor, 1998</xref>; <xref ref-type="bibr" rid="B74">Wiberg, 2006</xref>).</p>
<p>DIF analysis mainly focuses on examining the behavior of the correct response across the subgroups, while DDF is a method of examining the DIF structure of distractors along with the correct responses. The DDF method was first proposed by <xref ref-type="bibr" rid="B25">Green et&#x20;al. (1989)</xref>. They compared selection rates of distractors across different groups. Although there are different methods to examine the behavior of distractors, DDF methods, in general,&#x20;have been classified into two different approaches (<xref ref-type="bibr" rid="B65">Suh and Talley, 2015</xref>) that include <italic>divide-by-total</italic> (<xref ref-type="bibr" rid="B69">Thissen and Steinberg, 1986</xref>) and <italic>divide-by-distractors</italic> (<xref ref-type="bibr" rid="B64">Suh and Bolt, 2011</xref>). <xref ref-type="bibr" rid="B72">Wang (2000)</xref> developed a DDF method based on the factorial model that examines multiple grouping effects and interactions. Moreover, an odds ratio (OR) based method for DDF effects was proposed by <xref ref-type="bibr" rid="B56">Penfield (2008)</xref>, and a multi-step logistic regression-based DDF method was proposed by <xref ref-type="bibr" rid="B1">Abedi et&#x20;al. (2008)</xref>. <xref ref-type="bibr" rid="B38">Kato et&#x20;al. (2009)</xref> extended the multi-step multinomial logistic regression (MLR) based DDF method, proposed by <xref ref-type="bibr" rid="B1">Abedi et&#x20;al. (2008)</xref>, which allows the detection of both DIF and DDF effects. These previously mentioned methods are considered <italic>divide-by-total</italic> DDF methods. On the other hand, <italic>divide-by-distractors</italic> methods have been recently developed, such as the likelihood-ratio-based nested logit approach (<xref ref-type="bibr" rid="B64">Suh and Bolt, 2011</xref>) and an odds-ratio-based nested logit approach (<xref ref-type="bibr" rid="B68">Terzi and Suh, 2015</xref>). These two methods separate key answer parameters from distractor parameters, enabling the evaluation of the DDF effect independent of DIF. Thus, it indicates whether the DDF is a plausible reason or consequence of&#x20;DIF.</p>
<p>DIF analyses, in general, are integrated with DDF analyses to explain the DIF effects. The relationship between DIF and DDF effects is casual rather than correlational (<xref ref-type="bibr" rid="B16">Deng, 2020</xref>; <xref ref-type="bibr" rid="B36">Jamalzadeh et&#x20;al., 2021</xref>). <xref ref-type="bibr" rid="B57">Penfield (2010)</xref> found that the DDF effect may cause uniform DIF and partially explained by DDF, whereas non-uniform DIF may indicate the variation of signs within distractors. Moreover, <xref ref-type="bibr" rid="B57">Penfield (2010)</xref> suggests that DDF studies can shed light on the possible causes of DIF. Therefore, it is suggested to conduct DDF analysis along with DIF analysis to get more information about potential underlying causes of&#x20;DIF.</p>
<p>Similar to DIF effects, DDF effects are classified into two groups: uniform and non-uniform DDF. A uniform DDF indicates a constant DDF effect across all distractors in the same direction, while a non-uniform DDF effect indicates an inconsistent DDF effect across different ability levels (<xref ref-type="bibr" rid="B70">Tsaousis et&#x20;al., 2018</xref>). Moreover, a uniform DDF effect signals that DIF occurs due to the characteristics of correct responses, while a non-uniform DDF implies that DIF occurs either because of a non-functioning distractor or an unexpected interaction between distractors or the stem of the item (<xref ref-type="bibr" rid="B56">Penfield, 2008</xref>). In this study, the MLR based DDF method proposed by <xref ref-type="bibr" rid="B38">Kato et&#x20;al. (2009)</xref> and multi-group nested logit model (NLM) proposed by <xref ref-type="bibr" rid="B64">Suh and Bolt (2011)</xref> were utilized to examine the DDF effects.</p>
<sec id="s1-1">
<title>General Aptitude Test (GAT)</title>
<p>The general aptitude test (GAT) is a standardized test that has been administered in the Middle East since 2002. The GAT is mainly administered to evaluate the college readiness of high school graduates. The scores obtained from GAT are used to select college candidates during the admission process. The main goal of administering GAT is to measure skills, such as problem-solving, logical relations, and drawing conclusions. Two different versions of GAT, which are Science and Art, were developed based on the students&#x2019; majors. The GAT is mandatory for all high-school students who are seeking to pursue a further degree in colleges or universities. It is administered twice a year in a paper-based format and administered year-round in a computer-based format.</p>
<p>Various studies have been conducted to examine the factorial structure and psychometric properties of the GAT, including the validity and reliability of GAT (e.g., <xref ref-type="bibr" rid="B4">Alqataee and Alharbi, 2012</xref>; <xref ref-type="bibr" rid="B17">Dimitrov, 2014</xref>; <xref ref-type="bibr" rid="B76">Dimitrov and Shamrani, 2015</xref>; <xref ref-type="bibr" rid="B61">Sideridis et&#x20;al., 2015</xref>). The test is assumed to be unidimensional, with one dominant factor that explains a large amount of the explained variance. According to a study conducted by <xref ref-type="bibr" rid="B76">Dimitrov and Shamrani (2015)</xref>, a bifactor model consisting of one general factor along with three verbal factors (reading comprehension, sentence completion, and analogy) fits well to the&#x20;data.</p>
</sec>
<sec id="s1-2">
<title>The Purpose of the Study</title>
<p>This study aims at investigating the behaviors of item distractors that exhibited DIF across gender in the 2017 Art Section of the General Aptitude Test (GAT-ART) administered by the National Center for Assessment (NCA). To this end, first, DIF methods based on two non-linear logistic regressions (three parameters, 3PL-NLR and four parameters, 4&#xa0;&#xa0;P&#xa0;L-NLR) were used to detect DIF items. Additionally, the Mantel-Haenszel (MH) Delta DIF method was used to calculate the DIF effect size for each DIF item. Then, the multinomial log-linear regression (MLR) model and 2-PL nested logit model (2PL-NLM) were applied to the items exhibiting DIF that have moderate and large effect sizes to investigate both behaviors of distractors across gender and to examine how distractors affect DIF results. Therefore, in this study, the following main research questions were addressed:<list list-type="simple">
<list-item>
<p>1) Which CFA model (one-factor, two-factor, or bifactor models) fits best to the GAT-ART data?</p>
</list-item>
<list-item>
<p>2) Do GAT-ART items function differently across gender groups (female vs. male)?</p>
</list-item>
<list-item>
<p>3) Do GAT-ART items function differently across quantitative and verbal sections?</p>
</list-item>
<list-item>
<p>4) Do the distractors of GAT-ART items function differently across gender groups (female vs. male)?</p>
</list-item>
<list-item>
<p>5) How does the distribution of responses to distractors associated with DIF items affect DIF results?</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec id="s2">
<title>Methods</title>
<sec id="s2-1">
<title>Data</title>
<p>The GAT-ART 1521 test was administered to 27,075&#x20;high-school students that consist of 22,882 females (84.5%) and 4,191 males (15.5%), and two missing values. The test consists of two domains that are <italic>quantitative</italic> and <italic>verbal,</italic> respectively. Among the 96 items of GAT-ART, 24 items belong to the quantitative section and 72 items to the verbal section. The quantitative section consists of four subdomains: <italic>arithmetic, geometry, mathematical analysis,</italic> and <italic>comparison</italic>. On the other hand, the verbal section consists of four subdomains: <italic>verbal analogy</italic>, <italic>context analysis, sentence completion,</italic> and <italic>reading comprehension</italic>.</p>
</sec>
<sec id="s2-2">
<title>Data Screening</title>
<p>The distribution of missing data was examined before examining the factorial structure of the exam. The results of missing data analysis showed that only 303 out of 27,075 students had missing values larger than 5%. It is suggested to include participants with less than 5% of missing data in the analysis (<xref ref-type="bibr" rid="B3">Alice, 2015</xref>; <xref ref-type="bibr" rid="B45">Madley-Dowd et&#x20;al., 2019</xref>). Therefore, examinees with more than 5% of the missing data were treated as outliers and excluded from the data. For the outlier analysis, the Mahalanobis distance method was used. First, Mahalanobis distance for each examinee and criterion for outliers (144.56) were calculated. Those examinees with Mahalanobis distance values greater than criterion value were detected as outliers. The results indicated that all examinees had Mahalanobis distance values smaller than 144.56; therefore, there appeared to be no outliers in this data&#x20;set.</p>
<p>Statistical testing methods are highly affected by large sample sizes, in which even negligibly small differences turn out to be statistically significant. Therefore, a relatively small sample of GAT-ART data was randomly selected from the entire GAT-ART. Analyses were conducted on the small data set, corresponding to approximately 10% of the entire data set. The 2,500 sample consisted of 2071 (82.8%) females and 429 (17.6%)&#x20;males.</p>
</sec>
<sec id="s2-3">
<title>Statistical Analysis</title>
<p>The factorial structure of the test was first examined with a <italic>one factorial confirmatory factor analysis</italic> (CFA) to test whether the test is unidimensional<italic>.</italic> Additionally, a <italic>two-factorial CFA model</italic>, where each subdomain was considered a factor, and a <italic>bifactor model,</italic> in which a general factor along with two sub-factors accounts for the variance, were also used to examine the factorial structure of the test. The chi-square statistics, comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA) were used to determine to what extent the data adhered to the model. The Lavaan R-package developed by <xref ref-type="bibr" rid="B58">Rosseel (2012)</xref> was used to conduct CFA models. As for the fit criteria, <xref ref-type="bibr" rid="B33">Hu and Bentler (1999)</xref> suggested that goodness of fit statistics must satisfy the following criteria for an acceptable fit: RMSEA &#x2264; 0.06, CFA&#x2265;0.95, and TLI&#x2265;0. On the other hand, <xref ref-type="bibr" rid="B77">Marsh et al. (2004)</xref> suggested less stringent criteria for fit measures (RMSEA &#x2264; 0.08, CFA 0.90, and TLI&#x2265;0.90). Moreover, <xref ref-type="bibr" rid="B53">Muthen and Muthen (2012)</xref> suggested reporting weighted root mean square residuals (WRMR) index when the items are categorical. The goodness of fit statistics for each CFA model are provided in <xref ref-type="table" rid="T1">Table&#x20;1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>CFA results related to one-factor, two-factor, and bifactor models of GAT-ART data.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">X2</th>
<th align="center">Df</th>
<th align="center">CFI</th>
<th align="center">TLI</th>
<th align="center">WRMR</th>
<th align="center">RMSEA</th>
<th align="center">RMSEA (CI&#x20;&#x3d;&#x20;90%) LL</th>
<th align="center">UL</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">One-factor model</td>
<td align="char" char=".">10,161.588</td>
<td align="center">4,464</td>
<td align="char" char=".">0.969</td>
<td align="char" char=".">0.968</td>
<td align="char" char=".">1.477</td>
<td align="char" char=".">0.018</td>
<td align="char" char=".">0.017</td>
<td align="char" char=".">0.018</td>
</tr>
<tr>
<td align="left">Two-factor model</td>
<td align="char" char=".">9,686.151</td>
<td align="center">4,463</td>
<td align="char" char=".">0.972</td>
<td align="char" char=".">0.971</td>
<td align="char" char=".">1.442</td>
<td align="char" char=".">0.017</td>
<td align="char" char=".">0.017</td>
<td align="char" char=".">0.018</td>
</tr>
<tr>
<td align="left">Bifactor model</td>
<td align="char" char=".">7,109.559</td>
<td align="center">4,365</td>
<td align="char" char=".">0.985</td>
<td align="char" char=".">0.984</td>
<td align="char" char=".">1.236</td>
<td align="char" char=".">0.013</td>
<td align="char" char=".">0.012</td>
<td align="char" char=".">0.013</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-4">
<title>Differential Item Functioning (DIF) and Differential Distractor Functioning (DDF) Methods</title>
<p>Along with IRT-based DIF methods, NLR based DIF methods are used to detect items functioning differently across different subgroups. In this study, DIF analyses were first conducted to detect the items with significant DIF effects with two NLR-based DIF methods (3PL-NLR and 4PL-NLR). Moreover, the Mantel-Hanszel Delta (MH Delta) DIF method was utilized to determine the DIF effect size for each DIF item. The difNLR R-package (<xref ref-type="bibr" rid="B30">Hladka and Martinkova, 2021</xref>) was used to conduct NLR-based DIF and DDF analysis, while the DifR R-package (<xref ref-type="bibr" rid="B46">Magis et&#x20;al., 2010</xref>) was used to conduct the MH Delta DIF analyses. Additionally, the Benjamini-Hochberg correction method (<xref ref-type="bibr" rid="B8">Benjamini and Hochberg, 1995</xref>) was used as a p-adjustment method that controls the proportion of false detection to increase the accuracy of hypothesis testing results.</p>
<p>Moreover, the item-purification methods were employed along with DIF and DDF analyses to improve the accuracy of the results. Then, the multinomial log-linear regression (MLR) model and 2-PL nested logit model (2PL-NLM) were employed to detect DDF items. The MLR method takes all response categories, including the correct option and distractors, into account, while the NLM excludes the correct option when evaluating the DDF effect. Thus, employing the latter method allows for a determination of whether the significant DDF effect is a potential underlying cause or a consequence of DIF. Besides, the likelihood ratio test of sub-model methods was used to examine distractors&#x2019; behaviors across gender groups. The following sections provide more detailed information about DIF and DDF methods employed in this&#x20;study.</p>
<sec id="s2-4-1">
<title>Non-linear Logistic Regression Based DIF Methods</title>
<p>The NLR based DIF methods are considered as an extension of the two-parameter DIF methods proposed by <xref ref-type="bibr" rid="B66">Swaminathan and Rogers (1990)</xref>. Compared to the traditional logistic regression model, the 3PL-NLR method accounts for guessing, while the 4PL-NLR method accounts for inattention with the guessing behavior of participants. These two methods seem to be more advantageous compared to the other DIF methods since they take these two parameters into account. <xref ref-type="bibr" rid="B19">Drabinov&#xe1; and Martinkov&#xe1; (2016)</xref> conducted a simulation study which showed that these proposed NLR-based DIF methods yielded sufficient power, low convergence failure rate, and rejection rate compared to the item response theory-based (IRT-based) DIF methods. Therefore, these methods can be considered as a robust alternative to the IRT-based DIF methods.</p>
<p>In the IRT framework, participants from different groups (reference and focal groups) are matched with ability estimates (&#x3b8;), while in the NLR framework, participants are matched with standardized total scores (z-scores). The formula for 3PL-NLR can be reparametrized in the IRT framework. Therefore, the formula for the 3PL-NLR DIF method is as follows:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x7c;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">Z</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">Z</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">Z</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <italic>Z</italic>
<sub>
<italic>j</italic>
</sub> denotes standardized total score, and <italic>G</italic>
<sub>
<italic>j</italic>
</sub> stands for group membership. The regression parameter <italic>b</italic>
<sub>
<italic>i</italic>
</sub> corresponds to <italic>difficulty</italic> and <italic>a</italic>
<sub>
<italic>i</italic>
</sub> to <italic>discrimination</italic> parameters of the <italic>i</italic>th item. On the other hand, a<sub>DIF<italic>i</italic>
</sub> and b<sub>DIF<italic>i</italic>
</sub> represent the difference in both parameters between the focal and reference groups. In this formula, <italic>c</italic>
<sub>i</sub> stands for guessing parameter, which is equal to the probability of a person with a minimum ability score to answer an item correctly.</p>
<p>A 4PL-NLR is an extension of the three-parameter model that accounts for students&#x2019; inattention by adding <italic>d</italic>
<sub>
<italic>i</italic>
</sub> <italic>inattention parameter</italic>. It is simply equivalent to the upper asymptote of the item characteristic curve. The formula for the 4PL-NLR based DIF method is as follows:<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">d</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">d</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">DI</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold-italic">F</mml:mi>
<mml:mi mathvariant="bold-italic">i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>I</mml:mi>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>F</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <italic>x</italic>
<sub>
<italic>j</italic>
</sub> denotes the standardized total score of person <italic>j,</italic> and <italic>d</italic>
<sub>
<italic>i</italic>
</sub> and <italic>g</italic>
<sub>
<italic>j</italic>
</sub> parameters denote the inattention parameter and group membership, respectively. Moreover, <italic>d</italic>
<sub>
<italic>DIFi</italic>
</sub> represents the difference in inattention parameters between reference and focal groups. The other parameters are the same as in Formula 1. This study used the 3PL-NLR and 4PL-NLR methods to detect DIF items in the&#x20;GAT.</p>
</sec>
<sec id="s2-4-2">
<title>Mantel-Haenszel DIF Method</title>
<p>The Mantel-Haenszel (MH) method is one of the most popular non-parametric DIF methods (<xref ref-type="bibr" rid="B32">Holland and Thayer, 1988</xref>). It mainly tests the relationship between group membership and response to a particular item given the total score. The MH statistics are calculated from a 2&#x20;&#xd7; 2 contingency table and follow a chi-square distribution where degrees of freedom equal 1. Therefore, an item is flagged as DIF when the calculated MH-statistic is greater than the critical value calculated based on the alpha (&#x3b1;) significance level. An alternative statistic of the MH method is &#x3b1;<sub>MH</sub>, which is based on odds ratio statistics that provide DIF effect-size measure given by the following formula:<disp-formula id="e3">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">MH</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">A</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">D</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">T</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">B</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">C</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">T</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where T<sub>j</sub> denotes the total score and A<sub>j</sub> and B<sub>j</sub> are the total number of the correct and incorrect responses for the reference group, respectively. On the other hand, C<sub>j</sub> and D<sub>j</sub> denote the total number of correct and incorrect responses for the focal group. Additionally, the logarithm of &#x3b1;MH (log (&#x3b1;MH)) is equal to &#x2206;MH (delta MH), which is of asymptotical normal distribution (see, e.g., <xref ref-type="bibr" rid="B2">Agresti, 1990</xref>). This delta statistic is widely used to determine the effect sizes of DIF statistics. The most commonly used criteria for DIF effect size are those proposed by <xref ref-type="bibr" rid="B31">Holland and Thayer (1985)</xref> that classify DIF effect size &#x7c;&#x2206;MH&#x7c; &#x3c; 1 as negligibly small, 1&#x20;&#x3c; &#x7c;&#x2206;MH&#x7c;&#x3c;1.5 as moderate, and &#x7c;&#x2206;MH&#x7c; &#x3e; 1.5 as large. They are also known as the ETS-delta scaling (<xref ref-type="bibr" rid="B32">Holland and Thayer, 1988</xref>).</p>
</sec>
<sec id="s2-4-3">
<title>Differential Distractor Functioning (DDF)</title>
<p>DDF methods are used to examine the invariance of all responses, rather than just the invariance between correct and incorrect responses (<xref ref-type="bibr" rid="B42">Koon, 2010</xref>). In this study, the MLR based DDF method proposed by <xref ref-type="bibr" rid="B38">Kato et&#x20;al. (2009)</xref> and the multi-group NLM based DDF method proposed by <xref ref-type="bibr" rid="B64">Suh and Bolt (2011)</xref> were utilized to examine the DDF effects. The MLR calculates the item response category characteristic curves (IRCCs) for each response category that represents the probability of selecting a category of an item given the ability score (z-scores). The calculation process for MLR is based on the comparison of two models. The first model restricts IRCCs of items to be the same across the groups, while the second model allows IRCCs to vary across the groups. The pseudo <italic>R</italic>
<sup>2</sup> values obtained from two models were compared to detect items exhibiting DDF. A significant difference in <italic>R</italic>
<sup>2</sup> indicates that the corresponding item shows significant DDF. The formula for the IRCCs of the MLR-based DDF method as a function of the standardized total score (z) is as follows:<disp-formula id="e4">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">p</mml:mi>
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold-italic">z</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:msub>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:msub>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where z denotes standardized total scores and a<sub>k</sub> and b<sub>k</sub> denote logistic regression coefficients that represent the intercept and the slope of IRCCs of a given item<italic>,</italic> respectively.</p>
<p>
<xref ref-type="bibr" rid="B63">Suh and Bolt (2010)</xref> proposed a 2PL-NLM that estimates both item parameters for correct response categories and distractor categories. Additionally, they proposed a multi-group extension of the NLM, which allows item parameters to differ across groups. Therefore, the multi-group NLM can detect both DIF and DDF effects at the item level. The following formula calculates the probability of correct response to an item-<italic>i</italic> given participant-<italic>j</italic> ability parameter (&#x3b8;<italic>j</italic>) <italic>for</italic> multi-group 2PL nested model:<disp-formula id="e5">
<mml:math id="m5">
<mml:mrow>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">u</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ij</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2002;</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x7c;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b8;</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold-italic">g</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ig</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ig</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b8;</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b2;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ig</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ig</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b8;</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>G</italic> denotes group membership and &#x3b2;<sub>
<italic>i</italic>
</sub> and &#x3b1;<sub>
<italic>i</italic>
</sub> represent the intercept and slope parameters of item-<italic>i</italic>, respectively. Additionally, the formula for the conditional probability of selecting a distractor</p>
<p>Is as Follows<disp-formula id="e6">
<mml:math id="m6">
<mml:mrow>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">d</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ijv</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2002;</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x7c;</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">ij</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b8;</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">G</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold-italic">g</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">Z</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">igv</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b8;</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi mathvariant="bold-italic">m</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mi mathvariant="bold-italic">e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">Z</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">igv</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">&#x3b8;</mml:mi>
<mml:mi mathvariant="bold-italic">j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>Z</italic>
<sub>
<italic>igv</italic>
</sub> (&#x3b8;<sub>
<italic>j</italic>
</sub>) is equal to &#x3b6;igv &#x2b; &#x3bb;igv (&#x3b8;j), and the total of distractors&#x2019; parameters are set equal to 0 for each group (<inline-formula id="inf1">
<mml:math id="m7">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:msub>
<mml:mtext>&#x3b6;</mml:mtext>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf2">
<mml:math id="m8">
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:msub>
<mml:mtext>&#x3bb;</mml:mtext>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>).</p>
<p>The discrepancy between item parameters across different groups is evaluated with the likelihood ratio test. To this end, the first model, in which the parameters of item distractors are constrained to be equal, is compared to the second augmented model in which item parameters of all options, including the correct response and distractors, are estimated. The significant difference in the likelihood ratio test statistics (G<sup>2</sup>) between the first and second augmented models implies the existence of DDF effect independent of the DIF effect.</p>
</sec>
</sec>
<sec id="s2-5">
<title>Descriptive Statistics and Reliability of GAT-ART Scores</title>
<p>
<xref ref-type="table" rid="T2">Table&#x20;2</xref> provides the reliability coefficients and descriptive statistics related to the entire test and each domain. Additionally, it provides descriptive statistics and reliability coefficients of the GAT-ART for each gender group. The Cronbach&#x2019;s &#x3b1; and latent variable modeling-based reliability coefficients (composite reliability) were calculated for the entire test and each subsection. The composite reliability coefficient is a reliability coefficient calculated with factor loadings when the test is unidimensional. The composite reliability coefficient yields higher values than Cronbach&#x2019;s &#x3b1; reliability coefficient when the assumption of essentially tau-equivalence is not&#x20;met.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Descriptive statistics and reliability of GAT-ART scores.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Test/domain</th>
<th align="center">Mean</th>
<th align="center">Sd</th>
<th align="center">
<italic>&#x3c1;</italic>
</th>
<th align="center">Cronbach -&#x3b1;</th>
<th align="center">Cronbach-&#x3b1; LL</th>
<th align="center">(CI &#x3d; 90%) UL</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="left">Entire test</td>
<td align="left">GAT-ALL</td>
<td align="char" char=".">47.39</td>
<td align="char" char=".">13.34</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.90</td>
</tr>
<tr>
<td align="left">Quantitative</td>
<td align="char" char=".">12.33</td>
<td align="char" char=".">4.61</td>
<td align="char" char=".">0.78</td>
<td align="char" char=".">0.78</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.78</td>
</tr>
<tr>
<td align="left">Verbal</td>
<td align="char" char=".">35.06</td>
<td align="char" char=".">9.60</td>
<td align="char" char=".">0.84</td>
<td align="char" char=".">0.84</td>
<td align="char" char=".">0.84</td>
<td align="char" char=".">0.85</td>
</tr>
<tr>
<td rowspan="3" align="left">Male group</td>
<td align="left">GAT-ALL</td>
<td align="char" char=".">47.49</td>
<td align="char" char=".">12.91</td>
<td align="char" char=".">0.88</td>
<td align="char" char=".">0.88</td>
<td align="char" char=".">0.88</td>
<td align="char" char=".">0.89</td>
</tr>
<tr>
<td align="left">Quantitative</td>
<td align="char" char=".">12.70</td>
<td align="char" char=".">4.97</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.75</td>
<td align="char" char=".">0.78</td>
</tr>
<tr>
<td align="left">Verbal</td>
<td align="char" char=".">35.77</td>
<td align="char" char=".">9.27</td>
<td align="char" char=".">0.83</td>
<td align="char" char=".">0.83</td>
<td align="char" char=".">0.82</td>
<td align="char" char=".">0.84</td>
</tr>
<tr>
<td rowspan="3" align="left">Female group</td>
<td align="left">GAT-ALL</td>
<td align="char" char=".">44.66</td>
<td align="char" char=".">13.95</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.90</td>
<td align="char" char=".">0.88</td>
<td align="char" char=".">0.91</td>
</tr>
<tr>
<td align="left">Quantitative</td>
<td align="char" char=".">11.83</td>
<td align="char" char=".">4.64</td>
<td align="char" char=".">0.77</td>
<td align="char" char=".">0.78</td>
<td align="char" char=".">0.75</td>
<td align="char" char=".">0.81</td>
</tr>
<tr>
<td align="left">Verbal</td>
<td align="char" char=".">34.86</td>
<td align="char" char=".">10.22</td>
<td align="char" char=".">0.86</td>
<td align="char" char=".">0.86</td>
<td align="char" char=".">0.84</td>
<td align="char" char=".">0.88</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The results in <xref ref-type="table" rid="T2">Table&#x20;2</xref> show that Cronbach&#x2013;&#x3b1; coefficient for the entire test is substantially high and equal to 0.89, and it is equal to 0.84 for verbal and 0.78 for quantitative, respectively. Similar results were obtained for each gender group data. The main reason behind the relatively low reliability coefficient for the quantitative domain could be having only 24 items in this section. Additionally, the latent variable modeling-based reliability coefficients and Cronbach&#x2013;&#x3b1; coefficients were almost identical. These identical reliability coefficients indicate that the essentially tau-equivalence assumption is met for the GAT-ART data. Moreover, the difference in average GAT-ART scores between males and females appears to be not significant at 0.05 significance level (F<sub>0.05, 2498</sub> &#x3d; 3.129, <italic>p</italic>&#x20;&#x3d; 0.077). These results indicate that both groups had similar proficiency levels.</p>
</sec>
</sec>
<sec id="s3">
<title>Results</title>
<p>This study mainly consists of three stages. At the first stage, the factorial structure of the test was examined with the <italic>one-factorial confirmatory factor analysis (CFA) model</italic>, <italic>two-factorial CFA model,</italic> and <italic>bifactor model.</italic> DIF analyses were conducted at the second stage to detect items exhibiting DIF across gender groups. At the third stage, DDF analyses were conducted to examine how distractors of these detected DIF items functioned across gender.</p>
<sec id="s3-1">
<title>Confirmatory Factor Analysis (CFA) Results</title>
<p>The factorial structure of the test was examined with the <italic>one-factorial CFA model, two-factorial CFA model,</italic> and <italic>bifactor model</italic>. Fit measures provided in <xref ref-type="table" rid="T1">Table&#x20;1</xref> were used to determine to what extent data fit the&#x20;model.</p>
<p>The results associated with different CFA models given in <xref ref-type="table" rid="T1">Table&#x20;1</xref> show that both comparative fit index (CFI) and Tucker-Lewis index (TLI) fit statistics are higher than 0.95, and root mean square error (RMSEA) values are smaller than 0.06. These results indicate a perfect fit between the data and each CFA model based on <xref ref-type="bibr" rid="B33">Hu and Bentler (1999)</xref> criteria. Unlike the other fit indices, the chi-square statistics were statistically significant. This might be due to the large sample size, in which a very minor difference tends to be statistically significant. According to the study conducted by <xref ref-type="bibr" rid="B18">DiStefano et al., (2017)</xref>, the weighted root mean square residuals (WRMR) index greater than 1 indicates misspecification, and smaller values indicate a better fit. However, less stringent criteria can be applied since items are dichotomously scored and have a relatively limited number of categories (only two categories, 0 and 1). Regardless of the good fit indices related to the bifactor model, many misfitting items were detected when factor loadings were examined. These results suggest that the test can be considered unidimensional, where there is only one factor that underlies participants&#x2019; scores. The test can also be considered multidimensional (see, two-factor and bifactor models), in which quantitative and verbal sections are treated as separate factors. Therefore, DIF analyses were conducted assuming that the test is unidimensional, and DIF methods were employed to the entire test regardless of subdomains.</p>
</sec>
<sec id="s3-2">
<title>Differential Item Functioning (DIF) Results</title>
<p>The DIF results for the entire test are given in <xref ref-type="sec" rid="s9">Supplementary Appendix S1A</xref>. Items are named with abbreviations in a way that one can easily distinguish which item belongs to which domain. The first column in <xref ref-type="sec" rid="s9">Supplementary Appendix S1A</xref> indicates the item numbers along with abbreviations that represent each domain and subdomain. For the quantitative section, MAR, MGE, MAN, and MCO abbreviations stand for <italic>arithmetic, geometry, mathematical analysis,</italic> and <italic>comparison,</italic> respectively. For the verbal section, VAN, VCA, VSC, and VRC abbreviations stand for <italic>verbal analogy</italic>, <italic>context analysis, sentence completion,</italic> and <italic>reading comprehension,</italic> respectively. The other columns present DIF-statistics and <italic>p</italic>-values obtained from DIF methods. The significance level for detecting DIF items was set at 0.01 with a detection threshold equal to 9.21 to eliminate the sample size effect on the chi-square-based test statistics, which might result in identifying non-DIF items as DIF&#x20;items.</p>
<p>
<xref ref-type="table" rid="T3">Table&#x20;3</xref> presents items detected as showing DIF by NLR methods. Additionally, the MH-Delta method was utilized to determine the effect sizes of DIF statistics. According to the 3PL-NLR method, 22 out of 96 items were detected as DIF items. Among these DIF items, three items (MAR8, MAR9, MGE4) were quantitative, and the remaining 19 items were verbal. In addition, 4 items were detected as showing non-uniform DIF, and 18 items were detected as showing uniform DIF. Among the uniform DIF items, 10 items were in favor of males, whereas 8 items favored females. According to the MH-Delta method, 5 items were classified in Category A, which indicated a negligibly small DIF effect, while 12 items were classified in Category B with moderate DIF effect, and 5 items were classified in category C with large DIF effect.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Items detected as DIF with using Mantel-Haensel Delta method.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Items</th>
<th colspan="2" align="center">3PL-NLR</th>
<th colspan="2" align="center">4PL-NLR</th>
<th colspan="2" align="center">Mantel-haensel</th>
</tr>
<tr>
<th align="center">Statistics</th>
<th align="center">
<italic>p</italic>-value</th>
<th align="center">Statistics</th>
<th align="center">
<italic>p</italic>-value</th>
<th align="center">MH-delta</th>
<th align="center">Effect size</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">MAR8</td>
<td align="char" char=".">13.826</td>
<td align="char" char=".">0.004<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">11.208</td>
<td align="char" char=".">0.006<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">1.0191</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">MAR9</td>
<td align="char" char=".">15.259</td>
<td align="char" char=".">0.003<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">8.963</td>
<td align="char" char=".">0.014</td>
<td align="char" char=".">&#x2212;0.8307</td>
<td align="center">A</td>
</tr>
<tr>
<td align="left">MGE4</td>
<td align="char" char=".">22.920</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">20.005</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;1.1845</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VAN5</td>
<td align="char" char=".">21.609</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">8.990</td>
<td align="char" char=".">0.017</td>
<td align="char" char=".">&#x2212;1.2342</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VAN6</td>
<td align="char" char=".">28.872</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">18.874</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;0.6668</td>
<td align="center">A</td>
</tr>
<tr>
<td align="left">VAN15</td>
<td align="char" char=".">19.900</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">10.191</td>
<td align="char" char=".">0.009<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;0.8718</td>
<td align="center">A</td>
</tr>
<tr>
<td align="left">VAN18</td>
<td align="char" char=".">31.807</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">20.694</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;1.2519</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VAN19</td>
<td align="char" char=".">36.758</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">20.973</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;1.3223</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VCA9</td>
<td align="char" char=".">49.179</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">15.269</td>
<td align="char" char=".">0.001<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;1.389</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VCA15</td>
<td align="char" char=".">15.430</td>
<td align="char" char=".">0.002<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">0.520</td>
<td align="char" char=".">0.844</td>
<td align="char" char=".">&#x2212;0.6545</td>
<td align="center">A</td>
</tr>
<tr>
<td align="left">VSC6</td>
<td align="char" char=".">42.176</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">3.773</td>
<td align="char" char=".">0.125</td>
<td align="char" char=".">&#x2212;1.4731</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VSC7</td>
<td align="char" char=".">25.595</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">13.063</td>
<td align="char" char=".">0.003<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">&#x2212;0.9144</td>
<td align="center">A</td>
</tr>
<tr>
<td align="left">VRC1</td>
<td align="char" char=".">27.169</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">10.976</td>
<td align="char" char=".">0.006<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">1.4739</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VRC7</td>
<td align="char" char=".">41.081</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">40.924</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">2.0218</td>
<td align="center">C</td>
</tr>
<tr>
<td align="left">VRC9</td>
<td align="char" char=".">39.383</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">2.313</td>
<td align="char" char=".">0.235</td>
<td align="char" char=".">1.6211</td>
<td align="center">C</td>
</tr>
<tr>
<td align="left">VRC11</td>
<td align="char" char=".">15.494</td>
<td align="char" char=".">0.002<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">3.774</td>
<td align="char" char=".">0.125</td>
<td align="char" char=".">1.3588</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VRC12</td>
<td align="char" char=".">35.774</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">16.254</td>
<td align="char" char=".">0.001<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">1.7118</td>
<td align="center">C</td>
</tr>
<tr>
<td align="left">VRC18</td>
<td align="char" char=".">109.525</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">19.842</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">2.8885</td>
<td align="center">C</td>
</tr>
<tr>
<td align="left">VRC20</td>
<td align="char" char=".">14.154</td>
<td align="char" char=".">0.004<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">12.411</td>
<td align="char" char=".">0.003<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">1.1129</td>
<td align="center">B</td>
</tr>
<tr>
<td align="left">VRC22</td>
<td align="char" char=".">48.999</td>
<td align="char" char=".">0.000<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">2.934</td>
<td align="char" char=".">0.172</td>
<td align="char" char=".">2.3687</td>
<td align="center">C</td>
</tr>
<tr>
<td align="left">VRC23</td>
<td align="char" char=".">13.828</td>
<td align="char" char=".">0.004<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="char" char=".">0.775</td>
<td align="char" char=".">0.553</td>
<td align="char" char=".">1.2677</td>
<td align="center">B</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Note: A &#x3d; negligibly small DIF, effect, B &#x3d; moderate DIF, effect, C &#x3d; large DIF, effect.</p>
</fn>
<fn id="Tfn1">
<label>a</label>
<p>Significant at 0.05.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>When the 4PL-NLR method, which accounts for inattention (d-parameter upper asymptote), was used to detect DIF items, the number of items detected as exhibiting DIF decreased from 22 to 14. Among these DIF items, only 2 items (MAR8, MGE4) were quantitative items, and the remaining 12 items were verbal items. In addition, all DIF items exhibited uniform DIF. It is noticeable that all DIF items related to the verbal analogy, context analysis, and sentence completion within the verbal section were in favor of males, while DIF items related to reading comprehension were in favor of females. These results indicate either existence of DIF across gender or signal the content-specific DIF. According to the MH-Delta method, 4 out of these 14 DIF items were classified in Category A, indicating a negligibly small DIF effect, while 7 items were classified in Category B with a moderate DIF effect, and 3 items were classified in Category C with a large DIF effect.</p>
<p>ICCs in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> depict the probability of a correct response across the entire ability score range for males (reference group) and females (focal group) for the 10 detected DIF items. The circles in the ICC plots represent the counts of standardized total scores, in which a larger size indicates a larger number of test-takers with that given standardized total score. As can be observed from ICCs associated with DIF items, the discrepancy between males&#x2019; and females&#x2019; ICCs was consistently in favor of one group across the entire ability range, indicating the uniform DIF effects.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>ICC associated with DIF items.</p>
</caption>
<graphic xlink:href="feduc-06-748884-g001.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>Differential Distractor Functioning (DDF) Results</title>
<p>The likelihood ratio statistics based on the MLR and NLM DDF methods were conducted on the 10 DIF items with moderate to large DIF effect sizes. These DIF items were detected by both 3PL-NLR and 4PL-NLR methods. Employing the latter method allows identifying whether DDF is an underlying potential cause or a consequence of DIF. In addition, the likelihood ratio test of sub-model methods was used to examine distractors&#x2019; behaviors across gender.</p>
<p>Since the GAT items contain 4 response categories, including the correct response, the corresponding critical values for the NLM method at 0.05 significance level (&#x3b1; &#x3d; 0.05) and 0.01 significance level (&#x3b1; &#x3d; 0.01) are 9.49 and 13.28, respectively. Moreover, IRCCs that provide ICCs of both the correct option and distractors were provided. The IRCCs enable observing the distribution of response to the distractor across the entire range of ability scores to inspect both DIF and DDF effects. Along with DIF results, DDF results of each 10 DIF items are provided in <xref ref-type="table" rid="T4">Table&#x20;4</xref>, while IRCCs of DIF items are given in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>. Moreover, it provides the proportion of selecting each item option for both male and female students.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>DDF results of DIF&#x20;items.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Item</th>
<th colspan="2" align="center">DIF results</th>
<th colspan="3" align="center">DDF results</th>
<th colspan="3" align="center">Percent (%)</th>
</tr>
<tr>
<th align="center">4PL-NLR</th>
<th align="center">MH-delta effect size</th>
<th align="center">Likelihood ratio value</th>
<th align="center">MLR</th>
<th align="center">NLM (G<sup>2</sup>)</th>
<th align="center">Options</th>
<th align="center">Females (%)</th>
<th align="center">Males (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4" align="left">MAR8</td>
<td rowspan="4" align="char" char=".">11.208&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td align="center">2.878</td>
<td rowspan="4" align="center">27.065&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">31.11&#x2a;&#x2a;</td>
<td align="center">A</td>
<td align="char" char=".">18.0</td>
<td align="char" char=".">19.5</td>
</tr>
<tr>
<td align="center">(<italic>p</italic>&#x20;&#x3d; 0.411)</td>
<td align="center">B</td>
<td align="char" char=".">18.1</td>
<td align="char" char=".">15.6</td>
</tr>
<tr>
<td align="center">&#x2014;</td>
<td align="center">C</td>
<td align="char" char=".">32.2</td>
<td align="char" char=".">32.7</td>
</tr>
<tr>
<td align="center">&#x2014;</td>
<td align="center">D</td>
<td align="char" char=".">31.7</td>
<td align="char" char=".">32.1</td>
</tr>
<tr>
<td rowspan="4" align="left">MGE4</td>
<td rowspan="4" align="char" char=".">20.005&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td rowspan="4" align="center">69.853</td>
<td rowspan="4" align="center">45.353&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">4.54</td>
<td align="center">A</td>
<td align="char" char=".">15.2</td>
<td align="char" char=".">20.3</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">18.0</td>
<td align="char" char=".">20.7</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">48.1</td>
<td align="char" char=".">31.5</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">18.7</td>
<td align="char" char=".">27.5</td>
</tr>
<tr>
<td rowspan="4" align="left">VAN18</td>
<td rowspan="4" align="char" char=".">20.694&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td rowspan="4" align="center">66.909&#x2a;&#x2a;</td>
<td rowspan="4" align="center">42.875&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">12.21<xref ref-type="table-fn" rid="Tfn2">
<sup>a</sup>
</xref>
</td>
<td align="center">A</td>
<td align="char" char=".">5.1</td>
<td align="char" char=".">8.3</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">59.7</td>
<td align="char" char=".">42.2</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">21.1</td>
<td align="char" char=".">28.2</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">14.2</td>
<td align="char" char=".">21.3</td>
</tr>
<tr>
<td rowspan="4" align="left">VAN19</td>
<td rowspan="4" align="char" char=".">20.973&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td rowspan="4" align="center">119.024&#x2a;&#x2a;</td>
<td rowspan="4" align="center">53.433&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">11.94<xref ref-type="table-fn" rid="Tfn2">
<sup>a</sup>
</xref>
</td>
<td align="center">A</td>
<td align="char" char=".">18.7</td>
<td align="char" char=".">30.8</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">8.2</td>
<td align="char" char=".">13.7</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">70.5</td>
<td align="char" char=".">48.6</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">2.6</td>
<td align="char" char=".">6.9</td>
</tr>
<tr>
<td rowspan="4" align="left">VCA9</td>
<td rowspan="4" align="char" char=".">15.269&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td rowspan="4" align="center">108.576&#x2a;&#x2a;</td>
<td rowspan="4" align="center">49.769&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">14.09&#x2a;&#x2a;</td>
<td align="center">A</td>
<td align="char" char=".">2.0</td>
<td align="char" char=".">6.8</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">83.8</td>
<td align="char" char=".">66.8</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">7.0</td>
<td align="char" char=".">16.2</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">7.3</td>
<td align="char" char=".">10.2</td>
</tr>
<tr>
<td rowspan="4" align="left">VRC1</td>
<td rowspan="4" align="char" char=".">10.976&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td rowspan="4" align="center">13.861<xref ref-type="table-fn" rid="Tfn2">
<sup>a</sup>
</xref>
</td>
<td rowspan="4" align="center">31.560&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">13.59&#x2a;&#x2a;</td>
<td align="center">A</td>
<td align="char" char=".">17.1</td>
<td align="char" char=".">15.3</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">21.8</td>
<td align="char" char=".">20.6</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">48.1</td>
<td align="char" char=".">55.0</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">13.0</td>
<td align="char" char=".">9.2</td>
</tr>
<tr>
<td rowspan="4" align="left">VRC7</td>
<td rowspan="4" align="char" char=".">40.924&#x2a;&#x2a;</td>
<td rowspan="4" align="center">C</td>
<td align="center">5.160</td>
<td rowspan="4" align="center">37.102&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">14.99&#x2a;&#x2a;</td>
<td align="center">A</td>
<td align="char" char=".">17.1</td>
<td align="char" char=".">15.3</td>
</tr>
<tr>
<td align="center">(<italic>p</italic>&#x20;&#x3d; 0.160)</td>
<td align="center">B</td>
<td align="char" char=".">21.8</td>
<td align="char" char=".">20.6</td>
</tr>
<tr>
<td align="center">&#x2014;</td>
<td align="center">C</td>
<td align="char" char=".">48.1</td>
<td align="char" char=".">55.0</td>
</tr>
<tr>
<td align="center">&#x2014;</td>
<td align="center">D</td>
<td align="char" char=".">13.0</td>
<td align="char" char=".">9.2</td>
</tr>
<tr>
<td rowspan="4" align="left">VRC12</td>
<td rowspan="4" align="char" char=".">16.254&#x2a;&#x2a;</td>
<td rowspan="4" align="center">C</td>
<td rowspan="4" align="center">25.331&#x2a;&#x2a;</td>
<td rowspan="4" align="center">41.068&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">0.88</td>
<td align="center">A</td>
<td align="char" char=".">44.9</td>
<td align="char" char=".">55.4</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">23.9</td>
<td align="char" char=".">18.5</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">19.3</td>
<td align="char" char=".">16.7</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">11.9</td>
<td align="char" char=".">9.5</td>
</tr>
<tr>
<td rowspan="4" align="left">VRC18</td>
<td rowspan="4" align="char" char=".">19.842&#x2a;&#x2a;</td>
<td rowspan="4" align="center">C</td>
<td rowspan="4" align="center">117.124&#x2a;&#x2a;</td>
<td rowspan="4" align="center">140.143&#x2a;&#x2a;</td>
<td rowspan="4" align="char" char=".">34.39&#x2a;&#x2a;</td>
<td align="center">A</td>
<td align="char" char=".">21.5</td>
<td align="char" char=".">22.1</td>
</tr>
<tr>
<td align="center">B</td>
<td align="char" char=".">42.6</td>
<td align="char" char=".">23.3</td>
</tr>
<tr>
<td align="center">C</td>
<td align="char" char=".">22.8</td>
<td align="char" char=".">39.8</td>
</tr>
<tr>
<td align="center">D</td>
<td align="char" char=".">13.1</td>
<td align="char" char=".">14.9</td>
</tr>
<tr>
<td rowspan="4" align="left">VRC20</td>
<td rowspan="4" align="char" char=".">12.411&#x2a;&#x2a;</td>
<td rowspan="4" align="center">B</td>
<td align="center">1.978</td>
<td align="center">12.149</td>
<td rowspan="4" align="char" char=".">3.58</td>
<td align="center">A</td>
<td align="char" char=".">53.7</td>
<td align="char" char=".">54.4</td>
</tr>
<tr>
<td align="center">(p &#x3d; 0.577)</td>
<td align="center">(<italic>p</italic>&#x20;&#x3d; 0.059)</td>
<td align="center">B</td>
<td align="char" char=".">17.1</td>
<td align="char" char=".">16.5</td>
</tr>
<tr>
<td align="center">&#x2014;</td>
<td align="center">&#x2014;</td>
<td align="center">C</td>
<td align="char" char=".">12.9</td>
<td align="char" char=".">14.4</td>
</tr>
<tr>
<td align="center">&#x2014;</td>
<td align="center">&#x2014;</td>
<td align="center">D</td>
<td align="char" char=".">16.3</td>
<td align="char" char=".">14.7</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn2">
<label>a</label>
<p>Significant at 0.05.</p>
</fn>
<fn>
<p>&#x2a;&#x2a;&#x2a;Significant at 0.01.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Item response characteristic curves (IRCC<sub>S</sub>) associated with DIF items.</p>
</caption>
<graphic xlink:href="feduc-06-748884-g002.tif"/>
</fig>
<p>The DDF results in <xref ref-type="table" rid="T4">Table&#x20;4</xref> indicate that item MAR8, which is associated with the arithmetic subdomain of the quantitative section, exhibited uniform DIF in favor of male students (see <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). Although the correct option of this item was option D, most male and female students selected distractor C. Moreover, both MLR and NLM statistics associated with this item were significant, which indicates the existence of a DDF effect. The significant DDF effect of item MAR8 obtained from the NLM method indicates that distractors function as the potential underlying cause of DIF. As can be observed from <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, the discrepancies between ICCs of distractors also indicate the existence of a significant DDF effect. Moreover, distractor C functioned differently than expected. Therefore, the DDF effect of distractor C might have caused DIF rather than the stem of the items or the correct option (<xref ref-type="bibr" rid="B57">Penfield, 2010</xref>).</p>
<p>The results related to item MGE4 in the quantitative section indicate that it exhibited uniform DIF with a moderate effect size in favor of females. However, the selection proportion for each distractor was higher for males than females. Unlike the significant DDF effect obtained from the MLR method, the NLM DDF effect was not statistically significant, which indicates that distractors might not be contributing to the DIF effect. Thus, the non-significant DDF result implies that either stem or correct option is likely to be the potential underlying cause of&#x20;DIF.</p>
<p>When it comes to the DIF and DDF results of the analogy subdomain of the verbal section, both item VAN18 and item VAN19 showed uniform DIF with a moderate effect size in favor of females (see <xref ref-type="table" rid="T4">Table&#x20;4</xref>). The selection proportion of each distractor for males was higher compared to females, which indicates the existence of the DDF effect for male students. Moreover, both MLR and NLM statistics were statistically significant, indicating the existence of the DDF effect for both items. The significant DDF effect of the NLM method indicates that distractors functioned as the potential underlying cause of DIF. However, these DDF effects were non-uniform, implying that distractors functioned differently across gender groups. Moreover, distractor A was less likely to be selected by both gender groups for item VAN18. Additionally, there was a substantial difference in the selection proportions of distractor A between males and females for item VAN19. Therefore, these results indicate that the DDF effect might have caused the DIF along with the correct option rather than the stem of the&#x20;items.</p>
<p>There is only one item (VCA9) detected as exhibiting DIF in the context domain of the verbal section. The results indicate that item VCA9 exhibited uniform DIF with a moderate effect size in favor of females. On the other hand, the selection proportion of each distractor for males was higher compared to females. Moreover, both MLR and NLM statistics were significant, indicating the existence of a DDF effect. The significant DDF effect of VCA9 obtained from the NLM method indicates that the distractors functioned as a potential cause of DIF, and the associated DDF effect was uniform indicating the consistent DFF effect across the entire ability range. Additionally, distractor A was less likely to be selected by both gender groups compared to the other distractors. Therefore, the DDF effect of distractors might have caused DIF rather than the stem of the&#x20;item.</p>
<p>Unlike the other domains, the reading comprehension domain had the highest total number of items detected as DIF (5 items). Results indicate that all DIF items in the reading comprehension section (VRC1, VRC7, VRC12, VRC18, and VRC20) exhibited uniform DIF in favor of females (see <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). Moreover, both MLR and NLM statistics were significant for VRC1, VRC7, and VRC18, which indicate the existence of significant DDF effects. The significant DDF effects of these three items obtained from the NLM method indicate that the distractors might have caused DIF effects rather than the stem of the items or the correct option. Additionally, unlike the other DDF items, the log-likelihood ratio statistic was not significant for item VRC7, which indicates that column proportions of distractors remained the same across gender groups. On the other hand, for item VRC12, the MLR DDF statistic was significant, while the NLM statistic was not. Among all these 10 DIF items, only item VRC20 had a non-significant DDF effect. The non-significant DDF effect obtained from NLM indicates that distractors did not function differently when the responses to the correct option were excluded. Therefore, the correct option might have caused DIF rather than the stem of the items for item VR12 and item&#x20;VR20.</p>
</sec>
</sec>
<sec id="s4">
<title>Conclusion and Discussion</title>
<p>The objective of this study is to detect items that exhibit DIF across gender groups and to examine the DIF items with DDF methods in order to define possible sources of DIF effects. For this purpose, first, DIF analyses were conducted with two non-linear logistic regression-based DIF methods (3PL-NLR and 4PL-NLR) to detect the items that have significant DIF effects. Moreover, the MH-Delta DIF method was utilized to determine the effect size of the DIF statistics for each DIF item. Second, the MLR method, NLM, and likelihood ratio test of sub-model methods were used to detect DDF items and examine item distractors&#x2019; behaviors across gender groups. The MLR method is classified as a divide-by-total method, which evaluates both DIF and DDF effects simultaneously, while the NLM is classified as a divide-by-distractor method, which evaluates the DDF effect independent of DIF and, therefore, determines whether item distractors contributed or caused&#x20;DIF.</p>
<p>DIF results of the GAT-ART show that 22 out of 96 items were flagged as exhibiting DIF by the 3PL-NLR method. However, the number of DIF items decreased from 22 to 14 when the 4PL-NLR DIF method, which accounts for the inattention of students, was applied. All these 14 DIF items detected by the 4PL-NLR DIF method exhibited uniform DIF. According to the effect-size results of the MH-Delta DIF method, 4 out of the 14 DIF items had negligibly small DIF effect, 7 items had moderate DIF effect, and 3 items had large DIF effect. When DIF results of 3PL-NLR, 4PL-NLR, and MH-Delta methods were compared, only 10 items were detected as DIF items with moderate to large DIF effect sizes. All these 10 items exhibited uniform DIF. The other 4 items with negligibly small DIF effect sizes were excluded from DDF analyses. In general, DDF methods are employed to determine the potential causes of DIF or to investigate if options function differently (<xref ref-type="bibr" rid="B59">Schmitt and Dorans, 1990</xref>; <xref ref-type="bibr" rid="B6">Banks, 2009</xref>; <xref ref-type="bibr" rid="B57">Penfield, 2010</xref>; <xref ref-type="bibr" rid="B64">Suh and Bolt, 2011</xref>; <xref ref-type="bibr" rid="B65">Suh and Talley, 2015</xref>; <xref ref-type="bibr" rid="B55">Park, 2017</xref>). Therefore, DDF analyses were conducted on these 10 items to provide insight into the behavior of item distractors and their effects on DIF results.</p>
<p>For the results of DDF analyses, MLR DDF results indicate that all DIF items exhibited DDF across gender except for item VRC20 (p<sub>ddf</sub> &#x3d; 0.059), which had a moderate DIF effect size and was in favor of females. According to the NLM method, 7 DIF items showed significant DDF, while the other 3 DIF items (MGE4, VRC12, and VRC20) did not exhibit DDF. The non-significant DDF results obtained from the NLM method indicate that the distractors did not contribute to the DIF effect and, thus, the stem or correct option might have caused DIF for these 3 items. The significant DDF effects of the other 7 items showed that distractors either contributed to the DIF effect or were the potential cause of the&#x20;DIF.</p>
<p>When the DDF results obtained from the MLR and NLR methods were compared, there were only 2 items (MGE4 and VRC12) that had significant DDF effects according to the MLR, while the DDF effects of these items were not significant according to the NLM method. Therefore, for these 2 items, a significant DDF effect obtained from the divide-by-total method signals that the DDF effects of these items are a consequence of DIF rather than the cause (<xref ref-type="bibr" rid="B64">Suh and Bolt, 2011</xref>).</p>
<p>Regarding the DIF and DDF results related to each section, 2 items were related to the <italic>verbal analogy</italic> (VAN18, VAN19), 1 item was related to the <italic>context analysis</italic> (VCA9), and 5 items were related to <italic>reading comprehension</italic> (VRC1, VRC7, VRC12, VRC18, VRC20) subdomains for the verbal section. The results of DDF analyses related to the reading comprehension subdomain reveal that all DIF items associated with the reading comprehension domain exhibited uniform DIF in favor of females. Moreover, 2 items (VRC1 and VRC7) showed DDF in which the DDF effect of distractors might have caused DIF rather than the stem of the items. However, unlike the other items related to reading comprehension, two items (VRC12 and VRC20) showed no significant DDF effects but only significant DIF effects. Thus, for these two items, either the correct option or the stem of items are the potential cause of the DIF rather than distractors.</p>
<p>One of the interesting outcomes of this study is that all items showing DIF related to the <italic>verbal analogy</italic>, <italic>context analysis,</italic> and <italic>sentence completion</italic> subdomains of verbal sections were in favor of male students, while all DIF items related to <italic>reading comprehension</italic> were in favor of female students. These results may either reflect the DIF across gender or signal the content-specific DIF. As stated in a study conducted by G&#xf3;mez-Benito and others (2018), DIF might occur due to the existence of items with specific characteristics in common, such as being related to the same content (<xref ref-type="bibr" rid="B5">American Educational Research Association, American Psychological Association and National Council on Measurement in Education, 2014</xref>). These findings require further investigations and need to be supported by the literature review. Along with conducting DDF analysis, an alternative approach to validate DIF results could be using a mixed-methods approach that integrates both qualitative and quantitative methods (<xref ref-type="bibr" rid="B15">Creswell, 2015</xref>). For instance, <xref ref-type="bibr" rid="B7">Ben&#xed;tez et&#x20;al. (2016)</xref> investigated DIF items in PISA 2006 with subject matter experts to define the potential source of the DIF. Likewise, <xref ref-type="bibr" rid="B44">Maddox et&#x20;al. (2015)</xref> compared DIF results with an ethnographic transcript to determine the way that students dealt with items in literacy tests (<xref ref-type="bibr" rid="B24">G&#xf3;mez-Benito et&#x20;al., 2018</xref>).</p>
<p>Overall, the significant DDF effects obtained from the <italic>divide-by-total</italic> method (MLR) indicate that the DDF effects of these items are a consequence of DIF rather than the cause (<xref ref-type="bibr" rid="B64">Suh and Bolt, 2011</xref>). Additionally, the significant NLR-based DDF effect implies that the DDF effects of distractors might have caused DIF rather than the stem of the items or the correct options (<xref ref-type="bibr" rid="B57">Penfield, 2010</xref>). For such cases, it is suggested to either revise the item distractors to eliminate the DIF effect or exclude the items from the test. On the other hand, the non-significant DDF results obtained from the NLM method indicate that item distractors did not contribute to the DIF effect and, thus, the stem or correct option might have caused DIF. The detailed investigation of the distractors of the items with significant DDF effects also reveals that some distractors were more likely to be selected, while some distractors were less likely to be selected (see MAR8 and VCA9, VRC18). The discrepancy between ICCs of these distractors also indicates the existence of the DDF effect since these distractors functioned differently than expected.</p>
<sec id="s4-1">
<title>The Implication of This Study</title>
<p>Examining the role of the distractors in DIF and the effect of the distractor on test bias through DDF analysis have caught a lot of practitioners&#x2019; attention in the last few decades (<xref ref-type="bibr" rid="B25">Green et&#x20;al., 1989</xref>; <xref ref-type="bibr" rid="B35">Jalili et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B50">Martinkov&#xe1; et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B51">Middleton and Laitusis, 2007</xref>; <xref ref-type="bibr" rid="B70">Tsaousis et&#x20;al., 2018</xref>, among others). However, there are very few studies that have employed divide-by-distractor and divide-by-total DDF approaches together to determine the effect of distractors on DIF results and to determine the potential source of DIF. Therefore, this study is believed to make a significant contribution to the existing literature in this regard.</p>
<p>The other aspect of this study that makes it unique compared to the other studies is that it utilizes a 4PL-NLR method-based DIF approach, which is assumed to eliminate the inattention effect on estimated item parameters. The main characteristic of the 4PL model is that it provides a non-zero chance of incorrect response to an item for high-performing students. <xref ref-type="bibr" rid="B78">Rulison and Loken (2009)</xref> showed that the effect of early mistakes made by students with high ability levels because of stress and carelessness could be reduced. Therefore, ability estimation bias could be decreased by the 4PL model (<xref ref-type="bibr" rid="B48">Magis, 2013</xref>).</p>
<p>
<xref ref-type="bibr" rid="B16">Deng (2020)</xref> claims that there is no evidence of the existence of the DDF effect without DIF. Therefore, DDF analysis is conducted after DIF analysis to provide more insight into the potential sources of the significant DIF effects. Likewise, <xref ref-type="bibr" rid="B38">Kato et&#x20;al. (2009)</xref> have stated that DDF analysis is used as a supplementary analysis that plays a secondary role in studying test fairness and provides important information about the potential underlying causes or the sources of the DIF effect.</p>
<p>DDF analyses can also be used to investigate the perception of items, understand stimuli-attracts, and determine the cognitive steps across different subgroups. Additionally, it can be employed to understand the differences in cognitive processes used to respond to an item across the subgroups for achievement tests (<xref ref-type="bibr" rid="B55">Park, 2017</xref>). In this study, DIF and DDF effects were examined in terms of group differences and attributes being measured. However, there might be some other factors that are not directly related to the content being measured, such as differences in teaching practices, teaching environment, and socioeconomic status, that contribute to the unexpected differences in responding behavior (<xref ref-type="bibr" rid="B75">Zumbo, 2007</xref>; <xref ref-type="bibr" rid="B55">Park, 2017</xref>). These factors are listed under the third generation of DIF studies by <xref ref-type="bibr" rid="B75">Zumbo (2007)</xref>. Therefore, it is suggested to use both DIF and DDF methods within different contexts, including other factors, to provide more insight into the potential sources of the DIF and DDF effects.</p>
<p>Overall, it is suggested to examine DDF along with DIF (e.g., <xref ref-type="bibr" rid="B56">Penfield, 2008</xref>, <xref ref-type="bibr" rid="B57">2010</xref>; <xref ref-type="bibr" rid="B64">Suh and Bolt, 2011</xref>; <xref ref-type="bibr" rid="B65">Suh and Talley, 2015</xref>; <xref ref-type="bibr" rid="B68">Terzi and Suh, 2015</xref>). Examining both phenomena provides more accurate information about correct response and behaviors of the item distractors during the test development process. Moreover, studying DIF and DDF also provides insight into whether DDF occurs due to DIF in the correct option, or DIF occurs due to the significant DDF effects.</p>
</sec>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The datasets presented in this article are not readily available because the data owner is the National Center for Assessment (NCA, KSA). Therefore, the data can be shared with the approval of NCA. Requests to access the datasets should be directed to <ext-link ext-link-type="uri" xlink:href="research@etec.gov.sa">research@etec.gov.sa</ext-link>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>BO conducted a research project and drafted the manuscript. HA provided critical revisions. Both authors approved the final version of the manuscript for submission.</p>
</sec>
<sec sec-type="COI-statement" id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>A part of the study has been submitted to National Center for Assessment (NCA) as a research project titled &#x201c;Examining Differential Distractor Functioning of GAT-ART Items with Non-Linear Logistic Regression Models.&#x201d; I would like to thank Abdullah Al-Qatee for his continued support throughout this project. I would also like to thank Khaleel Alharbi for providing research data.</p>
</ack>
<sec id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/feduc.2021.748884/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/feduc.2021.748884/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.docx" id="SM1" mimetype="application/docx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Abedi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Leon</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kao</surname>
<given-names>J.&#x20;C.</given-names>
</name>
</person-group> (<year>2008</year>). <source>Examining Differential Distractor Functioning in reading Assessments for Students with Disabilities (CRESST Tech. Rep. No. 743</source>. <publisher-loc>Los Angeles, CAStandards, and Student Testing</publisher-loc>: <publisher-name>National Center for Research on Evaluation</publisher-name>. </citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Agresti</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>1990</year>). <source>Categorical Data Analysis</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>John Wiley and Sons</publisher-name>. <pub-id pub-id-type="doi">10.1002/0471249688</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Alice</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Imputing Missing Data With R; MICE Package</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/">https://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/</ext-link>
</comment>. </citation>
</ref>
<ref id="B4">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Alqataee</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Alharbi</surname>
<given-names>K. A.</given-names>
</name>
</person-group> (<year>2012</year>). <source>The Ability of Admission Criteria to Predict the First-Year College Grade point Average in Some Saudi Universities (Technical Report No. TR009-2012)</source>. <publisher-loc>Riyadh, Saudi Arabia</publisher-loc>: <publisher-name>The National Center for Assessment in Higher Education</publisher-name>. </citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<collab>American Educational Research Association, American Psychological Association and National Council on Measurement in Education</collab> (<year>2014</year>). <source>Standards for Educational and Psychological Testing</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>American Educational Research Association</publisher-name>. </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Banks</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Using DDF in a Post Hoc Analysis to Understand Sources&#x20;of&#x20;DIF</article-title>. <source>Educ. Assess.</source> <volume>14</volume> (<issue>2</issue>), <fpage>103</fpage>&#x2013;<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1080/10627190903035229</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ben&#xed;tez</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Padilla</surname>
<given-names>J.-L.</given-names>
</name>
<name>
<surname>Hidalgo Montesinos</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Sireci</surname>
<given-names>S. G.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Using Mixed Methods to Interpret Differential Item Functioning</article-title>. <source>Appl. Meas. Education.</source> <volume>29</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1080/08957347.2015.1102915</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benjamini</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hochberg</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing</article-title>. <source>J.&#x20;R. Stat. Soc. Ser. B (Methodological).</source> <volume>57</volume> (<issue>1</issue>), <fpage>289</fpage>&#x2013;<lpage>300</lpage>. <pub-id pub-id-type="doi">10.2307/2346101</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tutz</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focused Trees</article-title>. <source>J.&#x20;Educ. Behav. Stat.</source> <volume>41</volume> (<issue>6</issue>), <fpage>2016</fpage>. <pub-id pub-id-type="doi">10.3102/1076998616659371</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bond</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Fulcher</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Validity and Assessment: A Rasch Measurement perspectiveLanguage Testing and Assessment</article-title>. <source>Metodoliga de Las Ciencias Del. Comportamentoroutledge.</source> <volume>5</volume> (<issue>2</issue>), <fpage>179</fpage>&#x2013;<lpage>194</lpage>. </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Borsboom</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>When Does Measurement Invariance Matter?</article-title> <source>Med. Care.</source> <volume>44</volume> (<issue>11Suppl. 3</issue>), <fpage>S176</fpage>&#x2013;<lpage>S181</lpage>. <pub-id pub-id-type="doi">10.1097/01.mlr.0000245143.08679.cc</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Camilli</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Shepard</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>1994</year>). <source>Methods for Identifying Biased Test Items</source>. <publisher-loc>Thousand Oaks, CA</publisher-loc>: <publisher-name>Sage publications</publisher-name>. </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clauser</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Mazor</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Using Statistical Procedures to Identify Differentially Functioning Test Items</article-title>. <source>Educ. Meas. Issues Pract.</source> <volume>17</volume> (<issue>1</issue>), <fpage>31</fpage>&#x2013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3992.1998.tb00619.x</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Creswell</surname>
<given-names>J.&#x20;W.</given-names>
</name>
</person-group> (<year>2015</year>). <source>A Concise Introduction to Mixed Methods Research</source>. <publisher-loc>Thousand Oaks, CA</publisher-loc>: <publisher-name>Sage Publications</publisher-name>. </citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Deng</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <source>The Relationship between Differential Distractor Functioning (DDF) and Differential Item Functioning (DIF): If DDF Occurs, Must DIF Occur?</source> <publisher-loc>Lawrence, KS</publisher-loc>: <publisher-name>Doctoral dissertation, University of Kansas</publisher-name>. </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Dimitrov</surname>
<given-names>D. M.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Testing for Unidimensionality of GAT Data</source>. <comment>(Technical Report: TR029-2013)</comment>. <publisher-loc>Riyadh, KSA</publisher-loc>: <publisher-name>National Center for Assessment in Higher Education</publisher-name>. </citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dimitrov</surname>
<given-names>D. M.</given-names>
</name>
<name>
<surname>Shamrani</surname>
<given-names>A. R.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Psychometric Features of the General Aptitude Test&#x2013;Verbal Part (GAT-V)</article-title> <source>Meas. Eval. Couns. Dev.</source> <volume>48</volume> (<issue>2</issue>), <fpage>79</fpage>&#x2013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1177/0748175614563317</pub-id> </citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dimitrov</surname>
<given-names>D. M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Examining Differential Item Functioning: IRT-Based Detection in the Framework of Confirmatory Factor Analysis</article-title> <source>Meas. Eval. Couns. Dev.</source> <volume>50</volume> (<issue>3</issue>), <fpage>183</fpage>&#x2013;<lpage>200</lpage>. <pub-id pub-id-type="doi">10.1080/07481756.2017.1320946</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>DiStefano</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Examination of the Weighted Root Mean Square Residual: Evidence for Trustworthiness?</article-title> <source>Struct. Equation Model. A Multidisciplinary J.</source> <volume>25</volume> (<issue>3</issue>), <fpage>453</fpage>&#x2013;<lpage>466</lpage>. <pub-id pub-id-type="doi">10.1080/10705511.2017.1390394</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Drabinov&#xe1;</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Martinkov&#xe1;</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Detection of Differential Item Functioning with Non-linear Regression: Non-IRT Approach Accounting for Guessing</source>. <comment>(Technical report No. V1229)</comment>. <publisher-name>Institute of Computer Science the Czech Academy of Sciences</publisher-name>. </citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fulcher</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2007</year>). <source>Language Testing and Assessment</source>. <publisher-loc>London, NY</publisher-loc>: <publisher-name>Routledge</publisher-name>. </citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fulcher</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2013</year>). <source>The Routledge Handbook of Language Testing</source>. <publisher-loc>Abingdon, UK</publisher-loc>: <publisher-name>Routledge</publisher-name>. <pub-id pub-id-type="doi">10.4324/9780203181287</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>G&#xf3;mez-Benito</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sireci</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Padilla</surname>
<given-names>J.&#x20;L.</given-names>
</name>
<name>
<surname>Hidalgo</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Ben&#xed;tez</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Differential Item Functioning: Beyond Validity Evidence Based on Internal Structure</article-title>. <source>Psicothema.</source> <volume>30</volume> (<issue>1</issue>), <fpage>104</fpage>&#x2013;<lpage>109</lpage>. <pub-id pub-id-type="doi">10.7334/psicothema2017.183</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Green</surname>
<given-names>B. F.</given-names>
</name>
<name>
<surname>Crone</surname>
<given-names>C. R.</given-names>
</name>
<name>
<surname>Folk</surname>
<given-names>V. G.</given-names>
</name>
</person-group> (<year>1989</year>). <article-title>A Method for Studying Differential Distractor Functioning</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>26</volume> (<issue>2</issue>), <fpage>147</fpage>&#x2013;<lpage>160</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.1989.tb00325.x</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hambleton</surname>
<given-names>R. K.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Good Practices for Identifying Differential Item Functioning</article-title>. <source>Med. Care.</source> <volume>44</volume> (<issue>11</issue>), <fpage>S182</fpage>&#x2013;<lpage>S188</lpage>. <pub-id pub-id-type="doi">10.1097/01.mlr.0000245443.86671.c4</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hambleton</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>Clauser</surname>
<given-names>B. E.</given-names>
</name>
<name>
<surname>Mazor</surname>
<given-names>K. M.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>R. W.</given-names>
</name>
</person-group> (<year>1993</year>). <article-title>Advances in the Detection of Differentially Functioning Test Items</article-title>. <source>Eur. J.&#x20;Psychol. Assess.</source> <volume>9</volume>, <fpage>1</fpage>&#x2013;<lpage>18</lpage>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hambleton</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>H. J.</given-names>
</name>
</person-group> (<year>1989</year>). <article-title>Detecting Potentially Biased Test Items: Comparison of IRT Area and Mantel-Haenszel Methods</article-title>. <source>Appl. Meas. Education.</source> <volume>2</volume> (<issue>4</issue>), <fpage>313</fpage>&#x2013;<lpage>334</lpage>. <pub-id pub-id-type="doi">10.1207/s15324818ame0204_4</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hambleton</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>Swaminathan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>H. J.</given-names>
</name>
</person-group> (<year>1991</year>). <source>Fundamentals of Item Response Theory</source>. <publisher-loc>Newbury Park. Calif</publisher-loc>: <publisher-name>Sage Publications</publisher-name>. </citation>
</ref>
<ref id="B30">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Hladka</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Martinkova</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>difNLR: DIF and DDF Detection by Non-linear Regression Models</article-title>. <comment>R package version 1.3.7, Available at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=difNLR">https://CRAN.R-project.org/package&#x3d;difNLR</ext-link>
</comment>. </citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Holland</surname>
<given-names>P. W.</given-names>
</name>
<name>
<surname>Thayer</surname>
<given-names>D. T.</given-names>
</name>
</person-group> (<year>1985</year>). <source>An Alternate Definition of the ETS delta Scale of Item Difficulty (Research Report RR-85-43)</source>. <publisher-loc>Princeton, NJ</publisher-loc>: <publisher-name>Educational Testing Service</publisher-name>. </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holland</surname>
<given-names>P. W.</given-names>
</name>
<name>
<surname>Thayer</surname>
<given-names>D. T.</given-names>
</name>
</person-group> (<year>1988</year>). &#x201c;<article-title>Differential item performance and the Mantel-Haenszel procedure</article-title>,&#x201d; in <source>Test. Validity</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Wainer</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Braun</surname>
<given-names>H. I.</given-names>
</name>
</person-group> (<publisher-loc>Lawrence</publisher-loc>: <publisher-name>Lawrence Erlbaum Associates), Inc.</publisher-name>, <fpage>129</fpage>&#x2013;<lpage>145</lpage>. </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>L. t.</given-names>
</name>
<name>
<surname>Bentler</surname>
<given-names>P. M.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria versus New Alternatives</article-title>. <source>Struct. Equation Model. A Multidisciplinary J.</source> <volume>6</volume>, <fpage>1</fpage>&#x2013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1080/10705519909540118</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hunter</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2014</year>). <source>A Simulation Study Comparing Two Methods of Evaluating Differential Test Functioning (DTF): DFIT and the Mantel-Haenszel/Liu-Agresti Variance</source>. <publisher-loc>Atlanta, GA</publisher-loc>: <publisher-name>Unpublished Doctoral Dissertation, Georgia State University</publisher-name>. </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jalili</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Barati</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Moein Zadeh</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Using Multiple-Variable Matching to Identify EFL Ecological Sources of Differential Item Functioning</article-title>. <source>J.&#x20;Teach. Lang. Skills.</source> <volume>38</volume> (<issue>4</issue>), <fpage>1</fpage>&#x2013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.22099/jtls.2020.36702.2794</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jamalzadeh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lotfi</surname>
<given-names>A. R.</given-names>
</name>
<name>
<surname>Rostami</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Assessing the Validity of an IAU General English Achievement Test Through Hybridizing Differential Item Functioning and Differential Distractor Functioning</article-title>. <source>Lang. Test. Asia.</source> <volume>11</volume>, <fpage>8</fpage>. <pub-id pub-id-type="doi">10.1186/s40468-021-00124-7</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karami</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Detecting Gender Bias in a Language Proficiency Test</article-title>. <source>Int. J.&#x20;Lang. Stud.</source> <volume>5</volume> (<issue>2</issue>), <fpage>27</fpage>&#x2013;<lpage>38</lpage>. </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kato</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Moen</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Thurlow</surname>
<given-names>M. L.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Differentials of a State reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories</article-title>. <source>Educ. Meas. Issues Pract.</source> <volume>28</volume>, <fpage>28</fpage>&#x2013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3992.2009.00145.x</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Oshima</surname>
<given-names>T. C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Effect of Multiple Testing Adjustment in Differential Item Functioning Detectionffect of Multiple Testing Adjustment in Differential Item Functioning Detection</article-title>. <source>Educ. Psychol. Meas.</source> <volume>73</volume> (<issue>3</issue>), <fpage>458</fpage>&#x2013;<lpage>470</lpage>. <pub-id pub-id-type="doi">10.1177/0013164412467033</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Detecting DIF Across the Different Language Groups in a Speaking Test</article-title>. <source>Lang. Test.</source> <volume>18</volume> (<issue>1</issue>), <fpage>89</fpage>&#x2013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.1177/026553220101800104</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>S.-H.</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Alagoz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>DIF Detection and Effect Size Measures for Polytomously Scored Itemsffect Size Measures for Polytomously Scored Items</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>44</volume> (<issue>2</issue>), <fpage>93</fpage>&#x2013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.2007.00029.x</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Koon</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2010</year>). <source>A Comparison of Methods for Detecting Differential Distractor Functioning</source>. <publisher-loc>Tallahassee, FL</publisher-loc>: <publisher-name>Unpublished Doctoral Dissertation, Florida State University</publisher-name>. </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Loken</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Rulison</surname>
<given-names>K. L.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Estimation of a Four-Parameter Item Response Theory Model</article-title>. <source>Br. J.&#x20;Math. Stat. Psychol.</source> <volume>63</volume> (<issue>3</issue>), <fpage>509</fpage>&#x2013;<lpage>525</lpage>. <pub-id pub-id-type="doi">10.1348/000711009X474502</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maddox</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zumbo</surname>
<given-names>B. D.</given-names>
</name>
<name>
<surname>Tay-Lim</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Qu</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>An Anthropologist Among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi</article-title>. <source>Int. J.&#x20;Test.</source> <volume>15</volume> (<issue>4</issue>), <fpage>291</fpage>&#x2013;<lpage>309</lpage>. <pub-id pub-id-type="doi">10.1080/15305058.2015.1017103</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Madley-Dowd</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tilling</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Heron</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>The Proportion of Missing Data Should Not Be Used to Guide Decisions on Multiple Imputation</article-title>. <source>J.&#x20;Clin. Epidemiol.</source> <volume>110</volume>, <fpage>63</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1016/j.jclinepi.2019.02.016</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magis</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>B&#xe9;land</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tuerlinckx</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>De Boeck</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A General Framework and an R Package for the Detection of Dichotomous Differential Item Functioning</article-title>. <source>Behav. Res. Methods.</source> <volume>42</volume>, <fpage>847</fpage>&#x2013;<lpage>862</lpage>. <pub-id pub-id-type="doi">10.3758/BRM.42.3.847</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magis</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>De Boeck</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approachfication of Differential Item Functioning in Multiple Group Settings: A Multivariate Outlier Detection Approach</article-title>. <source>Multivariate Behav. Res.</source> <volume>46</volume> (<issue>5</issue>), <fpage>733</fpage>&#x2013;<lpage>755</lpage>. <pub-id pub-id-type="doi">10.1080/00273171.2011.606757</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magis</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>A Note on the Item Information Function of the Four-Parameter Logistic Model</article-title>. <source>Appl. Psychol. Meas.</source> <volume>37</volume> (<issue>4</issue>), <fpage>304</fpage>&#x2013;<lpage>315</lpage>. <pub-id pub-id-type="doi">10.1177/0146621613475471</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magis</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Tuerlinckx</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>De Boeck</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Detection of Differential Item Functioning Using the Lasso Approach</article-title>. <source>J.&#x20;Educ. Behav. Stat.</source> <volume>40</volume> (<issue>2</issue>), <fpage>111</fpage>&#x2013;<lpage>135</lpage>. <pub-id pub-id-type="doi">10.3102/1076998614559747</pub-id> </citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marsh</surname>
<given-names>H. W.</given-names>
</name>
<name>
<surname>Hau</surname>
<given-names>K-T.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>In Search of Golden Rules: Comment on Hypothesis Testing Approaches to Setting Cutoff Values for Fit Indices and Dangers in Overgeneralising Hu and Bentler&#x2019;s (1999) Findings</article-title>. <source>Struct. Equ. Modeling</source> <volume>11</volume>, <fpage>320</fpage>&#x2013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.1207/s15328007sem1103_2</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martinkov&#xe1;</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Drabinov&#xe1;</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Liaw</surname>
<given-names>Y. L.</given-names>
</name>
<name>
<surname>Sanders</surname>
<given-names>E. A.</given-names>
</name>
<name>
<surname>McFarland</surname>
<given-names>J.&#x20;L.</given-names>
</name>
<name>
<surname>Price</surname>
<given-names>R. M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments</article-title>. <source>CBE&#x2014;Life Sci. Education.</source> <volume>16</volume> (<issue>2</issue>), <fpage>rm2</fpage>. <pub-id pub-id-type="doi">10.1187/cbe.16-10-0307</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Middleton</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Laitusis</surname>
<given-names>C. C.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Examining Test Items for Differential Distractor Functioning Among Students With Learning Disabilities</article-title>. <source>ETS Res. Rep. Ser.</source> <volume>2007</volume> (<issue>2</issue>), <fpage>i</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1002/j.2333-8504.2007.tb02085.x</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Millsap</surname>
<given-names>R. E.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Comments on Methods for the Investigation of Measurement Bias in the Mini-Mental State Examination</article-title>. <source>Med. Care.</source> <volume>44</volume> (<issue>11</issue>), <fpage>S171</fpage>&#x2013;<lpage>S175</lpage>. <pub-id pub-id-type="doi">10.1097/01.mlr.0000245441.76388.ff</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Muthen</surname>
<given-names>L. K.</given-names>
</name>
<name>
<surname>Muthen</surname>
<given-names>B. O.</given-names>
</name>
</person-group> (<year>2012</year>). <source>Mpus User&#x2019;s Guide</source>. <edition>7th ed.</edition> <publisher-loc>Los Angeles, CA</publisher-loc>: <publisher-name>Muthen, and Muthen, Author</publisher-name>. </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pae</surname>
<given-names>T.-I.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>DIF for Examinees With Different Academic Backgrounds</article-title>. <source>Lang. Test.</source> <volume>21</volume> (<issue>1</issue>), <fpage>53</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1191/0265532204lt274oa</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Investigating Differential Options Functioning Based on Multinomial Logistic Regression with Widely Used Statistical Software Master Thesis</source>. <publisher-loc>Vancouver, Canada</publisher-loc>: <publisher-name>University of British Columbia</publisher-name>. </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Penfield</surname>
<given-names>R. D.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>An Odds Ratio Approach for Assessing Differential Distractor Functioning Effects Under the Nominal Response Model</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>45</volume>, <fpage>247</fpage>&#x2013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.2008.00063.x</pub-id> </citation>
</ref>
<ref id="B57">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Penfield</surname>
<given-names>R. D.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Modeling DIF Effects Using Distractor-Level Invariance Effects: Implications for Understanding the Causes of DIF</article-title>. <source>Applied Psychological Measurement</source>. <pub-id pub-id-type="doi">10.1177/0146621609359284</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosseel</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Lavaan: An R Package for Structural Equation Modeling</article-title>. <source>J.&#x20;Stat. Softw.</source> <volume>48</volume> (<issue>2</issue>), <fpage>1</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v048.i02</pub-id> </citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rulison</surname>
<given-names>K. L.</given-names>
</name>
<name>
<surname>Loken</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>I&#x2019;ve Fallen and I can&#x2019;t get up: Can High Ability Students Recover from Early Mistakes in Computer Adaptive Testing?</article-title>. <source>Appl. Psychol. Meas.</source> <volume>33</volume>, <fpage>83</fpage>&#x2013;<lpage>101</lpage>. <pub-id pub-id-type="doi">10.1177/0146621608324023</pub-id> </citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schmitt</surname>
<given-names>A. P.</given-names>
</name>
<name>
<surname>Dorans</surname>
<given-names>N. J.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Differential Item Functioning for Minority Examinees on the SAT</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>27</volume>, <fpage>67</fpage>&#x2013;<lpage>81</lpage>. <ext-link ext-link-type="uri" xlink:href="http://www.jstor.org/stable/1434768">http://www.jstor.org/stable/1434768</ext-link> </citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shohamy</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Democratic Assessment as an Alternative</article-title>. <source>Lang. Test.</source> <volume>18</volume> (<issue>4</issue>), <fpage>373</fpage>&#x2013;<lpage>391</lpage>. <pub-id pub-id-type="doi">10.1177/026553220101800404</pub-id> </citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sideridis</surname>
<given-names>G. D.</given-names>
</name>
<name>
<surname>Tsaousis</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Al-harbi</surname>
<given-names>K. A.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Multi-Population Invariance With Dichotomous Measures</article-title>. <source>J.&#x20;Psychoeducational Assess.</source> <volume>33</volume> (<issue>6</issue>), <fpage>568</fpage>&#x2013;<lpage>584</lpage>. <pub-id pub-id-type="doi">10.1177/0734282914567871</pub-id> </citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stobart</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Fairness in Multicultural Assessment Systems</article-title>. <source>Assess. Educ. Principles, Pol. Pract.</source> <volume>12</volume> (<issue>3</issue>), <fpage>275</fpage>&#x2013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1080/09695940500337249</pub-id> </citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Suh</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bolt</surname>
<given-names>D. M.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Nested Logit Models for Multiple-Choice Item Response Data</article-title>. <source>Psychometrika.</source> <volume>75</volume>, <fpage>454</fpage>&#x2013;<lpage>473</lpage>. <pub-id pub-id-type="doi">10.1007/S11336-010-9163-7</pub-id> </citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Suh</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bolt</surname>
<given-names>D. M.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>A Nested Logit Approach for Investigating Distractors as Causes of Differential Item Functioning</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>48</volume>, <fpage>188</fpage>&#x2013;<lpage>205</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.2011.00139.x</pub-id> </citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Suh</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Talley</surname>
<given-names>A. E.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items</article-title>. <source>Appl. Meas. Education.</source> <volume>28</volume>, <fpage>48</fpage>&#x2013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1080/08957347.2014.973560</pub-id> </citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Swaminathan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>H. J.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Detecting Differential Item Functioning Using Logistic Regression Proceduresfferential Item Functioning Using Logistic Regression Procedures</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>27</volume> (<issue>4</issue>), <fpage>361</fpage>&#x2013;<lpage>370</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.1990.tb00754.x</pub-id> </citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takala</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kaftandjieva</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Test Fairness: A DIF Analysis of an L2 Vocabulary Test</article-title>. <source>Lang. Test.</source> <volume>17</volume> (<issue>3</issue>), <fpage>323</fpage>&#x2013;<lpage>340</lpage>. <pub-id pub-id-type="doi">10.1177/026553220001700303</pub-id> </citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Terzi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Suh</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>An Odds Ratio Approach for Detecting DDF Under the Nested Logit Modeling Framework</article-title>. <source>J.&#x20;Educ. Meas.</source> <volume>52</volume> (<issue>4</issue>), <fpage>376</fpage>&#x2013;<lpage>398</lpage>. <pub-id pub-id-type="doi">10.1111/jedm.12091</pub-id> </citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thissen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Steinberg</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>1986</year>). <article-title>A Taxonomy of Item Response Models</article-title>. <source>Psychometrika.</source> <volume>51</volume>, <fpage>567</fpage>&#x2013;<lpage>577</lpage>. <pub-id pub-id-type="doi">10.1007/bf02295596</pub-id> </citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tsaousis</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Sideridis</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Al-Saawi</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Differential Distractor Functioning as a Method for Explaining DIF: The Case of a National Admissions Test in Saudi Arabia</article-title>. <source>Int. J.&#x20;Test.</source> <volume>18</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1080/15305058.2017.1345914</pub-id> </citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walker</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>G&#xf6;&#xe7;er</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items</article-title>. <source>Educ. Psychol. Meas.</source> <volume>80</volume> (<issue>4</issue>), <fpage>808</fpage>&#x2013;<lpage>820</lpage>. <pub-id pub-id-type="doi">10.1177/0013164419899731</pub-id> </citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>W. C.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Factorial Modeling of Differential Distractor Functioning in Multiple-Choice Items</article-title>. <source>J.&#x20;Appl. Meas.</source> <volume>1</volume>, <fpage>238</fpage>&#x2013;<lpage>256</lpage>. </citation>
</ref>
<ref id="B73">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Weir</surname>
<given-names>C. J.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Language Testing and Validation</source>. <publisher-name>Palgrave McMillan</publisher-name>. <pub-id pub-id-type="doi">10.1057/9780230514577</pub-id> </citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wiberg</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Gender Differences in the Swedish Driving-License Test</article-title>. <source>J.&#x20;Saf. Res.</source> <volume>37</volume>, <fpage>285</fpage>&#x2013;<lpage>291</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsr.2006.02.005</pub-id> </citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zumbo</surname>
<given-names>B. D.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Three Generations of DIF Analyses: Considering Where it Has Been, where it Is Now, and where it Is Going</article-title>. <source>Lang. Assess. Q.</source> <volume>4</volume> (<issue>2</issue>), <fpage>223</fpage>&#x2013;<lpage>233</lpage>. <pub-id pub-id-type="doi">10.1080/15434300701375832</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>