<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2017.01143</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Modified Logistic Regression Approaches to Eliminating the Impact of Response Styles on DIF Detection in Likert-Type Scales</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Chen</surname> <given-names>Hui-Fang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/399113/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Jin</surname> <given-names>Kuan-Yu</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/400257/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Wen-Chung</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/423887/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Applied Social Sciences, City University of Hong Kong</institution> <country>Kowloon Tong, Hong Kong</country></aff>
<aff id="aff2"><sup>2</sup><institution>Assessment Research Centre, The Education University of Hong Kong</institution> <country>Tai Po, Hong Kong</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Holmes Finch, Ball State University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Daniel Bolt, University of Wisconsin-Madison, United States; Chun Wang, University of Minnesota, United States</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Hui-Fang Chen <email>hfchen&#x00040;cityu.edu.hk</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>07</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>8</volume>
<elocation-id>1143</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>12</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>06</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Chen, Jin and Wang.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Chen, Jin and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Extreme response styles (ERS) is prevalent in Likert- or rating-type data but previous research has not well-addressed their impact on differential item functioning (DIF) assessments. This study aimed to fill in the knowledge gap and examined their influence on the performances of logistic regression (LR) approaches in DIF detections, including the ordinal logistic regression (OLR) and the logistic discriminant functional analysis (LDFA). Results indicated that both the standard OLR and LDFA yielded severely inflated false positive rates as the magnitude of the differences in ERS increased between two groups. This study proposed a class of modified LR approaches to eliminating the ERS effect on DIF assessment. These proposed modifications showed satisfactory control of false positive rates when no DIF items existed and yielded a better control of false positive rates and more accurate true positive rates under DIF conditions than the conventional LR approaches did. In conclusion, the proposed modifications are recommended in survey research when there are multiple group or cultural groups.</p>
</abstract>
<kwd-group>
<kwd>extreme response styles</kwd>
<kwd>logistic regression</kwd>
<kwd>likert scale</kwd>
<kwd>differential item functioning</kwd>
<kwd>mild response style</kwd>
</kwd-group>
<contract-num rid="cn001">7004507</contract-num>
<contract-sponsor id="cn001">City University of Hong Kong<named-content content-type="fundref-id">10.13039/100007567</named-content></contract-sponsor>
<counts>
<fig-count count="4"/>
<table-count count="3"/>
<equation-count count="5"/>
<ref-count count="55"/>
<page-count count="11"/>
<word-count count="8125"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>A considerable number of survey-based studies have reported that the process of mapping answers to response options on Likert-type items may vary between individuals; this is termed response styles (RS; Paulhus, <xref ref-type="bibr" rid="B39">1991</xref>; De Jong et al., <xref ref-type="bibr" rid="B18">2008</xref>; Kieruj and Moors, <xref ref-type="bibr" rid="B28">2013</xref>). RS &#x0201C;may be induced by context effects such as the item format or personality&#x0201D; (Paulhus, <xref ref-type="bibr" rid="B39">1991</xref>), and might further lead to biased conclusions of measurement invariance (Morren et al., <xref ref-type="bibr" rid="B36">2012</xref>). In other words, participants may exhibit a tendency to endorse specific response categories and systematically tick a certain rating option, regardless of item content (Weijters, <xref ref-type="bibr" rid="B50">2006</xref>). As a result, the scale scores might be inflated or reduced and fail to capture participants&#x00027; true attitudes or beliefs.</p>
<p>Several types of RS have been identified in the literature, and extreme response style (ERS) vs. mild response style (MRS) are the most widely investigated (Harzing et al., <xref ref-type="bibr" rid="B22">2009</xref>). ERS refers to a tendency to use two extremely end points, such as rating categories 0 and 4, when participants rate their agreement to a statement on a five-point scale from 0 (strongly disagree) to 4 (strongly agree). MRS respondents tend to avoid the two &#x0201C;opposite, extreme&#x0201D; categories and consistently choose the middle range of response categories across all items (e.g., 1, 2, and 3). ERS and MRS are mutually exclusive, because they cannot be simultaneously observed on one item. Participants who have a strong tendency to ERS would demonstrate a weak tendency to MRS and vice versa (Jin and Wang, <xref ref-type="bibr" rid="B25">2014</xref>). Therefore, the term of ERS will only be used to refer to the bipolar tendency thereafter.</p>
<p>Degrees of ERS may vary across gender, cultures, educational, or intellectual levels, and age. Although Moors (<xref ref-type="bibr" rid="B35">2012</xref>) found that women, compared to men, show a greater tendency to choose two extreme sides of rating options when rating passive/laissez-faire leadership, but Baumgartner and Steenkamp (<xref ref-type="bibr" rid="B3">2001</xref>) found no gender differences in marketing surveys. In other studies, Americans were more likely to be ERS respondents than their Chinese peers given the same level of self-esteem (Song et al., <xref ref-type="bibr" rid="B43">2011</xref>), resulting in lower scores found among Chinese and Japanese than Americans and Canadians (Cai et al., <xref ref-type="bibr" rid="B10">2007</xref>; Brown et al., <xref ref-type="bibr" rid="B7">2009</xref>). Educational level is negatively related to ERS (De Beuckelaer et al., <xref ref-type="bibr" rid="B17">2010</xref>): Respondents with a lower education level tended to mainly select middle points of the scale, most likely in order to simplify the task (Weijters, <xref ref-type="bibr" rid="B50">2006</xref>). Meisenberg and Williams (<xref ref-type="bibr" rid="B33">2008</xref>) further investigated certain education-related factors, such as high intelligence or self-confidence, and found a suppressing effect on ERS. Regarding the impact of age on ERS, while some studies reported that younger respondents tended to use ERS more often compared to their older counterparts (Rossi et al., <xref ref-type="bibr" rid="B42">2001</xref>; Austin et al., <xref ref-type="bibr" rid="B2">2006</xref>), others claimed that age was non-significant in relation to ERS (Johnson et al., <xref ref-type="bibr" rid="B27">2005</xref>). It is ubiquitous that observed scores on self-reported scales might be contaminated by ERS, and it is necessary to investigate the impact of ERS before conclusions are drawn from measurement scores.</p>
<p>ERS causes problems in interpreting item scores because it creates uncertainty over whether the given answers accurately reflect participants&#x00027; true opinions. Baumgartner and Steenkamp (<xref ref-type="bibr" rid="B3">2001</xref>) argued that ERS constitutes error variance, which might attenuate correlations among variables, and many statistical methods based on correlations are influenced, such as Cronbach&#x00027;s alpha, regression analysis, factor analysis, and structural equation modeling. As a result, the mean levels of responses and the correlations among constructs of interests are biased (Baumgartner and Steenkamp, <xref ref-type="bibr" rid="B3">2001</xref>). Cheung and Rensvold (<xref ref-type="bibr" rid="B14">2000</xref>) further pointed out that higher/lower ERS can lead to an increase/decrease in factor loadings. Composite scores of problematic items and good items, which are used to rank participants for personnel selection in organizations or for admission to school, or to compare group differences (e.g., gender and ethnicity differences) for evaluating the effectiveness of curriculum, programs, or educational reforms, lead to misinformed conclusions.</p>
<p>ERS also causes problems in psychometric properties of scales, such as metric and scalar invariance, due to inflated or reduced scale variances (Baumgartner and Steenkamp, <xref ref-type="bibr" rid="B3">2001</xref>). Empirical and simulation studies on ERS have reported that it led to an additional dimension other than the intended-to-be-measured latent trait (Bolt and Johnson, <xref ref-type="bibr" rid="B5">2009</xref>; Morren et al., <xref ref-type="bibr" rid="B36">2012</xref>; Wetzel et al., <xref ref-type="bibr" rid="B52">2013</xref>). All items therefore might be identified as non-invariant when individual groups show different degrees of ERS. In other words, ERS causes measurement non-invariance even when the intercepts, slopes, and variances of the intended-to-be-measured latent trait are invariant across groups (Morren et al., <xref ref-type="bibr" rid="B36">2012</xref>). Standard approaches to testing measurement invariance (e.g., multiple-group confirmatory factor analysis) therefore always conclude a violation of measurement invariance. These findings suggest that standard methods of matching participants based on their latent traits are not appropriate, because the impact of ERS has not been taken into account. While measurement invariance is required to make meaningful comparisons across groups, the influence of ERS should be considered and corrected in data analyses (Weijters et al., <xref ref-type="bibr" rid="B51">2008</xref>).</p>
<p>Although ERS has been recognized in survey research, it is not well-controlled in studies (Van Vaerenbergh and Thomas, <xref ref-type="bibr" rid="B47">2013</xref>) such as the Programme for International Student Assessment (PISA) (Buckley, <xref ref-type="bibr" rid="B8">2009</xref>; Bolt and Newton, <xref ref-type="bibr" rid="B6">2011</xref>). Therefore, the current study aimed at addressing the need to control ERS in survey research and focused on the issue of measurement invariance in Likert- or rating-type scales. We first addressed the reasons why ERS causes misleading conclusions regarding the psychometric properties of Likert-type scales, and then proposed two modified logistic regression methods to eliminate the impact of ERS on the test of measurement invariance. The following sections are organized as follows: introduction to differential item functioning (DIF) assessment, the limitation of standard logistic regression methods for DIF assessment when ERS occurs, the proposed modified logistic regression methods for DIF assessment, the results from a series of simulation studies to compare the performance of the standard and modified methods in DIF assessment, and conclusions, and suggestions for future studies.</p>
</sec>
<sec id="s2">
<title>Differential item functioning (DIF) assessment</title>
<p>One prerequisite of comparisons across groups or countries is measurement invariance. DIF assessment is one approach that has been routinely conducted in large-scale assessment programs to ensure that observed or scaled scores are comparable across groups or countries. To meet the assumption of measurement invariance, responses to items should be free from bias inferred by group membership (e.g., gender, ethnicity, or country), so that different performances on tests or questionnaires can be attributed to group difference in the intended-to-be-measured latent trait (e.g., ability, attitude, or interest). However, while researchers often examine the issue of DIF in cognitive assessments, they seldom address it in self-reported inventories, although both types of assessments have been used in large-scale assessment programs to monitor achievement trends and students&#x00027; learning and living environments on a global basis.</p>
<p>An item is identified as having DIF when participants belonging to different groups have varying probabilities of endorsing that item, given that they have the same levels on the intended-to-be-measured latent trait. In DIF assessment, participants are placed on the same metric using a matching variable, before the performance on the studied item of a focus group and a reference group is compared for DIF. If participants are matched by a biased metric, the subsequent DIF assessment will be misleading (Clauser et al., <xref ref-type="bibr" rid="B16">1993</xref>; Wang, <xref ref-type="bibr" rid="B48">2004</xref>). Usually, the total score of an instrument serves as a matching variable, but this does not always function adequately. Prior studies have shown that only if the total score is a sufficient statistic of or has a strong relationship with the latent trait, and the latent trait distributions are identical or very similar across groups, can the total score serve as a matching variable to yield satisfactory DIF assessment (Bolt and Gierl, <xref ref-type="bibr" rid="B4">2006</xref>; Magis and De Boeck, <xref ref-type="bibr" rid="B30">2014</xref>). Likewise, when participants exhibit ERS, the total score is contaminated and becomes a biased matching variable.</p>
</sec>
<sec id="s3">
<title>Logistic regression for DIF detection</title>
<p>The logistic regression (LR) procedure (Swaminathan and Rogers, <xref ref-type="bibr" rid="B45">1990</xref>; Rogers and Swaminathan, <xref ref-type="bibr" rid="B41">1993</xref>) is one of the most popular approaches in DIF assessment. The LR does not require specific forms of item response functions or large sample sizes (Narayanan and Swaminathan, <xref ref-type="bibr" rid="B38">1994</xref>). It is computationally simple and can be easily implemented in commercial software (e.g., SPSS, SAS, and STATA) or free software (e.g., R) without additional effort in terms of computer programming. Therefore, the LR has frequently been used in empirical studies.</p>
<p>The LR was originally developed for dichotomous data. Two extensions of the LR, namely ordinal logistic regression (OLR; French and Maller, <xref ref-type="bibr" rid="B20">1996</xref>; Zumbo, <xref ref-type="bibr" rid="B54">1999</xref>) and logistic discriminant function analysis (LDFA; Miller and Spray, <xref ref-type="bibr" rid="B34">1993</xref>), have been proposed for ordinal responses (e.g., Likert-type scales). When a set of items has <italic>M</italic> ordinal response categories, the OLR for DIF assessment can be expressed as:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">log&#x000A0;</mml:mtext><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0003E;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>G</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mi>G</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>which is the logarithm of the ratio of the probability of obtaining a score <italic>Y</italic> (<italic>Y</italic> &#x0003D; 0, 1,&#x02026;, <italic>M</italic>&#x02013;1) less than or equal to <italic>j</italic> to the probability of receiving a score higher than <italic>j. X</italic> is the total score; <italic>G</italic> is a grouping variable, testing the effect of group for uniform DIF, and <italic>XG</italic> is an interaction between the grouping variable and the total score. If &#x003B3;<sub>2</sub> is statistically significantly different from zero (e.g., at the 0.05 nominal level) but &#x003B3;<sub>3</sub> is not, then there is a uniform DIF between groups. If &#x003B3;<sub>3</sub> is statistically significantly different from zero, a non-uniform DIF is found between groups.</p>
<p>In the LDFA, the observed response <italic>Y</italic> of the studied item, the total score (<italic>X</italic>), and their interaction (<italic>XY</italic>) are used to predict the probability of being in a specific group (<italic>G</italic>). The full model of the logistic function is defined as:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">log&#x000A0;</mml:mtext><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>Y</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mi>Y</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Likewise, if &#x003B1;<sub>2</sub> is significantly different from zero but &#x003B1;<sub>3</sub> is not, then there is a uniform DIF between groups. A non-uniform DIF is found if &#x003B1;<sub>3</sub> is significantly different from zero.</p>
<p>Although the OLR and LDFA were developed to detect DIF for ordinal responses within the framework of LR, they are exactly independent approaches. It is noticeable that the grouping member serves as a predictor in the OLR, as in the conventional LR approach for dichotomous data, but becomes an outcome variable in the LDFA. Operationally, the LDFA may be preferred by practitioners because it keeps the feature of simplicity in the binary LR where the outcome is binary (0 and 1).</p>
<p>Compared to studies on dichotomous items, fewer studies have investigated the performance of the above approaches in detecting DIF items in polytomous data. Most have suggested that all approaches show similar power in detecting uniform DIF (Kristjansson et al., <xref ref-type="bibr" rid="B29">2005</xref>) and that their performance is influenced by sample sizes, DIF magnitude, and latent trait distributions (Zwick et al., <xref ref-type="bibr" rid="B55">1997</xref>; Ankenmann et al., <xref ref-type="bibr" rid="B1">1999</xref>; Wang and Su, <xref ref-type="bibr" rid="B49">2004</xref>; Kristjansson et al., <xref ref-type="bibr" rid="B29">2005</xref>; Su and Wang, <xref ref-type="bibr" rid="B44">2005</xref>). High item slope parameters and large group ability differences usually cause inflated Type I error rates (Zwick et al., <xref ref-type="bibr" rid="B55">1997</xref>; Ankenmann et al., <xref ref-type="bibr" rid="B1">1999</xref>). Most procedures yield increased power with increased item slope (Kristjansson et al., <xref ref-type="bibr" rid="B29">2005</xref>).</p>
<p>The reason why ignoring ERS would lead to erroneous judgment in DIF assessment could be explained by using the ERS model with generalized partial credit modeling (ERS-GPCM) (Jin and Wang, <xref ref-type="bibr" rid="B25">2014</xref>). The ERS-GPCM is expressed as:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">log&#x000A0;</mml:mtext><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B2;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x003B8;</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x003B4;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003C9;</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo>&#x003C4;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>P</italic><sub><italic>nij</italic></sub> and <italic>P</italic><sub><italic>ni</italic>(<italic>j</italic>&#x02212;1)</sub> are the probabilities of selecting options <italic>j</italic> and <italic>j&#x02013;</italic>1 for person <italic>n</italic>, respectively; &#x003B8;<sub><italic>n</italic></sub> is the latent trait of person <italic>n</italic> and is assumed to follow a normal distribution with a mean of zero and variance of <inline-formula><mml:math id="M4"><mml:msubsup><mml:mrow><mml:mo>&#x003C3;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x003B8;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>; &#x003C9;<sub><italic>n</italic></sub> denotes the ERS tendency of person <italic>n</italic> and is assumed to follow a log-normal distribution with a mean of zero and variance of <inline-formula><mml:math id="M5"><mml:msubsup><mml:mrow><mml:mo>&#x003C3;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x003C9;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>; &#x003B4;<sub><italic>i</italic></sub>; and &#x003B2;<sub><italic>i</italic></sub> are the mean location parameter and the slope parameter of item <italic>i</italic>, respectively; and &#x003C4;<sub><italic>ij</italic></sub> is the <italic>j</italic>th threshold of item <italic>i</italic>. In addition, &#x003C9; is assumed to be independent of &#x003B8;, and both are random variables. As pointed out by Jin and Wang (<xref ref-type="bibr" rid="B25">2014</xref>), &#x003C9;<sub><italic>n</italic></sub> can be treated as a personal factor for controlling the occurrence of ERS. A participant with a larger &#x003C9; is more likely to choose middle response categories, whereas a participant with a smaller &#x003C9; is more likely to choose extreme response categories. A larger <inline-formula><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mo>&#x003C3;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x003C9;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> implies that participants in a group showed more heterogeneous ERS. When <inline-formula><mml:math id="M7"><mml:msubsup><mml:mrow><mml:mo>&#x003C3;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x003C9;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> is 0, &#x003C9; will be 1 for all participants (i.e., the ERS levels are identical for all participants); therefore, Equation (3) reduces to the generalized partial credit model (Muraki, <xref ref-type="bibr" rid="B37">1992</xref>). In particular, the assumption that the ERS tendency and the latent trait are compensatory, which is questionable but required in other models for ERS (e.g., Bolt and Johnson, <xref ref-type="bibr" rid="B5">2009</xref>; Johnson and Bolt, <xref ref-type="bibr" rid="B26">2010</xref>; Bolt and Newton, <xref ref-type="bibr" rid="B6">2011</xref>), is not made in the ERS-GPCM.</p>
<p>Assume that participants are from two groups, with a stronger tendency to endorse extreme responses (&#x003C9; &#x0003D; 0.5) and a tendency to choose the middle range of categories (&#x003C9; &#x0003D; 2) respectively, termed the focal group and the reference group, respectively. Given the same level of latent trait, participants with different magnitudes of &#x003C9; have different expected total scores (Figure <xref ref-type="fig" rid="F1">1</xref>). Thus, the correspondence between the latent trait and the total score is poor, such that the total score is no longer a good indicator of the latent trait when ERS exists. For instance, participants with &#x003C9; of 0.5 and &#x003B8; of &#x02212;2 will be matched with participants with &#x003C9; of 2 and &#x003B8; of &#x02212;1.5 because their total scores are the same in standard DIF assessment. As a result, the total score is not a valid matching variable in standard DIF assessment.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Cumulative test characteristic curves for respondents with different ERS levels.</p></caption>
<graphic xlink:href="fpsyg-08-01143-g0001.tif"/>
</fig>
<p>Because ERS is common and threatens test validity, leading to biased comparisons among individuals, it should be taken into consideration properly. To the best of our knowledge, none of the existing studies have considered the effect of ERS with the logistic regression approaches in DIF assessments. One noticeable difference between the current study and previous work (e.g., Bolt and Johnson, <xref ref-type="bibr" rid="B5">2009</xref>; Morren et al., <xref ref-type="bibr" rid="B36">2012</xref>) is the specification of whether a measurement model is used. In the studies of Morren et al. (<xref ref-type="bibr" rid="B36">2012</xref>) and Bolt and Johnson (<xref ref-type="bibr" rid="B5">2009</xref>), a measurement model, which incorporated not only the mechanism of DIF but also ERS, was implemented to fit to data. Conversely, a strategy of jointly using the summation and variance of item scores as matching variables in the OLR or LDFA method is proposed in our study. In other words, no measurement model is specified.</p>
<p>This study focused on the influence of group difference in ERS on DIF detection, because participants from different cultures or groups may have different degrees of tendencies to choose extreme response categories (e.g., Chen et al., <xref ref-type="bibr" rid="B13">1995</xref>; Hamamura et al., <xref ref-type="bibr" rid="B21">2008</xref>; Song et al., <xref ref-type="bibr" rid="B43">2011</xref>; Kieruj and Moors, <xref ref-type="bibr" rid="B28">2013</xref>). It was expected that group difference in ERS would cause inflated false positive (FP) rates of DIF detection in standard LR, and that the modified LR would yield well-controlled FP rates. Three simulation studies were conducted in this paper. Simulation study 1 examined the influence of difference in ERS between the focal and reference groups on the performance of standard LR in detecting (uniform) DIF; simulation study 2 demonstrated the superiority of the modified LR; and simulation study 3 evaluated the performances of the standard and modified LR when tests had multiple DIF items. Finally, gender DIF was investigated on an anxiety scale and results from the standard and modified LR methods were compared.</p>
</sec>
<sec id="s4">
<title>Simulation study 1</title>
<sec>
<title>Design</title>
<p>In the first simulation study, we examined the performance of the standard OLR and the LDFA in detecting uniform DIF when two groups of participants exhibited different levels of ERS on average. Consequently, &#x003B3;<sub>3</sub> in Equation (1) and &#x003B1;<sub>3</sub> in Equation (2) were constrained at 0. Data were generated from Equation (3). A sample of 500 participants in each group, answering a four-point scale, was generated. There were either 10 or 20 items. The settings for item parameters were as follows: (a) mean location parameters were generated from uniform (&#x02212;2, 2); (b) slope parameters were generated from log-normal (0, 0.3<sup>2</sup>); (c) the three threshold parameters were set at &#x02212;0.6, 0, and 0.6 for all items. For the focal and reference groups, the latent trait &#x003B8; was randomly generated from the standard normal distribution. The ERS level (i.e., &#x003C9;) was generated from the log-normal distribution with identical variance of 0.6 but with different means. Using the log-normal scale, the mean of &#x003C9; was set at 0, 0.2, and 0.3 for the reference group and 0, &#x02212;0.2, and &#x02212;0.3 for the focal group, respectively. Thus, mean differences of 0, 0.4, and 0.6 were used to indicate no, moderate, and large differences in ERS between groups (hereafter to be referred to as no, moderate, and large difference in ERS, respectively). Each condition consisted of 1,000 replications. All items in a test were assessed for DIF using the OLR and LDFA. Here, the item location and slope parameters were set to be identical across the groups (i.e., all items were DIF-free), so the FP rate of an item was computed as the percentage of times the item was mistakenly identified as having DIF across the 1,000 replications.</p>
</sec>
<sec>
<title>Results</title>
<p>The left panels in Table <xref ref-type="table" rid="T1">1</xref> show the performance of the standard OLR and the LDFA when tests had 10 items. Both methods performed well when there was no group difference in ERS, so that the FP rate for all items was around the nominal level of 0.05. However, when differences were moderate or large, both methods yielded severely inflated FP rates. When the difference was moderate, the mean FP rate was 0.19 and 0.11 for the standard OLR and the LDFA, respectively; when it was large, mean FP rate was 0.31 and 0.18 for the two methods, respectively. Similar patterns were found when tests had 20 items, but the inflation on the FP rates became more serious (see the left panels in Table <xref ref-type="table" rid="T2">2</xref>), because the matching variable (total score) was more severely contaminated by ERS. In general, when the group difference in ERS was large, the chance of most items being misidentified as having DIF increased by more than 20% on average.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>False positive rates (&#x02030;) in a 10-item test.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Generated values</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>OLR</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>LDFA</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>OLR-m</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>LDFA-m</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Item</bold></th>
<th valign="top" align="center"><bold>&#x003B2;</bold></th>
<th valign="top" align="center"><bold>&#x003B4;</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">0.634</td>
<td valign="top" align="center">&#x02212;1.112</td>
<td valign="top" align="center">41</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">45</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">46</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">1.372</td>
<td valign="top" align="center">1.483</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">614</td>
<td valign="top" align="center">903</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">337</td>
<td valign="top" align="center">617</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">111</td>
<td valign="top" align="center">178</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">71</td>
<td valign="top" align="center">85</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">0.962</td>
<td valign="top" align="center">&#x02212;1.173</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">223</td>
<td valign="top" align="center">430</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">106</td>
<td valign="top" align="center">199</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">68</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">0.655</td>
<td valign="top" align="center">1.674</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">111</td>
<td valign="top" align="center">167</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">86</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">83</td>
<td valign="top" align="center">98</td>
<td valign="top" align="center">59</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">62</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">1.198</td>
<td valign="top" align="center">&#x02212;0.046</td>
<td valign="top" align="center">37</td>
<td valign="top" align="center">88</td>
<td valign="top" align="center">143</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">70</td>
<td valign="top" align="center">41</td>
<td valign="top" align="center">42</td>
<td valign="top" align="center">44</td>
<td valign="top" align="center">39</td>
<td valign="top" align="center">41</td>
<td valign="top" align="center">46</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="center">0.872</td>
<td valign="top" align="center">0.447</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">495</td>
<td valign="top" align="center">808</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">238</td>
<td valign="top" align="center">439</td>
<td valign="top" align="center">44</td>
<td valign="top" align="center">131</td>
<td valign="top" align="center">219</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">64</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="center">0.959</td>
<td valign="top" align="center">1.064</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">157</td>
<td valign="top" align="center">316</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">85</td>
<td valign="top" align="center">122</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">72</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">57</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="center">0.745</td>
<td valign="top" align="center">0.074</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">53</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="center">1.130</td>
<td valign="top" align="center">&#x02212;0.813</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">44</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">44</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">41</td>
<td valign="top" align="center">42</td>
<td valign="top" align="center">45</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="center">1.076</td>
<td valign="top" align="center">&#x02212;1.249</td>
<td valign="top" align="center">45</td>
<td valign="top" align="center">113</td>
<td valign="top" align="center">179</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">89</td>
<td valign="top" align="center">139</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">58</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left" colspan="3">Average</td>
<td valign="top" align="center">51.1</td>
<td valign="top" align="center">194.1</td>
<td valign="top" align="center">311.5</td>
<td valign="top" align="center">47.2</td>
<td valign="top" align="center">111.8</td>
<td valign="top" align="center">183.5</td>
<td valign="top" align="center">51.0</td>
<td valign="top" align="center">69.0</td>
<td valign="top" align="center">90.0</td>
<td valign="top" align="center">48.8</td>
<td valign="top" align="center">53.3</td>
<td valign="top" align="center">58.4</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>I, II, and III denote that the mean differences in the magnitude of ERS were set at 0, 0.4, and 0.6, respectively; significance level at 0.05</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>False positive rates (&#x02030;) in a 20-item test.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Generated values</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>OLR</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>LDFA</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>OLR-m</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>LDFA-m</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Item</bold></th>
<th valign="top" align="center"><bold>&#x003B2;</bold></th>
<th valign="top" align="center"><bold>&#x003B4;</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
<th valign="top" align="center"><bold>I</bold></th>
<th valign="top" align="center"><bold>II</bold></th>
<th valign="top" align="center"><bold>III</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">1.127</td>
<td valign="top" align="center">&#x02212;1.112</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">366</td>
<td valign="top" align="center">643</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">193</td>
<td valign="top" align="center">350</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">87</td>
<td valign="top" align="center">116</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">68</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">0.949</td>
<td valign="top" align="center">1.483</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">492</td>
<td valign="top" align="center">800</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">284</td>
<td valign="top" align="center">509</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">103</td>
<td valign="top" align="center">131</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">57</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">0.909</td>
<td valign="top" align="center">&#x02212;1.173</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">256</td>
<td valign="top" align="center">469</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">137</td>
<td valign="top" align="center">244</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">57</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">0.719</td>
<td valign="top" align="center">1.674</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">364</td>
<td valign="top" align="center">620</td>
<td valign="top" align="center">42</td>
<td valign="top" align="center">215</td>
<td valign="top" align="center">373</td>
<td valign="top" align="center">37</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">71</td>
<td valign="top" align="center">39</td>
<td valign="top" align="center">44</td>
<td valign="top" align="center">48</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">0.792</td>
<td valign="top" align="center">&#x02212;0.046</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">59</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">56</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="center">0.589</td>
<td valign="top" align="center">0.447</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">87</td>
<td valign="top" align="center">124</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">68</td>
<td valign="top" align="center">89</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">63</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="center">0.560</td>
<td valign="top" align="center">1.064</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">160</td>
<td valign="top" align="center">268</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">103</td>
<td valign="top" align="center">161</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">62</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="center">1.581</td>
<td valign="top" align="center">0.074</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">108</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">74</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">48</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="center">1.226</td>
<td valign="top" align="center">&#x02212;0.813</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">310</td>
<td valign="top" align="center">518</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">161</td>
<td valign="top" align="center">270</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">76</td>
<td valign="top" align="center">92</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">61</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="center">0.505</td>
<td valign="top" align="center">&#x02212;1.249</td>
<td valign="top" align="center">45</td>
<td valign="top" align="center">99</td>
<td valign="top" align="center">151</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">66</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">59</td>
</tr>
<tr>
<td valign="top" align="left">11</td>
<td valign="top" align="center">0.651</td>
<td valign="top" align="center">&#x02212;1.677</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">217</td>
<td valign="top" align="center">412</td>
<td valign="top" align="center">42</td>
<td valign="top" align="center">120</td>
<td valign="top" align="center">213</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">67</td>
</tr>
<tr>
<td valign="top" align="left">12</td>
<td valign="top" align="center">0.716</td>
<td valign="top" align="center">0.954</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">188</td>
<td valign="top" align="center">341</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">121</td>
<td valign="top" align="center">201</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">66</td>
<td valign="top" align="center">72</td>
</tr>
<tr>
<td valign="top" align="left">13</td>
<td valign="top" align="center">1.315</td>
<td valign="top" align="center">&#x02212;0.235</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">69</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">55</td>
</tr>
<tr>
<td valign="top" align="left">14</td>
<td valign="top" align="center">1.340</td>
<td valign="top" align="center">&#x02212;1.367</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">651</td>
<td valign="top" align="center">920</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">342</td>
<td valign="top" align="center">609</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">112</td>
<td valign="top" align="center">186</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">45</td>
<td valign="top" align="center">45</td>
</tr>
<tr>
<td valign="top" align="left">15</td>
<td valign="top" align="center">0.623</td>
<td valign="top" align="center">1.520</td>
<td valign="top" align="center">77</td>
<td valign="top" align="center">243</td>
<td valign="top" align="center">445</td>
<td valign="top" align="center">77</td>
<td valign="top" align="center">144</td>
<td valign="top" align="center">250</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">70</td>
<td valign="top" align="center">69</td>
<td valign="top" align="center">73</td>
</tr>
<tr>
<td valign="top" align="left">16</td>
<td valign="top" align="center">1.063</td>
<td valign="top" align="center">&#x02212;0.904</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">251</td>
<td valign="top" align="center">457</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">130</td>
<td valign="top" align="center">223</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">66</td>
<td valign="top" align="center">84</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">70</td>
</tr>
<tr>
<td valign="top" align="left">17</td>
<td valign="top" align="center">0.568</td>
<td valign="top" align="center">&#x02212;0.343</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">46</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">57</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">45</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">51</td>
</tr>
<tr>
<td valign="top" align="left">18</td>
<td valign="top" align="center">1.247</td>
<td valign="top" align="center">&#x02212;0.816</td>
<td valign="top" align="center">65</td>
<td valign="top" align="center">290</td>
<td valign="top" align="center">515</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">276</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">72</td>
<td valign="top" align="center">93</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">63</td>
</tr>
<tr>
<td valign="top" align="left">19</td>
<td valign="top" align="center">1.398</td>
<td valign="top" align="center">0.515</td>
<td valign="top" align="center">54</td>
<td valign="top" align="center">300</td>
<td valign="top" align="center">524</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">167</td>
<td valign="top" align="center">277</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">87</td>
<td valign="top" align="center">88</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">53</td>
</tr>
<tr>
<td valign="top" align="left">20</td>
<td valign="top" align="center">1.250</td>
<td valign="top" align="center">0.319</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">151</td>
<td valign="top" align="center">246</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">97</td>
<td valign="top" align="center">133</td>
<td valign="top" align="center">59</td>
<td valign="top" align="center">59</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">52</td>
<td valign="top" align="center">54</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left" colspan="3">Average</td>
<td valign="top" align="center">56.4</td>
<td valign="top" align="center">233.5</td>
<td valign="top" align="center">387.0</td>
<td valign="top" align="center">53.5</td>
<td valign="top" align="center">136.5</td>
<td valign="top" align="center">226.5</td>
<td valign="top" align="center">54.2</td>
<td valign="top" align="center">66.0</td>
<td valign="top" align="center">76.5</td>
<td valign="top" align="center">53.4</td>
<td valign="top" align="center">55.6</td>
<td valign="top" align="center">60.3</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>I, II, and III denote that the mean differences in the magnitude of ERS were set at 0, 0.4, and 0.6, respectively; significance level at 0.05</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>To better understand why a group difference in ERS might influence DIF assessment, a Monte Carlo procedure was conducted to compute the average score difference (ASD) between the two groups of each item. A total of 100,000 simulees were generated, given distributions of &#x003B8; and &#x003C9; for the focal (and reference) group, and the expected scores on items were computed. The ASD was obtained by calculating the difference between the mean scores for the focal and reference groups. Figure <xref ref-type="fig" rid="F2">2</xref> includes two scatter plots showing a positive relationship between the ASD (in absolute value) and FP rate for the standard OLR and LDFA. In the 20-item test, for example, the ASD on item 1 was 0.00, 0.04, and 0.05, respectively when there was no, moderate, and large group difference in ERS. When there was no difference between groups, the ASD on all items approximated to zero, suggesting that the usage of total score as the only matching variable was still effective. On the other hand, under the conditions with moderate and large difference in ERS, non-zero ASD was found, which therefore decreased the appropriateness of total score. In addition, the larger the ASD of an item, the higher its probability of being flagged as DIF. Apparently, both standard LR methods failed to consider the ASD caused by the group difference in ERS, leading the FP rates to become out of control.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Relationships between the average score difference (in absolute value) and the false positive rate in a 20-item test. <bold>(A)</bold> OLR and <bold>(B)</bold> LDFA. Conditions I, II, and III denote that the mean differences in the magnitude ofERS were set at 0, 0.4, and 0.6, respectively.</p></caption>
<graphic xlink:href="fpsyg-08-01143-g0002.tif"/>
</fig>
<p>It is concluded that the total score contaminated by ERS is not an appropriate matching variable for both LR methods. The standard OLR performs worse than the standard LDFA when there is an ERS effect. The group difference in ERS causes measurement non-invariance and invalidates the standard OLR and LDFA procedures in that DIF-free items are often misidentified as DIF items. Modifications of the standard OLR and LDFA are needed to partial out the group difference in ERS and improve the DIF assessment.</p>
</sec>
</sec>
<sec id="s5">
<title>Simulation study 2</title>
<sec>
<title>Design</title>
<p>A new class of LR procedures for eliminating the impact of a difference in ERS on DIF assessment was proposed, and their performance was evaluated in simulation study 2. As indicated by Jin and Wang (<xref ref-type="bibr" rid="B25">2014</xref>), the tendency of ERS is related to the extent of dispersion of item scores. Scores of a participant with ERS would be more extreme and their variance would be larger than those of a participant with MRS. Therefore, in order to rule out the interference of ERS/MRS in DIF assessment, the dispersion of item scores should be taken into consideration when the matching variable is constructed; here, the variance of item scores seemed to be a good representative of the ERS effect. The modified OLR (OLR-m) can be expressed as follows:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">log&#x000A0;</mml:mtext><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0003E;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>G</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mi>G</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mi>S</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B3;</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mi>S</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>S</italic> indicates the variance of individuals&#x00027; item scores, <italic>XS</italic> is the interaction between the total score and the variance, and others are the same as stated previously. The modified LDFA (denoted as LDFA-m) then becomes:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">log&#x000A0;</mml:mtext><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">P&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">G</mml:mtext><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">P&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">G</mml:mtext><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>Y</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mi>Y</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mi>S</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003B1;</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mi>S</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The inclusion of the predictors of <italic>S</italic> and <italic>XS</italic> is to partial out the ERS effect such that &#x003B3;<sub>2</sub> or &#x003B3;<sub>3</sub> in Equation (4), and &#x003B1;<sub>2</sub> and &#x003B1;<sub>3</sub> in Equation (5), can be used to properly detect uniform or non-uniform DIF. Because we focus on uniform DIF, &#x003B3;<sub>2</sub> and &#x003B3;<sub>3</sub> in Equation (4) and &#x003B1;<sub>2</sub> and &#x003B1;<sub>3</sub> in Equation (5) are constrained at zero. To evaluate whether OLR-m and LDFA-m outperformed the standard OLR and LDFA, the same conditions as in simulation study 1 were examined to see if the modified models could yield an acceptable FP rate.</p>
</sec>
<sec>
<title>Results</title>
<p>The right panels in Tables <xref ref-type="table" rid="T1">1</xref>, <xref ref-type="table" rid="T2">2</xref> show that, as the difference in ERS increased, the FP rates for the OLR-m and the LDFA-m rose slightly. However, the FP rate was generally well-controlled around the 0.05 level, except when the OLR-m was adopted in 10-item tests with a large ERS effect, where the FP rate was slightly inflated. The mean FP rate across items in the OLR-m ranged from 0.051 to 0.090 when tests had 10 items, and 0.054&#x02013;0.076 when tests had 20 items. The LDFA-m yielded a mean FP rate between 0.049 and 0.058 when tests had 10 items, and 0.053 and 0.060 when tests had 20 items. It seems that the LDFA-m outperformed the OLR-m in controlling the FP rate. Although the OLR-m yielded a slightly inflated FP rate when the difference in ERS was large, generally speaking, both modified procedures significantly outperformed their counterparts (standard OLR and LDFA).</p>
</sec>
</sec>
<sec id="s6">
<title>Simulation study 3</title>
<p>In simulation studies 1 and 2, it was assumed that all items in a test were DIF-free. In reality, tests often (if not always) contain DIF items. In simulation study 3, we evaluated the performances of the standard and modified OLR and LDFA when tests had uniform DIF items. We set 20% of items as DIF items and manipulated two DIF patterns, balanced and unbalanced. In the balanced DIF condition, half of the DIF items favored the focal group and the other half favored the reference group; in the unbalanced DIF condition, all DIF items favored the focal group. In practice, most DIF patterns in a test will fall between these two extreme patterns. The differences of mean location parameters for DIF items were set at a constant of 0.2 logit. It was expected that the OLR-m and LDFA-m would yield a good control of FP rate and a high true positive (TP) rate, which was computed as the percentage of times a DIF item was correctly identified as having DIF across the 1,000 replications.</p>
<sec>
<title>Results</title>
<p>We compared the FP and TP rates of the OLR-m and LDFA-m with those of the standard OLR and LDFA, as shown in Figures <xref ref-type="fig" rid="F3">3</xref>, <xref ref-type="fig" rid="F4">4</xref>. When the DIF pattern was unbalanced, both the OLR-m and LDFA-m showed a better control of FP rate and yielded a higher TP rate across all conditions than the standard OLR and LDFA. The standard OLR and LDFA performed more poorly in terms of inflated FP rates and deflated TP rates when there was a moderate or large difference in ERS. More than 10% of DIF-free items were identified as having DIF in the standard OLR and LDFA, while the TP rate decreased substantially when there was a large difference in ERS.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>False positive rates and true positive rates under the conditions of unbalanced DIF. <bold>(A,B)</bold> 10 items and <bold>(C,D)</bold> 20 items.</p></caption>
<graphic xlink:href="fpsyg-08-01143-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>False positive rates and true positive rates under the conditions of balanced DIF. <bold>(A,B)</bold> 10 items and <bold>(C,D)</bold> 20 items.</p></caption>
<graphic xlink:href="fpsyg-08-01143-g0004.tif"/>
</fig>
<p>Similar results were found in the balanced DIF condition. The OLR-m and LDFA-m showed a better control of the FP rates and yielded higher TP rates than their counterparts, regardless of the magnitude of the ERS effect. The standard approaches identified more than 10% of DIF-free items as having DIF and yielded a lower TP rate as the magnitude of difference in ERS increased. In summary, the modified approaches outperformed their standard counterparts when tests had DIF items.</p>
</sec>
</sec>
<sec id="s7">
<title>An empirical example</title>
<p>Data for this example came from the anxiety scale of the patient-reported outcomes measurement information system (PROMIS) (Pilkonis et al., <xref ref-type="bibr" rid="B40">2011</xref>). The PROMIS was developed by the National Institute of Health and provides item banks for the measurement of patient-reported health outcomes (Cella et al., <xref ref-type="bibr" rid="B12">2007</xref>; Buysse et al., <xref ref-type="bibr" rid="B9">2010</xref>). The anxiety scale focuses on fear, anxious misery, hyperarousal, and somatic symptoms related to arousal (Cella et al., <xref ref-type="bibr" rid="B11">2010</xref>). Participants included 369 males and 397 females, who responded to 29 five-point Likert-type items. The dataset can be accessed in the lordif R package (Choi et al., <xref ref-type="bibr" rid="B15">2011</xref>). In the anxiety data, males and females, on average, endorsed 17.91 and 16.29 extreme responses, respectively, implying a gender difference in ERS (<italic>p</italic> &#x0003D; 0.01). The four DIF detection methods, including the standard/modified OLR and LDFA, were applied to detect whether there were items functioning differentially between males and females.</p>
<p>Table <xref ref-type="table" rid="T3">3</xref> summarizes the results of different DIF detection methods on the anxiety scale. The four methods detected seven to eight out of 29 items as DIF items. In particular, items 6, 19, and 20 were consistently flagged as DIF items among these approaches. When comparing the standard and modified approaches, discrepancies were found. The LDFA seemed to yield larger discrepancies compared to the other approaches. For instance, only the LDFA identified items 15, 28, and 29 as DIF items. The inconsistency between the standard and modified approaches may be a result of whether ERS was considered in the implemented DIF detection method. According to the simulation findings, the result of DIF detection for the two modified LR methods would be more reliable.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Results of the four DIF detection methods in the anxiety scale.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Item</bold></th>
<th valign="top" align="left"><bold>Description</bold></th>
<th valign="top" align="center"><bold>OLR</bold></th>
<th valign="top" align="center"><bold>LDFA</bold></th>
<th valign="top" align="center"><bold>OLR-m</bold></th>
<th valign="top" align="center"><bold>LDFA-m</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">I felt fearful</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">I felt frightened</td>
<td valign="top" align="center">&#x0002B;</td>
<td/>
<td valign="top" align="center">&#x0002B;</td>
<td valign="top" align="center">&#x0002B;</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">It scared me when I felt nervous</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">I felt anxious</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">I felt like I needed help for my anxiety</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">I was concerned about my mental health</td>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">I felt upset</td>
<td valign="top" align="center">&#x0002B;</td>
<td valign="top" align="center">&#x0002B;</td>
<td/>
<td valign="top" align="center">&#x0002B;</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">I had a racing or pounding heart</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">I was anxious if my normal routine was disturbed</td>
<td valign="top" align="center">&#x02212;</td>
<td/>
<td valign="top" align="center">&#x02212;</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">I had sudden feelings of panic</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">11</td>
<td valign="top" align="left">I was easily startled</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">12</td>
<td valign="top" align="left">I had trouble paying attention</td>
<td/>
<td/>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
</tr>
<tr>
<td valign="top" align="left">13</td>
<td valign="top" align="left">I avoided public places or activities</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">14</td>
<td valign="top" align="left">I felt fidgety</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">15</td>
<td valign="top" align="left">I felt something awful would happen</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">16</td>
<td valign="top" align="left">I felt worried</td>
<td/>
<td valign="top" align="center">&#x0002B;</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">17</td>
<td valign="top" align="left">I felt terrified</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">18</td>
<td valign="top" align="left">I worried about other people&#x00027;s reactions to me</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">19</td>
<td valign="top" align="left">I found it hard to focus on anything other than my anxiety</td>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
<td valign="top" align="center">&#x02212;</td>
</tr>
<tr>
<td valign="top" align="left">20</td>
<td valign="top" align="left">My worries overwhelmed me</td>
<td valign="top" align="center">&#x0002B;</td>
<td valign="top" align="center">&#x0002B;</td>
<td valign="top" align="center">&#x0002B;</td>
<td valign="top" align="center">&#x0002B;</td>
</tr>
<tr>
<td valign="top" align="left">21</td>
<td valign="top" align="left">I had twitching or trembling muscles</td>
<td valign="top" align="center">&#x02212;</td>
<td/>
<td valign="top" align="center">&#x02212;</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">22</td>
<td valign="top" align="left">I felt nervous</td>
<td/>
<td valign="top" align="center">&#x0002B;</td>
<td/>
<td valign="top" align="center">&#x0002B;</td>
</tr>
<tr>
<td valign="top" align="left">23</td>
<td valign="top" align="left">I felt indecisive</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">24</td>
<td valign="top" align="left">Many situations made me worry</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">25</td>
<td valign="top" align="left">I had difficulty sleeping</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">26</td>
<td valign="top" align="left">I had trouble relaxing</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">27</td>
<td valign="top" align="left">I felt uneasy</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">28</td>
<td valign="top" align="left">I felt tense</td>
<td/>
<td valign="top" align="center">&#x0002B;</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">29</td>
<td valign="top" align="left">I had difficulty calming down</td>
<td/>
<td valign="top" align="center">&#x02212;</td>
<td/>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>&#x0002B; indicates DIF items favoring females; &#x02212; indicates DIF items favoring males</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec sec-type="discussion" id="s8">
<title>Discussion</title>
<p>Non-negligible individual differences in the mapping process of response categories are a critical concern in cross-group and cross-cultural studies, as they lead to measurement non-invariance. In the present research, we investigated the impact of differences in ERS on DIF assessment and proposed a new class of procedures to eliminate their effect and further improve the performance of the LR approach. As expected, even when item location and slope parameters were identical between the reference and focal groups (DIF-free), the existence of ERS rendered total scores inappropriate for service as a matching variable. The standard LR approaches therefore yield an inflated FP rate and a deflated TP rate. In addition, compared to the standard LDFA, the standard OLR seems to be more vulnerable to ERS, with evidence of a more inflated FP rate. A total score, being a matching variable in standard LR approaches, is contaminated by ERS so seriously that the performance of this method in DIF assessment is unreliable.</p>
<p>In contrast, the modified LDFA and OLR, with the inclusion of item score variance and its interaction with total score as predictors, can partial out the ERS effect so that the subsequent DIF assessment becomes appropriate. Findings from the simulation studies have suggested that the OLR-m and LDFA-m not only outperform their standard counterparts, but also maintain a good control of FP rate and yield a high TP rate in DIF detection even when ERS exists.</p>
<p>A concern was raised by a reviewer regarding why score variance is a better indicator than score standard deviation (<italic>SD</italic>) as a measure of ERS. We compared the proposed approaches with the ones including standard deviation (<italic>SD</italic>) of scores as a predictor. The results showed that the absolute value of the correlation between variances of scores and &#x003C9; was &#x02212;0.42, slightly higher than the one (&#x02212;0.40) of <italic>SD</italic> of scores and &#x003C9;. In other words, the &#x003C9; in the ERS-GPCM has a stronger relationship with score variance than with score <italic>SD</italic>. It might be the main reason why the inclusion of score variances in the modified LR approach performed better than the ones with score <italic>SD</italic>s. Therefore, we suggested the modified approaches with the inclusion of score variances.</p>
<p>ERS is reasonably stable across times and scales. Wetzel et al. (<xref ref-type="bibr" rid="B53">2016</xref>) assessed personality, vocational interests, and social interaction anxiety of students in German high schools and reported the stability of ERS over a period of 8 years. Weijters et al. (<xref ref-type="bibr" rid="B51">2008</xref>) proposed a longitudinal multiple-indictors, multiple-covariates (MIMIC) model to investigate the impact of demographic information on RS and detect the potential biases yielded by the same. Given the simplicity of our approaches and the reduced requirements in terms of sample sizes, it is suggested that future studies implement the modified methods to partial out the ERS effect in longitudinal or multiple-scale studies. Moreover, although the measurement model was the ERS-GPCM, it can be other measurement model for ERS, such as the multidimensional nominal response model (Thissen-Roe and Thissen, <xref ref-type="bibr" rid="B46">2013</xref>; Falk and Cai, <xref ref-type="bibr" rid="B19">2016</xref>). Future studies can be conducted to investigate how the modified LDFA and OLR method will perform when the data are simulated from other measurement models.</p>
<p>The strategy of considering ERS may be applied to other existing DIF detection methods, such as the generalized Mantel-Haenszel (GMH; Mantel and Haenszel, <xref ref-type="bibr" rid="B32">1959</xref>; Holland and Thayer, <xref ref-type="bibr" rid="B24">1988</xref>) and the Mantel method (Mantel, <xref ref-type="bibr" rid="B31">1963</xref>). The Mantel and the GMH usually perform as adequately as the LDFA and the OLR in uniform DIF detections (Kristjansson et al., <xref ref-type="bibr" rid="B29">2005</xref>), even when 40% of items are DIF items in a scale (Wang and Su, <xref ref-type="bibr" rid="B49">2004</xref>). Furthers studies are required to investigate how the GMH and the Mantel work in DIF assessment when ERS occurs.</p>
<p>The present study focused on the assessment of uniform DIF and did not evaluate the assessment of non-uniform DIF. Future studies can extend our approaches to non-uniform DIF assessment and compare their performance with the GMH and Mantel approaches. Previous studies have reported controversial findings regarding the performance of the LR and MH methods in polytomous data (e.g., Hidalgo and L&#x000F3;pez-Pina, <xref ref-type="bibr" rid="B23">2004</xref>; Kristjansson et al., <xref ref-type="bibr" rid="B29">2005</xref>). It is important to revisit these issues when ERS occurs.</p>
<p>The study can be extended to two conditions: the impact of different latent ability distributions of the reference and the focal groups and the studied item slope parameter. The literature has reported a large difference in the ability distribution, with high item discrimination being relevant to poor performance in DIF assessments (Kristjansson et al., <xref ref-type="bibr" rid="B29">2005</xref>; Su and Wang, <xref ref-type="bibr" rid="B44">2005</xref>). The performance of modified approaches might also be influenced by these two factors and it is worth investigating this further before making recommendations for the application of research.</p>
</sec>
<sec id="s9">
<title>Author contributions</title>
<p>HC contributed to the conception, design, and analysis of data as well as drafting and revising the manuscript; KJ contributed to the conception, design, analysis of data, and critically revising the manuscript; WW contributed to the conception, design, and revising the manuscript.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ankenmann</surname> <given-names>R. D.</given-names></name> <name><surname>Witt</surname> <given-names>E. A.</given-names></name> <name><surname>Dunbar</surname> <given-names>S. B.</given-names></name></person-group> (<year>1999</year>). <article-title>An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning</article-title>. <source>J. Educ. Meas.</source> <volume>36</volume>, <fpage>277</fpage>&#x02013;<lpage>300</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.1999.tb00558.x</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Austin</surname> <given-names>E. J.</given-names></name> <name><surname>Deary</surname> <given-names>I. J.</given-names></name> <name><surname>Egan</surname> <given-names>V.</given-names></name></person-group> (<year>2006</year>). <article-title>Individual differences in response scale use: mixed rasch modelling of responses to NEO-FFI items</article-title>. <source>Pers. Individ. Dif.</source> <volume>40</volume>, <fpage>1235</fpage>&#x02013;<lpage>1245</lpage>. <pub-id pub-id-type="doi">10.1016/j.paid.2005.10.018</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baumgartner</surname> <given-names>H.</given-names></name> <name><surname>Steenkamp</surname> <given-names>J. B. E. M.</given-names></name></person-group> (<year>2001</year>). <article-title>Response styles in marketing research: a cross-national investigation</article-title>. <source>J. Market. Res.</source> <volume>38</volume>, <fpage>143</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1509/jmkr.38.2.143.18840</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bolt</surname> <given-names>D. M.</given-names></name> <name><surname>Gierl</surname> <given-names>M. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Testing features of graphical DIF: application of a regression correction to three nonparametric statistical tests</article-title>. <source>J. Educ. Meas.</source> <volume>43</volume>, <fpage>313</fpage>&#x02013;<lpage>333</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.2006.00019.x</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bolt</surname> <given-names>D. M.</given-names></name> <name><surname>Johnson</surname> <given-names>T. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Addressing score bias and differential item functioning due to individual differences in response style</article-title>. <source>Appl. Psychol. Meas.</source> <volume>33</volume>, <fpage>335</fpage>&#x02013;<lpage>352</lpage>. <pub-id pub-id-type="doi">10.1177/0146621608329891</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bolt</surname> <given-names>D. M.</given-names></name> <name><surname>Newton</surname> <given-names>J. R.</given-names></name></person-group> (<year>2011</year>). <article-title>Multiscale measurement of extreme response style</article-title>. <source>Educ. Psychol. Meas.</source> <volume>71</volume>, <fpage>814</fpage>&#x02013;<lpage>833</lpage>. <pub-id pub-id-type="doi">10.1177/0013164410388411</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>J. D.</given-names></name> <name><surname>Cai</surname> <given-names>H.</given-names></name> <name><surname>Oakes</surname> <given-names>M. A.</given-names></name> <name><surname>Deng</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Cultural similarities in self-esteem functioning: East is East and West is West, but sometimes the twain do meet</article-title>. <source>J. Cross Cult. Psychol.</source> <volume>40</volume>, <fpage>140</fpage>&#x02013;<lpage>157</lpage>. <pub-id pub-id-type="doi">10.1177/0022022108326280</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Buckley</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>Cross-national response styles in international educational assessments: Evidence from PISA 2006</article-title>, in <source>Paper Presented at the NCES Conference on the Program for International Student Assessment: What We Can Learn from PISA</source> (<publisher-loc>Washington, DC</publisher-loc>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://edsurveys.rti.org/pisa/documents/buckley_pisaresponsestyle.pdf">https://edsurveys.rti.org/pisa/documents/buckley_pisaresponsestyle.pdf</ext-link></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buysse</surname> <given-names>D. J.</given-names></name> <name><surname>Yu</surname> <given-names>L.</given-names></name> <name><surname>Moul</surname> <given-names>D. E.</given-names></name> <name><surname>Germain</surname> <given-names>A.</given-names></name> <name><surname>Stover</surname> <given-names>A.</given-names></name> <name><surname>Dodds</surname> <given-names>N. E.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments</article-title>. <source>Sleep</source> <volume>33</volume>, <fpage>781</fpage>&#x02013;<lpage>792</lpage>. <pub-id pub-id-type="doi">10.1093/sleep/33.6.781</pub-id><pub-id pub-id-type="pmid">20550019</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>H.</given-names></name> <name><surname>Brown</surname> <given-names>J. D.</given-names></name> <name><surname>Deng</surname> <given-names>C.</given-names></name> <name><surname>Oakes</surname> <given-names>M. A.</given-names></name></person-group> (<year>2007</year>). <article-title>Self-esteem and culture: differences in cognitive self-evaluations or affective self-regard?</article-title> <source>Asian J. Soc. Psychol.</source> <volume>10</volume>, <fpage>162</fpage>&#x02013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-839X.2007.00222.x</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cella</surname> <given-names>D.</given-names></name> <name><surname>Riley</surname> <given-names>W.</given-names></name> <name><surname>Stone</surname> <given-names>A.</given-names></name> <name><surname>Rothrock</surname> <given-names>N.</given-names></name> <name><surname>Reeve</surname> <given-names>B.</given-names></name> <name><surname>Yount</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008</article-title>. <source>J. Clin. Epidemiol.</source> <volume>63</volume>, <fpage>1179</fpage>&#x02013;<lpage>1194</lpage>. <pub-id pub-id-type="doi">10.1016/j.jclinepi.2010.04.011</pub-id><pub-id pub-id-type="pmid">20685078</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cella</surname> <given-names>D.</given-names></name> <name><surname>Yount</surname> <given-names>S.</given-names></name> <name><surname>Rothrock</surname> <given-names>N.</given-names></name> <name><surname>Gershon</surname> <given-names>R.</given-names></name> <name><surname>Cook</surname> <given-names>K.</given-names></name> <name><surname>Reeve</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years</article-title>. <source>Med. Care</source> <volume>45</volume>, <fpage>S3</fpage>&#x02013;<lpage>S11</lpage>. <pub-id pub-id-type="doi">10.1097/01.mlr.0000258615.42478.55</pub-id><pub-id pub-id-type="pmid">17443116</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Lee</surname> <given-names>S. Y.</given-names></name> <name><surname>Stevenson</surname> <given-names>H. W.</given-names></name></person-group> (<year>1995</year>). <article-title>Response style and cross-cultural comparisons of rating scales among East Asian and North American students</article-title>. <source>Psychol. Sci.</source> <volume>6</volume>, <fpage>170</fpage>&#x02013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9280.1995.tb00327.x</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheung</surname> <given-names>G. W.</given-names></name> <name><surname>Rensvold</surname> <given-names>R. B.</given-names></name></person-group> (<year>2000</year>). <article-title>Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modelling</article-title>. <source>J. Cross Cult. Psychol.</source> <volume>31</volume>, <fpage>188</fpage>&#x02013;<lpage>213</lpage>. <pub-id pub-id-type="doi">10.1177/0022022100031002003</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>S. W.</given-names></name> <name><surname>Gibbons</surname> <given-names>L. E.</given-names></name> <name><surname>Crane</surname> <given-names>P. K.</given-names></name></person-group> (<year>2011</year>). <article-title>lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations</article-title>. <source>J. Stat. Softw.</source> <volume>39</volume>, <fpage>1</fpage>&#x02013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v039.i08</pub-id><pub-id pub-id-type="pmid">21572908</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clauser</surname> <given-names>B.</given-names></name> <name><surname>Mazor</surname> <given-names>K.</given-names></name> <name><surname>Hambleton</surname> <given-names>R. K.</given-names></name></person-group> (<year>1993</year>). <article-title>The effects of purification of matching criterion on the identification of DIF using the Mantel-Haenszel procedure</article-title>. <source>Appl. Meas. Educ.</source> <volume>6</volume>, <fpage>269</fpage>&#x02013;<lpage>279</lpage>. <pub-id pub-id-type="doi">10.1207/s15324818ame0604_2</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Beuckelaer</surname> <given-names>A.</given-names></name> <name><surname>Weijters</surname> <given-names>B.</given-names></name> <name><surname>Rutten</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>Using <italic>ad-hoc</italic> measures for response styles: a cautionary note</article-title>. <source>Qual. Quant.</source> <volume>44</volume>, <fpage>761</fpage>&#x02013;<lpage>775</lpage>. <pub-id pub-id-type="doi">10.1007/s11135-009-9225-z</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Jong</surname> <given-names>M. G.</given-names></name> <name><surname>Steenkamp</surname> <given-names>J.-B. E. M.</given-names></name> <name><surname>Fox</surname> <given-names>J.-P.</given-names></name> <name><surname>Baumgartner</surname> <given-names>H.</given-names></name></person-group> (<year>2008</year>). <article-title>Using item response theory to measure extreme response style in marketing research: a global investigation</article-title>. <source>J. Market. Res.</source> <volume>45</volume>, <fpage>104</fpage>&#x02013;<lpage>115</lpage>. <pub-id pub-id-type="doi">10.1509/jmkr.45.1.104</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Falk</surname> <given-names>C. F.</given-names></name> <name><surname>Cai</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>A flexible full-information approach to the modeling of response styles</article-title>. <source>Psychol. Methods</source> <volume>21</volume>, <fpage>328</fpage>. <pub-id pub-id-type="doi">10.1037/met0000059</pub-id><pub-id pub-id-type="pmid">26641273</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>French</surname> <given-names>A. W.</given-names></name> <name><surname>Maller</surname> <given-names>T. R.</given-names></name></person-group> (<year>1996</year>). <article-title>Logistic regression and its use in detecting differential item functioning in polytomous items</article-title>. <source>J. Educ. Meas.</source> <volume>33</volume>, <fpage>315</fpage>&#x02013;<lpage>332</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.1996.tb00495.x</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hamamura</surname> <given-names>T.</given-names></name> <name><surname>Heine</surname> <given-names>S. J.</given-names></name> <name><surname>Paulhus</surname> <given-names>D. L.</given-names></name></person-group> (<year>2008</year>). <article-title>Cultural differences in response styles: the role of dialectical thinking</article-title>. <source>Pers. Individ. Dif.</source> <volume>44</volume>, <fpage>932</fpage>&#x02013;<lpage>942</lpage>. <pub-id pub-id-type="doi">10.1016/j.paid.2007.10.034</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harzing</surname> <given-names>A. W.</given-names></name> <name><surname>Baldueza</surname> <given-names>J.</given-names></name> <name><surname>Barner-Rasmussen</surname> <given-names>W.</given-names></name> <name><surname>Barzantny</surname> <given-names>C.</given-names></name> <name><surname>Canabal</surname> <given-names>A.</given-names></name> <name><surname>Davila</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Rating versus ranking: what is the best way to reduce response and language bias in cross-national research?</article-title> <source>Int. Business Rev.</source> <volume>18</volume>, <fpage>417</fpage>&#x02013;<lpage>432</lpage>. <pub-id pub-id-type="doi">10.1016/j.ibusrev.2009.03.001</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hidalgo</surname> <given-names>M. D.</given-names></name> <name><surname>L&#x000F3;pez-Pina</surname> <given-names>J. A.</given-names></name></person-group> (<year>2004</year>). <article-title>Differential item functioning detection and effect size: a comparison between logistic regression and Mantel-Haenszel procedures</article-title>. <source>Educ. Psychol. Meas.</source> <volume>64</volume>, <fpage>903</fpage>&#x02013;<lpage>915</lpage>. <pub-id pub-id-type="doi">10.1177/0013164403261769</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Holland</surname> <given-names>W. P.</given-names></name> <name><surname>Thayer</surname> <given-names>D. T.</given-names></name></person-group> (<year>1988</year>). <article-title>Differential item performance and the Mantel-Haenszel procedure</article-title>, in <source>Test Validity</source>, eds <person-group person-group-type="editor"><name><surname>Wainer</surname> <given-names>H.</given-names></name> <name><surname>Braun</surname> <given-names>H. I.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum</publisher-name>), <fpage>129</fpage>&#x02013;<lpage>145</lpage>.</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>K.-Y.</given-names></name> <name><surname>Wang</surname> <given-names>W.-C.</given-names></name></person-group> (<year>2014</year>). <article-title>Generalized IRT models for extreme response style</article-title>. <source>Educ. Psychol. Meas.</source> <volume>74</volume>, <fpage>116</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1177/0013164413498876</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>T. R.</given-names></name> <name><surname>Bolt</surname> <given-names>D. M.</given-names></name></person-group> (<year>2010</year>). <article-title>On the use of factor-analytic multinomial logit item response models to account for individual differences in response style</article-title>. <source>J. Educ. Behav. Stat.</source> <volume>35</volume>, <fpage>92</fpage>&#x02013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.3102/1076998609340529</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>T.</given-names></name> <name><surname>Kulesa</surname> <given-names>P.</given-names></name> <name><surname>Cho</surname> <given-names>Y. I.</given-names></name> <name><surname>Shavitt</surname> <given-names>S.</given-names></name></person-group> (<year>2005</year>). <article-title>The relation between culture and response styles: evidence from 19 countries</article-title>. <source>J. Cross Cult. Psychol.</source> <volume>36</volume>, <fpage>264</fpage>&#x02013;<lpage>277</lpage>. <pub-id pub-id-type="doi">10.1177/0022022104272905</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kieruj</surname> <given-names>N. D.</given-names></name> <name><surname>Moors</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Response style behavior: question format dependent or personal style?</article-title> <source>Qual. Quant.</source> <volume>47</volume>, <fpage>193</fpage>&#x02013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1007/s11135-011-9511-4</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kristjansson</surname> <given-names>E.</given-names></name> <name><surname>Aylesworth</surname> <given-names>R.</given-names></name> <name><surname>McDowell</surname> <given-names>I.</given-names></name> <name><surname>Zumbo</surname> <given-names>B. D.</given-names></name></person-group> (<year>2005</year>). <article-title>A comparison of four methods for detecting differential item functioning in ordered response items</article-title>. <source>Educ. Psychol. Meas.</source> <volume>65</volume>, <fpage>935</fpage>&#x02013;<lpage>953</lpage>. <pub-id pub-id-type="doi">10.1177/0013164405275668</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magis</surname> <given-names>D.</given-names></name> <name><surname>De Boeck</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>Type I error inflation in DIF identification with Mantel-Haenszel: an explanation and a solution</article-title>. <source>Educ. Psychol. Meas.</source> <volume>74</volume>, <fpage>713</fpage>&#x02013;<lpage>728</lpage>. <pub-id pub-id-type="doi">10.1177/0013164413516855</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mantel</surname> <given-names>N.</given-names></name></person-group> (<year>1963</year>). <article-title>Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>58</volume>, <fpage>690</fpage>&#x02013;<lpage>700</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.1963.10500879</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mantel</surname> <given-names>N.</given-names></name> <name><surname>Haenszel</surname> <given-names>W.</given-names></name></person-group> (<year>1959</year>). <article-title>Statistical aspects of the analysis of data from retrospective studies of disease</article-title>. <source>J. Natl. Cancer Inst.</source> <volume>22</volume>, <fpage>719</fpage>&#x02013;<lpage>748</lpage>. <pub-id pub-id-type="doi">10.1093/jnci/22.4.719</pub-id><pub-id pub-id-type="pmid">13655060</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meisenberg</surname> <given-names>G.</given-names></name> <name><surname>Williams</surname> <given-names>A.</given-names></name></person-group> (<year>2008</year>). <article-title>Are acquiescent and extreme response styles related to low intelligence and education?</article-title> <source>Pers. Individ. Dif.</source> <volume>44</volume>, <fpage>1539</fpage>&#x02013;<lpage>1550</lpage>. <pub-id pub-id-type="doi">10.1016/j.paid.2008.01.010</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>T. R.</given-names></name> <name><surname>Spray</surname> <given-names>J. A.</given-names></name></person-group> (<year>1993</year>). <article-title>Logistic discriminant function analysis for DIF identification of polytomously scored items</article-title>. <source>J. Educ. Meas.</source> <volume>30</volume>, <fpage>107</fpage>&#x02013;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.1993.tb01069.x</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moors</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>The effect of response style bias on the measurement of transformational, transactional, and laissez-faire leadership</article-title>. <source>Eur. J. Work Organ. Psychol.</source> <volume>21</volume>, <fpage>271</fpage>&#x02013;<lpage>298</lpage>. <pub-id pub-id-type="doi">10.1080/1359432X.2010.550680</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morren</surname> <given-names>M.</given-names></name> <name><surname>Gelissen</surname> <given-names>J.</given-names></name> <name><surname>Vermunt</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>The impact of controlling for extreme responding on measurement equivalence in cross-cultural research</article-title>. <source>Methodology</source> <volume>8</volume>, <fpage>159</fpage>&#x02013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1027/1614-2241/a000048</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muraki</surname> <given-names>E.</given-names></name></person-group> (<year>1992</year>). <article-title>A generalized partial credit model: application of an EM algorithm</article-title>. <source>Appl. Psychol. Meas.</source> <volume>16</volume>, <fpage>159</fpage>&#x02013;<lpage>176</lpage>. <pub-id pub-id-type="doi">10.1177/014662169201600206</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Narayanan</surname> <given-names>P.</given-names></name> <name><surname>Swaminathan</surname> <given-names>H.</given-names></name></person-group> (<year>1994</year>). <article-title>Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning</article-title>. <source>Appl. Psychol. Meas.</source> <volume>18</volume>, <fpage>315</fpage>&#x02013;<lpage>328</lpage>. <pub-id pub-id-type="doi">10.1177/014662169401800403</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Paulhus</surname> <given-names>D. L.</given-names></name></person-group> (<year>1991</year>). <article-title>Measurement and control of response bias</article-title>, in <source>Measurement of Personality and Social Psychological Attitudes</source>, eds <person-group person-group-type="editor"><name><surname>Robinson</surname> <given-names>J. P.</given-names></name> <name><surname>Shaver</surname> <given-names>P. R.</given-names></name> <name><surname>Wrightman</surname> <given-names>L. S.</given-names></name></person-group> (<publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>17</fpage>&#x02013;<lpage>59</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pilkonis</surname> <given-names>P. A.</given-names></name> <name><surname>Choi</surname> <given-names>S. W.</given-names></name> <name><surname>Reise</surname> <given-names>S. P.</given-names></name> <name><surname>Stover</surname> <given-names>A. M.</given-names></name> <name><surname>Riley</surname> <given-names>W. T.</given-names></name> <name><surname>Cella</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <article-title>Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS&#x000AE;): depression, anxiety, and anger</article-title>. <source>Assessment</source> <volume>18</volume>, <fpage>263</fpage>&#x02013;<lpage>283</lpage>. <pub-id pub-id-type="doi">10.1177/1073191111411667</pub-id><pub-id pub-id-type="pmid">21697139</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rogers</surname> <given-names>H. J.</given-names></name> <name><surname>Swaminathan</surname> <given-names>H.</given-names></name></person-group> (<year>1993</year>). <article-title>A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning</article-title>. <source>Appl. Psychol. Meas.</source> <volume>17</volume>, <fpage>105</fpage>&#x02013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1177/014662169301700201</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rossi</surname> <given-names>P. E.</given-names></name> <name><surname>Gilula</surname> <given-names>Z.</given-names></name> <name><surname>Allenby</surname> <given-names>G. M.</given-names></name></person-group> (<year>2001</year>). <article-title>Overcoming scale usage heterogeneity: a Bayesian hierarchical approach</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>96</volume>, <fpage>20</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.2307/2670337</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>H.</given-names></name> <name><surname>Cai</surname> <given-names>H.</given-names></name> <name><surname>Brown</surname> <given-names>J. D.</given-names></name> <name><surname>Grimm</surname> <given-names>K. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Differential item functioning of the Rosenberg Self-Esteem Scale in the US and China: measurement bias matters</article-title>. <source>Asian J. Soc. Psychol.</source> <volume>14</volume>, <fpage>176</fpage>&#x02013;<lpage>188</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-839X-2011.01347.x</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Su</surname> <given-names>Y.-H.</given-names></name> <name><surname>Wang</surname> <given-names>W.-C.</given-names></name></person-group> (<year>2005</year>). <article-title>Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items</article-title>. <source>Appl. Meas. Educ.</source> <volume>18</volume>, <fpage>313</fpage>&#x02013;<lpage>350</lpage>. <pub-id pub-id-type="doi">10.1207/s15324818ame1804_1</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swaminathan</surname> <given-names>H.</given-names></name> <name><surname>Rogers</surname> <given-names>H. J.</given-names></name></person-group> (<year>1990</year>). <article-title>Detecting differential item functioning using logistic regression procedures</article-title>. <source>J. Educ. Meas.</source> <volume>27</volume>, <fpage>361</fpage>&#x02013;<lpage>370</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3984.1990.tb00754.x</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thissen-Roe</surname> <given-names>A.</given-names></name> <name><surname>Thissen</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>A two-decision model for responses to Likert-type items</article-title>. <source>J. Educ. Behav. Stat.</source> <volume>38</volume>, <fpage>522</fpage>&#x02013;<lpage>547</lpage>. <pub-id pub-id-type="doi">10.3102/1076998613481500</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Vaerenbergh</surname> <given-names>Y.</given-names></name> <name><surname>Thomas</surname> <given-names>T. D.</given-names></name></person-group> (<year>2013</year>). <article-title>Response styles in survey research: a literature review of antecedents, consequences, and remedies</article-title>. <source>Int. J. Public Opin. Res.</source> <volume>25</volume>, <fpage>195</fpage>&#x02013;<lpage>217</lpage>. <pub-id pub-id-type="doi">10.1093/ijpor/eds021</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>W.-C.</given-names></name></person-group> (<year>2004</year>). <article-title>Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models</article-title>. <source>J. Exp. Educ.</source> <volume>72</volume>, <fpage>221</fpage>&#x02013;<lpage>261</lpage>. <pub-id pub-id-type="doi">10.3200/JEXE.72.3.221-261</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>W.-C.</given-names></name> <name><surname>Su</surname> <given-names>Y.-H.</given-names></name></person-group> (<year>2004</year>). <article-title>Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items</article-title>. <source>Appl. Psychol. Meas.</source> <volume>28</volume>, <fpage>450</fpage>&#x02013;<lpage>480</lpage>. <pub-id pub-id-type="doi">10.1177/0146621604269792</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Weijters</surname> <given-names>B.</given-names></name></person-group> (<year>2006</year>). <source>Response Styles in Consumer Research.</source> Unpublished doctoral dissertation. <publisher-name>Ghent University</publisher-name>, <publisher-loc>Ghent</publisher-loc>.</citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weijters</surname> <given-names>B.</given-names></name> <name><surname>Geuens</surname> <given-names>M.</given-names></name> <name><surname>Schillewaert</surname> <given-names>N.</given-names></name></person-group> (<year>2008</year>). <article-title>Assessing response styles across modes of data collection</article-title>. <source>J. Acad. Market. Sci.</source> <volume>36</volume>, <fpage>409</fpage>&#x02013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.1007/s11747-007-0077-6</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wetzel</surname> <given-names>E.</given-names></name> <name><surname>B&#x000F6;hnke</surname> <given-names>J. R.</given-names></name> <name><surname>Carstensen</surname> <given-names>C. H.</given-names></name> <name><surname>Ziegler</surname> <given-names>M.</given-names></name> <name><surname>Ostendorf</surname> <given-names>F.</given-names></name></person-group> (<year>2013</year>). <article-title>Do individual response styles matter? Assessing differential item functioning for men and women in the NEO-PI-R</article-title>. <source>J. Individ. Dif.</source> <volume>34</volume>, <fpage>69</fpage>&#x02013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1027/1614-0001/a000102</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wetzel</surname> <given-names>E.</given-names></name> <name><surname>L&#x000FC;dtke</surname> <given-names>O.</given-names></name> <name><surname>Zettler</surname> <given-names>I.</given-names></name> <name><surname>B&#x000F6;hnke</surname> <given-names>J. R.</given-names></name></person-group> (<year>2016</year>). <article-title>The stability of extreme response style and acquiescence over 8 years</article-title>. <source>Assessment</source> <volume>23</volume>, <fpage>279</fpage>&#x02013;<lpage>291</lpage>. <pub-id pub-id-type="doi">10.1177/1073191115583714</pub-id><pub-id pub-id-type="pmid">25986062</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zumbo</surname> <given-names>B. D.</given-names></name></person-group> (<year>1999</year>). <source>A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (ordinal) Item Scores</source>. <publisher-loc>Ottawa, ON</publisher-loc>: <publisher-name>Directorate of Human Resources Research and Evaluation; Department of National Defense</publisher-name>.</citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwick</surname> <given-names>R.</given-names></name> <name><surname>Thayer</surname> <given-names>D. T.</given-names></name> <name><surname>Mazzeo</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>Descriptive and inferential procedures for assessing differential item functioning in polytomous items</article-title>. <source>Appl. Meas. Educ.</source> <volume>10</volume>, <fpage>321</fpage>&#x02013;<lpage>344</lpage>. <pub-id pub-id-type="doi">10.1207/s15324818ame1004_2</pub-id></citation>
</ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This study was supported by Strategic Research Grant (No. 7004507) from the City University of Hong Kong, and Early Career Scheme (CityU 21615416) from The Research Grants Council of theHong Kong Special Administrative Region, China.</p>
</fn>
</fn-group>
</back>
</article>