<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3-mathml3.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" dtd-version="1.3" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Organ. Psychol.</journal-id>
<journal-title-group>
<journal-title>Frontiers in Organizational Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Organ. Psychol.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2813-771X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/forgp.2026.1787155</article-id>
<article-version article-version-type="Version of Record" vocab="NISO-RP-8-2008"/>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Mini Review</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Mini-review: considering impacts of artificial intelligence on the development of measurement scales</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Stanton</surname> <given-names>Jeffrey M.</given-names></name>
<xref ref-type="aff" rid="aff1"/>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Conceptualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; review &amp; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-review-editing/">Writing &#x2013; review &#x00026; editing</role>
<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &#x2013; original draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &#x2013; original draft</role>
<uri xlink:href="https://loop.frontiersin.org/people/3344576"/>
</contrib>
</contrib-group>
<aff id="aff1"><institution>School of Information Studies, Syracuse University</institution>, <city>Syracuse, NY</city>, <country country="us">United States</country></aff>
<author-notes>
<corresp id="c001"><label>&#x0002A;</label>Correspondence: Jeffrey Stanton, <email xlink:href="mailto:jmstanto@syr.edu">jmstanto@syr.edu</email></corresp>
</author-notes>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2026-02-26">
<day>26</day>
<month>02</month>
<year>2026</year>
</pub-date>
<pub-date publication-format="electronic" date-type="collection">
<year>2026</year>
</pub-date>
<volume>4</volume>
<elocation-id>1787155</elocation-id>
<history>
<date date-type="received">
<day>13</day>
<month>01</month>
<year>2026</year>
</date>
<date date-type="rev-recd">
<day>09</day>
<month>02</month>
<year>2026</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>02</month>
<year>2026</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2026 Stanton.</copyright-statement>
<copyright-year>2026</copyright-year>
<copyright-holder>Stanton</copyright-holder>
<license>
<ali:license_ref start_date="2026-02-26">https://creativecommons.org/licenses/by/4.0/</ali:license_ref>
<license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License (CC BY)</ext-link>. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<abstract>
<p>Numerous experiments are underway on uses of artificial intelligence in the design and validation of psychological scales, tests and measurements. New developments such as large language models offer novel tools for psychometric tasks such as item generation, validity assessment, and adaptive testing. Researchers have applied these technologies to routine psychometric tasks such as drafting candidate items as well as more nuanced activities such as making content validation judgments. While artificial intelligence holds promise for streamlining instrument development, several pitfalls exist with the potential of adversely impacting important processes such as employee selection and educational testing. This brief review cites recent literature to describe AI-supported content generation and analytics and includes consideration of data quality and ethical issues. The concluding discussion offers three suggestions for improving the responsible use of AI for psychometrics.</p></abstract>
<kwd-group>
<kwd>artificial intelligence</kwd>
<kwd>human in the loop</kwd>
<kwd>LLM</kwd>
<kwd>machine learning</kwd>
<kwd>psychometrics</kwd>
</kwd-group>
<funding-group>
 <funding-statement>The author(s) declared that financial support was not received for this work and/or its publication.</funding-statement>
</funding-group>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="88"/>
<page-count count="7"/>
<word-count count="6289"/>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Performance and Development</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Valid measurement instruments are a cornerstone of research in the social sciences. Traditionally, psychometric development has relied on teamwork, iterative testing, expert judgment, and statistical analysis grounded in frameworks such as classical test theory. These methods, while backed by decades of research, take time and are limited by the resource capacity of a research project. Despite progress in reusing validated scales, researchers still encounter challenges in building and adapting measures to suit project needs (<xref ref-type="bibr" rid="B34">Heggestad et al., 2019</xref>). In recent years, artificial intelligence (AI) technologies&#x02014;particularly large language models (LLMs) and machine learning (ML) algorithms&#x02014;have begun to augment item, scale, and instrument development (<xref ref-type="bibr" rid="B8">Beghetto et al., 2025</xref>; <xref ref-type="bibr" rid="B87">Zhao et al., 2025</xref>; <xref ref-type="bibr" rid="B72">Tan et al., 2024</xref>).</p>
<p>In brief, typical AI technologies comprise a huge array of interconnected mathematical structures known as artificial neurons whose coefficients can be successively adjusted by subjecting them to large amounts of training data. Research applying these structures to the production of natural language demonstrated success in 2017 with the release of the first &#x0201C;generative pre-trained transformer&#x0201D; (GPT) models (Wang Z. et al., <xref ref-type="bibr" rid="B81">2025</xref>). LLMs, trained on vast corpora of text, generate convincing linguistic outputs when prompted with the definition of a psychological construct provided by a researcher (<xref ref-type="bibr" rid="B44">Kim and Lee, 2024</xref>; <xref ref-type="bibr" rid="B48">Lee et al., 2025</xref>; <xref ref-type="bibr" rid="B72">Tan et al., 2024</xref>; <xref ref-type="bibr" rid="B73">Terry et al., 2025</xref>). ML techniques can predict factor structures, estimate reliability prior to collection of response data, and identify patterns in item responses that may elude traditional statistics (<xref ref-type="bibr" rid="B44">Kim and Lee, 2024</xref>; <xref ref-type="bibr" rid="B48">Lee et al., 2025</xref>; <xref ref-type="bibr" rid="B68">Stanton et al., 2024</xref>). Such innovations suggest a future where LLMs and ML might make psychometric tasks easier and quicker.</p>
<p>Unsurprisingly there is no free lunch. AI also creates pitfalls and unintended consequences. Generative AI provides little purchase for a human who wants to understand how the technology works. As a result, assumptions embedded in AI models often cannot be directly evaluated. For example, when AI selects or discards items from a scale, it is difficult to assess whether underlying assumptions are fulfilled (<xref ref-type="bibr" rid="B36">Hernandez and Nie, 2023</xref>; <xref ref-type="bibr" rid="B60">Qu et al., 2025</xref>). These interpretability barriers raise questions about validity, replicability, and defensibility of instruments developed with AI (e.g., <xref ref-type="bibr" rid="B14">Circi et al., 2023</xref>). Ethical considerations also arise in the use of AI-based methods. The immense amounts of data harvested to create AI training data may contain copyrighted material or inadvertently include inappropriate data such as personal information (<xref ref-type="bibr" rid="B41">Jiao et al., 2025</xref>). Training data may also reflect the inaccuracies, biases, and prejudices present in everyday speech (<xref ref-type="bibr" rid="B30">Hagerty and Rubinov, 2019</xref>; <xref ref-type="bibr" rid="B64">Sang and Stanton, 2020</xref>; <xref ref-type="bibr" rid="B74">Veale and Binns, 2017</xref>).</p>
<p>This review briefly synthesizes recent literature to examine how AI serves various stages of psychometrics, including item generation, validation, and analysis. The review highlights benefits, such as increased efficiency, as well as pitfalls, including lack of transparency, ethical concerns, and the risk of obscuring psychometric assumptions. Drawing from articles in psychology and education, as well as technical papers from fields like natural language processing, this review evaluates AI&#x00027;s emerging role in supporting social science measurement. The review is organized into themes addressing content generation, psychometric analytics, data quality, and ethical considerations, and concludes with a discussion.</p></sec>
<sec id="s2">
<title>Content generation</title>
<p>One promising application of AI in social science measurement lies in the use of large language models (LLMs) for early-stage psychometric tasks such as item generation. LLMs can produce a large set of candidate survey items in the few moments it takes a researcher to write a suitable prompt. Studies have demonstrated that LLMs can produce syntactically and grammatically correct texts that semantically align with target constructs. <xref ref-type="bibr" rid="B72">Tan et al. (2024)</xref> listed 60 distinct research efforts to generate educational assessment items. <xref ref-type="bibr" rid="B10">Bhandari et al. (2024)</xref> conducted psychometric evaluations suggesting that AI-generated items matched the performance of human-written items.</p>
<p>One pitfall of using LLMs for this purpose is semantic drift, where generated items may deviate from the intended construct due to the probabilistic nature of language modeling. In typical configurations, most LLMs &#x0201C;sample&#x0201D; from a set of probable responses. AI-generated items thus reflect stochastic pattern matching rather than an epistemological connection to a defined theoretical construct. This disconnect arises in part from the absence of world knowledge in AI systems: they lack an understanding of psychological theory, cultural nuance, and the context of test administration (<xref ref-type="bibr" rid="B19">Ferdaus et al., 2026</xref>; <xref ref-type="bibr" rid="B58">Owan et al., 2023</xref>). Semantic drift may lead to AI-generated items that exhibit face validity without meeting statistical quality criteria (<xref ref-type="bibr" rid="B60">Qu et al., 2025</xref>; <xref ref-type="bibr" rid="B67">Speer et al., 2025</xref>). Replicability problems can be corrected in some AI models with appropriate configuration changes, but with the next release of the LLM it will once again become difficult to reproduce earlier outputs.</p>
<p>In addition to new items, AI can also creation of content for alternate test forms. Educational testing researchers have experimented with automatic item generation even prior to the advent of LLMs (<xref ref-type="bibr" rid="B21">Gierl and Lai, 2012</xref>; <xref ref-type="bibr" rid="B24">Gorgun and Bulut, 2024</xref>). Using appropriate prompt engineering, LLMs can generate alternative items that attempt to assess an existing construct, skill, or knowledge area (<xref ref-type="bibr" rid="B72">Tan et al., 2024</xref>). For example, Wang L. et al. <xref ref-type="bibr" rid="B79">(2025)</xref> developed a system that incorporated contextual knowledge from teachers into an item generation prompt to enhance the veracity of LLM-generated items. <xref ref-type="bibr" rid="B12">Burke (2025)</xref> generated alternative test forms through an iterative process of prompt development including refinements like this: &#x0201C;Exams should test the following learning objectives: procedural fluency, emergent statistical thinking, and explanatory literacy&#x02026;&#x0201D; <xref ref-type="bibr" rid="B44">Kim and Lee (2024)</xref> and <xref ref-type="bibr" rid="B67">Speer et al. (2025)</xref> cautioned that empirical verification of the equivalence of alternate forms generated with the help of AI is a crucial analytic step.</p>
<p>The development of translated items applies a &#x0201C;cross language&#x0201D; LLM to creating items for a foreign language version of a scale. Such systems specialize on language-pairs, enabling translations that try to preserve the definition of the construct reflected in the original items using the target language (<xref ref-type="bibr" rid="B25">Grobelny et al., 2025</xref>; <xref ref-type="bibr" rid="B78">Wang and Duan, 2024</xref>). Some tools create workflows that combine machine-generated drafts with human editing, a hybrid approach that may improve translation quality (<xref ref-type="bibr" rid="B35">Herbig et al., 2019</xref>). Semantic shifts induced by automated translation may affect item functioning across cultures, potentially compromising validity (<xref ref-type="bibr" rid="B83">Xu et al., 2024</xref>).</p>
<p>An emerging application of LLMs creates data sets of artificial item responses using &#x0201C;silicon respondents&#x0201D; (<xref ref-type="bibr" rid="B27">Gurdil et al., 2024</xref>; <xref ref-type="bibr" rid="B52">Liu et al., 2025</xref>; <xref ref-type="bibr" rid="B70">Sun et al., 2024</xref>). For example, <xref ref-type="bibr" rid="B31">H&#x000E4;m&#x000E4;l&#x000E4;inen et al. (2023)</xref> conducted a series of three experiments to evaluate the usefulness of participant data generated by an LLM and concluded that such data was of sufficiently high quality to include in pilot testing and experiment design efforts. <xref ref-type="bibr" rid="B69">Suh et al. (2025)</xref> reached a similar conclusion in an experiment on public opinion data. <xref ref-type="bibr" rid="B53">Lutz et al. (2025)</xref> examined prompting strategies and found that LLMs had difficulty generating data representing responses of minority groups. <xref ref-type="bibr" rid="B57">Neumann et al. (2025)</xref> evaluated nine LLMs&#x00027; ability to generate opinion data and found substantial variation in data quality by model.</p>
<p>In a parallel vein to LLMs, researchers have explored AI-based image generation for psychological assessments, such as facial expression scales and projective tests (<xref ref-type="bibr" rid="B46">Koweszko et al., 2025</xref>; <xref ref-type="bibr" rid="B51">Lin et al., 2025</xref>; <xref ref-type="bibr" rid="B77">Villegas-Ch et al., 2025</xref>). AI-generated inkblot-like stimuli have been examined as alternatives to traditional projective materials (<xref ref-type="bibr" rid="B4">Alsabouni and Ihara, 2023</xref>; <xref ref-type="bibr" rid="B49">Li et al., 2025</xref>). Researchers have also tested AI tools that assist with the evaluation of projective test responses created by participants (<xref ref-type="bibr" rid="B9">Beltoft et al., 2026</xref>). These innovations raise similar concerns to LLM-based item: Stimuli for projective tests require variety, whereas training for generative AI systems ingest graphical data in ways that may create homogeneous outputs (<xref ref-type="bibr" rid="B2">AlDahoul et al., 2025</xref>; <xref ref-type="bibr" rid="B55">Messingschlager and Appel, 2025</xref>).</p></sec>
<sec id="s3">
<title>Psychometric analytics</title>
<p>Traditionally, scale validation procedures have relied on step-by-step application of statistical procedures such as reliability estimation and factor analysis. These methods are usually grounded in psychometric frameworks like item response theory. Recently, researchers have applied methods from a domain of AI known as machine learning (ML) to predict factor structures, assess internal consistency, and identify item clusters (e.g., <xref ref-type="bibr" rid="B23">Goretzko and B&#x000FC;hner, 2020</xref>; <xref ref-type="bibr" rid="B68">Stanton et al., 2024</xref>). Unlike traditional statistical methods, ML generally does not embed data assumptions such as normality or linearity (<xref ref-type="bibr" rid="B22">Giudici et al., 2025</xref>). ML can handle atypical data types, such as graphical and linguistic data. This flexibility has advantages, for example when working with unruly distributions, but also raises transparency and interpretability questions. For example, when an ML predicts the reliability of a scale, the criteria or features it used to make that judgment ignore the theoretical underpinnings of scale development and may hinder replication efforts (<xref ref-type="bibr" rid="B88">Zhuang et al., 2025</xref>). Basic assumptions, such as unidimensionality or local independence, often remain unexamined and may be absent from the tested data.</p>
<p>The application of ML algorithms to individual problems of prediction and classification has existed for decades (e.g., <xref ref-type="bibr" rid="B15">Cortes and Vapnik, 1995</xref>) but these algorithms represent just a subset of what is now termed as AI. Taking a broader perspective, AI-assisted workflows are increasingly being tested to automate chains of tasks that would more usually be conducted in separate steps by a human analyst. These workflows integrate machine learning models, statistical code generation, and natural language interfaces to streamline procedures such factor analysis. For example, <xref ref-type="bibr" rid="B88">Zhuang et al. (2025)</xref> described a framework that applied adaptive testing principles to AI model evaluation. <xref ref-type="bibr" rid="B3">Almheiri et al. (2024)</xref> described an application where the LLM recommended a sequence of spreadsheet-based analyses. Similarly, AI has been used to streamline statistical modeling (<xref ref-type="bibr" rid="B65">Sharpnack et al., 2024</xref>; <xref ref-type="bibr" rid="B85">Yancey et al., 2024</xref>; <xref ref-type="bibr" rid="B86">Zhan et al., 2025</xref>) mainly through code generation for R and/or Python.</p>
<p>Automated workflows raise concerns about rigor/replicability (<xref ref-type="bibr" rid="B16">Duque et al., 2023</xref>). When human analysts conduct a factor analysis, for example, opportunities arise for evaluating assumptions and crosschecking. AI analytics, in their present development stage, tend to obscure the rationales behind decisions, predictions, and estimates (<xref ref-type="bibr" rid="B86">Zhan et al., 2025</xref>). The likelihood that a na&#x000EF;ve user could generate nonsensical results may be unacceptably high until the automation also offers built-in crosschecking and tools to flag violated assumptions (<xref ref-type="bibr" rid="B88">Zhuang et al., 2025</xref>).</p>
<p>LLMs and supervised ML classification algorithms also offer new avenues for approaching content validation. Traditionally, content validation involves expert judgment of whether items represent the construct being measured (<xref ref-type="bibr" rid="B33">Haynes et al., 1995</xref>). AI augments content analysis by analyzing semantic coherence, construct coverage, and linguistic appropriateness of item pools. For example, researchers have assessed how LLMs evaluate alignment of items with construct definitions (<xref ref-type="bibr" rid="B25">Grobelny et al., 2025</xref>; <xref ref-type="bibr" rid="B47">Larsen et al., 2024</xref>). Additionally, natural language processing techniques have been used to identify redundancies, gaps, or ambiguities in item sets (<xref ref-type="bibr" rid="B61">Reimers and Gurevych, 2019</xref>). ML algorithms can also predict human categorization of items, offering probabilistic assessments that complement human judgment (<xref ref-type="bibr" rid="B1">Ahmad et al., 2020</xref>). While these approaches enhance efficiency, they may also reinforce biases present in training data (<xref ref-type="bibr" rid="B7">Barocas et al., 2019</xref>, p. 7).</p>
<p>In addition to psychometrics, AI can also assist with test administration. Adaptive testing, which dynamically adjusts item difficulty based on a respondent&#x00027;s performance on previously administered items, can reduce test administration time and increase measurement precision. AI has been tested for real-time item selection and scoring. For example, <xref ref-type="bibr" rid="B56">Mujtaba and Mahapatra (2020)</xref> described AI-driven item sequencing to maximize precision while minimizing test length. Other research has used AI to tailor assessments both to ability levels and individual item response patterns (<xref ref-type="bibr" rid="B17">El Msayer et al., 2024</xref>; <xref ref-type="bibr" rid="B20">Gan et al., 2019</xref>). These systems raise concerns about transparency and fairness, because the algorithms typically make item selections without documentation of the psychometric rationale.</p>
<p>As a final topic in this section, researchers have also applied AI directly to common, time-consuming measurement tasks. For example, open-ended survey responses can offer rich qualitative insights into participants&#x00027; beliefs, attitudes, personality, and reasoning processes, but their analysis has traditionally been labor-intensive (<xref ref-type="bibr" rid="B18">Fan et al., 2023</xref>; <xref ref-type="bibr" rid="B71">Sun, 2021</xref>). AI can automate scoring and classification of open-ended responses by recognizing patterns in textual data and assigning scores or categories (<xref ref-type="bibr" rid="B28">Gweon and Schonlau, 2022</xref>). Similarly, LLMs can automatically assign grades and generate feedback for short answers and essays. These models have achieved low error rates in predicting human-assigned grades and have generated written feedback judged to be similar in quality to that of human experts (<xref ref-type="bibr" rid="B43">Katuka et al., 2024</xref>).</p></sec>
<sec id="s4">
<title>Data quality</title>
<p>Given contemporary methods used to train AI models, developers may inadvertently create inappropriate item content when using LLMs and other generative AI tools. Researchers have long recognized that LLMs exhibit biases present in training data (<xref ref-type="bibr" rid="B38">Hovy and Prabhumoye, 2021</xref>; <xref ref-type="bibr" rid="B55">Messingschlager and Appel, 2025</xref>; <xref ref-type="bibr" rid="B63">Salinas et al., 2023</xref>). These biases can lead to output that participants may consider offensive (<xref ref-type="bibr" rid="B39">Huang and Huang, 2025</xref>). In the same vein, LLMs are known to &#x0201C;hallucinate&#x0201D; in ways that create counterfactual outputs (<xref ref-type="bibr" rid="B13">Chen et al., 2023</xref>). Developers have tested safety controls to limit the generation of inappropriate output, but research shows that these can be readily circumvented (<xref ref-type="bibr" rid="B40">Hundt et al., 2025</xref>).</p>
<p>Current LLM training processes require vast amounts of data (<xref ref-type="bibr" rid="B76">Villalobos et al., 2024</xref>). Companies harvest these data from the internet, but at the same time increasing amounts of internet content are themselves AI-generated. Thus, another training-related concern is &#x0201C;AI-slop:&#x0201D; degradation of training data and model quality caused by recursive ingestion of AI-generated material (<xref ref-type="bibr" rid="B26">Gross and Colson, 2025</xref>). This problem can amplify biases and cause stylistic homogenization (<xref ref-type="bibr" rid="B19">Ferdaus et al., 2026</xref>; <xref ref-type="bibr" rid="B32">Han, 2025</xref>), undermining the richness needed for full representation of a construct domain (<xref ref-type="bibr" rid="B66">Smith and Stanton, 1998</xref>). Poor data quality may also complicate validation workflows: When decision-making models are trained or tested on data that includes synthetic responses or AI-generated items, generalizability of the results may be compromised. This is particularly troubling in adaptive testing and IRT modeling, where item-level precision is essential. If an item pool is polluted with low-quality or derivative content, resulting trait estimates may be misleading (<xref ref-type="bibr" rid="B5">Bachmann et al., 2024</xref>; <xref ref-type="bibr" rid="B45">Kingston and Kramer, 2013</xref>).</p></sec>
<sec id="s5">
<title>Ethical considerations</title>
<p>As AI becomes integrated into psychometric tasks, ethical concerns have emerged. The convenience of using automation in research processes has the potential to reduce researcher oversight (<xref ref-type="bibr" rid="B84">Xu and Peng, 2025</xref>). One pressing issue is algorithmic bias, which can arise when AI systems make decisions based on training data that reflect historical inequities. As AI systems take on tasks traditionally performed by humans&#x02014;such as item writing&#x02014;there is a risk that researchers may defer too readily to automated outputs. This could lead to a diminished role for critical thinking in scale development/deployment such that AI testing reinforces stereotypes or excludes a breadth of cultural perspectives.</p>
<p>Ethical concerns also arise regarding privacy and autonomy, especially when AI systems infer psychological traits (<xref ref-type="bibr" rid="B44">Kim and Lee, 2024</xref>; <xref ref-type="bibr" rid="B54">Menard and Bott, 2025</xref>). In cases where AI infers personality traits or predicts future behavior, there is a risk of violating participant autonomy by analyzing personal data in the absence of informed consent by affected individuals (<xref ref-type="bibr" rid="B44">Kim and Lee, 2024</xref>). Researchers must also ensure that data handling&#x02014;for training data, input prompts, and output data&#x02014;complies with ethical standards and legal regulations, such as GDPR and HIPAA, and that participants affected by AI-based measures are informed about the role of AI in the research process and in testing services.</p>
<p>Moreover, because the internal mechanisms by which LLMs generate items are opaque, researchers may be unable to explain why certain outputs were produced, raising concerns about rigor and legal defensibility of measures. Unlike traditional methods, which allow researchers to assess/adjust for bias through statistical procedures, AI systems make it difficult to detect or correct adverse patterns (<xref ref-type="bibr" rid="B60">Qu et al., 2025</xref>; <xref ref-type="bibr" rid="B88">Zhuang et al., 2025</xref>). These issues pose challenges for peer review, replication, ethical oversight, and informed consent, especially when AI is used in processes like educational placement and employee selection.</p></sec>
<sec sec-type="discussion" id="s6">
<title>Discussion</title>
<p>Integration of AI into development and validation of measurement tools heralds the prospect of rapid methodological evolution for social science research. As this review suggests, AI tools show potential to enhance item generation, automate analysis, and support adaptive testing, among other tasks. These innovations enhance what a research team can accomplish with limited resources (<xref ref-type="bibr" rid="B44">Kim and Lee, 2024</xref>; <xref ref-type="bibr" rid="B80">Wang et al., 2024</xref>) but also introduce concerns that must be addressed to ensure the integrity and ethical use of AI in psychometrics.</p>
<p>A recurring theme throughout this review is that, while current AI systems can assist with many psychometric tasks, they do not do so transparently. Current systems may replicate traditional psychometric procedures in some contexts, but they typically cannot reveal the assumptions or decision rules underlying their outputs. This opacity raises questions about validity, replicability, and accountability&#x02014;core principles of psychometrics (<xref ref-type="bibr" rid="B11">Bringsjord, 2011</xref>; <xref ref-type="bibr" rid="B75">Veldkamp, 2023</xref>). The risk is particularly acute in tasks such as data analysis, where untested or invisible assumptions may distort results.</p>
<p>Another critical issue is the epistemological gap between AI systems and human researchers. AI models, despite linguistic fluency, lack world knowledge and contextual understanding. This deficit affects the ability of an AI model to generate items that are not simply grammatically correct but also societally and culturally appropriate. Consistent alignment between AI model outputs, theoretical frameworks, and ethical standards for the conduct of research remains as an important future goal.</p>
<p>Ethical concerns compound current technical challenges. Bias in training data, lack of cultural sensitivity, and erosion of researcher oversight all threaten to undermine the fairness and defensibility of instruments created and/or administered with the assistance of AI (<xref ref-type="bibr" rid="B42">Kassir et al., 2023</xref>; <xref ref-type="bibr" rid="B56">Mujtaba and Mahapatra, 2020</xref>). Moreover, AI-slop may pose and eventual risk to the diversity and quality of psychometric tools (<xref ref-type="bibr" rid="B29">Hagendorff, 2021</xref>). Without careful curation of training data, AI-generated items may become increasingly homogenized, atheoretical, and ineffective in capturing psychological constructs.</p>
<p>Three mitigation strategies may help us achieve benefits of using AI in psychometrics: human-in-the-loop, rigorous training data curation, and transparency engineering. First, experiments with AI-enabled systems suggest that interface designs can provide checkpoints for human validation of outputs and pending decisions (<xref ref-type="bibr" rid="B12">Burke, 2025</xref>; <xref ref-type="bibr" rid="B48">Lee et al., 2025</xref>; <xref ref-type="bibr" rid="B82">Wu et al., 2022</xref>; <xref ref-type="bibr" rid="B62">Rothschild et al., 2025</xref>). This design strategy, commonly known as human-in-the-loop, casts the AI as a subordinate team member in an interactive effort to achieve a project goal.</p>
<p>The second strategy, rigorous data and tool curation, might require cooperation by professional societies, publishers, and research funders. Heretofore, curation of training and evaluation data for LLMs and other AI technologies has often been left to commercial firms that create AI models. There is a process called &#x0201C;fine-tuning&#x0201D; where additional training data (often from a specialized domain) is applied to a basic trained model to allow it to perform tasks that a general-purpose model cannot (<xref ref-type="bibr" rid="B37">Hommel and Arslan, 2025</xref>). Shared benchmarking data and tools are needed to support fine tuning and evaluate the performance of the resulting system. To support contemporary scientific standards, a concerted effort is needed by the social science community to curate high quality AI training data and evaluation tools.</p>
<p>Finally, emerging research suggests that transparency challenges exhibited by current AI models can eventually be rectified by new capabilities (<xref ref-type="bibr" rid="B6">Bacon and Menon, 2025</xref>; <xref ref-type="bibr" rid="B50">Liao and Vaughan, 2023</xref>; <xref ref-type="bibr" rid="B59">Pigac, 2025</xref>). With appropriate engineering, an AI model can communicate to users the internal assumptions used in a computation as well as the uncertainty surrounding estimates/outputs. Such features could eventually include assumption-checking protocols, post hoc simulation of distributional properties, and comparison/validation of results by multiple AI models. As an example of the latter, <xref ref-type="bibr" rid="B52">Liu et al. (2025)</xref> published research suggesting that an ensemble of LLMs produces better results than a single LLM working alone (see also, <xref ref-type="bibr" rid="B48">Lee et al., 2025</xref>). The concern for transparency extends to reports by human researchers using AI: we need scientific reporting standards that document model versions, configurations, training procedures, and known limitations.</p>
<p>Despite challenges outlined above, AI-assisted psychometrics promise to enhance the efficiency and scope of social science measurement tasks. When combined with traditional methods and subject to human oversight, AI tools can support efficient and ethical approaches to measurement. The future of AI in psychometrics must be guided by a hybrid approach&#x02014;one that leverages the strengths of AI while preserving the judgment, expertise, and ethical accountability of human researchers. Transparent documentation, assumption-checking protocols, and interdisciplinary collaboration will be essential to ensure that AI tools enhance and do not compromise the rigor of item, scale, and instrument development in the social sciences.</p></sec>
</body>
<back>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>JS: Conceptualization, Writing &#x02013; review &#x00026; editing, Writing &#x02013; original draft.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s9">
<title>Generative AI statement</title>
<p>The author(s) declared that generative AI was not used in the creation of this manuscript.</p>
<p>Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.</p></sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ahmad</surname> <given-names>F.</given-names></name> <name><surname>Abbasi</surname> <given-names>A.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Dobolyi</surname> <given-names>D. G.</given-names></name> <name><surname>Netemeyer</surname> <given-names>R. G.</given-names></name> <name><surname>Clifford</surname> <given-names>G. D.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>A deep learning architecture for psychometric natural language processing</article-title>. <source>ACM Trans. Inf. Syst.</source> <volume>38</volume>:<fpage>6</fpage>. doi: <pub-id pub-id-type="doi">10.1145/3365211</pub-id></mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>AlDahoul</surname> <given-names>N.</given-names></name> <name><surname>Rahwan</surname> <given-names>T.</given-names></name> <name><surname>Zaki</surname> <given-names>Y.</given-names></name></person-group> (<year>2025</year>). <article-title>AI-generated faces influence gender stereotypes and racial homogenization</article-title>. <source>Sci. Rep.</source> <volume>15</volume>:<fpage>14449</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-025-99623-3</pub-id><pub-id pub-id-type="pmid">40281283</pub-id></mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Almheiri</surname> <given-names>S. M. A. A.</given-names></name> <name><surname>AlAnsari</surname> <given-names>M.</given-names></name> <name><surname>AlHashmi</surname> <given-names>J.</given-names></name> <name><surname>Abdalmajeed</surname> <given-names>N.</given-names></name> <name><surname>Jalil</surname> <given-names>M.</given-names></name> <name><surname>Ertek</surname> <given-names>G.</given-names></name></person-group> (<year>2024</year>). <article-title>&#x0201C;Data analytics with large language models (LLM): a novel prompting framework,&#x0201D;</article-title> in <source>International Conference on Business Analytics in Practice</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer Nature Switzerland</publisher-name>), <fpage>243</fpage>&#x02013;<lpage>255</lpage>.</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alsabouni</surname> <given-names>I.</given-names></name> <name><surname>Ihara</surname> <given-names>H.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;Utilizing ambiguous visual stimuli for creative expression in collaborative teamwork,&#x0201D;</article-title> in <source>IASDR 2023: Life-Changing Design</source>, eds. D. De Sainz Molestina, L. Galluzzo, F. Rizzo, and D. Spallazzo (Milan: IASDR).</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bachmann</surname> <given-names>D.</given-names></name> <name><surname>van der Wal</surname> <given-names>O.</given-names></name> <name><surname>Chvojka</surname> <given-names>E.</given-names></name> <name><surname>Zuidema</surname> <given-names>W. H.</given-names></name> <name><surname>van Maanen</surname> <given-names>L.</given-names></name> <name><surname>Schulz</surname> <given-names>K.</given-names></name></person-group> (<year>2024</year>). <article-title>fl-IRT-ing with psychometrics to improve NLP bias measurement</article-title>. <source>Minds Mach.</source> <volume>34</volume>:<fpage>37</fpage>. doi: <pub-id pub-id-type="doi">10.1007/s11023-024-09695-9</pub-id></mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Bacon</surname> <given-names>G.</given-names></name> <name><surname>Menon</surname> <given-names>V.</given-names></name></person-group> (<year>2025</year>). <article-title>&#x0201C;Exploring transparency and AI assessment in LLM-assisted research applications,&#x0201D;</article-title> in <source>SoutheastCon 2025</source> (<publisher-loc>Charlotte, NC</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>346</fpage>&#x02013;<lpage>351</lpage>.</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="web"><person-group person-group-type="author"><name><surname>Barocas</surname> <given-names>S.</given-names></name> <name><surname>Hardt</surname> <given-names>M.</given-names></name> <name><surname>Narayanan</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <source>Fairness and machine learning: limitations and opportunities</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://fairmlbook.org/">https://fairmlbook.org/</ext-link> (Accessed February 9, 2026).</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Beghetto</surname> <given-names>R. A.</given-names></name> <name><surname>Ross</surname> <given-names>W.</given-names></name> <name><surname>Karwowski</surname> <given-names>M.</given-names></name> <name><surname>Gl&#x00103;veanu</surname> <given-names>V. P.</given-names></name></person-group> (<year>2025</year>). <article-title>Partnering with AI for instrument development: possibilities and pitfalls</article-title>. <source>New Ideas Psychol.</source> <volume>76</volume>:<fpage>101121</fpage>.</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Beltoft</surname> <given-names>S. L.</given-names></name> <name><surname>Nielsen</surname> <given-names>J.</given-names></name> <name><surname>Askegaard</surname> <given-names>S.</given-names></name> <name><surname>Schneider-Kamp</surname> <given-names>A.</given-names></name></person-group> (<year>2026</year>). <article-title>Drawing out what they struggle to say: AI-augmented analysis of projective techniques in qualitative health research</article-title>. <source>Int. J. Qual. Methods</source> <volume>25</volume>:<fpage>16094069261417903</fpage>. doi: <pub-id pub-id-type="doi">10.1177/16094069261417903</pub-id></mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bhandari</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Kwak</surname> <given-names>Y.</given-names></name> <name><surname>Pardos</surname> <given-names>Z. A.</given-names></name></person-group> (<year>2024</year>). <article-title>Evaluating the psychometric properties of ChatGPT-generated questions. Comput</article-title>. <source>Educ. Artif. Intellig.</source> <volume>7</volume>:<fpage>100284</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.caeai.2024.100284</pub-id></mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bringsjord</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Psychometric artificial intelligence</article-title>. <source>J. Exp. Theor. Artif. Intell.</source> <volume>23</volume>, <fpage>271</fpage>&#x02013;<lpage>277</lpage>. doi: <pub-id pub-id-type="doi">10.1080/0952813X.2010.502314</pub-id></mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Burke</surname> <given-names>C. M.</given-names></name></person-group> (<year>2025</year>). <article-title>AI-assisted exam variant generation: a human-in-the-loop framework for automatic item creation</article-title>. <source>Educ. Sci.</source> <volume>15</volume>:<fpage>1029</fpage>. doi: <pub-id pub-id-type="doi">10.3390/educsci15081029</pub-id></mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Fu</surname> <given-names>Q.</given-names></name> <name><surname>Yuan</surname> <given-names>Y.</given-names></name> <name><surname>Wen</surname> <given-names>Z.</given-names></name> <name><surname>Fan</surname> <given-names>G.</given-names></name> <name><surname>Liu</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>&#x0201C;Hallucination detection: robustly discerning reliable answers in large language models,&#x0201D;</article-title> in <source>Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</source> (<publisher-loc>Birmingham</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>245</fpage>&#x02013;<lpage>255</lpage>.</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Circi</surname> <given-names>R.</given-names></name> <name><surname>Hicks</surname> <given-names>J.</given-names></name> <name><surname>Sikali</surname> <given-names>E.</given-names></name></person-group> (<year>2023</year>). <article-title>Automatic item generation: foundations and machine learning-based approaches for assessments</article-title>. <source>Front. Educ.</source> <volume>8</volume>:<fpage>858273</fpage>. doi: <pub-id pub-id-type="doi">10.3389/feduc.2023.858273</pub-id></mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cortes</surname> <given-names>C.</given-names></name> <name><surname>Vapnik</surname> <given-names>V.</given-names></name></person-group> (<year>1995</year>). <article-title>Support-vector networks</article-title>. <source>Mach. Learn.</source> <volume>20</volume>, <fpage>273</fpage>&#x02013;<lpage>297</lpage>. doi: <pub-id pub-id-type="doi">10.1023/A:1022627411411</pub-id></mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Duque</surname> <given-names>A.</given-names></name> <name><surname>Syed</surname> <given-names>A.</given-names></name> <name><surname>Day</surname> <given-names>K. V.</given-names></name> <name><surname>Berry</surname> <given-names>M. J.</given-names></name> <name><surname>Katz</surname> <given-names>D. S.</given-names></name> <name><surname>Kindratenko</surname> <given-names>V. V.</given-names></name></person-group> (<year>2023</year>). <article-title><italic>Leveraging large language models to build and execute computational workflows</italic></article-title>. <source>arXiv [Preprint]</source>. arXiv:2312.07711. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2312.07711</pub-id></mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>El Msayer</surname> <given-names>M.</given-names></name> <name><surname>Aoula</surname> <given-names>E. S.</given-names></name> <name><surname>Bouihi</surname> <given-names>B.</given-names></name></person-group> (<year>2024</year>). Artificial intelligence in computerized adaptive testing to assess the cognitive performance of students: a systematic review, in <italic>2024 International Conference on Intelligent Systems and Computer Vision (ISCV)</italic> (<publisher-loc>Fez</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>8</lpage>.</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>J.</given-names></name> <name><surname>Sun</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Zhang</surname> <given-names>B.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>How well can an AI chatbot infer personality? Examining psychometric properties of machine-inferred personality scores</article-title>. <source>J. Appl. Psychol.</source> <volume>108</volume>:<fpage>1277</fpage>. doi: <pub-id pub-id-type="doi">10.1037/apl0001082</pub-id><pub-id pub-id-type="pmid">36745068</pub-id></mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ferdaus</surname> <given-names>M. M.</given-names></name> <name><surname>Abdelguerfi</surname> <given-names>M.</given-names></name> <name><surname>Loup</surname> <given-names>E.</given-names></name> <name><surname>Niles</surname> <given-names>K. N.</given-names></name> <name><surname>Pathak</surname> <given-names>K.</given-names></name> <name><surname>Sloan</surname> <given-names>S.</given-names></name></person-group> (<year>2026</year>). <article-title>Towards trustworthy AI: a review of ethical and robust large language models</article-title>. <source>ACM Comput. Surv.</source> <volume>58</volume>, <fpage>1</fpage>&#x02013;<lpage>43</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3777382</pub-id></mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Gan</surname> <given-names>W.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Ye</surname> <given-names>S.</given-names></name> <name><surname>Fan</surname> <given-names>Y.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;AI-tutor: generating tailored remedial questions and answers based on cognitive diagnostic assessment,&#x0201D;</article-title> in <source>Proceedings of the 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC)</source> (<publisher-loc>Beijing</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gierl</surname> <given-names>M. J.</given-names></name> <name><surname>Lai</surname> <given-names>H.</given-names></name></person-group> (<year>2012</year>). <article-title>The role of item models in automatic item generation</article-title>. <source>Int. J. Test.</source> <volume>12</volume>, <fpage>273</fpage>&#x02013;<lpage>298</lpage>. doi: <pub-id pub-id-type="doi">10.1080/15305058.2011.635830</pub-id></mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Giudici</surname> <given-names>P.</given-names></name> <name><surname>Raffinetti</surname> <given-names>E.</given-names></name> <name><surname>Riani</surname> <given-names>M.</given-names></name></person-group> (<year>2025</year>). <article-title>Robust machine learning models: linear and nonlinear</article-title>. <source>Int. J. Data Sci. Anal.</source> <volume>20</volume>, <fpage>1043</fpage>&#x02013;<lpage>1050</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s41060-024-00512-1</pub-id></mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goretzko</surname> <given-names>D.</given-names></name> <name><surname>B&#x000FC;hner</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>One model to rule them all? Using machine learning algorithms to determine the number of factors in exploratory factor analysis</article-title>. <source>Psychol. Methods</source> <volume>25</volume>, <fpage>776</fpage>. doi: <pub-id pub-id-type="doi">10.1037/met0000262</pub-id><pub-id pub-id-type="pmid">32134315</pub-id></mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gorgun</surname> <given-names>G.</given-names></name> <name><surname>Bulut</surname> <given-names>O.</given-names></name></person-group> (<year>2024</year>). <article-title>Exploring quality criteria and evaluation methods in automated question generation: a comprehensive survey</article-title>. <source>Educ. Inf. Technol.</source> <volume>29</volume>, <fpage>24111</fpage>&#x02013;<lpage>24142</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10639-024-12771-3</pub-id></mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Grobelny</surname> <given-names>J.</given-names></name> <name><surname>Szyma&#x00144;ski</surname> <given-names>K.</given-names></name> <name><surname>Strozyk</surname> <given-names>Z.</given-names></name></person-group> (<year>2025</year>). <article-title>Act as an expert in psychometry. the evaluation of large language models utility in psychological tests cross-cultural adaptations</article-title>. <source>Acta Psychol.</source> <volume>261</volume>:<fpage>105813</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.actpsy.2025.105813</pub-id></mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gross</surname> <given-names>E. C.</given-names></name> <name><surname>Colson</surname> <given-names>A. J.</given-names></name></person-group> (<year>2025</year>). <article-title>AI-slop and political propaganda: the role of AI-generated content in memes and influence campaigns</article-title>. <source>EON</source> <volume>6</volume>, <fpage>289</fpage>&#x02013;<lpage>298</lpage>. doi: <pub-id pub-id-type="doi">10.56177/eon.6.3.2025.art.1</pub-id></mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gurdil</surname> <given-names>H.</given-names></name> <name><surname>Soguksu</surname> <given-names>Y. B.</given-names></name> <name><surname>Salihoglu</surname> <given-names>S.</given-names></name> <name><surname>Coskun</surname> <given-names>F.</given-names></name></person-group> (<year>2024</year>). <article-title>Integration of artificial intelligence in educational measurement: efficacy of ChatGPT in data generation within the scope of item response theory</article-title>. <source>arXiv [Preprint]</source>. arXiv:2402.01731. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2402.01731</pub-id></mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gweon</surname> <given-names>H.</given-names></name> <name><surname>Schonlau</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>Automated classification for open-ended questions with BERT</article-title>. <source>arXiv [Preprint]</source>. arXiv:2209.06178. doi: <pub-id pub-id-type="doi">10.1093/jssam/smad015</pub-id></mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hagendorff</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Linking human and machine behavior: a new approach to evaluate training data quality for beneficial machine learning</article-title>. <source>Minds Mach.</source> <volume>31</volume>, <fpage>563</fpage>&#x02013;<lpage>593</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11023-021-09573-8</pub-id><pub-id pub-id-type="pmid">34602749</pub-id></mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hagerty</surname> <given-names>A.</given-names></name> <name><surname>Rubinov</surname> <given-names>I.</given-names></name></person-group> (<year>2019</year>). <article-title>Global AI ethics: a review of the social impacts and ethical implications of artificial intelligence</article-title>. <source>arXiv [Preprint]</source>. arXiv:1907.07892. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1907.07892</pub-id></mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>H&#x000E4;m&#x000E4;l&#x000E4;inen</surname> <given-names>P.</given-names></name> <name><surname>Tavast</surname> <given-names>M.</given-names></name> <name><surname>Kunnari</surname> <given-names>A.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;Evaluating large language models in generating synthetic HCI research data: a case study,&#x0201D;</article-title> in <source>Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems</source> (<publisher-loc>Hamburg</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>19</lpage>.</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>B.</given-names></name></person-group> (<year>2025</year>). <article-title>Trustworthy machine learning in the era of foundation models</article-title>. <source>IEEE Intell. Syst.</source> <volume>40</volume>, <fpage>73</fpage>&#x02013;<lpage>79</lpage>. doi: <pub-id pub-id-type="doi">10.1109/MIS.2025.3621938</pub-id></mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haynes</surname> <given-names>S. N.</given-names></name> <name><surname>Richard</surname> <given-names>D. C. S.</given-names></name> <name><surname>Kubany</surname> <given-names>E. S.</given-names></name></person-group> (<year>1995</year>). <article-title>Content validity in psychological assessment: a functional approach to concepts and methods</article-title>. <source>Psychol. Assess.</source> <volume>7</volume>, <fpage>238</fpage>&#x02013;<lpage>247</lpage>. doi: <pub-id pub-id-type="doi">10.1037/1040-3590.7.3.238</pub-id></mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Heggestad</surname> <given-names>E. D.</given-names></name> <name><surname>Scheaf</surname> <given-names>D. J.</given-names></name> <name><surname>Banks</surname> <given-names>G. C.</given-names></name> <name><surname>Monroe Hausfeld</surname> <given-names>M.</given-names></name> <name><surname>Tonidandel</surname> <given-names>S.</given-names></name> <name><surname>Williams</surname> <given-names>E. B.</given-names></name></person-group> (<year>2019</year>). <article-title>Scale adaptation in organizational science research: a review and best-practice recommendations</article-title>. <source>J. Manage.</source> <volume>45</volume>, <fpage>2596</fpage>&#x02013;<lpage>2627</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0149206319850280</pub-id></mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Herbig</surname> <given-names>N.</given-names></name> <name><surname>Pal</surname> <given-names>S.</given-names></name> <name><surname>van Genabith</surname> <given-names>J.</given-names></name> <name><surname>Kr&#x000FC;ger</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Integrating artificial and human intelligence for efficient translation</article-title>. <source>arXiv [Preprint]</source>. arXiv:1903.02978. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1903.02978</pub-id></mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hernandez</surname> <given-names>I.</given-names></name> <name><surname>Nie</surname> <given-names>W.</given-names></name></person-group> (<year>2023</year>). <article-title>The AI-IP: minimizing the guesswork of personality scale item development through artificial intelligence</article-title>. <source>Pers. Psychol.</source> <volume>76</volume>, <fpage>1011</fpage>&#x02013;<lpage>1035</lpage>. doi: <pub-id pub-id-type="doi">10.1111/peps.12543</pub-id></mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hommel</surname> <given-names>B. E.</given-names></name> <name><surname>Arslan</surname> <given-names>R. C.</given-names></name></person-group> (<year>2025</year>). <article-title>Language models accurately infer correlations between psychological items and scales from text alone</article-title>. <source>Adv. Methods Pract. Psychol. Sci.</source> <volume>8</volume>:<fpage>25152459251377093</fpage>. doi: <pub-id pub-id-type="doi">10.1177/25152459251377093</pub-id></mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hovy</surname> <given-names>D.</given-names></name> <name><surname>Prabhumoye</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Five sources of bias in natural language processing</article-title>. <source>Lang. Linguist. Compass</source> <volume>15</volume>:<fpage>e12432</fpage>. doi: <pub-id pub-id-type="doi">10.1111/lnc3.12432</pub-id><pub-id pub-id-type="pmid">35864931</pub-id></mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>L. T. L.</given-names></name> <name><surname>Huang</surname> <given-names>T. R.</given-names></name></person-group> (<year>2025</year>). <article-title>Generative bias: widespread, unexpected, and uninterpretable biases in generative models and their implications</article-title>. <source>AI Soc.</source> doi: <pub-id pub-id-type="doi">10.1007/s00146-025-02533-1</pub-id></mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hundt</surname> <given-names>A.</given-names></name> <name><surname>Azeem</surname> <given-names>R.</given-names></name> <name><surname>Mansouri</surname> <given-names>M.</given-names></name> <name><surname>Brand&#x000E3;o</surname> <given-names>M.</given-names></name></person-group> (<year>2025</year>). <article-title>LLM-driven robots risk enacting discrimination, violence, and unlawful actions</article-title>. <source>Int. J. Soc. Robot.</source> <volume>17</volume>, <fpage>2663</fpage>&#x02013;<lpage>711</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12369-025-01301-x</pub-id></mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jiao</surname> <given-names>J.</given-names></name> <name><surname>Afroogh</surname> <given-names>S.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>Navigating LLM ethics: advancements, challenges, and future directions</article-title>. <source>AI Ethics</source> <volume>5</volume>, <fpage>5795</fpage>&#x02013;<lpage>5819</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s43681-025-00814-5</pub-id></mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kassir</surname> <given-names>S.</given-names></name> <name><surname>Baker</surname> <given-names>L.</given-names></name> <name><surname>Dolphin</surname> <given-names>J.</given-names></name> <name><surname>Polli</surname> <given-names>F.</given-names></name></person-group> (<year>2023</year>). <article-title>AI for hiring in context: a perspective on overcoming the unique challenges of employment research to mitigate disparate impact</article-title>. <source>AI Ethics</source> <volume>3</volume>, <fpage>845</fpage>&#x02013;<lpage>868</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s43681-022-00208-x</pub-id></mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Katuka</surname> <given-names>G. A.</given-names></name> <name><surname>Gain</surname> <given-names>A.</given-names></name> <name><surname>Yu</surname> <given-names>Y.-Y.</given-names></name></person-group> (<year>2024</year>). <article-title>Investigating automatic scoring and feedback using large language models</article-title>. <source>arXiv [Preprint]</source>. arXiv:2405.00602. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2405.00602</pub-id></mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Lee</surname> <given-names>B.</given-names></name></person-group> (<year>2024</year>). <article-title>AI-augmented surveys: leveraging large language models and surveys for opinion prediction</article-title>. <source>arXiv [Preprint]</source>. arXiv:2305.09620. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2305.09620</pub-id></mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Kingston</surname> <given-names>N. M.</given-names></name> <name><surname>Kramer</surname> <given-names>L. B.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;10 High-Stakes test construction and test use,&#x0201D;</article-title> in <source>The Oxford Handbook of Quantitative Methods, Vol. 1</source>, ed. T. D. Little (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>), <fpage>189</fpage>&#x02013;<lpage>205</lpage>.</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koweszko</surname> <given-names>T.</given-names></name> <name><surname>Kukulska</surname> <given-names>N.</given-names></name> <name><surname>Gierus</surname> <given-names>J.</given-names></name> <name><surname>Silczuk</surname> <given-names>A.</given-names></name></person-group> (<year>2025</year>). <article-title>Construction and initial psychometric validation of the Morana Scale: a multidimensional projective tool developed using AI-generated illustrations</article-title>. <source>J. Clin. Med.</source> <volume>14</volume>:<fpage>7069</fpage>. doi: <pub-id pub-id-type="doi">10.3390/jcm14197069</pub-id><pub-id pub-id-type="pmid">41096149</pub-id></mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Larsen</surname> <given-names>K. R.</given-names></name> <name><surname>Yen</surname> <given-names>S.</given-names></name> <name><surname>Lukyanenko</surname> <given-names>R.</given-names></name></person-group> (<year>2024</year>). <article-title>&#x0201C;Integrating LLMs and psychometrics: global construct validity,&#x0201D;</article-title> in <source>Proceedings of the 45th Conference on Information Systems</source> (<publisher-loc>Bangkok</publisher-loc>: <publisher-name>Association for Information Systems</publisher-name>).</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>P.</given-names></name> <name><surname>Son</surname> <given-names>M.</given-names></name> <name><surname>Jia</surname> <given-names>Z.</given-names></name></person-group> (<year>2025</year>). <article-title>AI-powered automatic item generation for psychological tests: a conceptual framework for an LLM-based multi-agent AIG system</article-title>. <source>J. Bus. Psychol.</source> 1&#x02013;29. doi: <pub-id pub-id-type="doi">10.1007/s10869-025-10067-y</pub-id></mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Yip</surname> <given-names>D. K. M.</given-names></name></person-group> (<year>2025</year>). <article-title>&#x0201C;Enhancing emotional exploration and self-expression through AI-generated dynamic visuals: a study inspired by the Rorschach Inkblot Test,&#x0201D;</article-title> in <source>Proceedings of the 18th International Symposium on Visual Information Communication and Interaction</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>5</lpage>.</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>Q. V.</given-names></name> <name><surname>Vaughan</surname> <given-names>J. W.</given-names></name></person-group> (<year>2023</year>). <article-title>AI transparency in the age of LLMs: a human-centered research roadmap</article-title>. <source>arXiv [Preprint].</source> arXiv:2306.01941. doi: <pub-id pub-id-type="doi">10.1162/99608f92.8036d03b</pub-id></mixed-citation>
</ref>
<ref id="B51">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>Q.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Ong</surname> <given-names>Y. S.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name></person-group> (<year>2025</year>). <article-title>&#x0201C;Make me happier: evoking emotions through image diffusion models,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF International Conference on Computer Vision</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>16367</fpage>&#x02013;<lpage>16376</lpage>.</mixed-citation>
</ref>
<ref id="B52">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Bhandari</surname> <given-names>S.</given-names></name> <name><surname>Pardos</surname> <given-names>Z. A.</given-names></name></person-group> (<year>2025</year>). <article-title>Leveraging LLM respondents for item evaluation: a psychometric analysis</article-title>. <source>Br. J. Educ. Technol.</source> <volume>56</volume>, <fpage>1028</fpage>&#x02013;<lpage>1052</lpage>. doi: <pub-id pub-id-type="doi">10.1111/bjet.13570</pub-id></mixed-citation>
</ref>
<ref id="B53">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lutz</surname> <given-names>M.</given-names></name> <name><surname>Sen</surname> <given-names>I.</given-names></name> <name><surname>Ahnert</surname> <given-names>G.</given-names></name> <name><surname>Rogers</surname> <given-names>E.</given-names></name> <name><surname>Strohmaier</surname> <given-names>M.</given-names></name></person-group> (<year>2025</year>). <article-title>The prompt makes the person (a): a systematic evaluation of sociodemographic persona prompting for large language models</article-title>. <source>arXiv [Preprint]</source>. arXiv:2507.16076. doi: <pub-id pub-id-type="doi">10.18653/v1/2025.findings-emnlp.1261</pub-id></mixed-citation>
</ref>
<ref id="B54">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Menard</surname> <given-names>P.</given-names></name> <name><surname>Bott</surname> <given-names>G. J.</given-names></name></person-group> (<year>2025</year>). <article-title>Artificial intelligence misuse and concern for information privacy: new construct validation and future directions</article-title>. <source>Inf. Syst. J.</source> <volume>35</volume>, <fpage>322</fpage>&#x02013;<lpage>367</lpage>. doi: <pub-id pub-id-type="doi">10.1111/isj.12544</pub-id></mixed-citation>
</ref>
<ref id="B55">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Messingschlager</surname> <given-names>T. V.</given-names></name> <name><surname>Appel</surname> <given-names>M.</given-names></name></person-group> (<year>2025</year>). <article-title>Algorithmic bias in image-generating artificial intelligence: prevalence and user perceptions</article-title>. <source>Inf. Commun. Soc.</source> doi: <pub-id pub-id-type="doi">10.1080/1369118X.2025.2584146</pub-id></mixed-citation>
</ref>
<ref id="B56">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Mujtaba</surname> <given-names>D. F.</given-names></name> <name><surname>Mahapatra</surname> <given-names>N. R.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Artificial intelligence in computerized adaptive testing,&#x0201D;</article-title> in <source>2020 International Conference on Computational Science and Computational Intelligence (CSCI)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>649</fpage>&#x02013;<lpage>654</lpage>.</mixed-citation>
</ref>
<ref id="B57">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neumann</surname> <given-names>T.</given-names></name> <name><surname>De-Arteaga</surname> <given-names>M.</given-names></name> <name><surname>Fazelpour</surname> <given-names>S.</given-names></name></person-group> (<year>2025</year>). <article-title>Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation</article-title>. <source>arXiv preprint</source> arXiv:2504.08954.</mixed-citation>
</ref>
<ref id="B58">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Owan</surname> <given-names>V. J.</given-names></name> <name><surname>Abang</surname> <given-names>K. B.</given-names></name> <name><surname>Idika</surname> <given-names>D. O.</given-names></name> <name><surname>Etta</surname> <given-names>E. O.</given-names></name> <name><surname>Bassey</surname> <given-names>B. A.</given-names></name></person-group> (<year>2023</year>). <article-title>Exploring the potential of artificial intelligence tools in educational measurement and assessment</article-title>. <source>Eurasia J. Math. Sci. Technol. Educ.</source> <volume>19</volume>:<fpage>em2307</fpage>. doi: <pub-id pub-id-type="doi">10.29333/ejmste/13428</pub-id></mixed-citation>
</ref>
<ref id="B59">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pigac</surname> <given-names>T.</given-names></name></person-group> (<year>2025</year>). <article-title>Transparency in large language model (LLM)-powered digital human twins: the AI ethics perspective</article-title>. <source>AI Soc.</source> doi: <pub-id pub-id-type="doi">10.1007/s00146-025-02617-y</pub-id></mixed-citation>
</ref>
<ref id="B60">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Qu</surname> <given-names>C.</given-names></name> <name><surname>Dai</surname> <given-names>S.</given-names></name> <name><surname>Wei</surname> <given-names>X.</given-names></name> <name><surname>Cai</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Yin</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>Tool learning with large language models: a survey</article-title>. <source>Front. Comput. Sci.</source> <volume>19</volume>:<fpage>198343</fpage>. doi: <pub-id pub-id-type="doi">10.1007/s11704-024-40678-2</pub-id></mixed-citation>
</ref>
<ref id="B61">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Reimers</surname> <given-names>N.</given-names></name> <name><surname>Gurevych</surname> <given-names>I.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Sentence-BERT: sentence embeddings using Siamese BERT-networks,&#x0201D;</article-title> in <source>Proceedings of EMNLP-IJCNLP</source> (<publisher-loc>Hong Kong</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>3982</fpage>&#x02013;<lpage>3992</lpage>.</mixed-citation>
</ref>
<ref id="B62">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Rothschild</surname> <given-names>D.</given-names></name> <name><surname>Brand</surname> <given-names>J.</given-names></name> <name><surname>Schroeder</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2025</year>). <article-title>&#x0201C;Opportunities and risks of LLMs in survey research,&#x0201D;</article-title> in <source>80th Annual AAPOR Conference</source> (<publisher-loc>St. Louis, MO</publisher-loc>: <publisher-name>AAPOR</publisher-name>).</mixed-citation>
</ref>
<ref id="B63">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Salinas</surname> <given-names>A.</given-names></name> <name><surname>Penafiel</surname> <given-names>L.</given-names></name> <name><surname>McCormack</surname> <given-names>R.</given-names></name> <name><surname>Morstatter</surname> <given-names>F.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;I&#x00027;m not racist but...&#x0201D;: discovering bias in the internal knowledge of large language models</article-title>. <source>arXiv [Preprint].</source> arXiv:2310.08780. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2310.08780</pub-id></mixed-citation>
</ref>
<ref id="B64">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sang</surname> <given-names>Y.</given-names></name> <name><surname>Stanton</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). Analyzing hate speech with incel-hunters&#x00027; critiques, in <italic>Proceedings of the International Conference on Social Media and Society</italic> (Toronto: <italic>ACM)</italic> 5&#x02013;13. doi: <pub-id pub-id-type="doi">10.1145/3400806.3400808</pub-id></mixed-citation>
</ref>
<ref id="B65">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sharpnack</surname> <given-names>J.</given-names></name> <name><surname>Hao</surname> <given-names>K.</given-names></name> <name><surname>Mulcaire</surname> <given-names>P.</given-names></name> <name><surname>Bicknell</surname> <given-names>K.</given-names></name> <name><surname>LaFlair</surname> <given-names>G.</given-names></name> <name><surname>Yancey</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>BanditCAT and AutoIRT: machine learning approaches to computerized adaptive testing and item calibration</article-title>. <source>arXiv [Preprint]</source>. arXiv:2410.21033. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2410.21033</pub-id></mixed-citation>
</ref>
<ref id="B66">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>P. C.</given-names></name> <name><surname>Stanton</surname> <given-names>J. M.</given-names></name></person-group> (<year>1998</year>). <article-title>Perspectives on the measurement of job attitudes: the long view</article-title>. <source>Hum. Resour. Manag. Rev.</source> <volume>8</volume>, <fpage>367</fpage>&#x02013;<lpage>386</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S1053-4822(99)00005-4</pub-id></mixed-citation>
</ref>
<ref id="B67">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Speer</surname> <given-names>A. B.</given-names></name> <name><surname>Oswald</surname> <given-names>F. L.</given-names></name> <name><surname>Putka</surname> <given-names>D. J.</given-names></name></person-group> (<year>2025</year>). <article-title>Reliability evidence for AI-based scores in organizational contexts: applying lessons learned from psychometrics</article-title>. <source>Organ. Res. Methods</source> doi: <pub-id pub-id-type="doi">10.1177/10944281251346404</pub-id></mixed-citation>
</ref>
<ref id="B68">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stanton</surname> <given-names>J.</given-names></name> <name><surname>Ramnarine-Rieks</surname> <given-names>A.</given-names></name> <name><surname>Sang</surname> <given-names>Y.</given-names></name></person-group> (<year>2024</year>). <article-title>Evaluating item content and scale characteristics using a pretrained neural network model</article-title>. <source>Surv. Res. Methods</source> <volume>18</volume>, <fpage>153</fpage>&#x02013;<lpage>165</lpage>. doi: <pub-id pub-id-type="doi">10.18148/srm/2024.v18i2.8240</pub-id></mixed-citation>
</ref>
<ref id="B69">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Suh</surname> <given-names>J.</given-names></name> <name><surname>Jahanparast</surname> <given-names>E.</given-names></name> <name><surname>Moon</surname> <given-names>S.</given-names></name> <name><surname>Kang</surname> <given-names>M.</given-names></name> <name><surname>Chang</surname> <given-names>S.</given-names></name></person-group> (<year>2025</year>). <article-title>&#x0201C;Language model fine-tuning on scaled survey data for predicting distributions of public opinions,&#x0201D;</article-title> in <source>Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>, <fpage>21147</fpage>&#x02013;<lpage>21170</lpage>.</mixed-citation>
</ref>
<ref id="B70">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>E.</given-names></name> <name><surname>Nan</surname> <given-names>D.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Lee</surname> <given-names>W.</given-names></name> <name><surname>Jansen</surname> <given-names>B. J.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Random silicon sampling: simulating human sub-population opinion using a large language model based on group-level demographic information</article-title>. <source>arXiv [Preprint]</source>. arXiv:2402.18144. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2402.18144</pub-id></mixed-citation>
</ref>
<ref id="B71">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). Artificial intelligence powered personality assessment: a multidimensional psychometric natural language processing perspective (Doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana, IL, United States.</mixed-citation>
</ref>
<ref id="B72">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>B.</given-names></name> <name><surname>Armoush</surname> <given-names>N.</given-names></name> <name><surname>Mazzullo</surname> <given-names>E.</given-names></name> <name><surname>Bulut</surname> <given-names>O.</given-names></name> <name><surname>Gierl</surname> <given-names>M.</given-names></name></person-group> (<year>2024</year>). <article-title>A review of automatic item generation techniques leveraging large language models</article-title>. <source>Int. J. Assess. Tools Educ.</source> <volume>12</volume>, <fpage>317</fpage>&#x02013;<lpage>340</lpage>. doi: <pub-id pub-id-type="doi">10.21449/ijate.1602294</pub-id></mixed-citation>
</ref>
<ref id="B73">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Terry</surname> <given-names>J.</given-names></name> <name><surname>Strait</surname> <given-names>G.</given-names></name> <name><surname>Alsarraf</surname> <given-names>S.</given-names></name> <name><surname>Weinmann</surname> <given-names>E.</given-names></name> <name><surname>Waychoff</surname> <given-names>A.</given-names></name></person-group> (<year>2025</year>). <article-title>Artificial intelligence in scale development: evaluating AI-generated survey items against gold standard measures</article-title>. <source>Curr. Psychol.</source> <volume>44</volume>, <fpage>16339</fpage>&#x02013;<lpage>16350</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12144-025-08240-w</pub-id></mixed-citation>
</ref>
<ref id="B74">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Veale</surname> <given-names>M.</given-names></name> <name><surname>Binns</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>Fairer machine learning in the real world: mitigating discrimination without collecting sensitive data</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. doi: <pub-id pub-id-type="doi">10.1177/2053951717743530</pub-id></mixed-citation>
</ref>
<ref id="B75">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Veldkamp</surname> <given-names>B. P.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;Trustworthy artificial intelligence in psychometrics,&#x0201D;</article-title> in <source>Essays on Contemporary Psychometrics</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>69</fpage>&#x02013;<lpage>87</lpage>.</mixed-citation>
</ref>
<ref id="B76">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Villalobos</surname> <given-names>P.</given-names></name> <name><surname>Ho</surname> <given-names>A.</given-names></name> <name><surname>Sevilla</surname> <given-names>J.</given-names></name> <name><surname>Besiroglu</surname> <given-names>T.</given-names></name> <name><surname>Heim</surname> <given-names>L.</given-names></name> <name><surname>Hobbhahn</surname> <given-names>M.</given-names></name></person-group> (<year>2024</year>). <article-title>&#x0201C;Position: will we run out of data? limits of LLM scaling based on human-generated data,&#x0201D;</article-title> in <source>Forty-first International Conference on Machine Learning</source> (<publisher-loc>Vienna</publisher-loc>: <publisher-name>ICML</publisher-name>).</mixed-citation>
</ref>
<ref id="B77">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Villegas-Ch</surname> <given-names>W.</given-names></name> <name><surname>Navarro</surname> <given-names>A. M.</given-names></name> <name><surname>Mera-Navarrete</surname> <given-names>A.</given-names></name></person-group> (<year>2025</year>). <article-title>Using generative adversarial networks for the synthesis of emotional facial expressions in virtual educational environments</article-title>. <source>Intell. Syst. Appl.</source> <volume>25</volume>:<fpage>200479</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.iswa.2025.200479</pub-id></mixed-citation>
</ref>
<ref id="B78">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Duan</surname> <given-names>Z.</given-names></name></person-group> (<year>2024</year>). <article-title>Agent AI with LangGraph: a modular framework for enhancing machine translation using large language models</article-title>. <source>arXiv [Preprint]</source>. arXiv:2412.03801. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2412.03801</pub-id></mixed-citation>
</ref>
<ref id="B79">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Song</surname> <given-names>R.</given-names></name> <name><surname>Guo</surname> <given-names>W.</given-names></name> <name><surname>Yang</surname> <given-names>H.</given-names></name></person-group> (<year>2025</year>). <article-title>Exploring prompt pattern for generative artificial intelligence in automatic question generation</article-title>. <source>Interact. Learn. Environ.</source> <volume>33</volume>, <fpage>2559</fpage>&#x02013;<lpage>2584</lpage>. doi: <pub-id pub-id-type="doi">10.1080/10494820.2024.2412082</pub-id></mixed-citation>
</ref>
<ref id="B80">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Guo</surname> <given-names>Q.</given-names></name> <name><surname>Yao</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>Autosurvey: large language models can automatically write surveys</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>37</volume>, <fpage>115119</fpage>&#x02013;<lpage>115145</lpage>. doi: <pub-id pub-id-type="doi">10.52202/079017-3655</pub-id></mixed-citation>
</ref>
<ref id="B81">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Chu</surname> <given-names>Z.</given-names></name> <name><surname>Doan</surname> <given-names>T. V.</given-names></name> <name><surname>Ni</surname> <given-names>S.</given-names></name> <name><surname>Yang</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name></person-group> (<year>2025</year>). <article-title>History, development, and principles of large language models: an introductory survey</article-title>. <source>AI Ethics</source> <volume>5</volume>, <fpage>1955</fpage>&#x02013;<lpage>1971</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s43681-024-00583-7</pub-id></mixed-citation>
</ref>
<ref id="B82">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>X.</given-names></name> <name><surname>Xiao</surname> <given-names>L.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Ma</surname> <given-names>T.</given-names></name> <name><surname>He</surname> <given-names>L.</given-names></name></person-group> (<year>2022</year>). <article-title>A survey of human-in-the-loop for machine learning</article-title>. <source>Future Gener. Comput. Syst.</source> <volume>135</volume>, <fpage>364</fpage>&#x02013;<lpage>381</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.future.2022.05.014</pub-id></mixed-citation>
</ref>
<ref id="B83">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Kim</surname> <given-names>Y. J.</given-names></name> <name><surname>Sharaf</surname> <given-names>A.</given-names></name> <name><surname>Awadalla</surname> <given-names>H. H.</given-names></name></person-group> (<year>2024</year>). <article-title>A paradigm shift in machine translation: boosting translation performance of large language models</article-title>. <source>arXiv [Preprint]</source>. arXiv:2309.11674. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2309.11674</pub-id></mixed-citation>
</ref>
<ref id="B84">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>R.</given-names></name> <name><surname>Peng</surname> <given-names>J.</given-names></name></person-group> (<year>2025</year>). <article-title>A comprehensive survey of deep research: systems, methodologies, and applications</article-title>. <source>arXiv [Preprint]</source>. arXiv:2506.12594. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2506.12594</pub-id></mixed-citation>
</ref>
<ref id="B85">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Yancey</surname> <given-names>K. P.</given-names></name> <name><surname>Runge</surname> <given-names>A.</given-names></name> <name><surname>Laflair</surname> <given-names>G.</given-names></name> <name><surname>Mulcaire</surname> <given-names>P.</given-names></name></person-group> (<year>2024</year>). <article-title>&#x0201C;Bert-IRT: accelerating item piloting with BERT embeddings and explainable IRT models,&#x0201D;</article-title> in <source>Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)</source> (<publisher-loc>Mexico City</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>428</fpage>&#x02013;<lpage>438</lpage>.</mixed-citation>
</ref>
<ref id="B86">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Zhan</surname> <given-names>G.</given-names></name> <name><surname>Shi</surname> <given-names>G.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>K.</given-names></name></person-group> (<year>2025</year>). <article-title>&#x0201C;A survey on automated data analysis techniques powered by large language models,&#x0201D;</article-title> in <source>2025 21st International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)</source> (<publisher-loc>Guilin</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>503</fpage>&#x02013;<lpage>509</lpage>.</mixed-citation>
</ref>
<ref id="B87">
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>W. X.</given-names></name> <name><surname>Zhou</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Tang</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Hou</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>A survey of large language models</article-title>. <source>arXiv [Preprint]</source>. arXiv:2303.18223. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2303.18223</pub-id></mixed-citation>
</ref>
<ref id="B88">
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Zhuang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Pardos</surname> <given-names>Z.</given-names></name> <name><surname>Kyllonen</surname> <given-names>P. C.</given-names></name> <name><surname>Zu</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2025</year>). <article-title>&#x0201C;Position: AI evaluation should learn from how we test humans,&#x0201D;</article-title> in <source>Proceedings of the Forty-second International Conference on Machine Learning Position Paper Track</source> (<publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>ICML</publisher-name>).</mixed-citation>
</ref>
</ref-list>
<fn-group>
<fn fn-type="custom" custom-type="edited-by" id="fn0001">
<p>Edited by: <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/2800521/overview">Charles A. Pierce</ext-link>, Oakland University, United States</p>
</fn>
<fn fn-type="custom" custom-type="reviewed-by" id="fn0002">
<p>Reviewed by: <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/395833/overview">Mark D. Reckase</ext-link>, Michigan State University, United States</p>
<p><ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3011732/overview">Yishen Song</ext-link>, Beijing Normal University, China</p>
</fn>
</fn-group>
</back>
</article>