<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">668538</article-id>
<article-id pub-id-type="doi">10.3389/feduc.2021.668538</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A New Measurement Instrument for Music-Related Argumentative Competence: The MARKO Competency Test and Competency Model</article-title>
<alt-title alt-title-type="left-running-head">Ehninger et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">MARKO Competency Test and Model</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Ehninger</surname>
<given-names>Julia</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1135915/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Knigge</surname>
<given-names>Jens</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/630641/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schurig</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/842999/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rolle</surname>
<given-names>Christian</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1279704/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Institute for Music Education, University of Cologne, <addr-line>Cologne</addr-line>, <country>Germany</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Department for Arts and Culture, Nord University, <addr-line>Levanger</addr-line>, <country>Norway</country>
</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>Faculty of Rehabilitation Sciences, TU Dortmund University, <addr-line>Dortmund</addr-line>, <country>Germany</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/274105/overview">Shaljan Areepattamannil</ext-link>, Emirates College for Advanced Education, United Arab Emirates</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1075636/overview">Mar&#xed;a Isabel de Vicente-Yag&#xfc;e Jara</ext-link>, University of Murcia, Spain</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/31327/overview">Daniel M&#xfc;llensiefen</ext-link>, Goldsmiths University of London, United&#x20;Kingdom</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Julia Ehninger, <email>info@juliaehninger.de</email>; Jens Knigge, <email>jens.knigge@nord.no</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Educational Psychology, a section of the journal Frontiers in Education</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>06</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>6</volume>
<elocation-id>668538</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>02</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>05</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Ehninger, Knigge, Schurig and Rolle.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Ehninger, Knigge, Schurig and Rolle</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>In this paper, we introduce the MARKO competency test and competency model, a new measurement instrument for music-related argumentative competence (MARKO: Musikbezogene ARgumentationsKOmpetenz; German for music-related argumentative competence). This competence, which plays an essential role in school curricula, refers to the ability to justify, and defend judgments about music. The two main goals of this study were 1) to design an assessment test for music-related argumentation that fulfills psychometric criteria and 2) to derive competency levels based on empirical data to describe the cognitive dispositions that are necessary when engaging in argumentation about music. Based on a theoretical framework, we developed a competency test to assess music-related argumentative competence. After two pretests (<italic>n</italic>&#x20;&#x3d; 391), we collected data from 440 students from grade nine to the university level. The final test consisted exclusively of open-ended items, which were rated with coding schemes that had been designed for each item. After ensuring inter-rater reliability, we composed an item pool that met psychometric criteria (e.g., local stochastic independence and item homogeneity) and represented content-related aspects in a meaningful way. Based on this item pool, we estimated a one-dimensional partial credit model. Following a standard-setting approach, four competency levels were derived from the empirical data. While individuals on the lowest competency level expressed their own opinions about the music by referring to salient musical attributes, participants on the highest level discussed different opinions on the music, and considered the social and cultural context of the music. The proficiency scores significantly varied between grades. Our findings empirically support some theoretical assumptions about music-related argumentation and challenge others.</p>
</abstract>
<kwd-group>
<kwd>music-related argumentation</kwd>
<kwd>competency</kwd>
<kwd>assessment</kwd>
<kwd>music</kwd>
<kwd>reasoning</kwd>
<kwd>item response theory</kwd>
<kwd>empirical research</kwd>
<kwd>musical judgment</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Music-related argumentative competence is &#x201c;the (learnable) ability to justify and defend aesthetic judgments about music in a comprehensive, plausible, and differentiated way&#x201d; (<xref ref-type="bibr" rid="B22">Kn&#xf6;rzer et&#x20;al., 2016</xref>, p. 2). In our everyday lives and in educational contexts, we frequently talk about music, and give reasons for our opinions. After a concert, audience members might discuss whether they liked what they heard. On social media, comments are posted below music videos that might even lead to an interactive discussion about the video. At rehearsals, band members often talk about their musical progress. Argumentation also plays an integral role in music lessons at schools.</p>
<p>Students&#x2019; argumentation skills are considered important for overall educational success (<xref ref-type="bibr" rid="B24">Kuhn, 2005</xref>), and music-related argumentation is crucial in music education. Talking about music is part of many music practices and language skills are helpful in enhancing musical learning processes. Thus, they are incorporated into the German music curricula, where argumentation plays an essential role, usually in the competency domain of &#x201c;reflection&#x201d; (e.g., <xref ref-type="bibr" rid="B28">Ministerium f&#xfc;r Schule und Berufsbildung des Landes Schleswig-Holstein, 2015</xref>; <xref ref-type="bibr" rid="B29">Ministerium f&#xfc;r Schule und Bildung des Landes Nordrhein-Westfalen, 2019</xref>).</p>
<sec id="s1-1">
<title>Music-Related Argumentation and Research on Competencies</title>
<p>Theories on argumentation date back to antiquity. <xref ref-type="bibr" rid="B43">Toulmin (2003)</xref>, one of the pioneers of modern argumentation theory, claimed that even though certain aspects of argumentation practice are field invariant, others vary from field to field. For example, a mathematician will have to deal with different &#x201c;forums,&#x201d; &#x201c;stakes,&#x201d; and &#x201c;contextual details&#x201d; (<xref ref-type="bibr" rid="B42">Toulmin, 1992</xref>, p. 9) when reasoning about a mathematical problem than a lawyer who appears in&#x20;court.</p>
<p>Music-related argumentation is a form of aesthetic argumentation. Claims to validity in this field differ from, for instance, claims to validity in natural science. If a person claims after a concert that they did not like the way the conductor interpreted the piece, the judgment does not refer only to the concert itself but also to their own impression of the music. Aesthetic judgments are neither merely subjective (e.g., expressing personal preferences) nor objective (e.g., referring to musical characteristics) but also relational (<xref ref-type="bibr" rid="B37">Rolle, 1999</xref>). They refer to the relationship between the person making a claim and the aesthetic object (<xref ref-type="bibr" rid="B17">Kant, 1790/2007</xref>, &#xa7; 1). Nevertheless, judgments about music can claim intersubjective validity. If someone is stating that a melody is lovely, they are articulating their aesthetic experience and may try to convince others by making the experience comprehensible for them (<xref ref-type="bibr" rid="B22">Kn&#xf6;rzer et&#x20;al., 2016</xref>, p. 2). <xref ref-type="bibr" rid="B37">Rolle (1999</xref>, p. 115) suggested that aesthetic judgments are recommendations. The judgments encourage others to perceive the aesthetic object in a certain way (see <xref ref-type="bibr" rid="B40">Stevenson, 1950</xref>).</p>
<p>Argumentation in music therefore needs a theoretical approach that takes into account interactive communication as well as field-dependent aspects. <xref ref-type="bibr" rid="B38">Rolle (2013)</xref> suggested a competency model for music-related argumentation that integrates theoretical assumptions on aesthetic argumentation (see above), general argumentation theories emphasizing the dialogical structure of argumentation (<xref ref-type="bibr" rid="B5">Eemeren et&#x20;al., 2014</xref>, ch. 10; <xref ref-type="bibr" rid="B46">Wohlrapp, 2014</xref>), research on reflective judgment (<xref ref-type="bibr" rid="B19">King and Kitchener, 2004</xref>), and concepts from art education (<xref ref-type="bibr" rid="B32">Parsons, 1987</xref>). In his competency model, Rolle distinguished several levels of music-related argumentation. While people on lower levels refer only to the objective properties of the music, such as its musical attributes or expressive qualities, people on higher levels combine the former two aspects and are able to consider different aesthetic conventions or cultural practices (<xref ref-type="bibr" rid="B38">Rolle, 2013</xref>, p. 146). People on lower levels assume that different musical judgments are a matter of taste, whereas people on higher levels can reflect on their own musical preferences and integrate different perspectives and counterarguments into their reasoning.</p>
<p>Little research has been conducted in this field. <xref ref-type="bibr" rid="B22">Kn&#xf6;rzer et&#x20;al. (2016)</xref> carried out the first empirical study on Rolle&#x2019;s model. In their study, 37 participants listened to two versions of the same musical piece. They were then asked which of the two versions they liked better and why. Kn&#xf6;rzer et&#x20;al. divided their sample into three groups according to their expertise (high school students, university students majoring in music education, professional musicians, and music educators). The authors of the study analyzed which aspects of the music the participants referred to when giving reasons for their judgment. While participants with the lowest expertise referred to subjective aspects in their reasoning, participants with higher musical expertise more often took context-specific background knowledge into account. <xref ref-type="bibr" rid="B9">Gottschalk and Lehmann-Wermser (2013)</xref> investigated the music-related argumentative competence of ninth graders analyzing discussions in the music classroom.</p>
<p>Much empirical research has been conducted on the development of empirically validated competency models since international large scale assessments such as PISA or TIMMS started to use domain-specific competency models as theoretical frameworks (e.g., <xref ref-type="bibr" rid="B25">Leutner et&#x20;al., 2017</xref>). In this context, the definition of competency is mainly based on the theoretical work of <xref ref-type="bibr" rid="B45">Weinert (2001)</xref> who suggested that competencies are &#x201c;context-specific cognitive dispositions that are acquired and needed to successfully cope with certain situations or tasks in specific domains&#x201d; (<xref ref-type="bibr" rid="B23">Koeppen et&#x20;al., 2008</xref>, p. 62; see also <xref ref-type="bibr" rid="B13">Hartig et&#x20;al., 2008</xref>). In the &#x201c;specific domain&#x201d; of music education, however, research on competencies has been scarce. Apart from the KoMus project, which investigated students&#x2019; competency to perceive and contextualize music (<xref ref-type="bibr" rid="B16">Jordan and Knigge, 2010</xref>; <xref ref-type="bibr" rid="B15">Jordan et&#x20;al., 2012</xref>), and the KOPRA-M project, which dealt with music performance competency (<xref ref-type="bibr" rid="B11">Hasselhorn and Lehmann, 2015</xref>), no empirical research has been conducted on music-related competency modelling (for an overview, see <xref ref-type="bibr" rid="B10">Hasselhorn and Knigge, in press</xref>). Based on Weinert&#x2019;s conceptual work and against the background of <xref ref-type="bibr" rid="B22">Kn&#xf6;rzer et&#x20;al.&#x2019;s (2016</xref>, p. 2) suggestion, we define music-related argumentative competence as follows: Music-related argumentative competence is the context-specific cognitive disposition that is acquired and needed to justify and defend aesthetic judgments about music in a comprehensive, plausible, and differentiated&#x20;way.</p>
</sec>
<sec id="s1-2">
<title>Research Goal</title>
<p>Our study was designed to empirically investigate theoretical assumptions about music-related argumentation. How is music-related argumentative competence structured? Which aspects play a role when people are reasoning about music? Which characteristics contribute to an argument being better or worse than others? The empirical study is based on Rolle&#x2019;s theoretical framework on music-related argumentation (<xref ref-type="bibr" rid="B38">Rolle, 2013</xref>); see (<italic>Music-Related Argumentation and Research on Competencies</italic>). In line with our overall goal, our first aim was to develop a competency test for music-related argumentation based on theoretical assumptions about the nature of music-related argumentation. After ensuring the psychometric properties of the test (i.e.,&#x20;model fit and reliability), our second aim was to model competence levels based on empirical data to show the challenges faced by the participants when reasoning about music. To our knowledge, this is the first empirical research endeavor on competency levels in this&#x20;field.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and Methods</title>
<p>In this section, we describe the test design, data collection, data analysis, and methodological procedure for the specification of the competency levels.</p>
<p>We conducted analyses using a partial credit model (PCM; <xref ref-type="bibr" rid="B3">Bond and Fox, 2015</xref>) from the item response theory (IRT) framework. In IRT, the probability of solving an item depends on the difficulty of the item and the ability of the person trying to solve it. This approach makes it possible to estimate personal ability values (i.e.,&#x20;weighted likelihood estimation [WLE]) and item difficulties (Thurstonian thresholds in PCM) on a common scale and draw conclusions about the underlying latent&#x20;trait.</p>
<sec id="s2-1">
<title>Test Design</title>
<p>For the MARKO competency test (Musikbezogene ARgumentationsKOmpetenz; German for music-related argumentative competence), over 60 test items were designed, and tested during two piloting phases in 2017 and 2018. We examined several German state and federal-level school curricula as well as schoolbooks. We also considered items that had been developed in the context of the KoMus project (<xref ref-type="bibr" rid="B21">Knigge, 2010</xref>; <xref ref-type="bibr" rid="B15">Jordan et&#x20;al., 2012</xref>) and preliminary empirical research on music-related argumentation by <xref ref-type="bibr" rid="B22">Kn&#xf6;rzer et&#x20;al. (2016)</xref>.</p>
<p>Middle school, high school, and university students participated in the 90-min piloting sessions (<italic>n</italic>&#x20;&#x3d; 391). Since the test was administered online, the testing sessions usually took place in the computer rooms of the cooperating institutions during regular music lessons. In the testing sessions, the participants wore headphones, and individually sat in front of computers while listening to music, and watching videos. During the test, they were asked to state and justify their judgment in a written statement. While the majority of items tested in the first pilot phase had to be discarded, the items in the second phase were constantly revised based on feedback from teachers, participants, and research fellows.</p>
<p>Twenty-five items were successfully incorporated into the main study. The final test consisted exclusively of open-ended items. These items were analyzed in terms of inter-rater reliability, item fit indices, and item discrimination. During the piloting phase, it became clear that these types of items were especially suited for measuring music-related argumentative competence because in closed items, arguments cannot be produced but only evaluated. The average processing time of the participants varied greatly because of the different amounts of text that they produced. Some students merely produced a sentence per item, while others wrote lengthy paragraphs to justify their music-related judgment. Therefore, in the introduction to the final version of the test, we told the participants that it was not important to complete all items, and asked them to take their time. We used a rotated test design to collect as much data as possible on all 25 items. The 25 items were split into three sets, and the sets appeared in a different order in the three final test booklets.</p>
<p>Each test session began with a short verbal introduction by the test supervisor. The online test also included an explanatory introduction that elaborated on the nature of music-related judgments and musical terminology.</p>
</sec>
<sec id="s2-2">
<title>Data Collection and Participants</title>
<p>The data collection for the main study took place in 2019 at nine public high schools during regularly scheduled music lessons, and two universities (with students majoring in music education programs) in the state of North Rhine-Westphalia, Germany (<italic>n</italic>&#x20;&#x3d; 440) (three high school students in the sample were visiting from other schools). Of the participants, 44.5% were female. About one third of the students were in grade nine (age: 14&#x2013;15), 24.5% were in grade ten (age: 15&#x2013;16), 28.6% in grade eleven (age: 16&#x2013;17), 5.5% in grade twelve (age: 17&#x2013;18), and 7.7% were university students. The mean age was 16&#xa0;years (<italic>SD</italic> &#x3d; 2.79) and the duration of the test was 90&#xa0;min. In addition to the competency test, demographic data were collected (gender, age, family migration history, and language), as well as data on musical experience (i.e.,&#x20;whether participants received musical instrument lessons), and musical sophistication (Gold-MSI general musical sophistication; <xref ref-type="bibr" rid="B31">M&#xfc;llensiefen et&#x20;al., 2014</xref>). The three test booklets were distributed almost evenly among the participating students (booklet I: 33.6%, booklet II: 33.9%, booklet III: 32.5%). Since the participants were told that they did not have to respond to all the items but should take their time, some participants did not complete all test items (26.2% missing values).</p>
</sec>
<sec id="s2-3">
<title>Data Analysis</title>
<sec id="s2-3-1">
<title>Sample Item</title>
<p>The test items were designed to measure different aspects of music-related argumentative competence. Some items aimed at assessing subjective perspectives, others were designed to determine how participants referred to musical attributes, and still others focused on the dialogical aspects of argumentation. For example, the participants were asked to comment on a discussion below a YouTube video, to react to a concert review in a newspaper, or to justify why they believed a song did or did not generate a certain atmosphere. Several incentives or triggers for possible argumentation were given in the test items. The following sample item exemplifies the items used in the&#x20;test.</p>
<p>The sample item &#x201c;Star Wars&#x201d; was developed to assess how participants referred to musical attributes and to the generated musical atmosphere (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). In the sample item, there is an explicit request to refer to the musical attributes of the piece. In addition, the item text mentions the expressive qualities of the music and the desired mood that is supposed to be created (&#x201c;atmosphere of outer space&#x201d;). Thus, two types of references are suggested in the item, which also play a role in the theoretical model: musical attributes and expressive qualities of the music (Stages 3 and 4 in <xref ref-type="bibr" rid="B38">Rolle, 2013</xref>, p. 146). The supplementary material includes another sample item (&#x201c;Eurovision Song Contest&#x201d;) focusing on the dialogical aspect of argumentation (see <xref ref-type="sec" rid="s9">Supplementary Material</xref>, section&#x20;1).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Test item &#x201c;Star Wars&#x201d; (English translation). The participants listen to an excerpt from the film score (&#x201c;Arrival at Naboo&#x201d; from Episode I). A screenshot from the scene was shown in the item but had to be removed here due to copyright issues. The screenshot shows the view of a planet from a space shuttle cockpit.</p>
</caption>
<graphic xlink:href="feduc-06-668538-g001.tif"/>
</fig>
<p>As the test consisted exclusively of open-ended items, coding schemes had to be developed to rate the&#x20;items.</p>
</sec>
<sec id="s2-3-2">
<title>Coding Process</title>
<p>Coding schemes were developed for each test item in a predominantly inductive and explorative process. By developing the coding schemes inductively, we were able to ensure the specifics of each item, since we also observed response behavior that could not be attributed to theoretical assumptions. <xref ref-type="table" rid="T1">Table&#x20;1</xref> gives an overview of the coding scheme used to rate test answers of the sample item &#x201c;Star Wars.&#x201d; Test answers were rated with one point if the test taker referred only to the musical atmosphere or mentioned only salient (i.e.,&#x20;basic) musical attributes. Two points were assigned if a connection between musical attributes and the generated atmosphere was established. If participants went into further detail on specific aspects of the music, three points were given. Two raters coded approximately 15% of all the collected test data. The inter-rater reliability ranged from good to very good (Cohen&#x2019;s <italic>&#x3ba;</italic> &#x3d; 0.73&#x2013;0.94 (calculation with linear weights)). The coding scheme for another item (&#x201c;Eurovision Song Contest&#x201d;) is included in section 1 of the <xref ref-type="sec" rid="s9">Supplementary Material</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Coding scheme for the sample item &#x201c;Star Wars&#x201d; (condensed and simplified version).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Points</th>
<th align="center">Description</th>
<th align="center">Sample answers</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">0</td>
<td align="left">Tautological justification or no reason</td>
<td align="left">&#x201c;Yes, because of the atmosphere that exists in space. The composer presented this very well.&#x201d; (VP_661)</td>
</tr>
<tr>
<td rowspan="2" align="left">1</td>
<td align="left">Participants refer only to the musical atmosphere</td>
<td rowspan="2" align="left">&#x201c;I think so, because it sounds exciting and unusual, which, in my opinion, corresponds well with the atmosphere in outer space.&#x201d; (VP_714)</td>
</tr>
<tr>
<td align="left">If musical attributes are mentioned (or even a causal relationship is established between them and the atmosphere), this is done by referring to &#x201c;basic&#x201d; and superficial characteristics of the music (e.g., &#x201c;bright notes,&#x201d; &#x201c;long tones,&#x201d; &#x201c;loud,&#x201d; &#x201c;soft,&#x201d; &#x201c;instruments that create tension&#x201d;)</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">Participants relate the generated atmosphere to musical attributes. If instruments (e.g., &#x201c;quiet strings&#x201d;) are mentioned, the answer is given two points</td>
<td align="left">&#x201c;Yes, I find it very well done. The sound layers depict the infinite vastness of the universe &#x2026; the synthesizers give the piece a futuristic character &#x2026; single high notes to illustrate the stars.&#x201d; (VP_589)</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">Participants relate the generated atmosphere to musical attributes. A detailed description is provided (e.g., the musical form and the way the instruments are played)</td>
<td align="left">&#x201c;I find the composition convincing because the long notes (played by the violin) generate a feeling of width and yet (because of the high notes) sound quite excited and dramatic, especially at the beginning. The fast (xylophone?) notes that go up and down the scale have a bright sound and are reminiscent of stars. The flourish at the beginning could suggest that a scenery of spectacular surroundings is just revealing itself to the audience.&#x201d; (VP_610)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-3-3">
<title>Item Selection</title>
<p>The analyses were carried out with R version 3.6.2 (<xref ref-type="bibr" rid="B35">R Core Team, 2019</xref>) with the packages TAM (<xref ref-type="bibr" rid="B36">Robitzsch et&#x20;al., 2020</xref>) and eRm (<xref ref-type="bibr" rid="B27">Mair et&#x20;al., 2020</xref>). A one-dimensional partial credit model was estimated with the data we collected for the main study. Due to computational reasons, we conducted analyses for participants that had values for at least eight test items (pairwise deletion). 27 participants had to be excluded and analyses were conducted with <italic>n</italic>&#x20;&#x3d; 440. Missing values were not imputed.</p>
<p>We ensured that standards related to classical test theory were met (<xref ref-type="bibr" rid="B47">Wu et&#x20;al., 2016</xref>, ch. 5). Item categories were collapsed if the relative frequency of the category was below 5%. We also monitored whether the item difficulty (i.e.,&#x20;Thurstonian thresholds) of the item categories appeared in the right order. As part of the criteria for Rasch conformity, Mean Squared Residual (MSQ) based item fit indices were calculated, considering conventional cut-off criteria (<xref ref-type="bibr" rid="B2">Ames and Penfield, 2015</xref>; <xref ref-type="bibr" rid="B3">Bond and Fox, 2015</xref>). In addition, item discrimination was determined as the point-biserial correlation of the item response category with the person ability (WLE) measured in the test. In a visual inspection, the expected item characteristic curves were compared with the empirically observed&#x20;ones.</p>
<p>The global fit of the model and the assumption of local stochastic independence were examined with Q3 statistics, graphical model tests, and the Wald test. While 2.33% (Q3) and 0.67% (aQ3) of all 600 item pairs showed values above the cut-off criterion &#x3e; 0.2 (<xref ref-type="bibr" rid="B4">Chen and Thissen, 1997</xref>), the mean of all Q3 and aQ3 values was close to zero (Q3: <italic>M &#x3d;</italic> &#x2212;0.05, <italic>SD</italic> &#x3d; 0.07; aQ3: <italic>M</italic>&#x20;&#x3c; 0.01, <italic>SD</italic> &#x3d; 0.07). Andersen&#x2019;s likelihood ratio test showed a significant result when the sample was split into two subsamples using a random split criterion (even vs. uneven case number) and gender as a split criterion (male vs. not male). Therefore, we also conducted graphical likelihood ratio model tests. No item categories were graphical outliers. The confidence ellipses intersected or were close to the identity line (<italic>x</italic>&#x20;&#x3d; <italic>y</italic> line; <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). Wald test results did not show anomalies, except for one item response category in the gender subsamples.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Graphical model test (split criterion: even-uneven case number).</p>
</caption>
<graphic xlink:href="feduc-06-668538-g002.tif"/>
</fig>
<p>To ensure the fairness of the test, we conducted analyses for differential item functioning (DIF) with the group variables gender, language use at home, and with a variable specifying whether participants received musical instrument lessons. We followed the categorization proposed by the Educational Testing Service, assuming that an effect size &#x2265; 0.64 logits indicates moderate to large DIF (<xref ref-type="bibr" rid="B44">Trendtel et&#x20;al., 2016</xref>, p. 131). Following this categorization, one item showed moderately significant DIF for participants who did not receive musical instrument lessons (&#x2212;0.69 logits), and one item showed significant DIF for male participants (&#x2212;0.72 logits). We kept both items in the item pool because the irregularities were not deemed detrimental.</p>
<p>None of the remaining 25 test items were eliminated from the item pool that was used for the final computation of the model. However, eight item categories had to be collapsed due to misfitting item characteristics in terms of item difficulty.</p>
</sec>
<sec id="s2-3-4">
<title>Modeling Competency Levels</title>
<p>Conclusions about the nature of music-related argumentation can only be drawn through a content-related description of competency levels. Following this approach, the requirements that the participants must meet during the test can be determined. An important prerequisite for the criterion-related descriptions of the competency levels is the IRT scaling of the&#x20;test.</p>
<p>In accordance with the bookmark method, which is a standard-setting procedure (e.g., <xref ref-type="bibr" rid="B26">Lewis et&#x20;al., 2012</xref>), criteria-oriented competence levels were derived from the empirical data using external criteria from the theoretical model (<xref ref-type="bibr" rid="B38">Rolle, 2013</xref>) and the coding schemes. In the standard-setting procedure, item categories were ordered by their Thurstonian thresholds in an item-person map (Wright map) according to their 65% probability of solving the item response category correctly (with the R package WrightMap; <xref ref-type="bibr" rid="B41">Torres Irribarra and Freund, 2020</xref>). During the test design process, several skills and abilities that the participants had to master to solve the items were inspected. On the one hand, the coding schemes contained a lot of information about which competencies had to be mastered to solve a certain item response category. On the other hand, the test items had been designed based on Rolle&#x2019;s model and school curricula. These frameworks include assumptions about abilities and task characteristics that can be crucial to the item-solving process (e.g., reference to salient vs. differentiated musical attributes, taking into account the expressive qualities of the music, and dealing with different perspectives on the musical piece).</p>
<p>Stemming from these a priori specified assumptions, three of the authors discussed which cognitive processes played a crucial role when solving an item. Every item response category in the ordered item booklet was reviewed and discussed in depth in terms of the relevant &#x201c;knowledge, skills, and abilities&#x201d; (<xref ref-type="bibr" rid="B18">Karantonis and Sireci, 2006</xref>, p. 5) that the participants had to master to solve the items. In this manner, cut scores were set&#x20;for the competency levels; an accordingly qualified student is&#x20;expected to have mastered the items below the cut score but not yet expected to have mastered the items above the bookmark. Mastery refers to having a 65% chance of solving the item. These cut scores were discussed and readjusted in an iterative process.</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>The Test</title>
<p>The final MARKO test consisted of 25 items (23 polytomous and 2 dichotomous items). The items that were selected showed appropriate infit (0.79&#x2013;1.22) and outfit values (0.78&#x2013;1.36; see <xref ref-type="sec" rid="s9">Supplementary Table S2</xref>). Relative item frequency (percentage of participants who solved an item response category) ranged from 0.05 to 0.79. Thurstonian thresholds appeared in the right order. EAP/PV reliability equaled 0.91, and WLE reliability was 0.90 (see <xref ref-type="bibr" rid="B1">Adams, 2005</xref>; <xref ref-type="bibr" rid="B39">Rost, 2004</xref>, pp. 380&#x2013;382 for an overview of test reliability measures).</p>
</sec>
<sec id="s3-2">
<title>The Model</title>
<p>In addition to a one-dimensional model (model A), we also estimated two two-dimensional models (<xref ref-type="table" rid="T2">Table&#x20;2</xref>). In model B, four items addressing the social context of the presented music were assigned to a second dimension. In model C, single item response categories focusing on the social context of the music were assigned to another dimension. In both models, the two dimensions had a correlation of <italic>r</italic>&#x20;&#x3d; 0.96. Exploratory factor analyses did not provide meaningful results due to low or double loadings. The analyses indicate that the item pool does not support multidimensionality. The resulting model is therefore a one-dimensional&#x20;PCM.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Information criteria for the estimated models. Model A represents a one-dimensional model. In model B, four items were assigned to a second dimension, and in model C, single-item categories were attributed to a second dimension.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">Loglike</th>
<th align="center">Deviance</th>
<th align="center">Npars</th>
<th align="center">Nobs</th>
<th align="center">AIC</th>
<th align="center">BIC</th>
<th align="center">AIC3</th>
<th align="center">AICc</th>
<th align="center">CAIC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Model A</td>
<td align="char" char=".">&#x2212;6,720.02</td>
<td align="char" char=".">13,440.03</td>
<td align="char" char=".">54</td>
<td align="char" char=".">440</td>
<td align="char" char=".">13,548.03</td>
<td align="char" char=".">13,768.72</td>
<td align="char" char=".">13,602.03</td>
<td align="char" char=".">13,563.46</td>
<td align="char" char=".">13,822.72</td>
</tr>
<tr>
<td align="left">Model B</td>
<td align="char" char=".">&#x2212;6,641.70</td>
<td align="char" char=".">13,283.39</td>
<td align="char" char=".">129</td>
<td align="char" char=".">440</td>
<td align="char" char=".">13,541.39</td>
<td align="char" char=".">14,068.59</td>
<td align="char" char=".">13,670.39</td>
<td align="char" char=".">13,649.58</td>
<td align="char" char=".">14,197.59</td>
</tr>
<tr>
<td align="left">Model C</td>
<td align="char" char=".">&#x2212;8,421.70</td>
<td align="char" char=".">16,843.40</td>
<td align="char" char=".">107</td>
<td align="char" char=".">440</td>
<td align="char" char=".">17,057.40</td>
<td align="char" char=".">17,494.69</td>
<td align="char" char=".">17,164.40</td>
<td align="char" char=".">17,127.02</td>
<td align="char" char=".">17,601.69</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-3">
<title>Competency Levels and Proficiency Scores</title>
<p>Four competency levels were derived from our data following the standard-setting approach described in <italic>Modeling Competency Levels</italic>. The item categories were ordered by their item difficulty (i.e.,&#x20;65% solution probability) in a Wright map (see <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>). The relevant abilities that the participants had to master to solve an item were identified. In this manner, conclusions about the competency levels were&#x20;drawn.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Wright map (person-item map) with item categories ordered by 65% solution probability. On this map, the difficulty of the item response categories (Thurstonian thresholds) are placed on the same metric scale as the proficiency scores (person ability measures). The left side of the figure shows a histogram of the proficiency scores and the right side shows the difficulty of the item categories.</p>
</caption>
<graphic xlink:href="feduc-06-668538-g003.tif"/>
</fig>
<p>Four competency levels were derived from the empirical data. Individuals on the lowest level, level A, express their own opinions about the music, and refer to salient musical attributes (e.g., &#x201c;loud&#x201d; and &#x201c;fast&#x201d;) in their judgments. Participants on level B additionally report various opinions on the music. Whereas individuals on level B refer to several salient musical attributes, students on level C refer to musical attributes in detail in their judgments. Finally, individuals on level D discuss different opinions on the music and take into account the social and cultural context of the music (see <xref ref-type="table" rid="T3">Table&#x20;3</xref>).</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Description of the competency levels. The sample answers are taken from the two sample items, &#x201c;Star Wars&#x201d; (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>) and &#x201c;Eurovision Song Contest&#x201d; (see section 1 in the <xref ref-type="sec" rid="s9">Supplementary Material</xref>; solution probability 65%). The second column (&#x201c;Logits&#x201d;) refers to the person ability scores (weighted likelihood estimation [WLE]). The table is supposed to be read from bottom to&#x20;top.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Level</th>
<th align="center">Logits</th>
<th align="center">Description of competency levels</th>
<th align="center">Sample answers</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4" align="left">D</td>
<td rowspan="4" align="center">&#x3e;2.22</td>
<td align="left">Individuals have mastered competencies on levels A, B, and C and are able to</td>
<td rowspan="4" align="left">&#x201c;Women empowerment is a very current topic that is important. It is good that artists are setting an example. Sometimes the lyrics are one-dimensional because women also &#x201c;play&#x201d; with women. But often it is the other way around and has been the case for centuries due to the unfair distribution of power, where women are neglected. Maybe she should have sung &#x201c;I&#x2019;m not a toy, for no one&#x201d; or something like that, which emphasizes the idea of equality. She represents a strong image of women, which is definitely socially critical. Because of the &#x201c;crackling,&#x201d; as sascha calls it, the song is unusual and different and differs from the social norm that influences the masses, as sascha, and 367 other people show. Have fun with your followers and mainstream boredom.&#x201d; (VP_89)</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;discuss different opinions on the music</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;refer to musical norms and genre conventions in their reasoning</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;take into account the social and cultural context of the music presented</td>
</tr>
<tr>
<td rowspan="3" align="left">C</td>
<td rowspan="3" align="center">&#x2264;2.22</td>
<td align="left">Individuals have mastered competencies on levels A and B and are able to</td>
<td rowspan="3" align="left">&#x201c;Yes, I find it very well done. The sound layers depict the infinite vastness of the universe &#x2026; the synthesizers give the piece a futuristic character &#x2026; single high notes to illustrate the stars.&#x201d; (VP_589)</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;base music-related judgments on detailed references to musical attributes and link them to the expressive quality and function of the music</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;refer to basic knowledge of musical norms and genre conventions in their reasoning</td>
</tr>
<tr>
<td rowspan="3" align="left">B</td>
<td rowspan="3" align="center">&#x2264;0.45</td>
<td align="left">Individuals have mastered competencies on level A and are able to</td>
<td rowspan="3" align="left">&#x201c;The singer addresses a very important and current topic: social equality. However, I think the point is not convincingly communicated. The lyrics are presented with humor and thus don&#x2019;t mean anything.&#x201d; (VP_142)</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;report various opinions on the music</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;base music-related judgments on several salient musical attributes (e.g., tempo, dynamics, intonation, and genre characteristics) and link them to the expressive quality and function of the music</td>
</tr>
<tr>
<td rowspan="4" align="left">A</td>
<td rowspan="4" align="center">&#x2264;&#x20;&#x2212;0.83</td>
<td align="left">Individuals are able to</td>
<td rowspan="4" align="left">&#x201c;Yes, because of the atmosphere that exists in space. The composer presented this very well.&#x201d; (VP_661)</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;express their own opinions about the music</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;base music-related judgments on salient musical attributes (e.g., tempo, dynamics, intonation, and genre characteristics)</td>
</tr>
<tr>
<td align="left">&#xa0;&#x2022;&#xa0;refer to the expressive quality and function of the music in their reasoning</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>While most ninth graders achieved competency level A, the majority of twelfth graders and university students obtained competency level C or D (<xref ref-type="fig" rid="F4">Figure&#x20;4</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Relative frequency of competency levels among grades (<italic>n</italic>&#x20;&#x3d; 439; Grade 9: <italic>n</italic>&#x20;&#x3d; 147; Grade 10: <italic>n</italic>&#x20;&#x3d; 108; Grade 11: <italic>n</italic>&#x20;&#x3d; 126; Grade 12: <italic>n</italic>&#x20;&#x3d; 24; University: <italic>n</italic>&#x20;&#x3d; 34).</p>
</caption>
<graphic xlink:href="feduc-06-668538-g004.tif"/>
</fig>
<p>We estimated the participants&#x2019; proficiency scores (person ability) using WLE (<xref ref-type="fig" rid="F5">Figure&#x20;5</xref>). The numerical proficiency scores varied slightly but significantly between students in different grades, <italic>F</italic> (4, 434) &#x3d; 44.85, <italic>p</italic>&#x20;&#x3c; 0.01, &#x3b7;<sup>2</sup> &#x3d; 0.29. Post hoc analyses were conducted using pairwise <italic>t</italic>&#x20;tests with pooled <italic>SD.</italic> With two exceptions (Grade 10 vs. Grade 11 and Grade 12 vs. university), the comparisons show significant results and medium to large effect sizes (see <xref ref-type="sec" rid="s9">Supplementary Table S3</xref>). Female participants performed moderately better than male participants, <italic>t</italic> (418.58) &#x3d; &#x2212;4.21, <italic>p</italic>&#x20;&#x3c; 0.01, <italic>&#x3b4;</italic> &#x3d; 0.41. Students who were taking musical instrument or voice lessons also had considerably better results, <italic>t</italic> (319.66) &#x3d; &#x2212;6.74, <italic>p</italic>&#x20;&#x3c; 0.01, <italic>&#x3b4;</italic> &#x3d; 0.68. Participants who mostly spoke German at home performed significantly better as well, <italic>t</italic> (419.13) &#x3d; &#x2212;4.42, <italic>p</italic>&#x20;&#x3c; 0.01, <italic>&#x3b4;</italic>&#x20;&#x3d; 0.42. The general musical sophistication of the participants was assessed with a subscale from the Gold-MSI study (<xref ref-type="bibr" rid="B31">M&#xfc;llensiefen et&#x20;al., 2014</xref>) which consisted of 18 items (<italic>&#x3b1;</italic> &#x3d;0.87). One item had to be removed from the scale due to low item-total correlation. The proficiency scores and general musical sophistication mean scores were significantly correlated, <italic>r</italic>&#x20;&#x3d; 0.41, <italic>p</italic>&#x20;&#x3c;&#x20;0.01.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Distribution of proficiency scores (weighted likelihood estimation) among grades as density plots.</p>
</caption>
<graphic xlink:href="feduc-06-668538-g005.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>The two main goals of this study were 1) to design an assessment test for music-related argumentation that fulfills psychometric criteria and 2) to model competency levels based on empirical data to describe the cognitive abilities that people must master when engaging in music-related argumentation. Competency modeling in this field of study is still in its infancy, and our project is one of the first empirical endeavors in this&#x20;field.</p>
<p>The test items were developed considering the competency requirements in German school curricula and the theoretical assumptions about music-related argumentation by (<xref ref-type="bibr" rid="B38">Rolle, 2013</xref>). The test meets several psychometric criteria. Coding schemes of the exclusively open-ended test items suggest high inter-rater reliability. The empirical results show that the selected items represent a one-dimensional ability construct. EAP reliability amounted to 0.91, and global fit indices ranged from acceptable to good. Subgroup invariance was ensured with Q3 indices and graphical model inspection. From the empirical data, we were able to model competency levels following a standard-setting procedure. The resulting model describes the competencies that people show when giving reasons for their judgments about&#x20;music.</p>
<p>We were able to model four competency levels that describe various aspects of music-related competence. While persons on the lowest level (A) are able to justify their judgments about music by referring to salient musical attributes or the overall atmosphere of the presented piece, persons on the highest level (D) are able to consider the social and cultural context of the music as well as the genre conventions.</p>
<p>Our findings empirically support some theoretical assumptions of <xref ref-type="bibr" rid="B38">Rolle (2013)</xref> theoretical competency model on music-related argumentation while challenging others. Whereas the theoretical model only talks in general terms about the ability to refer to the &#x201c;objective&#x201d; properties of music, the empirical study shows that the actual response behavior is more complex. The participants apparently find it easier to refer to the music in its entirety with the help of salient musical attributes (e.g., &#x201c;loud&#x201d; and &#x201c;fast&#x201d;). A differentiated reference to the presented music (e.g., to single passages in the music) occurs only on levels C and D. This phenomenon was described in the context of the KoMus project, which dealt with the competence to perceive and contextualize music (<xref ref-type="bibr" rid="B15">Jordan et&#x20;al., 2012</xref>). In our empirical data, we also could not find the distinction made in the theoretical model, which assumes that on lower levels, a reference is made either to the objective properties of the music or to subjective impressions. At level A, for example, a salient musical attribute, such as &#x201c;quiet,&#x201d; is used to describe the expressive quality of the music. We also found an even higher competency level, level E, as predicted by the theoretical model. Unfortunately, due to the low relative response frequency (&#x3c;5%), the item categories had to be collapsed. Future studies could include more university students to collect enough data on higher competency levels. In the theoretical model, (<xref ref-type="bibr" rid="B38">Rolle, 2013</xref>) assumed that people at a very low achievement level refer to authorities in their statements. This level was not found in our project (neither was it found during the pretest sessions or by <xref ref-type="bibr" rid="B22">Kn&#xf6;rzer et&#x20;al. (2016)</xref>).</p>
<p>While almost no ninth graders achieved competency level D, most of the twelfth graders and university students reached level C or D. In line with general expectations, students from higher grade levels performed significantly better on the test. Female participants performed better than the males. Students who took musical instrument lessons and participants who mostly spoke German at home performed significantly better as well <xref ref-type="bibr" rid="B11">Hasselhorn and Lehmann (2015)</xref> and <xref ref-type="bibr" rid="B14">Jordan (2014</xref>, pp. 141&#x2013;142) also showed in their studies on music-related competencies that female participants and students who took musical instrument lessons performed significantly better on the music competency test. While the latter finding is not surprising and suggests that students who have acquired skills on a musical instrument perform better on music-related assessment tests, more research has to be conducted on the relationship between test performance and gender. Preliminary path analyses of our data show only a small effect of gender on proficiency scores when the variable &#x201c;musical instrumental lessons&#x201d; is controlled for. Gender-specific aspects in the music classroom have hardly been researched, but <xref ref-type="bibr" rid="B12">He&#xdf; (2018)</xref> as well as <xref ref-type="bibr" rid="B7">Fiedler and Hasselhorn (2020)</xref> showed that girls have a higher musical self-concept than&#x20;boys.</p>
<p>Although our findings are promising, our study has some limitations. As mentioned earlier, the participants&#x2019; processing time varied greatly. More competent students tended to write longer statements and therefore did not process as many items as less competent students did. Hence, the sample of this study contains systematic missing values that are correlated with the participants&#x2019; proficiency scores (<italic>r</italic>&#x20;&#x3d; &#x2212;0.51, <italic>p</italic>&#x20;&#x3c;&#x20;0.01).</p>
<p>In his theoretical model, <xref ref-type="bibr" rid="B38">Rolle (2013)</xref> took into account that argumentation is an interactive event and theorized about how individuals deal with counterarguments presented by an opponent. Though we designed several items imitating dialogical situations (e.g., the item &#x201c;Eurovision Song Contest&#x201d;, see section 1 in the <xref ref-type="sec" rid="s9">Supplementary Material</xref>), an assessment test will never be as interactive as a real conversation with an actual person. Interactive research settings, such as group discussions, could provide information about interactive verbal exchanges (see also <xref ref-type="bibr" rid="B6">Ehninger, 2021</xref>, for the impact of research settings and methodology on research on music-related argumentative competence).</p>
<p>The construct validity of the test instrument needs to be examined in future studies. General language skills likely play an essential role when reasoning about music. This interrelation should be explored in more detail, not least to assess the discriminant validity of the MARKO test. Future studies could develop a shortened version of the MARKO test that leaves extra time to assess participants&#x2019; language skills. With this approach, one could analyze the interaction of domain-specific and overall language competencies. Research has shown that students&#x2019; argumentative and linguistic skills are crucial to their domain-specific learning and overall educational success (<xref ref-type="bibr" rid="B30">Morek et&#x20;al., 2017</xref>). Furthermore, it remains uncertain which other music-related factors influence participants&#x2019; test results. In this study, we showed that participants who took musical instrument lessons performed significantly better on the test. Analyses showed a correlation between general musical sophistication and proficiency scores. It remains open whether musical preferences affected the test performance of the participants. In our study, musical preferences were assessed via a selection of six audio excerpts from six test items. The participants listened to these audio excerpts at the end of the session. However, we did not collect enough data to draw conclusions about the relationship between music-related argumentation and musical preferences.</p>
<p>Our study yielded important findings in a field that has been little researched. The one-dimensional model derived from empirical data allows a detailed description of four competency levels. On the basis of these levels, conclusions can be drawn about the attainment of music-related argumentative competence. The MARKO competency test and model can thus help enhance the understanding of learning processes and improve the assessment of music-related argumentative competence.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in the following online repository: <ext-link ext-link-type="uri" xlink:href="https://osf.io/7tm9x/?view_only=87043a1db7c942dc9a8851af25025026">https://osf.io/7tm9x/?view_only&#x3d;87043a1db7c942dc9a8851af25025026</ext-link>
</p>
</sec>
<sec id="s6">
<title>Ethics Statement</title>
<p>The data for the study was collected in accordance with the guidelines of the state of North Rhine-Westphalia, Germany [Schulgesetz f&#xfc;r das Land Nordrhein-Westfalen (SchulG)]: BASS &#xa7; 120 Abs. 4 SchulG. Written informed consent to participate in this study was provided by the participants&#x2019; legal guardian/next of&#x20;kin.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>JE, JK, and CR contributed to the conception and design of the study. Test items for the first piloting study were designed by JE, JK, CR, and university students who also collected data for the first pilot study. JE revised and designed new items for the second pilot study supervised by JK and CR. JE collected the data for the second piloting phase and the main study and performed statistical analyses. MS and JK supported the analyses. JE drafted the manuscript. All other authors revised this manuscript critically and made improvements on it. All authors approve the final version of the manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/feduc.2021.668538/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/feduc.2021.668538/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adams</surname>
<given-names>R. J.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Reliability as a Measurement Design Effect</article-title>. <source>Stud. Educ. Eval.</source> <volume>31</volume> (<issue>2</issue>), <fpage>162</fpage>&#x2013;<lpage>172</lpage>. <pub-id pub-id-type="doi">10.1016/j.stueduc.2005.05.008</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ames</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Penfield</surname>
<given-names>R. D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models</article-title>. <source>Educ. Meas. Issues Pract.</source> <volume>34</volume> (<issue>3</issue>), <fpage>39</fpage>&#x2013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1111/emip.12067</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bond</surname>
<given-names>T. G.</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>C. M.</given-names>
</name>
</person-group> (<year>2015</year>). <source>Applying the Rasch Model. Fundamental Measurement in the Human Sciences</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>W.-H.</given-names>
</name>
<name>
<surname>Thissen</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Local Dependence Indexes for Item Pairs Using Item Response Theory</article-title>. <source>J.&#x20;Educ. Behav. Stat.</source> <volume>22</volume> (<issue>3</issue>), <fpage>265</fpage>&#x2013;<lpage>289</lpage>. <pub-id pub-id-type="doi">10.2307/1165285</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Eemeren</surname>
<given-names>F. H. v.</given-names>
</name>
<name>
<surname>Garssen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Krabbe</surname>
<given-names>E. C. W.</given-names>
</name>
<name>
<surname>Snoeck Henkemans</surname>
<given-names>A. F.</given-names>
</name>
<name>
<surname>Verheij</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wagemans</surname>
<given-names>J.&#x20;H. M.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Handbook of Argumentation Theory</source>. <publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ehninger</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Wie l&#xe4;sst sich musikbezogene Argumentationskompetenz empirisch untersuchen? &#xdc;ber die empirische Erforschung einer facettenreichen Kompetenz</article-title>. <source>Beitr&#xe4;ge empirischer Musikp&#xe4;dagogik</source>. <volume>12</volume>, <fpage>1</fpage>&#x2013;<lpage>31</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://b-em.info/index.php/ojs/article/view/192">https://b-em.info/index.php/ojs/article/view/192</ext-link>
</comment> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fiedler</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Hasselhorn</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Zum Zusammenhang von musikalischem Selbstkonzept und Motivation im Musikunterricht</article-title>. <source>Beitr&#xe4;ge empirischer Musikp&#xe4;dagogik</source>. <volume>11</volume>, <fpage>1</fpage>&#x2013;<lpage>34</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://b-em.info/index.php/ojs/article/view/187">https://b-em.info/index.php/ojs/article/view/187</ext-link>
</comment>. </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gottschalk</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lehmann-Wermser</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Iteratives Forschen am Beispiel der F&#xf6;rderung musikalisch-&#xe4;sthetischer Diskursf&#xe4;higkeit</article-title>,&#x201d; in <source>Der lange Weg zum Unterrichtsdesign. Zur Begru&#x308;ndung und Umsetzung fachdidaktischer Forschungs- und Entwicklungsprogramme</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Komorek</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Prediger</surname>
<given-names>S.</given-names>
</name>
</person-group> (<publisher-name>M&#xfc;nster: Waxmann</publisher-name>), <fpage>63</fpage>&#x2013;<lpage>78</lpage>. </citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hasselhorn</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Knigge</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>in press</year>). <source>&#x201c;Technology-Based Competency Assessment in Music Education: The KOPRA-M and KoMus Tests,&#x201d; in Testing and Feedback in Music Education &#x2013; Symposium Hannover 2017</source>) Editors <person-group person-group-type="editor">
<name>
<surname>Lehmann-Wermser</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Breiter</surname>
<given-names>A.</given-names>
</name>
</person-group> (<publisher-name>Hannover: ifmpf</publisher-name>).</citation>
</ref>
<ref id="B11">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hasselhorn</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lehmann</surname>
<given-names>A. C.</given-names>
</name>
</person-group> (<year>2015</year>). <source>&#x201c;Leistungsheterogenit&#xe4;t im Musikunterricht. Eine empirische Untersuchung zu Leistungsunterschieden im Bereich der Musikpraxis in Jahrgangsstufe 9,&#x201d; in Theoretische Rahmung und Theoriebildung in der musikp&#xe4;dagogischen Forschung</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Niessen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Knigge</surname>
<given-names>J.</given-names>
</name>
</person-group> (<publisher-name>M&#xfc;nster: Waxmann</publisher-name>), <fpage>163</fpage>&#x2013;<lpage>176</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>He&#xdf;</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Gendersensibler Musikunterricht. Empirische Studien und didaktische Konsequenzen</source>. <publisher-loc>Wiesbaden</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-658-19166-5</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hartig</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Klieme</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Leutner</surname>
<given-names>D.</given-names>
</name>
</person-group> <comment>Eds.</comment> (<year>2008</year>). <source>Assessment of Competencies in Educational Settings</source>. <publisher-loc>Go&#x308;ttingen</publisher-loc>: <publisher-name>Hogrefe</publisher-name>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jordan</surname>
<given-names>A.-K.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Empirische Validierung eines Kompetenzmodells f&#xfc;r das Fach Musik &#x2013; Teilkompetenz, Wahrnehmen und Kontextualisieren von Musik</source>. <publisher-loc>M&#xfc;nster</publisher-loc>: <publisher-name>Waxmann</publisher-name>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jordan</surname>
<given-names>A.-K.</given-names>
</name>
<name>
<surname>Knigge</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lehmann</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Niessen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lehmann-Wermser</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Entwicklung und Validierung eines Kompetenzmodells im Fach Musik: Wahrnehmen und Kontextualisieren von Musik</article-title>. <source>Z. f&#xfc;r P&#xe4;dagogik.</source> <volume>58</volume> (<issue>4</issue>), <fpage>500</fpage>&#x2013;<lpage>521</lpage>. </citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jordan</surname>
<given-names>A.-K.</given-names>
</name>
<name>
<surname>Knigge</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>The Development of Competency Models: An IRT-Based Approach to Competency Assessment in General Music Education</article-title>,&#x201d; in <source>The Practice of Assessment in Music Education: Frameworks, Models, and Designs</source>. Editor <person-group person-group-type="editor">
<name>
<surname>Brophy</surname>
<given-names>T. S.</given-names>
</name>
</person-group> (<publisher-loc>Chicago</publisher-loc>: <publisher-name>GIA</publisher-name>), <fpage>67</fpage>&#x2013;<lpage>86</lpage>. </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kant</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>1790/2007</year>). <source>Critique of Judgment. Translated by J.&#x20;C. Meredith</source>. <publisher-loc>Oxford</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karantonis</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sireci</surname>
<given-names>S. G.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>The Bookmark Standard-Setting Method: a Literature Review</article-title>. <source>Educ. Meas.</source> <volume>25</volume>, <fpage>4</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1111/j.1745-3992.2006.00047.x</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>King</surname>
<given-names>P. M.</given-names>
</name>
<name>
<surname>Kitchener</surname>
<given-names>K. S.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Reflective Judgment: Theory and Research on the Development of Epistemic Assumptions through Adulthood</article-title>. <source>Educ. Psychol.</source> <volume>39</volume>, <fpage>5</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1207/s15326985ep3901_2</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Knigge</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2010</year>). <source>Modellbasierte Entwicklung und Analyse von Testaufgaben zur Erfassung der Kompetenz &#x201c;Musik wahrnehmen und kontextualisieren&#x201d; [dissertation]</source>. <publisher-name>Universit&#xe4;t Bremen</publisher-name>. Available at: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://media.suub.uni-bremen.de/handle/elib/2844">https://media.suub.uni-bremen.de/handle/elib/2844</ext-link>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kn&#xf6;rzer</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Stark</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Rolle</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>&#x201c;I like Reggae and Bob Marley Is Already Dead&#x201d;: An Empirical Study on Music-Related Argumentation</article-title>. <source>Psychol. Music</source>. <volume>44</volume> (<issue>5</issue>), <fpage>1158</fpage>&#x2013;<lpage>1174</lpage>. <pub-id pub-id-type="doi">10.1177/0305735615614095</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koeppen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hartig</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Klieme</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Leutner</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Current Issues in Competence Modeling and Assessment</article-title>. <source>Z. f&#xfc;r Psychol./J.&#x20;Psychol.</source> <volume>216</volume>, <fpage>61</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1027/0044-3409.216.2.61</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kuhn</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Education for Thinking</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard University Press</publisher-name>.</citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Leutner</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Fleischer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gr&#xfc;nkorn</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Klieme</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Competence Assessment in Education</source>. <publisher-loc>Basel</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-319-50030-0</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>D. M.</given-names>
</name>
<name>
<surname>Mitzel</surname>
<given-names>H. C.</given-names>
</name>
<name>
<surname>Mercado</surname>
<given-names>R. L.</given-names>
</name>
<name>
<surname>Patz</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>E. M.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>The Bookmark Standard Setting Procedure</article-title>,&#x201d; in <source>Setting Performance Standards: Foundations, Methods, and Innovations</source>. Editor <person-group person-group-type="editor">
<name>
<surname>Cizek</surname>
<given-names>G.J.</given-names>
</name>
</person-group>. <edition>2nd ed.</edition> (<publisher-loc>New York</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>225</fpage>&#x2013;<lpage>253</lpage>. </citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mair</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hatzinger</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Maier</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Debelak</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020</year>). <source>eRm: Extended Rasch Modeling</source>. <comment>R package version 1.0-1</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=eRm">https://CRAN.R-project.org/package&#x003D;eRm</ext-link>
</comment>. </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<collab>Ministerium f&#xfc;r Schule und Berufsbildung des Landes Schleswig-Holstein</collab> (<year>2015</year>). <source>Fachanforderungen Musik. Allgemeinbildende Schulen. Sekundarstufe I. Sekundarstufe II</source>. <publisher-loc>Kiel</publisher-loc>: <publisher-name>Ministerium f&#xfc;r Schule und Berufsbildung des Landes Schleswig-Holstein</publisher-name>.</citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<collab>Ministerium f&#xfc;r Schule und Bildung des Landes Nordrhein-Westfalen</collab> (<year>2019</year>). <source>Musik. Kernlehrplan f&#xfc;r das Gymnasium Sekundarstufe I in Nordrhein-Westfalen</source>. <publisher-loc>D&#xfc;sseldorf</publisher-loc>: <publisher-name>Ministerium f&#xfc;r Schule und Bildung des Landes Nordrhein-Westfalen</publisher-name>.</citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Morek</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Heller</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Quasthoff</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Erkl&#xe4;ren und Argumentieren. Modellierungen und empirische Befunde zu Strukturen und Varianzen</article-title>,&#x201d; in <source>Begr&#xfc;nden &#x2013; Erkl&#xe4;ren &#x2013; Argumentieren</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Mei&#xdf;ner</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Wyss</surname>
<given-names>E.L.</given-names>
</name>
</person-group> (<publisher-name>T&#xfc;bingen: Stauffenburg</publisher-name>), <fpage>11</fpage>&#x2013;<lpage>46</lpage>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>M&#xfc;llensiefen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Gingras</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Musil</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>The Musicality of Non-musicians: an index for Assessing Musical Sophistication in the General Population</article-title>. <source>PLoS ONE</source>. <volume>9</volume> (<issue>2</issue>), <fpage>e89642</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0089642</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Parsons</surname>
<given-names>M. J.</given-names>
</name>
</person-group> (<year>1987</year>). <source>How We Understand Art. A Cognitive Developmental Account of Aesthetic Experience</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>University Press</publisher-name>.</citation>
</ref>
<ref id="B35">
<citation citation-type="book">
<collab>R Core Team</collab> (<year>2019</year>). <source>R: A Language and Environment for Statistical Computing</source>. <comment>R&#x20;version 3.6.2</comment>. <publisher-loc>Vienna</publisher-loc>: <publisher-name>R Foundation for Statistical Computing</publisher-name>.</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robitzsch</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kiefer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>TAM: Test Analysis Modules</article-title>. <source>R&#x20;Package Version 3.5-19</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=TAM">https://CRAN.R-project.org/package&#x003D;TAM</ext-link>
</comment>.</citation>
</ref>
<ref id="B37">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rolle</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>1999</year>). <source>Musikalisch-&#xe4;sthetische Bildung. &#xdc;ber die Bedeutung &#xe4;sthetischer Erfahrung f&#xfc;r musikalische Bildungsprozesse</source>. <publisher-loc>Kassel</publisher-loc>: <publisher-name>Bosse</publisher-name>.</citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rolle</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Argumentation Skills in the Music Classroom: A Quest for Theory</article-title>. In: <source>European Perspectives on Music Education 2: Artistry</source>. Editors <person-group person-group-type="editor">
<name>
<surname>de Vugt</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Malmberg</surname>
<given-names>I.</given-names>
</name>
</person-group> (<publisher-name>Innsbruck: Helbling)</publisher-name>, <fpage>137</fpage>&#x2013;<lpage>150</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rost</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2004</year>). <source>Lehrbuch Testtheorie &#x2013; Testkonstruktion</source>. <publisher-loc>Bern</publisher-loc>: <publisher-name>Huber</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-322-80662-8</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Stevenson</surname>
<given-names>C. L.</given-names>
</name>
</person-group> (<year>1950</year>). <source>&#x201c;Interpretation and Evaluation in Aesthetics,&#x201d; in Philosophical Analysis</source>. Editor <person-group person-group-type="editor">
<name>
<surname>Black</surname>
<given-names>M.</given-names>
</name>
</person-group> (<publisher-loc>Ithaca</publisher-loc>: <publisher-name>Cornell University Press</publisher-name>), <fpage>341</fpage>&#x2013;<lpage>383</lpage>.</citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Torres Irribarra</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Freund</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Wright Map: IRT Item-Person Map with ConQuest Integration</source>. <comment>R package version 1.2.3</comment>.</citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Toulmin</surname>
<given-names>S. E.</given-names>
</name>
</person-group> (<year>1992</year>). &#x201c;<article-title>Logic, Rhetoric and Reason. Redressing the Balance</article-title>,&#x201d; in <source>Argumentation Illuminated</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Eemeren</surname>
<given-names>F. H. V.</given-names>
</name>
<name>
<surname>Grootendorst</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Blair</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Willard</surname>
<given-names>C. A.</given-names>
</name>
</person-group> (<publisher-loc>Amsterdam: Sicsat</publisher-loc>), <fpage>3</fpage>&#x2013;<lpage>11</lpage>. </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Toulmin</surname>
<given-names>S. E.</given-names>
</name>
</person-group> (<year>2003</year>). <source>The Uses of Argument</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>. <pub-id pub-id-type="doi">10.1017/cbo9780511840005</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Trendtel</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schwabe</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Fellinger</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Differenzielles Itemfunktionieren in Subgruppen</article-title>,&#x201d; in <source>Large-Scale Assessment mit R. Methodische Grundlagen der &#xf6;sterreichischen Bildungsstandard&#xfc;berpr&#xfc;fung</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Breit</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schreiner</surname>
<given-names>C.</given-names>
</name>
</person-group> (<publisher-loc>Wien</publisher-loc>: <publisher-name>Facultas</publisher-name>), <fpage>111</fpage>&#x2013;<lpage>147</lpage>. </citation>
</ref>
<ref id="B45">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Weinert</surname>
<given-names>F. E.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>&#x201c;Concept of Competence: A Conceptual Clarification</article-title>&#x201d; in <source>Defining and Selecting Key Competencies</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Rychen</surname>
<given-names>D.S.</given-names>
</name>
<name>
<surname>Salganik</surname>
<given-names>L.H.</given-names>
</name>
</person-group> (<publisher-name>G&#xf6;ttingen: Hogrefe</publisher-name>), <fpage>45</fpage>&#x2013;<lpage>65</lpage>. </citation>
</ref>
<ref id="B46">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wohlrapp</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2014</year>). <source>The Concept of Argument</source>. <publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B47">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tam</surname>
<given-names>H. P.</given-names>
</name>
<name>
<surname>Jen</surname>
<given-names>T.-H.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Educational Measurement for Applied Researchers. Theory into Practice</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-981-10-3302-5</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>