<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2018.01590</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Pitch Class and Envelope Effects in the Tritone Paradox Are Mediated by Differently Pronounced Frequency Preference Regions</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Malek</surname> <given-names>Stephanie</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/479464/overview"/>
</contrib>
</contrib-group>
<aff><institution>Psychology Department, Martin Luther University Halle-Wittenberg</institution>, <addr-line>Halle</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Bruno Lucio Giordano, University of Glasgow, United Kingdom</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Bernhard Englitz, Radboud University, Netherlands; Claire Pelofi, Aix-Marseille Universit&#x000E9;, France; Vincent Adam, University College London, United Kingdom</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Stephanie Malek <email>stephanie.malek&#x00040;psych.uni-halle.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>09</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>9</volume>
<elocation-id>1590</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>10</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>08</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2018 Malek.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Malek</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Shepard tones (octave complex tones) are well defined in pitch chroma but are ambiguous in pitch height. Pitch direction judgments of Shepard tones depend on the clockwise distance of the pitch classes on the pitch class circle, indicating the proximity principle in auditory perception. The tritone paradox emerges when two Shepard tones that form a tritone interval are presented successively. In this case, no proximity cue is available and judgments depend on the first tone and vary from person to person. A common explanation for the tritone paradox is the assumption of a specific pitch class comparison mechanism based on a pitch class template that is differently orientated from person to person. In contrast, psychoacoustic approaches (e.g., the Terhardt virtual pitch theory) explain it with common pitch-processing mechanisms. The present paper proposes a probabilistic threshold model, which estimates Shepard tone pitch height by a probabilistic fundamental frequency extraction. In the first processing stage, only those frequency components whose amplitudes are above specific randomly distributed threshold values are selected for further processing, and whose expected values are determined by a threshold function. The lowest of these nonfiltered components is dedicated to the pitch height. The model is designed for tone pairs and provides occurrence probabilities for descending judgments. In a pitch-matching pretest, 12 Shepard tones (generated under a cosine envelope centered at 261 Hz) were compared to pure tones, whose frequencies were adjusted by an up-down staircase method. Matched frequencies corresponded to frequency components but were ambiguous in octave position. In order to test the model, Shepard tones were generated under six cosine envelopes centered over a wide frequency range (65.41, 261, 370, 440, 523.25, 1244.51 Hz). The model predicted pitch class effects and envelope effects. Steep threshold functions caused pronounced pitch class, whereas flat threshold functions caused pronounced envelope effects. The model provides an alternative explanation to the pitch class template theory and serves as a psychoacoustic framework for the perception of Shepard tones.</p></abstract>
<kwd-group>
<kwd>Shepard tones</kwd>
<kwd>tritone paradox</kwd>
<kwd>psychoacoustics</kwd>
<kwd>probabilistic models</kwd>
<kwd>pitch perception</kwd>
</kwd-group>
<counts>
<fig-count count="13"/>
<table-count count="2"/>
<equation-count count="4"/>
<ref-count count="61"/>
<page-count count="18"/>
<word-count count="11654"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Shepard tones or octave complex tones evoke some astonishing auditory illusions. These tones consist of several sinusoidal components spaced by octave intervals. Typically, the component amplitude is determined by a fixed bell-shaped spectral envelope over the logarithmic frequency axis. Shepard tones are constructed by equal-sized upward shifts of their components under this fixed envelope (see Figure <xref ref-type="fig" rid="F1">1A</xref>). Surprisingly, when participants listen to such stepwise increased Shepard tones several times successively, they are normally unaware of any repetitions and report a continuous ascending pitch or a continuous descending pitch for a sequence in reversed order (never-ending pitch illusion; Shepard, <xref ref-type="bibr" rid="B49">1964</xref>; Burns, <xref ref-type="bibr" rid="B2">1981</xref>). Shepard (<xref ref-type="bibr" rid="B49">1964</xref>) found that pitch direction judgments of Shepard tone pairs (ascending/descending) depend on the clockwise distance of their pitch classes on the pitch class circle. Specifically, participants judged the pitch direction of Shepard tone pairs as ascending when this distance was shorter clockwise than counterclockwise (see Figure <xref ref-type="fig" rid="F1">1A</xref>) and descending in the opposite condition. This finding has been replicated by several studies (Pollack, <xref ref-type="bibr" rid="B41">1978</xref>; Sugiyama and Ohgushi, <xref ref-type="bibr" rid="B51">1979</xref>; Burns, <xref ref-type="bibr" rid="B2">1981</xref>) and provides evidence for the proximity principle in auditory perception.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Spectral structure of Shepard tones: <italic>C</italic> and <italic>C&#x00023;</italic> are generated under an envelope centered at <italic>C</italic><sub>4</sub> (261.63 Hz); the components of <italic>C&#x00023;</italic> are shifted 1 semitone from components of <italic>C</italic>; the distance on the pitch class circle is shorter clockwise than counterclockwise; the frequency shift to the right is shorter than to the left <bold>(A)</bold>; <italic>A</italic> and <italic>D&#x00023;</italic> are generated under an envelope centered at <italic>A</italic><sub>4</sub> (440 Hz); the components of <italic>D&#x00023;</italic> are shifted 6 semitones from components of <italic>A</italic>; the distance on the pitch class circle is equal clockwise and counterclockwise (tritone interval, <bold>B</bold>). The envelope centered at <italic>A</italic><sub>4</sub> <bold>(B)</bold> is slightly shifted to the right in comparison with the envelope centered at <italic>C</italic><sub>4</sub> <bold>(A)</bold>. The relation of component amplitudes of <italic>C</italic> under <italic>C</italic><sub>4</sub> is the same as the relation of component amplitudes of <italic>A</italic> under <italic>A</italic><sub>4</sub>.</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0001.tif"/>
</fig>
<p>Proximity is not a valid cue for Shepard tones forming a tritone interval (separated by 6 semitones; tritone pairs; see Figure <xref ref-type="fig" rid="F1">1B</xref>). Accordingly, pitch judgments of tritone pairs should be ambiguous. However, Deutsch (<xref ref-type="bibr" rid="B13">1986</xref>, <xref ref-type="bibr" rid="B15">1988</xref>) revealed that participants were able to judge the pitch direction of tritone pairs reliably, based on the pitch class of the first tone (tritone paradox). The tritone paradox is, at best, only moderately affected by the spectral structure of Shepard tones (Deutsch, <xref ref-type="bibr" rid="B13">1986</xref>, <xref ref-type="bibr" rid="B14">1987</xref>, <xref ref-type="bibr" rid="B15">1988</xref>, <xref ref-type="bibr" rid="B16">1991</xref>; Cohen et al., <xref ref-type="bibr" rid="B10">1994</xref>; Giangrande, <xref ref-type="bibr" rid="B22">1998</xref>; Repp and Thompson, <xref ref-type="bibr" rid="B45">2010</xref>), ruling out a simple low-level mechanism. Interestingly, the resulting ascending-descending patterns differ from person to person, depending on the linguistic background (Deutsch et al., <xref ref-type="bibr" rid="B18">1987</xref>; Deutsch, <xref ref-type="bibr" rid="B16">1991</xref>; Dawe et al., <xref ref-type="bibr" rid="B12">1998</xref>; Giangrande, <xref ref-type="bibr" rid="B22">1998</xref>; Chalikia and Leinfelt, <xref ref-type="bibr" rid="B3">2000</xref>; Chalikia et al., <xref ref-type="bibr" rid="B5">2000</xref>, <xref ref-type="bibr" rid="B4">2001</xref>). Figure <xref ref-type="fig" rid="F2">2</xref> describes the extraction of the subjectively highest pitch class (SHPC) and the magnitude of effect from these patterns. The SHPC is given by the direction of the resultant vector of the data points (for more details see Fisher, <xref ref-type="bibr" rid="B20">1993</xref>; Repp and Thompson, <xref ref-type="bibr" rid="B45">2010</xref>). The magnitude of effect is the difference between maximum and minimum of the response pattern.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Subjectively highest pitch class (SHPC) and magnitude of effect are extracted from the proportion of <italic>lower</italic> judgments as a function of the initial pitch classes of tritone pairs; plotted in a radial response graph (concentric circles: increments of 20%, <bold>A</bold>) and as response function <bold>(B)</bold>; the SHPC is given by the direction of the resultant vector of the data points <bold>(A)</bold>; for more details (see Fisher, <xref ref-type="bibr" rid="B20">1993</xref>) and is nearly the point of the maximum proportion of <italic>lower</italic> responses <bold>(B)</bold>; magnitude of effect is the difference between maximum and minimum of the response function; in this example, the SHPC is approximately at C&#x00023; and the magnitude of effect is about 40%.</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0002.tif"/>
</fig>
<p>Another interesting finding is that tritone pairs are influenced by prior context (Repp, <xref ref-type="bibr" rid="B44">1997</xref>; Giangrande et al., <xref ref-type="bibr" rid="B23">2003</xref>; Englitz et al., <xref ref-type="bibr" rid="B19">2013</xref>; Chambers and Pressnitzer, <xref ref-type="bibr" rid="B7">2014</xref>; Chambers et al., <xref ref-type="bibr" rid="B6">2017</xref>). Repp (<xref ref-type="bibr" rid="B44">1997</xref>) found that tritone pairs were affected by a prior Shepard tone. Recently, studies have shown that preceding tone sequences also caused adaptation (Dawe et al., <xref ref-type="bibr" rid="B12">1998</xref>; Malek and Sperschneider, <xref ref-type="bibr" rid="B36">2018</xref>) and other context effects in tritone pairs (Englitz et al., <xref ref-type="bibr" rid="B19">2013</xref>; Chambers and Pressnitzer, <xref ref-type="bibr" rid="B7">2014</xref>; Chambers et al., <xref ref-type="bibr" rid="B6">2017</xref>). Two computational models were recently published to explain these context effects (Huang et al., <xref ref-type="bibr" rid="B26">2015</xref>; Chambers et al., <xref ref-type="bibr" rid="B6">2017</xref>).</p>
<p>This study intends to contribute to the theoretical discussion about the origin of the tritone paradox. One explanation for the tritone paradox posits that pitch judgments are based on the comparison of the pitch classes with an internal pitch class template that reflects an abstract form of an implicit absolute pitch and is possibly acquired through language experience (Deutsch, <xref ref-type="bibr" rid="B16">1991</xref>; Deutsch et al., <xref ref-type="bibr" rid="B17">2004</xref>). In other words, participants are assumed to compare two pitch classes instead of two pitch heights, which is usually assumed for unambiguous <italic>normal</italic> tones (e.g., musical tones), suggesting the importance of pitch class instead of pitch height. Thus, such a pitch class comparison mechanism differs from the known comparison mechanism in ordinary harmonic tones and represents a highly specific mechanism that is proposed for the selected class of Shepard tone comparisons. A contrasting explanation for the tritone paradox postulates no specific mechanism for Shepard tones but explains it with common pitch-processing theories (Terhardt, <xref ref-type="bibr" rid="B54">1991</xref>; Cohen et al., <xref ref-type="bibr" rid="B11">1995</xref>). Terhardt (<xref ref-type="bibr" rid="B54">1991</xref>), in particular, explained the tritone paradox with his <italic>virtual pitch theory</italic> (VPT; Terhardt et al., <xref ref-type="bibr" rid="B56">1982a</xref>,<xref ref-type="bibr" rid="B57">b</xref>), postulating that listeners extract fundamental frequencies from Shepard tones to determine their pitch heights. Here, I propose a model for the tritone paradox that emphasizes such a psychoacoustic explanation.</p>
<p>There is evidence that listeners can extract fundamental frequencies from Shepard tones. Terhardt et al. (<xref ref-type="bibr" rid="B55">1986</xref>) and Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>) conducted a pitch-matching task (listeners had to match frequencies of pure tones or harmonic complex tones to Shepard tones). Matched frequencies corresponded to Shepard tone components in different octaves. They ranged from 200 to 1500 Hz and concentrated around 300 Hz. Matched frequencies were in accordance with the predictions of the Terhardt pitch-processing model (Terhardt et al., <xref ref-type="bibr" rid="B57">1982b</xref>). The current paper intends to replicate the finding that frequency matches for single Shepard tones are ambiguous with respect to octave position and, in the next step, to show that this ambivalence in octave position can cause the tritone paradox.</p>
<p>Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>) investigated whether the results of the pitch-matching task predict the results of the standard tritone paradox paradigm. Participants were asked for the best and the second best match out of four possible harmonic comparison tones (in different octaves) for each of the 12 Shepard tones. The authors estimated the subjective pitch height of each Shepard tone by calculating the sum of the MIDI pitch number of the matches, weighted by participants&#x00027; confidence rating, and they compared it with the subjective pitch height measured with the standard tritone paradigm. Considering the averaged sample data, both measures were consistent, indicating that the ambiguity found in matching tasks causes the ambiguity in the tritone paradox. However, there was no consistency in the individual data, indicating that the typical phenomenon of the tritone paradox might not be assessable with matching tasks. Thus, much uncertainty still exists about the relationship between the ambiguity in matching tasks and the tritone paradox.</p>
<p>To conclude, the studies supporting the psychoacoustic explanation of the tritone paradox have focused on the pitch of single Shepard tones. However, no study up to now has shown that the psychoacoustic approach can explain the typical response patterns of the standard tritone paradox paradigm (tone-pair comparison task). This paper focuses on a psychoacoustic explanation for the pair-comparison patterns. As stated above, the pitch-matching experiments of Terhardt et al. (<xref ref-type="bibr" rid="B55">1986</xref>) and Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>) have revealed that Shepard tones are perceived as harmonic complex tones but are more ambiguous. This paper intends to explain the tritone paradox by considering Shepard tones as harmonic complex tones. I introduce an algorithm, the threshold model, that assumes that Shepard tones are processed like harmonic complex tones and provides predictions for the typical tritone paradox response patterns. The algorithm is based on the comparison of probabilistic fundamental frequency estimates. It combines psychoacoustic and physiological findings and concepts with the classical threshold theory (Gescheider, <xref ref-type="bibr" rid="B21">1997</xref>) and probability theory. The aim is not to develop a new general elaborate pitch-processing theory; however, the main assumptions of the threshold model should not contradict the main ideas of Terhardt&#x00027;s algorithm or other pitch-processing theories. In the literature, there exist two main types of pitch-processing theories (for an overview, see Cheveign&#x000E9;, <xref ref-type="bibr" rid="B8">2005</xref>, <xref ref-type="bibr" rid="B9">2010</xref>): pattern-matching theories (Schroeder, <xref ref-type="bibr" rid="B47">1968</xref>; Goldstein, <xref ref-type="bibr" rid="B24">1973</xref>; Wightman, <xref ref-type="bibr" rid="B59">1973</xref>; Terhardt, <xref ref-type="bibr" rid="B52">1974</xref>) and autocorrelation theories (Licklider, <xref ref-type="bibr" rid="B33">1951</xref>; Meddis and Hewitt, <xref ref-type="bibr" rid="B37">1991</xref>; Meddis and O&#x00027;Mard, <xref ref-type="bibr" rid="B38">1997</xref>). The threshold model belongs to the pattern-matching type, referring to the VPT (Terhardt et al., <xref ref-type="bibr" rid="B56">1982a</xref>). However, it is greatly simplified, especially in the pattern-matching processing stage, where the harmonic template is simply realized by taking the lowest frequency component and is restricted to harmonic complex tones with resolved components. The aim of the paper is to show that even such a simple mathematical model based on basic pitch-processing mechanisms can explain the main findings of the tritone paradox.</p>
<sec>
<title>1.1. Threshold model</title>
<p>Figure <xref ref-type="fig" rid="F3">3</xref> provides an overview of the threshold model algorithm (see <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> for details and reasoning). The threshold model estimates the Shepard tone pitch heights by a probabilistic fundamental frequency extraction mechanism. Commonly, harmonic complex tones consist of a fundamental frequency, associated with pitch height and several harmonics, which are integer multiples of the fundamental frequency and are associated with timbre. The fundamental frequency is often, but not necessarily, the lowest tone partial. Even when the fundamental frequency is deleted or masked, the pitch height corresponds to the fundamental frequency (missing fundamental, Shouten, <xref ref-type="bibr" rid="B50">1940</xref>; Licklider, <xref ref-type="bibr" rid="B34">1959</xref>), resulting in the relatively robust pitch height found for harmonic complex tones. The threshold model supposes that Shepard tones&#x00027; fundamental frequencies determine their pitch heights. Pitches of harmonic complex tones are relatively robust, because the fundamental frequencies can be reconstructed from the harmonics, even when the tones fail to comprise the fundamental frequency, which is the greatest common divisor of the harmonic frequencies. The greatest common divisor of Shepard tone components is the frequency of the lowest component due to the octave-spaced frequency components. In contrast to harmonic complex tones, filtering out the lowest frequency component changes the fundamental frequency and, therefore, Shepard tones&#x00027; pitch height. Thus, the lowest frequency component determines Shepard tones&#x00027; pitch height.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Threshold model algorithm for a Shepard tone pair (<italic>s</italic><sub><italic>l</italic></sub>, <italic>s</italic><sub><italic>k</italic></sub>): three components of the first Shepard tone (blue solid line) and four components of the second Shepard tone (red dotted line) are filtered in the first processing stage, resulting in three unfiltered first tone components and two unfiltered second tone components. The frequency of the lowest unfiltered component is assigned the role of fundamental frequency in stage II. Because the first tone fundamental frequency is lower than the second tone fundamental frequency, the model output is an ascending judgment (stage III).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0003.tif"/>
</fig>
<p>The fixed bell-shaped envelope attenuates low (and high) components of Shepard tones, resulting in uncertainty about which component is the lowest audible (and, hence, relevant) component. The threshold model implements a probabilistic component filtering mechanism. Frequency components with frequency <italic>f</italic> are filtered out when their amplitudes are beneath specific threshold values, which are realizations of random variables <italic>T</italic>. Their expected values &#x003BC;<sub><italic>t</italic></sub> are determined by a so-called threshold function <italic>g</italic>. Figure <xref ref-type="fig" rid="F4">4</xref> shows a frequency-dependent threshold function <italic>g</italic>(<italic>f</italic>). It also shows that the probability that a component is not filtered out corresponds to the area under the probability density function. The probability that a Shepard tone has a specific pitch height is given by the probability that the corresponding frequency component is the lowest nonfiltered component. The probability of an ascending judgment is the probability that the pitch height of the first Shepard tone is lower than the pitch height of the second Shepard tone.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Example of a monotonically decreasing threshold function <italic>g</italic> and sketched distributions of threshold values <bold>(A)</bold>; the distribution of <italic>T</italic><sub><italic>li</italic></sub> corresponding to the component <italic>c</italic><sub><italic>li</italic></sub> has an expected value &#x003BC;<sub><italic>t</italic></sub> &#x0003D; <italic>g</italic>(<italic>f</italic><sub><italic>li</italic></sub>) gray box in <bold>(A)</bold>, enlarged in <bold>(B)</bold>; the probability that the component <italic>c</italic><sub><italic>li</italic></sub> is not filtered, <italic>p</italic><sub><italic>li</italic></sub>, is the area under the probability density function of <italic>T</italic><sub><italic>li</italic></sub> for <italic>t</italic> &#x02264; <italic>a</italic><sub><italic>li</italic></sub> (the dark gray area in <bold>B</bold>).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0004.tif"/>
</fig>
</sec>
<sec>
<title>1.2. Threshold function g</title>
<p>Within the threshold model, it is assumed that individual differences in the tritone paradox are caused by individual differences in the threshold function. The threshold function can be considered as an implementation of Terhardt&#x00027;s internal spectral weighting function, which causes a specific frequency region to be particularly sensitive (preference region). The mathematical description is given in the <xref ref-type="supplementary-material" rid="SM1">Appendix</xref>.</p>
</sec>
</sec>
<sec id="s2">
<title>2. Pretest: pitch-matching experiment</title>
<p>An initial pitch-matching experiment was conducted to replicate the findings of Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>) and Terhardt et al. (<xref ref-type="bibr" rid="B55">1986</xref>) that frequency matches correspond to Shepard tones&#x00027; frequency components and are around 300 Hz. Furthermore, the matched frequencies were compared to the predictions of Terhardt&#x00027;s VPT, which provides so-called spectral pitches (SPs) and virtual pitches (VPs). Spectral pitches corresponded directly to partials and were weighted by a so-called internal spectral weighting function. Virtual pitches were determined by a pattern-matching procedure, a subharmonic coincidence detection. Here, all subharmonics of salient SPs are VP candidates, which are weighted. Only the most salient VPs are of interest. The VPT provides a weighted spectral pitch and a weighted virtual pitch pattern, which are both relevant for pitch. Terhardt (<xref ref-type="bibr" rid="B54">1991</xref>) postulated that Shepard tones&#x00027; pitches are mostly determined by their VPs and not by their SPs. Thus, it is assumed that pure tone matching corresponds to the VPs rather than to the SPs.</p>
<sec>
<title>2.1. Methods</title>
<sec>
<title>2.1.1. Participants</title>
<p>Normal-hearing, undergraduate students from the Martin Luther University Halle-Wittenberg (<italic>n</italic> &#x0003D; 8; 7 women) participated in the study. They were aged between 19 and 36 years (<italic>M</italic> &#x0003D; 24.28, <italic>SD</italic> &#x0003D; 5.76). No professional musician participated in the study. At the time of the survey, four participants had never played an instrument; two had learnt an instrument but did not play regularly anymore; two played an instrument regularly. All participants lived and were raised in Germany. They received credit points for their psychology courses in exchange for their participation, as is approved by the study board of the Department of Psychology, Martin Luther University Halle-Wittenberg. The experiments conducted do not require formal ethical approval according to the German law and institutional requirements. Before participation, the students were informed that the collected data would be used in an anonymous form for publication. All students participated voluntarily and were free to opt out with no negative consequences at any time of the experiment. The study was conducted in accordance with the declaration of Helsinki and the University Research Ethics Standards.</p>
</sec>
<sec>
<title>2.1.2. Equipment</title>
<p>The experiment was run on an Intel Core 2 Duo Windows computer containing a VIA high definition audio sound card. Participants listened to a stereo signal via Sennheiser HD 202 earphones (Frequency response, -8.96, &#x0002B;3.21 dB in 100&#x02013;10 kHz Center/summary HDM1) in an acoustically silenced room at the University. The experiment was presented by Pxlab (Irtel, <xref ref-type="bibr" rid="B27">2007</xref>). The stimuli were synthesized in Matlab (Version 7.40.287).</p>
</sec>
<sec>
<title>2.1.3. Stimuli</title>
<p>The Shepard tones were constructed according to the specifications of Deutsch et al. (<xref ref-type="bibr" rid="B18">1987</xref>). Each Shepard tone corresponded to one of the 12 chromatic pitch classes (<italic>C</italic>, <italic>C&#x00023;</italic>, &#x02026;, <italic>B</italic>) and consisted of six sinusoidal octave-spaced components (see Figure <xref ref-type="fig" rid="F1">1</xref>). The frequency <italic>f</italic><sub><italic>li</italic></sub> of the i<sup>th</sup> component of the Shepard tone <italic>l</italic> for all <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>m</italic>, <italic>l</italic> &#x0003D; 1, &#x02026;, <italic>l</italic><sub><italic>max</italic></sub> is</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x000B7;</mml:mo><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">min</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x000B7;</mml:mo><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">max</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Their amplitudes were determined by a fixed, bell-shaped spectral envelope. The general form of this envelope is described by the following equation:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mo>-</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>05</mml:mn><mml:mo>&#x000B7;</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x000B7;</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">min</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where A(f) is the relative amplitude of a sinusoid at frequency f in Hz, &#x003B2; is the frequency ratio formed by adjacent sinusoids (&#x003B2; &#x0003D; 2, hence octave spacing), &#x003B3; is the number of &#x003B2; cycles spanned (&#x003B3; &#x0003D; 6), and <italic>f</italic><sub>min</sub> is the minimum frequency for which the amplitude is nonzero (<italic>f</italic><sub>min</sub> &#x0003D; 32.70Hz, generating an envelope centered at <italic>C</italic><sub>4</sub>: 262 Hz).</p>
<p>Six Shepard tones were used. Their lowest frequencies were 34.65 Hz (C&#x00023;), 38.89 Hz (D&#x00023;), 43.65 Hz (F), 48.00 Hz (G), 55.00 Hz (A), and 61.74 Hz (B), respectively. All tones were 800 ms in duration with 53 ms sinusoidal amplitude ramps at the beginning and at the end. The sample rate was 44.1 kHz and the sound level was about 55 dB.</p>
</sec>
<sec>
<title>2.1.4. Research design and procedure</title>
<p>On each trial, a Shepard tone and a sinusoidal comparison tone were presented, separated by a silent period of 200 ms. Participants were asked to judge whether the second tone was <italic>higher</italic> or <italic>lower</italic> in pitch than the first tone by pressing the up arrow and down arrow on the number pad. They received visual feedback about the key which they had pressed but no feedback about the correctness of their answer. Participants were able to repeat the tone pairs as often as they wished.</p>
<p>The frequency of the comparison tone was determined in each trial using an up-down staircase method (Levitt, <xref ref-type="bibr" rid="B32">1971</xref>) implemented in PxLab (Irtel, <xref ref-type="bibr" rid="B27">2007</xref>). Each adaptive sequence started with a randomly chosen frequency (310&#x02013;910 Hz). When the participant answered that the comparison tone was <italic>higher</italic> or <italic>lower</italic> than the Shepard tone, the frequency was reduced or increased by its <italic>stepsize T</italic>, respectively. This stepsize was reduced at <italic>turnpoints</italic>, trials in which the response direction changed (from <italic>higher</italic> to <italic>lower</italic> responses or <italic>vice versa</italic>). Each adaptive sequence started with an initial step size <italic>T</italic><sub>0</sub> of 200 Hz. Subsequently, the stepsize <italic>T</italic><sub><italic>t</italic></sub> in trial <italic>t</italic> of the adaptive sequence was computed as <inline-formula><mml:math id="M4"><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>. The adaptive sequence stopped when four turnpoints occurred at a step size <italic>T</italic><sub><italic>t</italic></sub> smaller than 20 Hz or after forty trials. The frequency of the comparison tone could vary in the range of 50&#x02013;3,000 Hz. The arithmetic mean and the standard deviation of the four last turning points in an adaptive sequence were used to estimate the PSE and its standard error (Wetherill, <xref ref-type="bibr" rid="B58">1963</xref>).</p>
<p>There were four adaptive sequences for each Shepard tone (C&#x00023;, D&#x00023;, F, G, A, and B). In two adaptive sequences, the Shepard tone followed the comparison tone and in the other two, the comparison tone followed the Shepard tone to control for order effects. Data for each participant were collected on each of the two different day sessions (1 h per session; 900 trials per session). Participants received a short training with the task at the beginning of the first session.</p>
</sec>
</sec>
<sec>
<title>2.2. Results</title>
<p>The estimated PSEs corresponded to Shepard tones&#x00027; frequency components in different octaves, revealing the octave ambiguity in pitch height of single Shepard tones (see Figure <xref ref-type="fig" rid="F5">5</xref>). In some cases, it was impossible to estimate PSEs because of missing turnpoints (Figure <xref ref-type="fig" rid="F5">5A</xref>). In other cases, the standard errors of the PSE estimates were large, indicating that the frequencies adjusted by the up-down staircase method were not converging over the trials (Figures <xref ref-type="fig" rid="F5">5A,D</xref>). The PSEs were estimated between 52 Hz and 1356.67 Hz. Most PSEs were between 200-600 Hz, with a preference for the fourth component (around 300 Hz), corresponding to the envelope centered at 264 Hz (<italic>C</italic><sub>4</sub>). Interestingly, preferred matches at 300 Hz were also found by Terhardt et al. (<xref ref-type="bibr" rid="B55">1986</xref>) and by Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Point of subjective equality (PSE) estimates of the four adaptive sequences for each Shepard tone for <bold>(A)</bold> participant <italic>AH</italic>, <bold>(B)</bold> <italic>MK</italic>, <bold>(C)</bold> <italic>FM</italic> and <bold>(D)</bold> <italic>SM</italic>. The PSEs were estimated by the mean of the last four turnpoints in the up-down staircase method. The error bars represent the standard error of the four turnpoints used for PSE estimation. In some cases, standard errors are too small to be visible. There are less than four data points in some cases because of several reasons: the turnpoints were not enough to estimate the PSE, and similar PSE estimates result in overlaid data points.</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0005.tif"/>
</fig>
<p>Comparing the averaged PSEs with the prediction of the VPT (predicted by Terhardt, <xref ref-type="bibr" rid="B53">1990</xref>) revealed that the PSEs were between the SPs and VPs. The absolute values were more similar to the VPs than to the SPs (see Figure <xref ref-type="fig" rid="F6">6</xref>). However, the highest PSE was at Shepard tone <italic>G</italic>, which was in accordance with the highest SP and not the highest VP (<italic>D&#x00023;</italic>).</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Averaged PSE estimates compared to the spectral and virtual pitches with the highest weight predicted by the VPT. Error bars represent the standard error (<italic>n</italic> &#x0003D; 8).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0006.tif"/>
</fig>
</sec>
<sec>
<title>2.3. Discussion</title>
<p>The experiment showed that the listeners match frequencies of pure tones to octave components in different octaves, indicating octave ambiguity in pitch height for single Shepard tones. However, it remains unclear whether this ambiguity is a genuine property of Shepard tones or is only an artifact of the matching task. For some participants, the up-down staircase method failed to succeed in converging on a frequency. Furthermore, the PSE estimates varied significantly for some participants, indicating a possible difficulty in comparing sounds of different timbre, which typically results in increased errors, especially, for nonmusicians (Seither-Preisler et al., <xref ref-type="bibr" rid="B48">2007</xref>).</p>
<p>A further problem of the method is whether the frequency matches correspond to the &#x0201C;true&#x0201D; pitch or to frequency components emphasized by the sinusoidal comparison tone; that is, the problem is whether the pitch height assessable in the pure tone matching task corresponds to that in the tone comparison task. Terhardt (<xref ref-type="bibr" rid="B54">1991</xref>) postulated that Shepard tone pitches are mostly determined by their VPs and not by their SPs. The experiment showed that, overall, the frequency matches failed to correspond to VPs, indicating that at least some matches correspond to SPs, possibly emphasized by the frequency of the comparison tone. Possibly, participants were distracted by SPs, which increased noise and caused the ambiguity in octave position. To conclude, it remains unclear whether the octave ambiguity found in single Shepard tones is only an artifact of the pure tone matching method.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Tone-pair comparison task</title>
<p>The goal of the second experiment was to ascertain whether the ambiguity found in the pitch height of single Shepard tones accounts for the tritone paradox, providing support for the psychoacoustic account. First, the highest Shepard tone assessable in the pure tone matching task should correspond to that in the tone-pair comparison task. Thus, the highest Shepard tone should be around <italic>G</italic> for a stimulus set generated under a fixed envelope centered at 261 Hz. Second, the threshold model, implementing pitch ambiguity in single Shepard tones, should predict the typical response patterns of the tritone paradox.</p>
<p>The typical finding when testing tritone pairs has been that some tone pairs are clearly judged as rising in pitch, some are ambiguous (i.e., in some cases judged as rising and in the other as falling) and some are clearly judged as falling in pitch, resulting in a sigmoid response function (Deutsch, <xref ref-type="bibr" rid="B16">1991</xref>; Repp, <xref ref-type="bibr" rid="B44">1997</xref>). Although the sigmoid form is a typical finding of the tritone paradox, most theories have not considered it. For example, Terhardt (<xref ref-type="bibr" rid="B54">1991</xref>) considered the dominant virtual pitches, causing a staircase response pattern but not the sigmoid pattern. Thus, the threshold model ought to predict the typical sigmoid function.</p>
<p>A much debated question is whether the spectral envelope affects the tritone paradox. Shepard tones were constructed under a fixed spectral envelope. Shifting the envelope center on the frequency axis changes the amplitude relation of the Shepard tones&#x00027; components (see Figure <xref ref-type="fig" rid="F1">1</xref>). In most studies, the pitch judgments of most participants were, at best, only minimally affected by envelope shifts (Deutsch, <xref ref-type="bibr" rid="B13">1986</xref>, <xref ref-type="bibr" rid="B14">1987</xref>, <xref ref-type="bibr" rid="B15">1988</xref>, <xref ref-type="bibr" rid="B16">1991</xref>; Cohen et al., <xref ref-type="bibr" rid="B10">1994</xref>; Giangrande, <xref ref-type="bibr" rid="B22">1998</xref>; Repp and Thompson, <xref ref-type="bibr" rid="B45">2010</xref>). Some studies, however, found pronounced envelope effects for at least some participants (Repp, <xref ref-type="bibr" rid="B43">1994</xref>, <xref ref-type="bibr" rid="B44">1997</xref>; Kr&#x000FC;ger and Lukas, <xref ref-type="bibr" rid="B30">2002</xref>; Kr&#x000FC;ger, <xref ref-type="bibr" rid="B29">2011</xref>). Repp (<xref ref-type="bibr" rid="B44">1997</xref>) quantified envelope effects by averaging <italic>lower</italic> response rates across envelope sets not as a function of the pitch classes but as a function of the clockwise distance of these pitch classes from the envelope center (e.g., the Shepard tone <italic>C</italic> is shifted by 0 semitones whereas the Shepard tone <italic>C&#x00023;</italic> is shifted by 1 semitone from envelope center <italic>C</italic><sub>4</sub>). In particular, Repp (<xref ref-type="bibr" rid="B44">1997</xref>) revealed that the individual highest pitch classes were shifted by about 6 semitones from the envelope center. The threshold model should predict that the response pattern depends on the pitch class and on the envelope center. Empirical evaluation and simulations were conducted in order to determine whether this is dependent on the form of the threshold function.</p>
<sec>
<title>3.1. Methods</title>
<sec>
<title>3.1.1. Participants</title>
<p>Normal-hearing, undergraduate students from the Martin Luther University Halle-Wittenberg (<italic>n</italic> &#x0003D; 29; 22 women) participated in the study. They were aged between 18 and 31 years (<italic>M</italic> &#x0003D; 21.7, <italic>SD</italic> &#x0003D; 3.09). No professional musician participated in the study. At the time of the survey, most participants had some musical experience. Only four participants had never played an instrument or sung regularly. The participants played an instrument on average 1.57 h a week (<italic>SD</italic> &#x0003D; 2.53) or sung 0.69 h a week (<italic>SD</italic> &#x0003D; 1.32) in a choir or received singing lessons. On average, listeners made 5.60% errors (<italic>SD</italic> &#x0003D; 7.15%, range: 0&#x02212;25%) in a pure tone discrimination pretest. All participants lived and were raised in Germany. Payment and ethical standards were the same as in the first experiment.</p>
</sec>
<sec>
<title>3.1.2. Stimuli</title>
<p>The Shepard tones were constructed in the same way as in the pitch-matching experiment (see Equations 1&#x02013;3). The following 12 tritone pairs were formed: <italic>C</italic>&#x02212;<italic>F&#x00023;</italic>, <italic>C&#x00023;</italic>&#x02212;<italic>G</italic>, <italic>D</italic>&#x02212;<italic>G&#x00023;</italic>, <italic>D&#x00023;</italic>&#x02212;<italic>A</italic>, <italic>E</italic>&#x02212;<italic>A&#x00023;</italic>, <italic>F</italic>&#x02212;<italic>B</italic>, <italic>F&#x00023;</italic>&#x02212;<italic>C</italic>, <italic>G</italic>&#x02212;<italic>C&#x00023;</italic>, <italic>G&#x00023;</italic>&#x02212;<italic>D</italic>, <italic>A</italic>&#x02212;<italic>D&#x00023;</italic>, <italic>A&#x00023;</italic>&#x02212;<italic>E</italic>, and <italic>B</italic>&#x02212;<italic>F</italic>. Each one of the tritone pair was synthesized under six spectral envelopes centered at different envelope centers on the frequency axis. The envelope centers were chosen to cover a wide frequency region (see Figure <xref ref-type="fig" rid="F12">12</xref>) to test different threshold function forms. There was one envelope in the low frequency region centered at 65.41 Hz (<italic>C</italic><sub>2</sub>, <italic>f</italic><sub><italic>min</italic></sub> &#x0003D; 8.18), four envelopes in the middle frequency region centered at 261 Hz (<italic>C</italic><sub>4</sub>, <italic>f</italic><sub><italic>min</italic></sub> &#x0003D; 32.70), 370 Hz (<italic>F&#x00023;</italic><sub>4</sub>, <italic>f</italic><sub><italic>min</italic></sub> &#x0003D; 46.25), 440 Hz (<italic>A</italic><sub>4</sub>, <italic>f</italic><sub><italic>min</italic></sub> &#x0003D; 55), and 523.25 Hz (<italic>C</italic><sub>5</sub>, <italic>f</italic><sub><italic>min</italic></sub> &#x0003D; 65.41), and one envelope in the high frequency region centered at 1244.51 Hz (<italic>D&#x00023;</italic><sub>6</sub>, <italic>f</italic><sub><italic>min</italic></sub> &#x0003D; 155.56). Each Shepard tone (sampling rate = 44.1 kHz) lasted 800 ms and had a constant amplitude, with the exception of 53.6 ms sinusoidal ramps at onset and offset to prevent clicks. Subsequent Shepard tones were separated by a 200 ms interstimulus interval. They were presented at a volume of approximately 55 dB sound pressure level (SPL).</p>
<p>White noise (20&#x02013;10,000 Hz; level = 40 dB SPL) was presented in the background of each tritone pair (Zwicker and Feldtkeller, <xref ref-type="bibr" rid="B61">1967</xref>; Hartmann, <xref ref-type="bibr" rid="B25">1998</xref>). Sample audio files are provided in the <xref ref-type="supplementary-material" rid="SM2">Supplementary Material</xref>. Background noise masks the potential products of nonlinear distortions in the ear canal such as combination tones (Zwicker and Feldtkeller, <xref ref-type="bibr" rid="B61">1967</xref>; Hartmann, <xref ref-type="bibr" rid="B25">1998</xref>) and attenuates the frequency dependence and interindividual differences of single-frequency hearing thresholds (Zwicker and Feldtkeller, <xref ref-type="bibr" rid="B61">1967</xref>), facilitating the fitting of the threshold function.</p>
</sec>
<sec>
<title>3.1.3. Pure tone pretest</title>
<p>To measure the ability to discriminate pitch direction for non-ambiguous stimuli, the experiment started with a pure tone pretest, comprising 16 randomized different tone pairs, lasting 800 ms with a level of 55 dB SPL. On each trial, a standard and a comparison tone were presented, separated by a 200 ms silent interval. Participants had to judge whether the comparison tone was <italic>higher</italic> or <italic>lower</italic> in pitch than the standard tone. They could repeat the tone pair as often as they wished by pressing a repeat key. The standard tone always preceded the comparison tone. The standard tones&#x00027; frequencies corresponded to the fourth component of four randomly chosen Shepard tones, because the pitch-matching experiment suggested that Shepard tones&#x00027; pitches lie in this region. There were four standard tones with frequencies of 261.62 Hz, 277.18, 392, and 493.88 Hz, which corresponded to the fourth component of the Shepard tones <italic>C</italic>, <italic>C&#x00023;</italic>, <italic>G</italic>, <italic>B</italic>. The frequency of the comparison tones ranged from 207 Hz (<italic>G&#x00023;</italic><sub>3</sub>) to 622 Hz (<italic>D&#x00023;</italic><sub>5</sub>). The frequency differences between standard and comparison tone ranged from 1&#x02013;4 semitones. The standard tones with frequencies 261.62 and 392 Hz formed decreasing tone pairs; the standard tones with frequencies 277.18 Hz and 493.88 Hz formed increasing tone pairs. White noise was presented in the background of each tone pair. No tone pair was presented twice, except when participants pressed the repeat key. Thus, the pretest consisted of at least 16 trials.</p>
</sec>
<sec>
<title>3.1.4. Equipment</title>
<p>The equipment was the same as that used in the pure tone matching experiment.</p>
</sec>
<sec>
<title>3.1.5. Design and procedure</title>
<p>One tritone pair was presented in each trial. After the presentation, the participants were asked to judge whether the second tone was <italic>higher</italic> or <italic>lower</italic> in pitch than the first tone. After response, the next trial started immediately.</p>
<p>The participants listened to 12 (pitch classes) &#x000D7; 6 (envelope centers) &#x000D7; 30 (repetitions) experimental trials (2160 trials) and 12 (pitch classes) &#x000D7; 6 (envelope centers) &#x000D7; 2 (repetitions) practice trials (144 trials). Thus, each tritone pair was presented 32 times, and the first two presentations of each tone pair were practice trials and were excluded from data analysis. The trials were presented in blocks according to the envelope center. The order of the blocks and the order of the tritone pairs within each block were randomized for each participant. Participants could rest after each block of trials and could continue with the experiment when they wanted. The experiment was divided into two sessions, each lasting 1.5 h and separated by at least 1 day.</p>
</sec>
</sec>
<sec>
<title>3.2. Results</title>
<p>Categorical lower/higher responses were analyzed with logistic regression (Jaeger, <xref ref-type="bibr" rid="B28">2008</xref>).</p>
<sec>
<title>3.2.1. Data analysis</title>
<sec>
<title>3.2.1.1. Sample data.</title>
<p>Averaged <italic>lower</italic> responses as a function of the first Shepard tone were sigmoid and less pronounced for all envelope centers (see Figure <xref ref-type="fig" rid="F8">8</xref>), indicating high variability within or between the listeners. The highest Shepard tones depended on the envelope center (see Table <xref ref-type="table" rid="T1">1</xref>). The highest Shepard tone for the <italic>C</italic><sub>4</sub>-envelope (at 261 Hz) was at 7.43 (<italic>F&#x00023;</italic>&#x02212;<italic>G</italic>), corresponding, approximately, to the highest Shepard tone found in the pure tone matching experiment.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Highest Shepard tones in the data predicted by the threshold model using a horizontal and a logistic threshold function for the six envelope centers.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Envelope center</bold></th>
<th valign="top" align="center"><bold>Data</bold></th>
<th valign="top" align="center"><bold>Horizontal</bold></th>
<th valign="top" align="center"><bold>Logistic</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>C</italic><sub>2</sub></td>
<td valign="top" align="center">3.68 (D-D&#x00023;)</td>
<td valign="top" align="center">5.73 (E-F)</td>
<td valign="top" align="center">3.68 (D-D&#x00023;)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>C</italic><sub>4</sub></td>
<td valign="top" align="center">7.43 (F&#x00023;-G)</td>
<td valign="top" align="center">5.73 (E-F)</td>
<td valign="top" align="center">6.54 (F-F&#x00023;)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>F&#x00023;</italic><sub>4</sub></td>
<td valign="top" align="center">11.78 (A&#x00023;-B)</td>
<td valign="top" align="center">11.73 (A&#x00023;-B)</td>
<td valign="top" align="center">12.54 (B-C)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>A</italic><sub>4</sub></td>
<td valign="top" align="center">4.12 (D&#x00023;)</td>
<td valign="top" align="center">2.73 (C&#x00023;-D)</td>
<td valign="top" align="center">3.54 (D-D&#x00023;)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>C</italic><sub>5</sub></td>
<td valign="top" align="center">5.74 (E-F)</td>
<td valign="top" align="center">5.73 (E-F)</td>
<td valign="top" align="center">6.54 (F-F&#x00023;)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>D&#x00023;</italic><sub>6</sub></td>
<td valign="top" align="center">1.79 (C-C&#x00023;)</td>
<td valign="top" align="center">8.73 (G-G&#x00023;)</td>
<td valign="top" align="center">9.54 (G&#x00023;-A)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The horizontal threshold function parameters were <inline-formula><mml:math id="M5"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>58</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M6"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>19</mml:mn></mml:math></inline-formula> and the logistic threshold function parameters were &#x000E2; &#x0003D; 5.67, <inline-formula><mml:math id="M7"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>13</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M8"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>62</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M9"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>18</mml:mn></mml:math></inline-formula></italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>A multilevel (i.e., mixed-effects) logistic regression model (Jaeger, <xref ref-type="bibr" rid="B28">2008</xref>) was fitted, with initial pitch class (<italic>C</italic>, <italic>C&#x00023;</italic>, <italic>D</italic>, <italic>D&#x00023;</italic>, <italic>E</italic>, <italic>F</italic>, <italic>F&#x00023;</italic>, <italic>G</italic>, <italic>G&#x00023;</italic>, <italic>A</italic>, <italic>A&#x00023;</italic>, <italic>B</italic>) and envelope center (<italic>C</italic><sub>2</sub>, <italic>C</italic><sub>4</sub>, <italic>F&#x00023;</italic><sub>4</sub>, <italic>A</italic><sub>4</sub>, <italic>C</italic><sub>5</sub>, <italic>D&#x00023;</italic><sub>6</sub>) as predictors and averaged lower responses as the outcome. The pitch classes significantly affected the proportion of <italic>lower</italic> responses, &#x003C7;(1)<sup>2</sup> &#x0003D; 75.96, <italic>p</italic> &#x0003C; 0.0001. Pitch class effects were quantified by averaging <italic>lower</italic> response rates across the envelope sets as a function of pitch classes. The difference of the maximum and the minimum of this response function was taken as the magnitude of pitch class effect. For continuously varying measures of the highest pitch class, the rotation angle of the resultant vector was calculated from the averaged <italic>lower</italic> response rates as a function of Shepard tones&#x00027; pitch classes (see Figure <xref ref-type="fig" rid="F2">2</xref>, Fisher, <xref ref-type="bibr" rid="B20">1993</xref>; Repp and Thompson, <xref ref-type="bibr" rid="B45">2010</xref>). This measure ranged between 1 and 12 and corresponded to the consecutive pitch classes, for example 1 corresponded to <italic>C</italic>, 2 to <italic>C&#x00023;</italic>, etc. The highest pitch class was 3.52 (between <italic>D</italic> and <italic>D&#x00023;</italic>), and the magnitude of effect was small (10.15%). The envelope centers also significantly influenced the proportion of <italic>lower</italic> responses, &#x003C7;(1)<sup>2</sup> &#x0003D; 21.31, <italic>p</italic> &#x0003D; &#x0003C; 0.0001. There were less <italic>lower</italic> responses under the envelope centered at <italic>C</italic><sub>2</sub> (<italic>M</italic> &#x0003D; 47.71%) than under those in the middle and high frequency regions (<italic>C</italic><sub>4</sub>; <italic>M</italic> &#x0003D; 50.22%, <italic>F&#x00023;</italic><sub>4</sub>: <italic>M</italic> &#x0003D; 51.78%, <italic>A</italic><sub>4</sub>: <italic>M</italic> &#x0003D; 51.23%, <italic>C</italic><sub>5</sub>: <italic>M</italic> &#x0003D; 50.91%, <italic>D&#x00023;</italic><sub>6</sub>: <italic>M</italic> &#x0003D; 51.16%). The interaction between pitch classes and envelope centers was also significant, indicating that the highest pitch class depended on the envelope center (envelope effects), &#x003C7;(1)<sup>2</sup> &#x0003D; 51.38, <italic>p</italic> &#x0003D; &#x0003C; 0.0001. As described above, envelope effects were quantified using the procedure described by Repp (<xref ref-type="bibr" rid="B44">1997</xref>). The highest pitch class was 4.72 semitones shifted from the envelope center, and the magnitude of effect was slightly larger than the magnitude of pitch class effect (16.44%).</p>
</sec>
<sec>
<title>3.2.1.2. Individual data</title>
<p>Pitch class and envelope effects were calculated individually for each participant to investigate whether the small effects were due to small individual effects or high individual variability. Considering pitch class effects, the individual highest pitch classes were distributed nearly uniformly across the whole pitch class circle (see Figure <xref ref-type="fig" rid="F7">7A</xref>); the Rayleigh test for non-uniformity of circular data (Berens, <xref ref-type="bibr" rid="B1">2009</xref>) revealed that the distribution of highest pitch classes around the pitch class circle failed to significantly deviate from a uniform distribution, <italic>R</italic> &#x0003D; 0.36, <italic>p</italic> &#x0003D; 0.70, <italic>n</italic> &#x0003D; 29. Thus, the listeners failed to agree on which pitch class was the highest, despite having the same linguistic background.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>(A)</bold> Distribution of individual highest pitch classes and the magnitude of pitch class effect (<italic>n</italic> &#x0003D; 29). <bold>(B)</bold> Distribution of individual highest pitch classes shifted from the envelope center (in semitones, st) and the magnitude of envelope effects (<italic>n</italic> &#x0003D; 29). The horizontal line represents the median of the individual magnitude of effect. The vertical line represents the median of the clockwise distance by which the highest pitch class is shifted from the envelope center.</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0007.tif"/>
</fig>
<p>Considering envelope effects, the individual highest pitch classes were shifted, on average, 5.20 semitones from envelope center (see Figure <xref ref-type="fig" rid="F7">7B</xref>); the Rayleigh test for non-uniformity of circular data revealed a significant deviation from a uniform distribution, <italic>R</italic> &#x0003D; 8.44, <italic>p</italic> &#x0003C; 0.0001, <italic>n</italic> &#x0003D; 29. Thus, the listeners agreed that the highest pitch class was shifted about 5 semitones from envelope center.</p>
<p>There was no significant correlation between pitch class and envelope effects, <italic>r</italic> &#x0003D; 0.24, <italic>p</italic> &#x0003D; 0.203 or between error rates of the pure tone discrimination test and the magnitudes of pitch class effect, <italic>r</italic> &#x0003D; 0.16, <italic>p</italic> &#x0003D; 0.405 or envelope effects, <italic>r</italic> &#x0003D; &#x02212;0.292, <italic>p</italic> &#x0003D; 0.124.</p>
</sec>
</sec>
<sec>
<title>3.2.2. Model testing</title>
<p>The root mean squared deviation (<italic>RMSD</italic>) is used to validate a model&#x00027;s goodness of fit. This measure uses root mean squared error (<italic>RMSE</italic>) and, additionally, takes into account the number of model parameters <italic>k</italic> (Pitt et al., <xref ref-type="bibr" rid="B40">2002</xref>): <inline-formula><mml:math id="M10"><mml:mrow><mml:mi>R</mml:mi><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>S</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>/</mml:mo><mml:mtext>N-k</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:math></inline-formula>.</p>
<sec>
<title>3.2.2.1. Sample data</title>
<p>For testing the threshold model, it was necessary to choose the form of the threshold function (and the number of free parameters). The simplest assumption is that the threshold function (applied to each Shepard component) does not depend on frequency (horizontal threshold function). Thus, a horizontal function with the two parameters &#x003BC;<sub><italic>t</italic></sub> and &#x003C3;<sup>2</sup> was fitted, estimated by the least-squares method on the basis of the 72 sample data points (12 Shepard tones &#x000D7; 6 Envelope centers, <italic>n</italic> &#x0003D; 29). Model input was the logarithmic frequencies and the relative amplitudes of experimental stimuli. The threshold model predicted the sigmoid form of the response patterns and their shifts for the different envelope centers (see Figure <xref ref-type="fig" rid="F8">8</xref> and Table <xref ref-type="table" rid="T1">1</xref>, RMSD &#x0003D; 0.08, <italic>R</italic><sup>2</sup> &#x0003D; 0.31).</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Fitting the horizontal threshold function: average percentages of <italic>lower</italic> responses (blue dots and unbroken line) in comparison to model predictions (red diamonds and dotted line; estimated model parameters: <inline-formula><mml:math id="M11"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>58</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M12"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>19</mml:mn></mml:math></inline-formula>) for the six envelopes <bold>(A)</bold> <italic>C</italic><sub>2</sub>, <bold>(B)</bold> <italic>C</italic><sub>4</sub>, <bold>(C)</bold> <italic>F&#x00023;</italic><sub>4</sub>, <bold>(D)</bold> <italic>A</italic><sub>4</sub>, <bold>(E)</bold> <italic>C</italic><sub>5</sub>, <bold>(F)</bold> <italic>D&#x00023;</italic><sub>6</sub>. The error bars represent the standard error of mean (<italic>n</italic> &#x0003D; 29).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0008.tif"/>
</fig>
<p>As it can be seen in Table <xref ref-type="table" rid="T1">1</xref>, the predicted highest Shepard tones deviated by 2.02 semitones from the empirical highest Shepard tones. The largest deviation was found for the high envelope (<italic>D&#x00023;</italic><sub>6</sub>). However, the response function was nearly flat (see Figure <xref ref-type="fig" rid="F8">8F</xref>), and, thus, no reasonable highest Shepard tone was assessabled.</p>
<p>The model fit was poor for the lower envelope centers <italic>C</italic><sub>2</sub> and <italic>C</italic><sub>4</sub> (see Figures <xref ref-type="fig" rid="F8">8A,B</xref>), possibly because a horizontal threshold function is not appropriate in low frequency regions. Normally, psychoacoustic parameters depend on frequency, for example the hearing threshold is higher in low frequency regions than in middle frequency regions (Zwicker and Fastl, <xref ref-type="bibr" rid="B60">1999</xref>). Thus, the expected value &#x003BC;<sub><italic>t</italic></sub> was modeled using the logistic function</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>q</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>q</mml:mi><mml:mo>-</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x000B7;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo class="qopname">log</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>a</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>with <italic>l</italic> &#x0003D; 0 to implement a frequency-dependent threshold function. The logistic function is often used to fit a measured psychometric function to psychoacoustic data, because it has suitable general properties. For example, it begins at 0 and increases to 1 following a sigmoidal function.</p>
<p>Model predictions fitted well to data points in the middle and low frequency region (see Figure <xref ref-type="fig" rid="F9">9</xref> and Table <xref ref-type="table" rid="T1">1</xref>, RMSD &#x0003D; 0.06, <italic>R</italic><sup>2</sup> &#x0003D; 0.64). Particularly, model fit was improved for the <italic>C</italic>2 envelope (fit for the horizontal threshold function: <italic>RMSD</italic> &#x0003D; 0.14, <italic>R</italic><sup>2</sup> &#x0003D; 0.23; fit for the logistic threshold function: <italic>RMSD</italic> &#x0003D; 0.02, <italic>R</italic><sup>2</sup> &#x0003D; 1) and the <italic>C</italic><sub>4</sub> envelope (fit for the horizontal threshold function: <italic>RMSD</italic> &#x0003D; 0.08, <italic>R</italic><sup>2</sup> &#x0003D; 0.38; fit for the logistic threshold function: <italic>RMSD</italic> &#x0003D; 0.06, <italic>R</italic><sup>2</sup> &#x0003D; 0.8; also see Figures <xref ref-type="fig" rid="F9">9A,B</xref>), indicating that a falling, frequency-dependent threshold function is more appropriate than a horizontal, frequency-independent function for lower frequency regions. As can be seen in Table <xref ref-type="table" rid="T1">1</xref>, the predicted highest Shepard tones deviated by no more than 1 semitone from the empirical highest Shepard tones, again, except for the <italic>D&#x00023;</italic><sub>6</sub> envelope.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Fitting the logistic threshold function: average percentages of <italic>lower</italic> responses (blue dots and unbroken line) in comparison to model predictions for the six envelopes <bold>(A)</bold> <italic>C</italic><sub>2</sub>, <bold>(B)</bold> <italic>C</italic><sub>4</sub>, <bold>(C)</bold> <italic>F&#x00023;</italic><sub>4</sub>, <bold>(D)</bold> <italic>A</italic><sub>4</sub>, <bold>(E)</bold> <italic>C</italic><sub>5</sub>, <bold>(F)</bold> <italic>D&#x00023;</italic><sub>6</sub> (red diamonds and dotted line; estimated model parameters: &#x000E2; &#x0003D; 5.67, <inline-formula><mml:math id="M14"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>13</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M15"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>62</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M16"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>18</mml:mn></mml:math></inline-formula>). The error bars represent the standard error of mean (<italic>n</italic> &#x0003D; 29).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0009.tif"/>
</fig>
<p>Comparing the estimated horizontal and logistic threshold functions shows differences in the low frequency region (see Figure <xref ref-type="fig" rid="F12">12</xref>). The logistic function depended on frequency only in the low frequency region. The estimated threshold functions were quite similar in the middle and high frequency regions. Components with relative amplitudes below 0.6 tend to be filtered out (horizontal: &#x003BC;<sub><italic>t</italic></sub> &#x0003D; 0.58; logistic: <italic>q</italic> &#x0003D; 0.62).</p>
<p>The estimated logistic threshold function was analyzed more closely (see Figure <xref ref-type="fig" rid="F12">12</xref>). The parameter <italic>a</italic>, which determined the position of the logistic function on the frequency axis, was estimated at 50.91 Hz (<italic>a</italic> &#x0003D; 5.67 was converted to the non-logarithmic frequency scale 2<sup>5.67</sup> &#x0003D; 50.91), resulting in threshold function values nearly or even greater than one for frequencies lower than 50.91 Hz. Thus, components with frequencies lower than 50.91 Hz were probably filtered out, even if component amplitudes were maximal. Thus, the function parameter <italic>a</italic> can be considered as the lower limit of a preference region. The parameter <italic>q</italic>, which determined the lower limit of the range of the threshold function, was 0.62. The parameter <italic>q</italic> can be considered as a lower loudness boundary, because for components where the amplitude is smaller than <italic>q</italic>, the probability of not being filtered is rather small (&#x0003C; 0.5). The parameter <italic>b</italic>, which determined the slope of the threshold function, was 0.13, indicating a threshold function of moderate steepness.</p>
</sec>
<sec>
<title>3.2.2.2. Individual data</title>
<p>To fit individual threshold functions, the four parameters of the logistic function (see Equation 4) and &#x003C3;<sup>2</sup> were estimated by the least-squares method on the basis of the 72 individual data points (12 Shepard tones &#x000D7; 6 envelopes, <italic>n</italic> &#x0003D; 30). As can be seen in Figure <xref ref-type="fig" rid="F10">10</xref>, the general trend of the responses of the participant PG (<italic>R</italic><sup>2</sup> &#x0003D; 0.79), who showed pronounced pitch class effects (71.11%), was predicted well by the threshold model. Except from the <italic>C</italic><sub>2</sub> envelope, the threshold model predicted that the response patterns (see Figures <xref ref-type="fig" rid="F10">10B&#x02013;F</xref>) and the highest Shepard tones (see Table <xref ref-type="table" rid="T2">2</xref>) were not affected by the envelope center. The absolute predictions deviated from the data (<italic>RMSD</italic> &#x0003D; 0.17) because of the large deviation for the <italic>C</italic><sub>2</sub> envelope, where the threshold model predicted lower response rates of 0.5 for all tritone pairs (see Figure <xref ref-type="fig" rid="F10">10A</xref>). The prediction of the flat response function was due to the estimation of a very steep threshold function (see Figure 12), leading to a very high probability of all frequency components being filtered out for both Shepard tones and to an equal probability of the <italic>higher</italic> and <italic>lower</italic> responses (see Equation 19).</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Fitted pronounced pitch class effects (participant <italic>PG</italic>): average percentages of <italic>lower</italic> responses (blue dots and unbroken line) in comparison to model predictions for the six envelopes <bold>(A)</bold> <italic>C</italic><sub>2</sub>, <bold>(B)</bold> <italic>C</italic><sub>4</sub>, <bold>(C)</bold> <italic>F&#x00023;</italic><sub>4</sub>, <bold>(D)</bold> <italic>A</italic><sub>4</sub>, <bold>(E)</bold> <italic>C</italic><sub>5</sub>, <bold>(F)</bold> <italic>D&#x00023;</italic><sub>6</sub> (red diamonds and dotted line; estimated model parameters: &#x000E2; &#x0003D; 8.67, <inline-formula><mml:math id="M17"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>11</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M18"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>29</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M19"><mml:mover accent="true"><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>57</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M20"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>27</mml:mn></mml:math></inline-formula>).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0010.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Highest Shepard tones in the data predicted by the threshold model using logistic threshold functions for <italic>PG</italic>, who showed pronounced pitch class effects and AM, who showed pronounced envelope effects for the six envelope centers.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Envelope</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><italic><bold>PG</bold></italic></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><italic><bold>AM</bold></italic></th>
</tr>
<tr style="border-bottom: thin solid #000000;">
<th/>
<th valign="top" align="center"><bold>Data</bold></th>
<th valign="top" align="center"><bold>Predicted</bold></th>
<th valign="top" align="center"><bold>Data</bold></th>
<th valign="top" align="center"><bold>Predicted</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>C</italic><sub>2</sub></td>
<td valign="top" align="center">4.14 (D&#x00023;)</td>
<td valign="top" align="center">11.70 (A&#x00023;-B)</td>
<td valign="top" align="center">4.53 (D&#x00023;-E)</td>
<td valign="top" align="center">4.51 (D&#x00023;-E)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>C</italic><sub>4</sub></td>
<td valign="top" align="center">5.78 (E-F)</td>
<td valign="top" align="center">5.57 (E-F)</td>
<td valign="top" align="center">7.27 (F&#x00023;-G)</td>
<td valign="top" align="center">7.42 (F&#x00023;-G)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>F&#x00023;</italic><sub>4</sub></td>
<td valign="top" align="center">5.46 (E-F)</td>
<td valign="top" align="center">5.34 (E-F)</td>
<td valign="top" align="center">12.49 (B-C)</td>
<td valign="top" align="center">12.34 (B-C)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>A</italic><sub>4</sub></td>
<td valign="top" align="center">4.91 (E)</td>
<td valign="top" align="center">5.37 (E-F)</td>
<td valign="top" align="center">3.35 (D-D&#x00023;)</td>
<td valign="top" align="center">2.93 (D)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>C</italic><sub>5</sub></td>
<td valign="top" align="center">5.82 (E-F)</td>
<td valign="top" align="center">5.52 (E-F)</td>
<td valign="top" align="center">5.33 (E-F)</td>
<td valign="top" align="center">5.58 (E-F)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>D&#x00023;</italic><sub>6</sub></td>
<td valign="top" align="center">7.69 (F&#x00023;-G)</td>
<td valign="top" align="center">7.72 (F&#x00023;-G)</td>
<td valign="top" align="center">7.38 (F&#x00023;-G)</td>
<td valign="top" align="center">7.58 (F&#x00023;-G)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Estimated logistic-threshold-function parameters for &#x0201C;PG&#x0201D; were &#x000E2; &#x0003D; 8.67, <inline-formula><mml:math id="M21"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>11</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M22"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>29</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M23"><mml:mover accent="true"><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>57</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M24"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>27</mml:mn></mml:math></inline-formula> and for &#x0201C;AM&#x0201D; were &#x000E2; &#x0003D; 2.00, <inline-formula><mml:math id="M25"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>08</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M26"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>50</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M27"><mml:mover accent="true"><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>12</mml:mn><mml:mo>.</mml:mo><mml:mn>02</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M28"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>12</mml:mn></mml:math></inline-formula></italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>As can be seen in Figure <xref ref-type="fig" rid="F11">11</xref>, the general trend of the responses of the participant AM (<italic>R</italic><sup>2</sup> &#x0003D; 0.89), who showed pronounced envelope effects (52.22%), was predicted well by the threshold model. The threshold model predicted that the response patterns (see Figure <xref ref-type="fig" rid="F11">11</xref>) and the highest Shepard tones (see Table <xref ref-type="table" rid="T2">2</xref>) were affected by the envelope center. The absolute predictions deviated from the data (<italic>RMSD</italic> &#x0003D; 0.16), because the participant had an overall bias to give more <italic>lower</italic> responses (<italic>M</italic> &#x0003D; 62.45%), which is not implemented in the threshold model.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Fitted pronounced pitch class effects (participant AM): average percentages of <italic>lower</italic> responses (blue dots and unbroken line) in comparison to model predictions for the six envelopes <bold>(A)</bold> <italic>C</italic><sub>2</sub>, <bold>(B)</bold> <italic>C</italic><sub>4</sub>, <bold>(C)</bold> <italic>F&#x00023;</italic><sub>4</sub>, <bold>(D)</bold> <italic>A</italic><sub>4</sub>, <bold>(E)</bold> <italic>C</italic><sub>5</sub>, <bold>(F)</bold> <italic>D&#x00023;</italic><sub>6</sub> (red diamonds and dotted line; estimated model parameters: &#x000E2; &#x0003D; 2.00, <inline-formula><mml:math id="M29"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>08</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M30"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>50</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M31"><mml:mover accent="true"><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>12</mml:mn><mml:mo>.</mml:mo><mml:mn>02</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M32"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>12</mml:mn></mml:math></inline-formula>).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0011.tif"/>
</fig>
<p>As can be seen in Figure <xref ref-type="fig" rid="F12">12</xref>, the threshold function used to predict pronounced pitch-class effects (the response pattern of <italic>PG</italic>) is much steeper than those used to predict pronounced envelope effects (the response pattern of <italic>AM</italic>). The steepness of the logistic function is determined by the parameter <italic>b</italic>. Possibly, this parameter determines the relationship between pitch class and envelope effects. Another important parameter is <italic>a</italic>, which determines the position of the logistic function on the frequency axis. Individual differences in this parameter, possibly, explain the individual differences among the highest pitch classes.</p>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>The six envelopes used in the experiment (gray lines) along with the estimated horizontal and logistic sample threshold functions and the individual logistic threshold functions for participants PG and AM, with pronounced pitch class and envelope effects, respectively. The estimated parameters of the horizontal sample threshold function were <inline-formula><mml:math id="M33"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>58</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M34"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>19</mml:mn></mml:math></inline-formula>; that of the logistic sample were &#x000E2; &#x0003D; 5.67, <inline-formula><mml:math id="M35"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>13</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M36"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>62</mml:mn></mml:math></inline-formula>, <italic>l</italic> &#x0003D; 0, <inline-formula><mml:math id="M37"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>18</mml:mn></mml:math></inline-formula>; that of the logistic for &#x0201C;PG&#x0201D; were &#x000E2; &#x0003D; 8.67, <inline-formula><mml:math id="M38"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>11</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M39"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>29</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M40"><mml:mover accent="true"><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>57</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M41"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>27</mml:mn></mml:math></inline-formula>; and that of the logistic for &#x0201C;AM&#x0201D; were &#x000E2; &#x0003D; 2.00, <inline-formula><mml:math id="M42"><mml:mover accent="true"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>08</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M43"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>50</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M44"><mml:mover accent="true"><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>12</mml:mn><mml:mo>.</mml:mo><mml:mn>02</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="M45"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>12</mml:mn></mml:math></inline-formula>.</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0012.tif"/>
</fig>
</sec>
</sec>
<sec>
<title>3.2.3. Simulations</title>
<p>Simulations were conducted to investigate the effect of the form of the threshold function on the relationship between pitch class and envelope effects. The model input consisted of the logarithmic frequencies and the relative amplitudes of each tritone pair under each of the six envelopes that were used in the experiment (see Figures <xref ref-type="fig" rid="F13">13A,B</xref>). Pitch class and envelope effects were quantified using Repp&#x00027;s <xref ref-type="bibr" rid="B43">1994</xref> procedure as described above. Figure <xref ref-type="fig" rid="F13">13</xref> shows that the threshold model predicted pitch class and envelope effects, depending on threshold function parameters. Particular effects depended on more than one parameter. However, by considering rather extreme threshold functions, some systematic associations were revealed: steep threshold functions caused pronounced pitch class (see Figure <xref ref-type="fig" rid="F13">13C</xref>) and small envelope effects (Figure <xref ref-type="fig" rid="F13">13E</xref>); that is, the highest pitch class did not depend on envelope centers. Shifting steep threshold functions on the frequency axis (see Figure <xref ref-type="fig" rid="F13">13A</xref>) caused nearly reversed patterns of pitch class effects, that is, opposite highest pitch classes (6 semitones removed; see Figure <xref ref-type="fig" rid="F13">13C</xref>). In contrast, flat threshold functions (see Figure <xref ref-type="fig" rid="F13">13B</xref>) caused pronounced envelope (Figure <xref ref-type="fig" rid="F13">13F</xref>) and small pitch class effects (see Figure <xref ref-type="fig" rid="F13">13D</xref>); that is, the highest pitch class did depend on the envelope center and was shifted by a particular distance from envelope center. Shifting flat threshold functions on the frequency axis (see Figure <xref ref-type="fig" rid="F13">13B</xref>) had no effect on pitch class effects (see Figure <xref ref-type="fig" rid="F13">13D</xref>) but had an effect on envelope effects (Figure <xref ref-type="fig" rid="F13">13F</xref>).</p>
<fig id="F13" position="float">
<label>Figure 13</label>
<caption><p>Simulation of pitch class <bold>(C,D)</bold> and envelope effects <bold>(E,F)</bold>: The steep (nearly vertical) threshold functions <bold>(A)</bold> cause pronounced pitch class <bold>(C)</bold> and small envelope effects <bold>(E)</bold>; shifting the threshold function on the frequency axis causes the shift in pitch class effects pattern in <bold>(C)</bold> (solid red or dashed green threshold functions lead to solid or dashed effects, respectively); the flat (nearly horizontal) threshold functions <bold>(B)</bold> cause small pitch class effects <bold>(D)</bold> and pronounced envelope effects <bold>(F)</bold>; shifting the threshold function on the frequency axis causes the shift in envelope effects pattern <bold>(F)</bold>. These simulations were performed for Shepard tones under the spectral envelopes used in the experiment (dotted line in <bold>A</bold>,<bold>B</bold>).</p></caption>
<graphic xlink:href="fpsyg-09-01590-g0013.tif"/>
</fig>
</sec>
</sec>
<sec>
<title>3.3. Discussion</title>
<p>The pitch-matching pretest revealed the octave ambiguity in pitch height of single Shepard tones. The further question was whether this ambiguity accounts for the tritone paradox, which would strengthen the psychoacoustic approach. The focus of the study was the introduction of a simple model that predicts responses to tone pairs by assuming the octave ambiguity in single Shepard tones. However, additionally, the study revealed further empirical evidence supporting the psychoacoustic approach.</p>
<p>The study replicated previous findings of an effect of the spectral envelope on the tritone paradox. The subjectively highest pitch classes were shifted about 5 semitones from the envelope center. Repp (<xref ref-type="bibr" rid="B44">1997</xref>) found shifts of about 6 semitones from the envelope center. The cause of this small deviation from the present study is unclear, but like in the Repp study, the Shepard tone that was nearly opposite to the envelope center was judged to be the highest. Thus, the spectral structure was the crucial factor in the present study. The finding of envelope effects supports the psychoacoustic approach, because, theoretically, changes in the spectral structure affect F0-extraction but not pitch classes.</p>
<p>Furthermore, the highest pitch class revealed in the pure tone matching task corresponded to that revealed in the tone-pair comparison task (pitch class <italic>G</italic> for <italic>C</italic><sub>4</sub> Shepard tones). In accordance with Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>), the highest Shepard tone assessable in the matching task corresponded to the highest Shepard tone of the tone-pair comparison task for averaged sample data.</p>
<p>The main finding of the study was that the threshold model, using a logistic threshold function, predicted the subjectively highest pitch classes, depending on the envelope center and the typical sigmoid response pattern for the sample and the individual data. Thus, in addition to the highest pitch class, the threshold model predicted that some tritone pairs were more ambiguous than other tritone pairs. Given that the threshold model implements the pitch ambiguity of single Shepard tones, the prediction of the typical sigmoid response pattern suggests that the octave ambiguity in single tones accounts for the tritone paradox.</p>
<p>The threshold model predicted the response patterns of participants who showed pronounced pitch class effects (highest pitch class is unaffected by the envelope center) and of those who showed pronounced envelope effects (the highest pitch class is affected by the envelope center). The estimated threshold function for the former was rather steep, while the estimated threshold function for the later was rather flat, indicating that the relationship between pitch class and envelope effects is mediated by the form of the threshold function.</p>
<p>Supporting this suggestion, the simulations showed that threshold functions that were mostly independent of frequency (horizontal or flat logistic threshold functions) account for small pitch class and pronounced envelope effects and that functions that are more closely dependent on frequency (steep logistic threshold functions) account for the reverse pattern. Furthermore, the simulation showed that the position of the threshold function on the frequency axis determines the highest pitch class.</p>
<p>Terhardt (<xref ref-type="bibr" rid="B54">1991</xref>) explained pitch class effects as a result of a frequency region where frequencies are especially salient (frequency preference region). Here, the VP (F0) extraction depends on this frequency preference region and only partly on the spectral structure of the Shepard tone. The findings of the present study add detail to this account. When participants possess a pronounced frequency preference region (implemented by steep threshold functions), that is, frequencies within a small frequency region are especially salient, pitch class effects are pronounced; however, when participants possess a wider frequency preference region (implemented by flat threshold functions), then envelope effects are pronounced. The position of the frequency preference region on the frequency axis (implemented by the position of the threshold function) determines the highest pitch class. Thus, the current approach supports and extends Terhardt (<xref ref-type="bibr" rid="B54">1991</xref>)&#x00027;s explanation of the tritone paradox.</p>
<p>Two forms of threshold functions were tested: frequency-independent, horizontal and frequency-dependent, logistic threshold functions. The estimated logistic function was also nearly flat in the middle frequency region, possibly, because of the background noise. Zwicker and Feldtkeller (<xref ref-type="bibr" rid="B61">1967</xref>) showed that the hearing thresholds of pure tones, usually depending on their frequencies, become flat when background noise is present.</p>
<p>One cannot rule out that the finding of the current study that the tritone paradox was mainly affected by spectral factors and the finding of no common highest pitch class was due to the background noise. However, this result was also found in studies where no background noise was presented (Repp, <xref ref-type="bibr" rid="B43">1994</xref>, <xref ref-type="bibr" rid="B44">1997</xref>; Kr&#x000FC;ger and Lukas, <xref ref-type="bibr" rid="B30">2002</xref>). Nevertheless, it would be interesting to test whether background noise affects pitch class and envelope effects, given that the threshold model predicts differences depending on flat or steep threshold functions.</p>
<p>Another topic to discuss is whether logistic functions are appropriate to approximate threshold functions. Possibly, the class of logistic functions is too restricted, and more complex threshold functions would be more appropriate. Trivially, more complex functions would improve fitting results by introducing more free parameters. The consequence, however, would be a more time-consuming fitting algorithm. Furthermore, the advantage of the logistic function is its simple form, which enables associations between specific function characteristics and specific effects in the tritone paradox, as revealed in the simulations.</p>
<p>Another aspect to discuss is whether the estimated threshold functions are in accordance with the general psychoacoustic parameters. For example, the parameter <italic>a</italic> can be interpreted as the lower limit of the frequency preference region. The estimate was 5.67 for the logistic threshold function in the sample, which is about 50 (2<sup>5.67</sup> because of the logarithmic frequency scale) and corresponds, approximately, to the lower limit of the residual pitch (30&#x02013;40 Hz; Ritsma, <xref ref-type="bibr" rid="B46">1962</xref>; Moore, <xref ref-type="bibr" rid="B39">1973</xref>; Krumbholz et al., <xref ref-type="bibr" rid="B31">2000</xref>; Pressnitzer et al., <xref ref-type="bibr" rid="B42">2001</xref>). It seems reasonable to look for additional associations between function characteristics and psychoacoustic or physiological parameters in future research. However, it would be more complicated to find associations between function parameters and effect characteristics for more complex threshold functions. Thus, rather simple functions seem preferable. Considering the advantages and disadvantages, the logistic function still seems to be an appropriate way to approximate threshold functions.</p>
</sec>
</sec>
<sec id="s4">
<title>4. General discussion</title>
<p>The current study contributed to the theoretical discussion about the origin of the tritone paradox and strengthened the psychoacoustic explanation for Shepard tone phenomena. Even a very simplified pattern-matching model can explain the typical patterns found in the tritone paradox. Thus, a specific pitch class comparison mechanism postulated by Deutsch (<xref ref-type="bibr" rid="B16">1991</xref>) is not necessary to explain the tritone paradox. However, one cannot rule out that such a mechanism is at work in the tritone paradox, but owing to parsimony one should prefer the simpler, more general model.</p>
<p>One could argue that the threshold model only worked because mainly spectral factors affected the tritone paradox in the current study. However, the model also worked well for participants with pronounced pitch class and small envelope effects. Furthermore, the simulations showed that the threshold model predicts response patterns characterized by pronounced pitch class effects.</p>
<p>The current study contributes to the understanding of the tritone paradox by contributing further empirical data and by testing a theoretical model that is based on psychoacoustic assumptions. Whereas previous studies that support the psychoacoustic explanation of the tritone paradox focused on single Shepard tones, this study focused on comparisons of tone pairs. The typical response patterns of the tritone paradox were explained by considering Shepard tones as <italic>normal</italic> harmonic complex tones that are ambiguous with respect to their octave position.</p>
<p>One major drawback of the current study is that the individual data from the pitch-matching task and from the tone-pair comparison task (the standard tritone paradox) were not directly compared. Although such an approach seems reasonable and straightforward, it has several problems. Generally, individual data has the problem of individual factors (e.g., response biases), which can be eliminated, at least partly, by averaging across participants but not by averaging across trials within one participant. For example, some participants have an overall bias to give more <italic>lower</italic> than <italic>higher</italic> responses or <italic>vice versa</italic>. Repp and Thompson (<xref ref-type="bibr" rid="B45">2010</xref>) found no sufficient match between individual highest pitch classes assessed in their pitch-matching task and their tone comparison task, possibly caused by such methodological problems. Thus, the comparison of individual data is often less promising.</p>
<p>A problem of the threshold model is that it is hard to falsify, because it can be argued that a bad model fit is due to an inappropriate threshold function. More complex threshold functions with more free parameters necessarily improve the model fit. A possible solution would be to derive qualitative predictions from a specific form of threshold function. For example, assuming threshold functions to be logistic, the threshold functions are steeper within lower octaves and become flatter for higher octaves. Since steep threshold functions are associated with pronounced pitch class effects and flat threshold functions with pronounced envelope effects, more pronounced pitch class effects for lower octaves are expected than for higher octaves and the opposite pattern is expected for envelope effects.</p>
<p>A further limitation of the threshold model is that it is limited to tone pairs, ignoring the context effect shown in the literature (Englitz et al., <xref ref-type="bibr" rid="B19">2013</xref>; Chambers and Pressnitzer, <xref ref-type="bibr" rid="B7">2014</xref>; Chambers et al., <xref ref-type="bibr" rid="B6">2017</xref>). The neuronal network model by Huang et al. (<xref ref-type="bibr" rid="B26">2015</xref>) predicts these context effects and, additionally, the sigmoid response pattern as a function of the pitch classes of tritone pairs without context. It is assumed that different Shepard tones correspond to different presynaptic strengths, affecting a synaptic weighting function. Such a weighting function corresponds to the threshold function implemented in the threshold model. In general, these models focus on different aspects of Shepard tone pitches. The neuronal network model focuses on the explanation of prior context effects, whereas the threshold model focuses on the explanation of pitch class and envelope effects.</p>
<p>The threshold model predicted a variety of ascending/descending patterns for tritone pairs, including patterns of pronounced pitch class and envelope effects and even more complex patterns, depending on the form of the threshold function. The threshold function can be interpreted as an internal spectral weighting function (Terhardt et al., <xref ref-type="bibr" rid="B57">1982b</xref>) or a synaptic weighting function (Huang et al., <xref ref-type="bibr" rid="B26">2015</xref>), with the result that a specific frequency region is particularly important for pitch. When such a preference region is clearly distinct, implemented through steep decreasing threshold functions, the threshold model predicts the typical findings of Deutsch (<xref ref-type="bibr" rid="B16">1991</xref>), that is, pronounced pitch class and small envelope effects. Under these conditions, the highest pitch classes are barely affected by envelope position and depend on the position of the preference region on the frequency axis (implemented by the position of the threshold function on the frequency axis). In contrast, when the preference region is less distinct, implemented through flat threshold functions, the threshold model predicts the findings of Repp (<xref ref-type="bibr" rid="B43">1994</xref>, <xref ref-type="bibr" rid="B44">1997</xref>) or Kr&#x000FC;ger (<xref ref-type="bibr" rid="B29">2011</xref>), that is, pronounced envelope and small pitch class effects. Under these conditions, the highest pitch classes are largely affected by envelope center and do not depend on the position of the preference region anymore.</p>
<p>The threshold model leads to new suggestions about the connection between the tritone paradox and language. Remarkably, studies that revealed an effect of linguistic background also revealed pronounced pitch class and small envelope effects for most participants (Deutsch, <xref ref-type="bibr" rid="B16">1991</xref>), whereas in studies that revealed no effect of linguistic background, most participants showed pronounced envelope effects or mixed patterns (Repp, <xref ref-type="bibr" rid="B43">1994</xref>; Kr&#x000FC;ger and Lukas, <xref ref-type="bibr" rid="B30">2002</xref>). Within the framework of the threshold model, the distinctness of the preference region determines the relationship between pitch class and envelope effects. Thus, the effects of linguistic backgrounds are, possibly, mediated by preference region distinctness. Deutsch, <xref ref-type="bibr" rid="B16">1991</xref> may have tested participants with distinct preference regions and, therefore, found a language connection, whereas Repp (<xref ref-type="bibr" rid="B43">1994</xref>) and Kr&#x000FC;ger and Lukas (<xref ref-type="bibr" rid="B30">2002</xref>) tested participants with less distinct preference regions and, therefore, found no language connection. In other words, to detect a connection between linguistic background and highest pitch classes, a sample of participants possessing distinct preference regions is required. Given that highest pitch classes depend on the position of the preference region, people of different linguistic backgrounds might differ in the position of the preference region. Perhaps specific frequency regions become especially sensitive during language development. The threshold model provides a potential link between language and the tritone paradox in the form of the position of the preference region on the frequency axis, which may explain the inconsistent findings regarding the influence of language on the paradox.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>The author confirms being the sole contributor of this work and approved it for publication.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack><p>I would like to thank Prof. Dr. Torsten Schubert for carefully reading my manuscript and for giving constructive comments that substantially helped to improve the quality of the paper. This paper includes content from the author&#x00027;s unpublished dissertation (Malek, <xref ref-type="bibr" rid="B35">2013</xref>). This paper was supported by the MLU publication fund.</p>
</ack>
<sec sec-type="supplementary-material" id="s6">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01590/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01590/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Appendix.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Audio_1.WAV" id="SM2" mimetype="audio/x-wav" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Audio_2.WAV" id="SM3" mimetype="audio/x-wav" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Audio_3.WAV" id="SM4" mimetype="audio/x-wav" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berens</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>Circstat: a matlab toolbox for circular statistics</article-title>. <source>J. Stat. Softw.</source> <volume>31</volume>, <fpage>1</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v031.i10</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burns</surname> <given-names>E. M.</given-names></name></person-group> (<year>1981</year>). <article-title>Circularity in relative pitch judgment for inharmonic complex tones: the shepard demonstration revisited, again</article-title>. <source>Percept. Psychophys.</source> <volume>30</volume>, <fpage>467</fpage>&#x02013;<lpage>472</lpage>. <pub-id pub-id-type="doi">10.3758/BF03204843</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chalikia</surname> <given-names>M. H.</given-names></name> <name><surname>Leinfelt</surname> <given-names>F.</given-names></name></person-group> (<year>2000</year>). <article-title>Listeners in Sweden perceive tritone stimuli in a manner different from that of Americans and similar to that of British listeners</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>108</volume>:<fpage>2572</fpage>. <pub-id pub-id-type="doi">10.1121/1.4743556</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chalikia</surname> <given-names>M. H.</given-names></name> <name><surname>Miller</surname> <given-names>K. J.</given-names></name> <name><surname>Vaid</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>The tritone paradox is perceived differently by Koreans and Americans</article-title>, in <source>Paper presented at the 101st Annual Convention of the American Psychological Association</source>, <publisher-loc>San Francisco, CA</publisher-loc>.</citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chalikia</surname> <given-names>M. H.</given-names></name> <name><surname>Norberg</surname> <given-names>A. M.</given-names></name> <name><surname>Paterakis</surname> <given-names>L.</given-names></name></person-group> (<year>2000</year>). <article-title>Greek bilingual listeners perceive the tritone stimuli differently from speakers of English</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>2000</volume>, <fpage>108</fpage>, 2572</citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chambers</surname> <given-names>C.</given-names></name> <name><surname>Akram</surname> <given-names>S.</given-names></name> <name><surname>Adam</surname> <given-names>V.</given-names></name> <name><surname>Pelofi</surname> <given-names>C.</given-names></name> <name><surname>Sahani</surname> <given-names>M.</given-names></name> <name><surname>Shamma</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Prior context in audition informs binding and shapes simple features</article-title>. <source>Nat. Commun.</source> <volume>8</volume>:<fpage>15027</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms15027</pub-id><pub-id pub-id-type="pmid">28425433</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chambers</surname> <given-names>C.</given-names></name> <name><surname>Pressnitzer</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>Perceptual hysteresis in the judgment of auditory pitch shift</article-title>. <source>Attent. Percept. Psychophys.</source> <volume>76</volume>, <fpage>1271</fpage>&#x02013;<lpage>1279</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-014-0676-5</pub-id><pub-id pub-id-type="pmid">24874257</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheveign&#x000E9;</surname> <given-names>A. D.</given-names></name></person-group> (<year>2005</year>). <article-title>Pitch perception models</article-title>, in <source>Pitch: Neural Coding and Perception</source>, eds <person-group person-group-type="editor"><name><surname>Fay</surname> <given-names>R. R.</given-names></name> <name><surname>Popper</surname> <given-names>A. N.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>169</fpage>&#x02013;<lpage>233</lpage>.</citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheveign&#x000E9;</surname> <given-names>A. D.</given-names></name></person-group> (<year>2010</year>). <article-title>Pitch perception</article-title>, in <source>The Oxford Handbook of Auditory Science: Hearing</source>, chapter Pitch Perception, ed <person-group person-group-type="editor"><name><surname>Plack</surname> <given-names>C. J.</given-names></name></person-group> (<publisher-loc>Oxford</publisher-loc>: <publisher-name>University Press</publisher-name>), <fpage>71</fpage>&#x02013;<lpage>104</lpage>.</citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname> <given-names>A. J.</given-names></name> <name><surname>MacKinnon</surname> <given-names>K.</given-names></name> <name><surname>Swindale</surname> <given-names>N.</given-names></name></person-group> (<year>1994</year>). <article-title>The tritone paradox revisited: effects of musical training, envelope peak, and response mode</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>95</volume>:<fpage>2937</fpage>. <pub-id pub-id-type="doi">10.1121/1.409170</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname> <given-names>M. A.</given-names></name> <name><surname>Grossberg</surname> <given-names>S.</given-names></name> <name><surname>Wyse</surname> <given-names>L. L.</given-names></name></person-group> (<year>1995</year>). <article-title>A spectral network model of pitch perception</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>98</volume>, <fpage>862</fpage>&#x02013;<lpage>879</lpage>. <pub-id pub-id-type="doi">10.1121/1.413512</pub-id><pub-id pub-id-type="pmid">7642825</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dawe</surname> <given-names>L. A.</given-names></name> <name><surname>Platt</surname> <given-names>J. R.</given-names></name> <name><surname>Welsh</surname> <given-names>E.</given-names></name></person-group> (<year>1998</year>). <article-title>Spectral-motion aftereffects and the tritone paradox among Canadian subjects</article-title>. <source>Percept. Psychophys.</source> <volume>60</volume>, <fpage>209</fpage>&#x02013;<lpage>220</lpage>. <pub-id pub-id-type="doi">10.3758/BF03206030</pub-id><pub-id pub-id-type="pmid">9529905</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deutsch</surname> <given-names>D.</given-names></name></person-group> (<year>1986</year>). <article-title>A musical paradox</article-title>. <source>Music Percept.</source> <volume>6</volume>, <fpage>275</fpage>&#x02013;<lpage>280</lpage>. <pub-id pub-id-type="doi">10.2307/40285337</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deutsch</surname> <given-names>D.</given-names></name></person-group> (<year>1987</year>). <article-title>The tritone paradox: effects of spectral variables</article-title>. <source>Percept. Psychophys.</source> <volume>41</volume>, <fpage>563</fpage>&#x02013;<lpage>575</lpage>. <pub-id pub-id-type="doi">10.3758/BF03210490</pub-id><pub-id pub-id-type="pmid">3615152</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deutsch</surname> <given-names>D.</given-names></name></person-group> (<year>1988</year>). <article-title>The semitone paradox</article-title>. <source>Music Percept.</source> <volume>6</volume>, <fpage>115</fpage>&#x02013;<lpage>132</lpage>. <pub-id pub-id-type="doi">10.2307/40285421</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deutsch</surname> <given-names>D.</given-names></name></person-group> (<year>1991</year>). <article-title>The tritone paradox: an influence of language on music perception</article-title>. <source>Music Percept.</source> <volume>8</volume>, <fpage>335</fpage>&#x02013;<lpage>347</lpage>. <pub-id pub-id-type="doi">10.2307/40285517</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deutsch</surname> <given-names>D.</given-names></name> <name><surname>Henthorn</surname> <given-names>T.</given-names></name> <name><surname>Dolson</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Speech patterns heard early in life influence later perception of the tritone paradox</article-title>. <source>Music Percept.</source> <volume>21</volume>, <fpage>357</fpage>&#x02013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1525/mp.2004.21.3.357</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deutsch</surname> <given-names>D.</given-names></name> <name><surname>Kuyper</surname> <given-names>W.</given-names></name> <name><surname>Fisher</surname> <given-names>Y.</given-names></name></person-group> (<year>1987</year>). <article-title>The tritone paradox: its presence and form of distribution in a general population</article-title>. <source>Music Percept.</source> <volume>5</volume>, <fpage>79</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.2307/40285386</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Englitz</surname> <given-names>B.</given-names></name> <name><surname>Akram</surname> <given-names>S.</given-names></name> <name><surname>David</surname> <given-names>S.</given-names></name> <name><surname>Chambers</surname> <given-names>C.</given-names></name> <name><surname>Pressnitzer</surname> <given-names>D.</given-names></name> <name><surname>Depireux</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Putting the tritone paradox into context: insights from neural population decoding and human psychophysics</article-title>, in <source>Basic Aspects of Hearing</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>157</fpage>&#x02013;<lpage>164</lpage>.</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fisher</surname> <given-names>N. I.</given-names></name></person-group> (<year>1993</year>). <source>Statistical Analysis of Circular Data</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>University Press</publisher-name>.</citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gescheider</surname> <given-names>G. A.</given-names></name></person-group> (<year>1997</year>). <source>Psychophysics. The Fundamentals</source>, <edition>3rd Edn.</edition> <publisher-loc>Mahwah, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum Assciates</publisher-name>.</citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giangrande</surname> <given-names>J.</given-names></name></person-group> (<year>1998</year>). <article-title>The tritone paradox: effects of pitch class and position of the spectral envelope</article-title>. <source>Music Percept.</source> <volume>15</volume>, <fpage>253</fpage>&#x02013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.2307/40285767</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giangrande</surname> <given-names>J.</given-names></name> <name><surname>Tuller</surname> <given-names>B.</given-names></name> <name><surname>Kelso</surname> <given-names>J. A. S.</given-names></name></person-group> (<year>2003</year>). <article-title>Perceptual dynamics of circular pitch</article-title>. <source>Music Percept.</source> <volume>20</volume>, <fpage>241</fpage>&#x02013;<lpage>262</lpage>. <pub-id pub-id-type="doi">10.1525/mp.2003.20.3.241</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname> <given-names>J. L.</given-names></name></person-group> (<year>1973</year>). <article-title>An optimum processor theory for the central formation of the pitch of complex tones</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>54</volume>, <fpage>1496</fpage>&#x02013;<lpage>1516</lpage>. <pub-id pub-id-type="doi">10.1121/1.1914448</pub-id><pub-id pub-id-type="pmid">4780803</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hartmann</surname> <given-names>W. M.</given-names></name></person-group> (<year>1998</year>). <source>Signals, Sound, and Sensation</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>C.</given-names></name> <name><surname>Englitz</surname> <given-names>B.</given-names></name> <name><surname>Shamma</surname> <given-names>S.</given-names></name> <name><surname>Rinzel</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>A neuronal network model for context-dependence of pitch change perception</article-title>. <source>Front. Comput. Neurosci.</source> <volume>9</volume>:<fpage>101</fpage>. <pub-id pub-id-type="doi">10.3389/fncom.2015.00101</pub-id><pub-id pub-id-type="pmid">26300767</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Irtel</surname> <given-names>H.</given-names></name></person-group> (<year>2007</year>). <source>Pxlab: The Psychological Experiments Laboratory (vers. 2.1.11). Online im Internet</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.pxlab.de">http://www.pxlab.de</ext-link> (accessed June 19, 2007).</citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>T. F.</given-names></name></person-group> (<year>2008</year>). <article-title>Categorical data analysis: away from anovas (transformation or not) and towards logit mixed models</article-title>. <source>J. Mem. Lang.</source> <volume>59</volume>, <fpage>434</fpage>&#x02013;<lpage>446</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2007.11.007</pub-id><pub-id pub-id-type="pmid">19884961</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Kr&#x000FC;ger</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <source>Zur Tonh&#x000F6;henwahrnehmung von Oktav-Komplexen T&#x000F6;nen. Psychophysik, Psychoakustische Theorie und Computationale Modellierung.</source> Ph.D. thesis, Institut f&#x000FC;r Psychologie Martin-Luther-Universit&#x000E4;t Halle-Wittenberg.</citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kr&#x000FC;ger</surname> <given-names>S.</given-names></name> <name><surname>Lukas</surname> <given-names>J.</given-names></name></person-group> (<year>2002</year>). <article-title>Zirkul&#x000E4;re Urteile bei der Tonh&#x000F6;henwahrnehmung: Ein interkultureller Vergleich</article-title>, in <source>Experimentelle Psychologie</source>, eds <person-group person-group-type="editor"><name><surname>Baumann</surname> <given-names>M.</given-names></name> <name><surname>Keinath</surname> <given-names>A.</given-names></name> <name><surname>Krems</surname> <given-names>J. F.</given-names></name></person-group> (<publisher-loc>Regensburg</publisher-loc>: <publisher-name>Roderer</publisher-name>), <fpage>153</fpage>.</citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krumbholz</surname> <given-names>K.</given-names></name> <name><surname>Patterson</surname> <given-names>R. D.</given-names></name> <name><surname>Pressnitzer</surname> <given-names>D.</given-names></name></person-group> (<year>2000</year>). <article-title>The lower limit of pitch as determined by rate discrimination</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>108</volume>(<issue>3 Pt 1</issue>):<fpage>1170</fpage>&#x02013;<lpage>1180</lpage>. <pub-id pub-id-type="doi">10.1121/1.1287843</pub-id><pub-id pub-id-type="pmid">11008818</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levitt</surname> <given-names>H.</given-names></name></person-group> (<year>1971</year>). <article-title>Transformed up-down methods in psychoacoustics</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>49</volume>, <fpage>467</fpage>&#x02013;<lpage>476</lpage>. <pub-id pub-id-type="doi">10.1121/1.1912375</pub-id><pub-id pub-id-type="pmid">5541744</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Licklider</surname> <given-names>J. C. R.</given-names></name></person-group> (<year>1951</year>). <article-title>A duplex theory of pitch perception</article-title>. <source>Experientia</source> <volume>7</volume>, <fpage>128</fpage>&#x02013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1007/BF02156143</pub-id><pub-id pub-id-type="pmid">14831572</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Licklider</surname> <given-names>J. C. R.</given-names></name></person-group> (<year>1959</year>). <article-title>Three auditory theories</article-title>, in <source>Psychology. A Study of a Science</source>, <volume>Vol. 1</volume>, ed <person-group person-group-type="editor"><name><surname>Koch</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>McGraw-Hill</publisher-name>), <fpage>41</fpage>&#x02013;<lpage>144</lpage>.</citation></ref>
<ref id="B35">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Malek</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <source>Shepard-Ph&#x000E4;nomene bei der Tonh&#x000F6;henwahrnehmung. Ein probabilistisches Modell und Seine Experimentelle &#x000DC;berpr&#x000FC;fung</source>. Ph.D. thesis, Martin Luther University Halle-Wittenberg.</citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malek</surname> <given-names>S.</given-names></name> <name><surname>Sperschneider</surname> <given-names>K.</given-names></name></person-group> (<year>2018</year>). <article-title>Aftereffects of spectrally similar and dissimilar spectral motion adaptors in the tritone paradox</article-title>. <source>Front. Psychol.</source> <volume>9</volume>:<fpage>677</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2018.00677</pub-id><pub-id pub-id-type="pmid">29867653</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meddis</surname> <given-names>R.</given-names></name> <name><surname>Hewitt</surname> <given-names>M. J.</given-names></name></person-group> (<year>1991</year>). <article-title>Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>89</volume>, <fpage>2866</fpage>&#x02013;<lpage>2882</lpage>. <pub-id pub-id-type="doi">10.1121/1.400725</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meddis</surname> <given-names>R.</given-names></name> <name><surname>O&#x00027;Mard</surname> <given-names>L.</given-names></name></person-group> (<year>1997</year>). <article-title>A unitary model of pitch perception</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>102</volume>, <fpage>1811</fpage>&#x02013;<lpage>1820</lpage>. <pub-id pub-id-type="doi">10.1121/1.420088</pub-id><pub-id pub-id-type="pmid">9301058</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>B. C. J.</given-names></name></person-group> (<year>1973</year>). <article-title>Some experiments relating to the perception of complex tones</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>25</volume>, <fpage>451</fpage>&#x02013;<lpage>475</lpage>. <pub-id pub-id-type="doi">10.1080/14640747308400369</pub-id><pub-id pub-id-type="pmid">4767530</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pitt</surname> <given-names>M. A.</given-names></name> <name><surname>Myung</surname> <given-names>I. J.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>Toward a method of selecting among computational models of cognition</article-title>. <source>Psychol. Rev.</source> <volume>109</volume>:<fpage>472</fpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.109.3.472</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pollack</surname> <given-names>I.</given-names></name></person-group> (<year>1978</year>). <article-title>Decoupling of auditory pitch and stimulus frequency: the shepard demonstration revisited</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>63</volume>, <fpage>202</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1121/1.381714</pub-id><pub-id pub-id-type="pmid">632412</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pressnitzer</surname> <given-names>D.</given-names></name> <name><surname>Patterson</surname> <given-names>R. D.</given-names></name> <name><surname>Krumbholz</surname> <given-names>K.</given-names></name></person-group> (<year>2001</year>). <article-title>The lower limit of melodic pitch</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>109</volume>(<issue>5 Pt 1</issue>),<fpage>2074</fpage>&#x02013;<lpage>2084</lpage>. <pub-id pub-id-type="doi">10.1121/1.1359797</pub-id><pub-id pub-id-type="pmid">11386559</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Repp</surname> <given-names>B. H.</given-names></name></person-group> (<year>1994</year>). <article-title>The tritone paradox and the pitch range of the speaking voice: a dubious connection</article-title>. <source>Music Percept.</source> <volume>12</volume>, <fpage>227</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.2307/40285653</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Repp</surname> <given-names>B. H.</given-names></name></person-group> (<year>1997</year>). <article-title>Spectral envelope and context effects in the tritone paradox</article-title>. <source>Perception</source> <volume>26</volume>, <fpage>645</fpage>&#x02013;<lpage>665</lpage>. <pub-id pub-id-type="doi">10.1068/p260645</pub-id><pub-id pub-id-type="pmid">9488887</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Repp</surname> <given-names>B. H.</given-names></name> <name><surname>Thompson</surname> <given-names>J. M.</given-names></name></person-group> (<year>2010</year>). <article-title>Context sensitivity and invariance in perception of octave-ambiguous tones</article-title>. <source>Psychol. Res.</source> <volume>74</volume>, <fpage>437</fpage>&#x02013;<lpage>456</lpage>. <pub-id pub-id-type="doi">10.1007/s00426-009-0264-9</pub-id><pub-id pub-id-type="pmid">19941003</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ritsma</surname> <given-names>R. J.</given-names></name></person-group> (<year>1962</year>). <article-title>Existence region of the tonal residue</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>34</volume>, <fpage>1224</fpage>&#x02013;<lpage>1229</lpage>. <pub-id pub-id-type="doi">10.1121/1.1918307</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schroeder</surname> <given-names>M. R.</given-names></name></person-group> (<year>1968</year>). <article-title>Period histogram and product spectrum: new methodes for fundamental-frequency measurement</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>43</volume>, <fpage>829</fpage>&#x02013;<lpage>834</lpage>. <pub-id pub-id-type="doi">10.1121/1.1910902</pub-id><pub-id pub-id-type="pmid">5645832</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seither-Preisler</surname> <given-names>A.</given-names></name> <name><surname>Krumbholz</surname> <given-names>K.</given-names></name> <name><surname>Patterson</surname> <given-names>R.</given-names></name> <name><surname>Johnson</surname> <given-names>L.</given-names></name> <name><surname>Nobbe</surname> <given-names>A.</given-names></name></person-group> (<year>2007</year>). <article-title>Tone sequences with conflicting fundamental pitch and timbre changes are heard differently by musicians and nonmusicians</article-title>. <source>J. Exp. Psychol.</source> <volume>33</volume>, <fpage>743</fpage>&#x02013;<lpage>751</lpage>. <pub-id pub-id-type="doi">10.1037/0096-1523.33.3.743</pub-id><pub-id pub-id-type="pmid">17563235</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shepard</surname> <given-names>R. N.</given-names></name></person-group> (<year>1964</year>). <article-title>Circularity in judgments of relative pitch</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>36</volume>, <fpage>2346</fpage>&#x02013;<lpage>2353</lpage>. <pub-id pub-id-type="doi">10.1121/1.1919362</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shouten</surname> <given-names>J. F.</given-names></name></person-group> (<year>1940</year>). <article-title>The perception of pitch</article-title>. <source>Philips Technical Rev.</source> <volume>5</volume>, <fpage>286</fpage>&#x02013;<lpage>294</lpage>.</citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sugiyama</surname> <given-names>M.</given-names></name> <name><surname>Ohgushi</surname> <given-names>K.</given-names></name></person-group> (<year>1979</year>). <article-title>Proximity analysis of pitch perception of complex tones in endless scale</article-title>. <source>Bahaviormetrika</source> <volume>6</volume>, <fpage>35</fpage>&#x02013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.2333/bhmk.6.35</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terhardt</surname> <given-names>E.</given-names></name></person-group> (<year>1974</year>). <article-title>Pitch, consonance and harmony</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>55</volume>, <fpage>1061</fpage>&#x02013;<lpage>1069</lpage>. <pub-id pub-id-type="doi">10.1121/1.1914648</pub-id><pub-id pub-id-type="pmid">4833699</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Terhardt</surname> <given-names>E.</given-names></name></person-group> (<year>1990</year>). <article-title>Vpitch2: Determination of virtual and spectral pitches (version 0.3) [computer software]</article-title>.</citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terhardt</surname> <given-names>E.</given-names></name></person-group> (<year>1991</year>). <article-title>Music perception and sensory information acquisition: Relationships and low-level analogies</article-title>. <source>Music Percept.</source> <volume>8</volume>, <fpage>217</fpage>&#x02013;<lpage>240</lpage>. <pub-id pub-id-type="doi">10.2307/40285500</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terhardt</surname> <given-names>E.</given-names></name> <name><surname>Stoll</surname> <given-names>G.</given-names></name> <name><surname>Schermbach</surname> <given-names>R.</given-names></name> <name><surname>Parncutt</surname> <given-names>R.</given-names></name></person-group> (<year>1986</year>). <article-title>Tonh&#x000F6;henmehrdeutigkeit, Tonverwandschaft und Identifikation von Sukzessivintervallen</article-title>. <source>Acustica</source> <volume>61</volume>, <fpage>58</fpage>&#x02013;<lpage>66</lpage>.</citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terhardt</surname> <given-names>E.</given-names></name> <name><surname>Stoll</surname> <given-names>G.</given-names></name> <name><surname>Seewann</surname> <given-names>M.</given-names></name></person-group> (<year>1982a</year>). <article-title>Algorithm for extraction of pitch and pitch salience from complex tonal signals</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>71</volume>, <fpage>679</fpage>&#x02013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.1121/1.387544</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terhardt</surname> <given-names>E.</given-names></name> <name><surname>Stoll</surname> <given-names>G.</given-names></name> <name><surname>Seewann</surname> <given-names>M.</given-names></name></person-group> (<year>1982b</year>). <article-title>Pitch of complex signals according to virtual-pitch theory: tests, examples, and predictions</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>71</volume>, <fpage>671</fpage>&#x02013;<lpage>678</lpage>. <pub-id pub-id-type="doi">10.1121/1.387543</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wetherill</surname> <given-names>G. B.</given-names></name></person-group> (<year>1963</year>). <article-title>Sequential estimation of quantal response curves</article-title>. <source>J. R. Stat. Soc.</source> <volume>B25</volume>, <fpage>1</fpage>&#x02013;<lpage>48</lpage>.</citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wightman</surname> <given-names>F. L.</given-names></name></person-group> (<year>1973</year>). <article-title>Pitch and stimulus fine structure</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>54</volume>, <fpage>397</fpage>&#x02013;<lpage>406</lpage>. <pub-id pub-id-type="doi">10.1121/1.1913591</pub-id><pub-id pub-id-type="pmid">4759013</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zwicker</surname> <given-names>E.</given-names></name> <name><surname>Fastl</surname> <given-names>H.</given-names></name></person-group> (<year>1999</year>). <source>Psychoacoustics: Facts and Models</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwicker</surname> <given-names>E.</given-names></name> <name><surname>Feldtkeller</surname> <given-names>R.</given-names></name></person-group> (<year>1967</year>). <source>Das Ohr als Nachrichtenempf&#x000E4;nger</source>. Stuttgart: Hirzel Verlag.</citation></ref>
</ref-list> 
</back>
</article>