<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychology</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychology</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Research Foundation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2012.00203</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Tuned with a Tune: Talker Normalization via General Auditory Processes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Laing</surname> <given-names>Erika J. C.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Ran</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Lotto</surname> <given-names>Andrew J.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Holt</surname> <given-names>Lori L.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001">&#x0002A;</xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Brain Mapping Center, University of Pittsburgh Medical Center</institution> <country>Pittsburgh, PA, USA</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University</institution> <country>Pittsburgh, PA, USA</country></aff>
<aff id="aff3"><sup>3</sup><institution>Speech, Language and Hearing Sciences, University of Arizona</institution> <country>Tucson, AZ, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Josef P. Rauschecker, Georgetown University School of Medicine, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Iiro P. J&#x000E4;&#x000E4;skel&#x000E4;inen, University of Helsinki, Finland; Elia Formisano, Maastricht University, Netherlands; Maria Chait, University College London, UK</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Lori L. Holt, Department of Psychology, Carnegie Mellon University, Baker Hall 254K, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA. e-mail: <email>lholt&#x00040;andrew.cmu.edu</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Frontiers in Auditory Cognitive Neuroscience, a specialty of Frontiers in Psychology.</p></fn>
</author-notes>
<pub-date pub-type="epreprint">
<day>01</day>
<month>04</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>22</day>
<month>06</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<volume>3</volume>
<elocation-id>203</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>02</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>05</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2012 Laing, Liu, Lotto and Holt.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access" xlink:href="http://www.frontiersin.org/licenseagreement"><p>This is an open-access article distributed under the terms of the <uri xlink:href="http://creativecommons.org/licenses/by-nc/3.0/">Creative Commons Attribution Non Commercial License</uri>, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.</p></license>
</permissions>
<abstract>
<p>Voices have unique acoustic signatures, contributing to the acoustic variability listeners must contend with in perceiving speech, and it has long been proposed that listeners normalize speech perception to information extracted from a talker&#x02019;s speech. Initial attempts to explain talker normalization relied on extraction of articulatory referents, but recent studies of context-dependent auditory perception suggest that general auditory referents such as the long-term average spectrum (LTAS) of a talker&#x02019;s speech similarly affect speech perception. The present study aimed to differentiate the contributions of articulatory/linguistic versus auditory referents for context-driven talker normalization effects and, more specifically, to identify the specific constraints under which such contexts impact speech perception. Synthesized sentences manipulated to sound like different talkers influenced categorization of a subsequent speech target only when differences in the sentences&#x02019; LTAS were in the frequency range of the acoustic cues relevant for the target phonemic contrast. This effect was true both for speech targets preceded by spoken sentence contexts and for targets preceded by non-speech tone sequences that were LTAS-matched to the spoken sentence contexts. Specific LTAS characteristics, rather than perceived talker, predicted the results suggesting that general auditory mechanisms play an important role in effects considered to be instances of perceptual talker normalization.</p>
</abstract>
<kwd-group>
<kwd>speech perception</kwd>
<kwd>talker normalization</kwd>
<kwd>auditory perception</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="57"/>
<page-count count="9"/>
<word-count count="15752"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction">
<title>Introduction</title>
<p>A long-standing, core theoretical problem in understanding speech perception is the lack of a one-to-one mapping between acoustic input and intended linguistic categories (Liberman et al., <xref ref-type="bibr" rid="B30">1967</xref>). One major source of this lack of invariance is acoustic variation across talkers who differ in vocal tract size and anatomy, age, gender, dialect, and idiosyncratic speech mannerisms (Johnson et al., <xref ref-type="bibr" rid="B21">1993</xref>). This results in substantially different acoustic realizations of the same linguistic unit (e.g., Peterson and Barney, <xref ref-type="bibr" rid="B42">1952</xref>). Yet, human listeners maintain remarkably accurate speech perception across an unlimited number of communication partners, even without having extensive experience with the talker. The mechanisms by which speech perception accommodates talker variability have been a central issue since the inception of the field (Potter and Steinberg, <xref ref-type="bibr" rid="B44">1950</xref>), but they are poorly understood. This is evident in the fact that even the most advanced computerized speech recognition systems require substantial talker-specific training to achieve high accuracy.</p>
<p>The problem, however, is not unconstrained &#x02013; a change in vocal tract anatomy or vocal fold physiology changes the acoustic signature <italic>systematically</italic>. For example, adult women tend to have shorter and differently proportioned vocal tracts than adult males. As a result, female-produced vowels have formant frequencies (peaks in energy of a voice spectrum; Fant, <xref ref-type="bibr" rid="B7">1960</xref>) shifted to higher frequencies relative to males&#x02019;. It would seem likely that effective listeners make use of these regularities in acoustic variation to achieve more accurate and efficient speech categorization.</p>
<p>One early demonstration of talker-dependent speech categorization was made by Ladefoged and Broadbent (<xref ref-type="bibr" rid="B27">1957</xref>), who presented listeners with a constant target word (a relatively ambiguous vowel in a /b_t/ frame) at the end of a context phrase (<italic>Please say what this word is</italic>&#x02026;). The acoustic characteristics of the context phrase were manipulated by raising or lowering the first (F1) and/or second (F2) formant frequencies of the vowels. Shifting formant frequencies up and down can be roughly conceptualized as a decrease or increase in vocal tract length and, correspondingly, as a change in talker. When these phrases preceded a constant speech target, categorization of the vowel in the target word shifted as a function of the context phrase, suggesting that listeners compensate for vocal tract differences across talkers. The target was more often heard as &#x0201C;<italic>bit</italic>&#x0201D; following a higher formant frequency phrase (as might be produced by a shorter vocal tract), but more often as &#x0201C;<italic>bet</italic>&#x0201D; following the phrase with the lower formant frequencies. These classic results suggest that listeners extract some type of talker information from the context phrase and use it in perceiving the target word. The critical question, then, is: What type of information is extracted from the context phrase?</p>
<p>One possibility is that listeners construct an explicit representation of the talker-specific phonemic sound patterns produced by the talker, which could serve as a reference for subsequent speech perception. When an ambiguous vowel is encountered in the target word, the relative position of the formant frequencies in the remapped vowel space could reveal the intended vowel. Thus, talker-specific, speech-specific information gathered during the carrier phrase might tune speech perception to talker-specific patterns (Joos, <xref ref-type="bibr" rid="B23">1948</xref>; Ladefoged and Broadbent, <xref ref-type="bibr" rid="B27">1957</xref>).</p>
<p>Alternatively, listeners may estimate talker-specific vocal tract dimensions. Recent work by Story (<xref ref-type="bibr" rid="B50">2005</xref>) examining vocal tract shapes across talkers using magnetic resonance images reveals that most inter-talker variability is captured by the shape of the neutral, non-articulating vocal tract shape that, when excited by vocal fold vibration, results in a neutral schwa sound as in the second vowel of <italic>sofa</italic>. If one subtracts a talker&#x02019;s neutral vocal tract shape from other vowel vocal tract shapes, the resulting vocal air space shape for various vowels is quite consistent across talkers. Thus, estimating the neutral vocal tract shape of the talker from the carrier phrase and using this estimate to normalize the vocal tract shape determined for the target vowel might tune speech perception to talker-specific patterns. This might be accomplished by one of a number of mechanisms, such as reconstructing the intended articulatory movements of the vocal tract to identify speech sounds as described by the motor theory of speech perception (Liberman et al., <xref ref-type="bibr" rid="B30">1967</xref>; Liberman, <xref ref-type="bibr" rid="B29">1996</xref>), explicitly extracting vocal tract dimensions from the carrier phrase to rescale perception of subsequent speech sounds (McGowan, <xref ref-type="bibr" rid="B37">1997</xref>; McGowan and Cushing, <xref ref-type="bibr" rid="B38">1999</xref>), or creating an internal vocal tract model against which to compare ambiguous sounds to a set of possible targets (Halle and Stevens, <xref ref-type="bibr" rid="B9">1962</xref>; Poeppel et al., <xref ref-type="bibr" rid="B43">2008</xref>). Each of these strategies relies on explicit representation of some type of vocal tract-specific information. A challenge for these accounts is that it is notoriously difficult to solve the inverse problem of determining a unique vocal tract shape from speech acoustics (Atal et al., <xref ref-type="bibr" rid="B2">1978</xref>) and there is currently no good model of how listeners would retrieve the neutral vocal tract from speech that does not explicitly include an instance of a neutral production.</p>
<p>Recent studies in context-dependent auditory perception suggest that carrier phrases may provide an alternative type of information that is neither explicitly phonemic nor linked to speech production, but that may contribute to talker normalization effects in speech perception (Holt, <xref ref-type="bibr" rid="B13">2005</xref>, <xref ref-type="bibr" rid="B14">2006</xref>; Huang and Holt, <xref ref-type="bibr" rid="B16">2009</xref>, <xref ref-type="bibr" rid="B18">2012</xref>). These experiments mirrored the Ladefoged and Broadbent paradigm in that context sounds preceded speech targets. However, the contexts were not speech phrases, but rather a sequence of 21 non-speech sine-wave tones whose frequencies were sampled from one of two distributions. The resulting sounds were something like a simple tune. The mean of the distribution of tones was either a relatively high-frequency (mean 2800&#x02009;Hz, 2300&#x02013;3300&#x02009;Hz range) or low-frequency (mean 1800&#x02009;Hz, 1300&#x02013;2300&#x02009;Hz range). When these tone sequences preceded target speech sounds drawn from a series varying perceptually from /ga/ to /da/, speech categorization was influenced by the distribution from which the context tones had been drawn. Tones with a higher mean frequency led to more /ga/ responses, whereas the same targets were more often categorized as /da/ when lower-frequency tones preceded them.</p>
<p>Of note in interpreting the results, the tones comprising the context were randomly ordered on a trial-by-trial basis. Thus, each context stimulus was unique, and only the long-term average spectrum (LTAS, the distribution of acoustic energy across frequency for the entire duration of the tone sequence) defined conditions. The distributional nature of the contexts in these studies indicates that auditory processing is sensitive to the LTAS of context stimuli and that perception of target speech sounds is <italic>relative</italic> to, and spectrally contrastive with, the LTAS. These results are consistent with demonstrations that speech categorization shifts when the LTAS of the carrier phrase is changed by applying a filter (Watkins, <xref ref-type="bibr" rid="B54">1991</xref>; Watkins and Makin, <xref ref-type="bibr" rid="B55">1994</xref>, <xref ref-type="bibr" rid="B56">1996</xref>; Kiefte and Kluender, <xref ref-type="bibr" rid="B24">2008</xref>), spectral tilt (Kiefte and Kluender, <xref ref-type="bibr" rid="B25">2001</xref>), or reverberation (Watkins and Makin, <xref ref-type="bibr" rid="B57">2007</xref>). They are also consonant with findings of classic adaptation effects on phoneme categorization (e.g., Eimas and Corbit, <xref ref-type="bibr" rid="B6">1973</xref>; Diehl et al., <xref ref-type="bibr" rid="B5">1978</xref>; Sawusch and Nusbaum, <xref ref-type="bibr" rid="B45">1979</xref>; Lotto and Kluender, <xref ref-type="bibr" rid="B32">1998</xref>) in that both effects are spectrally contrastive, but the tone sequence effects differ in their time course, persisting across silences as long as 1.3&#x02009;s, and even across intervening spectrally neutral sound (Holt and Lotto, <xref ref-type="bibr" rid="B15">2002</xref>; Holt, <xref ref-type="bibr" rid="B13">2005</xref>).</p>
<p>Holt (<xref ref-type="bibr" rid="B13">2005</xref>) and Lotto and Sullivan (<xref ref-type="bibr" rid="B33">2007</xref>) have speculated that the general auditory processes underlying these effects may prove useful for talker normalization. To put this into the context of the classic talker normalization effects reported by Ladefoged and Broadbent (<xref ref-type="bibr" rid="B27">1957</xref>), consider the acoustic consequences of long versus short vocal tracts. A talker with a long vocal tract produces speech with relatively greater low-frequency energy than a talker with a shorter vocal tract. In line with the pattern of spectral contrast described above, listeners&#x02019; sensitivity to the lower-frequency energy in the LTAS of the longer-vocal tract talker&#x02019;s speech should result in target speech being perceived as relatively higher-frequency. Applying this prediction to the stimulus scheme of Ladefoged and Broadbent, constant vowel targets should be more often perceived as &#x0201C;<italic>bet</italic>&#x0201D; following a phrase synthesized to mimic a long vocal tract (&#x0201C;bet&#x0201D; is characterized by higher formant frequencies than &#x0201C;bit&#x0201D;) whereas a phrase mimicking a talker with a shorter vocal tract should lead listeners to label the same speech targets more often as &#x0201C;<italic>bit</italic>.&#x0201D; These are, in fact, the results of Ladefoged and Broadbent (<xref ref-type="bibr" rid="B27">1957</xref>). Thus, the analogy between the non-speech context results of Holt (<xref ref-type="bibr" rid="B13">2005</xref>, <xref ref-type="bibr" rid="B14">2006</xref>) and talker normalization carrier phrase effects appears compelling, but explicit comparison of these two types of effects has not been made. In particular, talker normalization effects have been typically demonstrated with shifts in vowel categorization, whereas the non-speech categorization tasks have typically utilized consonant contrasts as targets. In addition, there has not been an effort to match speech and non-speech contexts on duration, frequency ranges, and other acoustic dimensions. As a result, to this point, the proposal of an LTAS account of talker normalization has primarily been supported through analogy.</p>
<p>The purpose of the present study is to test three predictions of an LTAS-based model of talker normalization (Lotto and Sullivan, <xref ref-type="bibr" rid="B33">2007</xref>). The first prediction is that the <italic>direction</italic> of the shift in target phoneme categorization is predictable from a comparison of the LTAS of the carrier phrase and the spectrum of the targets. In particular, carrier phrases with higher-frequency concentrations of energy should result in target representations that are shifted to lower-frequency concentrations of energy; a spectral contrast effect (Lotto and Holt, <xref ref-type="bibr" rid="B31">2006</xref>).</p>
<p>The second prediction is that not all talkers will elicit normalization for all speech targets. The LTAS model makes specific predictions about <italic>which</italic> talkers will produce perceptual normalization effects, and which will not. Although the Ladefoged and Broadbent findings are foundational in the talker normalization literature, they have a reputation for being difficult to replicate (Hawkins, <xref ref-type="bibr" rid="B11">2004</xref>). We suspect that this may arise because it may not be sufficient to simply change the talker of the carrier phrase if the relationship between LTAS of target and carrier phrase is not matched. We predict that pairs of talkers who vary in LTAS in the <italic>range</italic> of frequencies important for target speech categorization (e.g., in the vicinity of F3 for /ga/-/da/ consonant targets) will produce target speech categorization shifts typical of the Ladefoged and Broadbent results but that LTAS differences outside this range will produce highly discriminable talkers that do not elicit &#x0201C;talker normalization&#x0201D; effects.</p>
<p>The final prediction of the LTAS model to be tested is that similar shifts in target categorization will be elicited from <italic>non-speech</italic> contexts to the extent that the LTAS is matched to the speech contexts (in the relevant frequency region). Such a result would strongly suggest general auditory, as opposed to vocal tract-representation or acoustic-phonemic based, mechanisms of talker normalization, because the non-speech contexts carry no information about vocal tract anatomy, talker identity, neutral vowel patterns, or phoneme identity. Should non-speech contexts influence speech target categorization when listeners have no access to articulatory referents, it would provide evidence for contributions of general auditory processes to talker normalization.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Participants</title>
<p>Twenty volunteers from Carnegie Mellon University and the University of Pittsburgh participated for a small payment. All listeners reported normal hearing, were native monolingual English speakers, and provided written informed consent to participate.</p>
<p>The experiment was approved by the Carnegie Mellon University Institutional Review Board.</p>
</sec>
<sec>
<title>Stimuli</title>
<sec>
<title>Speech targets</title>
<p>Nine speech target stimuli were derived from natural /ga/ and /da/ recordings from a monolingual male native English speaker (Computer Speech Laboratory, Kay Elemetrics, Lincoln Park, NJ, USA; 20-kHz sampling rate, 16-bit resolution) and were identical to those utilized in several earlier studies (Holt, <xref ref-type="bibr" rid="B13">2005</xref>, <xref ref-type="bibr" rid="B14">2006</xref>; Wade and Holt, <xref ref-type="bibr" rid="B53">2005</xref>). To create the nine-step series, multiple natural productions of the syllables were recorded and, from this set, one /ga/ and one /da/ token were selected that were nearly identical in spectral and temporal properties except for the onset frequencies of F2 and F3. Linear predictive coding (LPC) analysis was performed on each of the tokens to determine a series of filters that spanned these endpoints (Analysis-Synthesis Laboratory, Kay Elemetrics) such that the onset frequencies of F2 and, primarily, F3 varied approximately linearly between /ga/ and /da/ endpoints. These filters were excited by the LPC residual of the original /ga/ production to create an acoustic series spanning the natural /ga/ and /da/ endpoints in approximately equal steps. Creating stimuli in this way provides the advantage of very natural-sounding speech tokens. These 411-ms speech series members served as categorization targets. Figure <xref ref-type="fig" rid="F1">1</xref>B shows spectrograms for the endpoints of the series, appended to two different types of context. The top spectrogram depicts the /ga/ endpoint whereas the bottom spectrogram shows the /da/ endpoint. Notice that the main difference between the targets is the onset F3 frequency.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>(A)</bold> Schematic illustration of stimulus construction; <bold>(B)</bold> Spectrograms of non-speech context followed by /ga/ (top) and speech context followed by /da/ (bottom).</p></caption>
<graphic xlink:href="fpsyg-03-00203-g001.tif"/>
</fig>
</sec>
<sec>
<title>Context stimuli: speech</title>
<p>The speech targets were preceded by one of eight context stimuli. Four of these contexts were the phrase &#x0201C;<italic>Please say what this word is</italic>&#x02026;,&#x0201D; mimicking the contexts studied by Ladefoged and Broadbent (<xref ref-type="bibr" rid="B27">1957</xref>). Variants were synthesized to sound as though the phrase was spoken by four different talkers. This was accomplished by raising or lowering the formant frequencies in either the region of the F1 or F3. To create the voices, a 1700-ms phrase was generated by extracting formant frequencies and bandwidths from recording a male voice reciting &#x0201C;<italic>Please say what this word is</italic>&#x02026;,&#x0201D; and using these values to synthesize the phrase in the parallel branch of the Klatt and Klatt (<xref ref-type="bibr" rid="B26">1990</xref>) synthesizer. The phrase created with these natural parameters was spectrally manipulated by adjusting formant center frequencies and bandwidths to create the different &#x0201C;talkers.&#x0201D; These manipulations resulted in two independent variables: context frequency peak (High, Low) and context frequency range (F1, F3).</p>
<p>The context frequency peak manipulation arises from previous research, indicating that context effects in speech categorization are spectrally contrastive (e.g., Lotto and Kluender, <xref ref-type="bibr" rid="B32">1998</xref>; Holt, <xref ref-type="bibr" rid="B13">2005</xref>; Lotto and Holt, <xref ref-type="bibr" rid="B31">2006</xref>). Lower-frequency contexts shift categorization responses toward higher-frequency alternatives (e.g., /da/) whereas higher-frequency contexts shift responses toward lower-frequency alternatives (e.g., /ga/). Figure <xref ref-type="fig" rid="F2">2</xref>B plots the LTAS for /ga/ and /da/ endpoint speech targets, demonstrating that the tokens are maximally distinctive at two areas within the F3 frequency range (approximately 1800 and 2800&#x02009;Hz). Thus, following the hypotheses of the LTAS model, one &#x0201C;talker&#x0201D; was synthesized to possess relatively higher-frequency energy in the F3 region with a peak in energy at about 2866&#x02009;Hz. Another &#x0201C;talker&#x0201D; was created with relatively lower-frequency F3 energy peaking at about 1886&#x02009;Hz. This manipulation is very similar to the type used by Ladefoged and Broadbent (<xref ref-type="bibr" rid="B27">1957</xref>) to synthesize talker differences in their classic study, although they manipulated only F1 and F2 frequencies.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>(A)</bold> Long-term average spectra (LTAS) of representative context stimuli from each condition; <bold>(B)</bold> LTAS of speech target endpoints /ga/ and /da/.</p></caption>
<graphic xlink:href="fpsyg-03-00203-g002.tif"/>
</fig>
<p>A similar manipulation was made in the F1 frequency region to create two additional synthesized voices. The peak frequencies in this region were chosen to match the perceptual distance of the High and Low peaks and bandwidths in the F3 frequency region (equating the peak frequency difference on the Mel scale, a psychoacoustic scale that may better model the non-linear characteristics of human auditory processing along the frequency dimension than the linear Hz scale; Stevens et al., <xref ref-type="bibr" rid="B49">1937</xref>). This resulted in phrases with peaks in the LTAS at 318&#x02009;Hz for the lower-frequency context and 808&#x02009;Hz for the higher-frequency context.</p>
<p>The context frequency range manipulation (F1 versus F3) provided a test of the hypothesis that context-dependent speech categorization characterized as &#x0201C;talker normalization&#x0201D; is sensitive to spectral differences in the region of the spectra relevant to target speech categorization. Although both F1 and F3 manipulations are expected to produce discriminable differences in perceived talker, the LTAS model predicts that spectral peak differences in the region of F3 should influence categorization of /ga/-/da/ exemplars because F3 is critical to the /ga/-/da/ distinction. However, spectral peak differences in the region of F1 are predicted to have no influence on /ga/-/da/ categorization. Figure <xref ref-type="fig" rid="F1">1</xref>B presents representative spectrograms and Figure <xref ref-type="fig" rid="F2">2</xref>A shows the LTAS of each voice. Pairing these four contexts with the nine speech targets resulted in 36 unique stimuli; each was presented 10 times in the experiment, for a total of 360 speech context trials. Speech stimuli were sampled at 11,025&#x02009;Hz and converted to &#x0002A;.wav files using MATLAB (Mathworks, Inc.).</p>
</sec>
<sec>
<title>Context stimuli: non-speech</title>
<p>Following the methods of Holt (<xref ref-type="bibr" rid="B13">2005</xref>), four non-speech contexts comprised of sequences of sine-wave tones with frequencies chosen to mirror the Low and High-frequency peaks in the F1 and F3 regions of the speech contexts&#x02019; LTAS were also synthesized. Figure <xref ref-type="fig" rid="F2">2</xref>A shows the LTAS of these tone sequences. Note that, whereas the LTAS of sentence contexts are somewhat difficult to manipulate because speech inherently possesses energy across the frequency spectrum, non-speech contexts are more easily controlled with explicit placement of sine-wave tones. Thus, for the non-speech contexts acoustic energy may be focused on precisely the spectral regions predicted to have (or not to have) an effect on target /ga/-/da/ categorization. This may have important implications for the magnitude of the influence of speech versus non-speech contexts on speech categorization, as discussed below.</p>
<p>These sequences of tones, similar to those described by Holt (<xref ref-type="bibr" rid="B13">2005</xref>), did not sound like speech and did not possess articulatory or talker-specific information. Seventeen 70-ms tones (5&#x02009;ms linear onset/offset amplitude ramps) with 30&#x02009;ms silent intervals modeled the 1700-ms duration of the speech contexts. As in previous experiments (Holt, <xref ref-type="bibr" rid="B13">2005</xref>, <xref ref-type="bibr" rid="B14">2006</xref>; Huang and Holt, <xref ref-type="bibr" rid="B16">2009</xref>), the order of the tones making up the non-speech contexts was randomized on a trial-by-trial basis to minimize effects elicited by any particular tone ordering. Thus, any influence of the non-speech contexts on the speech categorization is indicative of listeners&#x02019; sensitivity to the LTAS of the context and not merely to the simple acoustic characteristics of any particular segment of the tone sequence.</p>
<p>The bandwidth of frequency variation was approximately matched to the bandwidth of the peak in the corresponding speech context&#x02019;s LTAS, as measured 10&#x02009;dB below the peak. The low-frequency F1 range distribution sampled 150&#x02009;Hz in 10&#x02009;Hz steps (mean 200&#x02009;Hz, 125&#x02013;275&#x02009;Hz range), and the high-frequency F1 range distribution sampled 240&#x02009;Hz in 16&#x02009;Hz steps (mean 750&#x02009;Hz, range 630&#x02013;870&#x02009;Hz). The low-frequency F3 range distribution sampled 435&#x02009;Hz in 29&#x02009;Hz steps (mean 1873.5&#x02009;Hz, range 1656&#x02013;2091&#x02009;Hz), and the high-frequency F3 range distribution sampled 570&#x02009;Hz in 38&#x02009;Hz steps (mean 2785&#x02009;Hz, range 2500&#x02013;3070&#x02009;Hz).</p>
<p>Tones comprising the non-speech contexts were synthesized with 16-bit resolution, sampled at 11,025&#x02009;Hz, and concatenated to form random orderings. Ninety unique contexts were created so that each non-speech context could be paired with each of the nine speech targets 10 times. Across the four non-speech LTAS conditions, this resulted in 360 unique stimuli.</p>
<p>All speech and non-speech contexts and speech targets were digitally matched to the RMS energy of the /da/ endpoint of the target speech series, and a 50-ms silent interval separated the context and the speech target. Figure <xref ref-type="fig" rid="F1">1</xref>A provides a schematic illustration of stimulus construction and Figures <xref ref-type="fig" rid="F1">1</xref>B and <xref ref-type="fig" rid="F2">2</xref>A show spectrograms and LTAS of representative stimuli from each condition. The LTAS of the speech target series endpoints, /ga/ and /da/, are shown in Figure <xref ref-type="fig" rid="F2">2</xref>B.</p>
</sec>
</sec>
<sec>
<title>Procedure</title>
<p>Listeners categorized the nine speech targets in each of the eight contexts. Trials were divided into four blocks so that listeners heard higher- and lower-frequency versions of each context condition [2 (speech/non-speech)&#x02009;&#x000D7;&#x02009;2 (LTAS peak in F1 region/F3 region)] within the same block. The order of the blocks was fully counterbalanced across participants and, within a block, trial order was random. On each trial, listeners heard a context plus speech target stimulus and categorized the speech target as /ga/ or /da/ using buttons on a computer keyboard.</p>
<p>The categorization blocks were followed by a brief discrimination test to measure the extent to which manipulations of the LTAS were successful in producing perceived talker differences among the speech contexts. On each trial, participants heard a pair of context sentences and judged whether the voice speaking the sentences was the same or different by pressing buttons on a computer keyboard. The task was divided into two blocks according to the LTAS peak region (F1 versus F3). Within a block, listeners heard both higher-frequency and lower-frequency versions of the sentences across 20 randomly ordered trials. One-half of the trials were different talker pairs (High-Low or Low-High, five repetitions each) and the remaining trials were identical voices (High-High, Low-Low, five repetitions each).</p>
<p>For both speech categorization and talker discrimination tests, acoustic presentation was under the control of E-Prime (Schneider et al., <xref ref-type="bibr" rid="B46">2002</xref>) and stimuli were presented diotically over linear headphones (Beyer DT-150) at approximately 70&#x02009;dB SPL (A). The experiment lasted approximately 1&#x02009;h.</p>
</sec>
</sec>
<sec>
<title>Results</title>
<p>The results of the talker discrimination task indicate that the synthesized voices were highly discriminable as different talkers (F1 manipulation <italic>d</italic>&#x02032;&#x02009;&#x0003D;&#x02009;3.46; F3 manipulation <italic>d</italic>&#x02032;&#x02009;&#x0003D;&#x02009;3.09). Moreover, participants&#x02019; ability to discriminate talkers did not differ for talkers created with manipulations to LTAS in the F1 versus F3 frequency regions, <italic>t</italic>(19)&#x02009;&#x0003D;&#x02009;1.603, <italic>p</italic>&#x02009;&#x0003D;&#x02009;0.126. Thus, there is sufficient information available in the synthesized speech contexts to support talker identity judgments and this information does not differ depending on whether voices were synthesized via spectral manipulations of F1 versus F3 spectral regions. Each might be reasonably expected to elicit talker normalization.</p>
<p>However, the results indicate that this was not the case. The patterns of speech target categorization in the context of these four talkers was assessed with a 2 (context frequency range, F1/F3 region)&#x02009;&#x000D7;&#x02009;2 (context frequency, High/Low)&#x02009;&#x000D7;&#x02009;9 (speech target, /ga/-/da/) repeated-measures ANOVA of percent /ga/ responses. The analysis revealed a significant main effect of speech target, <italic>F</italic>(8, 152)&#x02009;&#x0003D;&#x02009;130.196, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001, <inline-formula><mml:math id="M1"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0.873, indicating that /ga/ responses varied as intended across the speech targets. Higher-order interactions involving speech target were also significant (<italic>p&#x02009;&#x0003C;</italic>&#x02009;0.05); however, since our predictions center on context-dependent speech target categorization, the focus of interpretation is placed on interactions that do not involve target. Figure <xref ref-type="fig" rid="F3">3</xref> plots listeners&#x02019; average percent /ga/ categorization responses across speech targets as a function of context.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Mean percent-/ga/ categorization (1&#x02009;SEM)</bold>.</p></caption>
<graphic xlink:href="fpsyg-03-00203-g003.tif"/>
</fig>
<p>Overall, there was a robust main effect of speech context frequency (High, Low) on speech target categorization, <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;34.66, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001, <inline-formula><mml:math id="M2"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0.646, such that stimuli were more often labeled as /ga/ following high-frequency contexts (<italic>M</italic>&#x02009;&#x0003D;&#x02009;68.3%, S.E.&#x02009;&#x0003D;&#x02009;2.1%) than low-frequency contexts (<italic>M</italic>&#x02009;&#x0003D;&#x02009;63.4%, S.E.&#x02009;&#x0003D;&#x02009;2.1%). This is consistent with the spectrally contrastive pattern of results found for sentence-length contexts in previous research (Holt, <xref ref-type="bibr" rid="B13">2005</xref>, <xref ref-type="bibr" rid="B14">2006</xref>; Huang and Holt, <xref ref-type="bibr" rid="B16">2009</xref>) and extends the original Ladefoged and Broadbent effects from target vowels to target consonants. There was no main effect of context frequency range, <italic>F</italic>&#x02009;&#x0003C;&#x02009;1, indicating that the overall percent /ga/ responses did not vary between F1 and F3 conditions. Most importantly, however, context frequency (High, Low) significantly interacted with context range (F1, F3), <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;7.467, <italic>p</italic>&#x02009;&#x0003D;&#x02009;0.013, <inline-formula><mml:math id="M3"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0282. Whereas the voice differences created by high- versus low-frequency peaks in the F1 frequency range did not elicit a significant shift in speech target categorization [1.9%, <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;1.717, <italic>p</italic>&#x02009;&#x0003D;&#x02009;0.206, <inline-formula><mml:math id="M4"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0.083], the F3 range conditions did elicit a significant categorization shift [7.89%, <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;37.110, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.001, <inline-formula><mml:math id="M5"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0661]. This is curious given that the voice discrimination task revealed that voices created via F1 range spectral manipulations were just as discriminable as different talkers as those differing in the F3 frequency range. As described above, however, the LTAS model predicts this pattern of results.</p>
<p>A stronger test of an LTAS-based account rests with the non-speech contexts, which do not carry any speech or vocal tract information from which to accomplish talker normalization and, in fact, are perceived as sequences of non-linguistic tones. Qualitatively, the pattern of results for non-speech was very similar to that obtained for speech contexts (Figure <xref ref-type="fig" rid="F3">3</xref>). The effect of non-speech context frequency (High versus Low) was significant for the F3 range, <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;202.836, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001, <inline-formula><mml:math id="M6"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0.914, but not for the F1 range, <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;2.862, <italic>p</italic>&#x02009;&#x0003D;&#x02009;0.107, <inline-formula><mml:math id="M7"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0.131. Non-speech contexts elicited a shift in target categorization, but only when the spectral manipulations were in the F3 region.</p>
<p>There is a significant difference between the influence speech and non-speech contexts had on speech categorization that bears note. The effect of speech versus non-speech contexts varied as a function of context frequency (High, Low), <italic>F</italic>(1, 19)&#x02009;&#x0003D;&#x02009;83.54, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001, <inline-formula><mml:math id="M8"><mml:mrow><mml:msubsup><mml:mtext>&#x003B7;</mml:mtext><mml:mtext>p</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> &#x0003D; 0.815, revealing that the size of the spectrally contrastive shift in speech target categorization differed across context type. The directionality of this difference is interesting; <italic>non-speech</italic> contexts had a bigger effect on speech categorization. Whereas the speech contexts elicited about a 7% shift in target categorization, the non-speech elicited a 38% shift. The dramatic difference in effect size may be understood with respect to the LTAS differences between the speech and non-speech conditions. Whereas, by necessity, speech contexts possess more diffuse energy across the frequency spectrum, the non-speech contexts had extremely concentrated energy in the spectral region significant to target speech categorization. This concentration of LTAS energy appears to be particularly effective in altering the subsequent perception of speech.</p>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>The classic effect of context on speech categorization observed by Ladefoged and Broadbent (<xref ref-type="bibr" rid="B27">1957</xref>) demonstrated that listeners extract information from precursor phrases that affects categorization of subsequent vowels. The nature of this information has remained in question. Do listeners extract talker-specific representations of vocal tract dimensions or of acoustic-phonemic mappings for different talkers? The current study tested the predictions of an alternative model &#x02013; that listeners compute a general auditory representation of the average energy across frequency, the LTAS. The LTAS then serves as a referent for representation for subsequent perception. Three predictions of the LTAS model were tested: (1) the <italic>direction</italic> of the shift in target categorizations would be predictable from the distribution of energy in the carrier phrase relative to that in the targets; (2) only manipulations to the carrier phrase that affect relevant frequency <italic>ranges</italic> would result in categorization shifts; and (3) <italic>non-speech</italic> contexts matched in LTAS (in the correct frequency range) to the speech contexts would result in similar categorization shifts for speech targets.</p>
<p>The first prediction of the LTAS model was supported. Following the carrier phrase (<italic>Please say what this word is</italic>&#x02026;) synthesized with higher F3 frequencies, listeners categorized the target stimuli more often as /ga/, which has a lower F3 onset frequency than the alternative /da/. This is a spectrally contrastive pattern of results for which the greater relative energy in the higher-frequency region of F3 in the LTAS of the carrier phrase results in an effective lowering of the perceived F3 of the target stimulus (see Lotto and Holt, <xref ref-type="bibr" rid="B31">2006</xref>). Note that this result extends the original findings of Ladefoged and Broadbent from vowel to consonant targets. This extension allows a clearer link to be made between talker normalization effects and recent work on non-speech context effects on speech perception (e.g., Lotto and Kluender, <xref ref-type="bibr" rid="B32">1998</xref>; Holt, <xref ref-type="bibr" rid="B13">2005</xref>; Lotto and Holt, <xref ref-type="bibr" rid="B31">2006</xref>). Note, that whereas this pattern of results supports the predictions of the LTAS model, it does not rule out a model based on vocal tract or phoneme-acoustic specific representations. For example, if listeners were to track F3 values for each consonant during the high-F3 carrier phrase to map the phonemic space of a talker, the target would have a relatively low F3 onset frequency when compared to these referents (and be more likely perceived as a /ga/).</p>
<p>The second prediction of the LTAS model was also supported by the data. Although each of the context phrase manipulations resulted in a discriminably different voice, not all of the phrases produced a shift in target categorization. As predicted, the F3 manipulation resulted in a categorization shift, but the F1 manipulation, which is not in the range of the acoustic energy relevant for the /ga/-/da/ distinction, did not. This implies that the observed change in speech categorization was not based simply on a perceived change in talker (or vocal tract shape) <italic>per se</italic>. Rather, a particular task-relevant acoustic characteristic (F3 range) of talkers seems to be the critical factor that drives the normalization effect across conditions. It is reasonable to suspect that the sensitivity of the effects to the F3 (and not F1) spectral range is primarily due to its match to the range of frequencies discriminating the target speech contrast. Supportive of this, experiments have demonstrated that carrier phrases differentiated by energy in other spectral ranges (e.g., F2; Holt and Lotto, <xref ref-type="bibr" rid="B15">2002</xref>, fundamental frequency, <italic>f</italic><sub>0</sub>; Huang and Holt, <xref ref-type="bibr" rid="B16">2009</xref>, <xref ref-type="bibr" rid="B18">2012</xref>) produce similar effects on speech categorization if they match the spectral range relevant to the target contrast. Other research has also emphasized the importance of acoustic details of the context in predicting effects of context on categorizing auditory targets (Sjerps et al., <xref ref-type="bibr" rid="B47">2011</xref>). A strength of the LTAS model is that it predicts which changes in talker will result in shifts in categorization and which will not. From the perspective of the spectral-based LTAS model, the target contrast should only be affected by shifts in carrier phrase LTAS if the spectral ranges of carrier and target are well-matched (Holt, <xref ref-type="bibr" rid="B12">1999</xref>). Simply changing the talker of the context phrase does not suffice &#x02013; the listener clearly is not just using a shift in formant values to recalibrate judgments about vocal tract size.</p>
<p>The third prediction of the LTAS was tested by substituting a series of tones for the context phrases. The LTAS of these tone sequences had high or low-frequency energy in the F1 or F3 regions that were similar to the LTAS for the respective speech conditions, but they sounded nothing like speech, and carried no information about talker, voice, vocal tract anatomy, or phonemes. As predicted, there was a significant contrastive shift in target categorization for the non-speech sequences that mimicked the F3 manipulations of the speech contexts (but no shift for the F1 region tone sequences). In fact, the perceptual shift observed was greater for the non-speech than the speech contexts. This difference was likely due to the fact that the peaks in the LTAS for the non-speech were greater in amplitude and more discrete than those of the more acoustically complex sentences (see Figure <xref ref-type="fig" rid="F2">2</xref>A). The fact that prominence and focus of the spectral peaks, rather than any speech- or talker-specific characteristic, had the greatest effect on speech categorization provides further evidence that the processes and representations underlying these context-dependent effects may be of a general perceptual nature. This generality allows the LTAS model to extend naturally to other types of talker normalization that do not originate at the segmental (phoneme, syllable) level, such as normalization for lexical tone in tone languages (e.g., Leather, <xref ref-type="bibr" rid="B28">1983</xref>; Fox and Qi, <xref ref-type="bibr" rid="B8">1990</xref>; Moore and Jongman, <xref ref-type="bibr" rid="B39">1997</xref>; Huang and Holt, <xref ref-type="bibr" rid="B16">2009</xref>, <xref ref-type="bibr" rid="B18">2012</xref>). Furthermore, the spectral-context-based effects reported here are not constrained to the specific consonant contrast (/g/ versus /d/) reported in the present study. Other research has shown that categorization of vowel contrasts is shifted by the LTAS of preceding context (Vitela et al., <xref ref-type="bibr" rid="B52">2010</xref>; Huang and Holt, <xref ref-type="bibr" rid="B18">2012</xref>) and that categorization of Mandarin tones is shifted by the average voice pitch (fundamental frequency, <italic>f</italic><sub>0</sub>) in preceding context (Huang and Holt, <xref ref-type="bibr" rid="B16">2009</xref>, <xref ref-type="bibr" rid="B17">2011</xref>). In these studies, the effects were elicited by both speech and non-speech precursors and were in the directions predicted by the LTAS model (spectrally contrastive to the LTAS of preceding contexts).</p>
<p>The correspondence of the effects of speech and non-speech contexts on speech categorization strongly implicates the involvement general auditory mechanisms. One ubiquitous neural mechanism consistent with the contrastive shifts in perception observed in these effects is neural adaptation (Harris and Dallos, <xref ref-type="bibr" rid="B10">1979</xref>; Smith, <xref ref-type="bibr" rid="B48">1979</xref>; Carandini, <xref ref-type="bibr" rid="B4">2000</xref>). In a pool of auditory neurons encoding frequency, a precursor with a higher-frequency LTAS would be better encoded by a particular subset of this pool. Having fired robustly to the precursor, neural adaptation would predict that this subset of neurons would exhibit decreased responsiveness to any subsequent stimuli. Thus, at the population level, the encoding of the subsequent speech target would be shifted relative to encoding in isolation or following a precursor with a lower-frequency LTAS.</p>
<p>However, the present results are unlikely to arise from sensory adaptation (e.g., at the level of the cochlea or auditory nerve) because they persist even when non-speech contexts and speech targets are presented to opposite ears (Holt and Lotto, <xref ref-type="bibr" rid="B15">2002</xref>; Lotto et al., <xref ref-type="bibr" rid="B34">2003</xref>), when silent intervals between context and target preclude peripheral interactions (Holt, <xref ref-type="bibr" rid="B13">2005</xref>), and when spectrally neutral sounds intervene between context and target (Holt, <xref ref-type="bibr" rid="B13">2005</xref>).</p>
<p>Stimulus specific adaptation, a mechanism demonstrated in inferior colliculus, thalamus, and cortex in the auditory system (Ulanovsky et al., <xref ref-type="bibr" rid="B51">2004</xref>; Perez-Gonzalez et al., <xref ref-type="bibr" rid="B41">2005</xref>; Malmierca et al., <xref ref-type="bibr" rid="B36">2009</xref>; Antunes et al., <xref ref-type="bibr" rid="B1">2010</xref>), may be a better candidate than neural fatigue for supporting the LTAS-driven context effects (Holt, <xref ref-type="bibr" rid="B14">2006</xref>). Research on SSA suggests that auditory neurons track statistical distributions of sounds across rather extended temporal windows and modulate their responsiveness in reaction to this regularity such that responses to infrequent sounds are exaggerated. Thus, SSA serves to enhance acoustic contrast, as observed in the present behavioral results (see Holt, <xref ref-type="bibr" rid="B14">2006</xref> for further discussion).</p>
<p>An important aspect of the present results is the finding that there is an interaction between the acoustic energy that elicits spectral contrast effects and the range of spectral information relevant for categorizing the speech targets; energy in the region of F3 exerted an influence whereas lower-frequency F1 energy did not. One possibility is that there may be limitations on the spectral range across which a mechanism such as SSA is effective. Another possibility is that there is a top-down, task-, or attention-driven modulation of the frequency range distinguishing speech targets (e.g., the F3 range, in the current experiment) such that the effects of adaptive mechanisms in this range are enhanced (or, conversely, the effects of adaptive mechanisms outside this range are attenuated). The current data do not differentiate between these possibilities and the accounts are not mutually exclusive. Our understanding of neural mechanisms supporting the range-specificity of context effects observed in the current data will benefit from continued development of models of the interaction between effects influencing perceptual encoding, such as adaptation, and top-down modulatory mechanisms (see J&#x000E4;&#x000E4;skel&#x000E4;inen et al., <xref ref-type="bibr" rid="B19">2011</xref> for further discussion of such interactions).</p>
<p>The present findings do not suggest that LTAS is the <italic>only</italic> information involved in talker normalization or phonetic context effects. Listeners exhibit long-term effects of talker familiarity (Nygaard and Pisoni, <xref ref-type="bibr" rid="B40">1998</xref>), and speech processing can be influenced even as a function of whether a listener <italic>imagines</italic> that speech is produced by a male versus female talker (Johnson, <xref ref-type="bibr" rid="B20">1990</xref>; Johnson et al., <xref ref-type="bibr" rid="B22">1999</xref>) or that there are one versus two talkers present (Magnuson and Nusbaum, <xref ref-type="bibr" rid="B35">2007</xref>). Whereas the present data demonstrate talker-specific information is not necessary to observed shifts in speech categorization, they do not preclude the possibility that voice- or speech-specific processes (e.g., Belin et al., <xref ref-type="bibr" rid="B3">2000</xref>) may also contribute. These expectation-based and long-term memory effects are not inconsistent with mechanisms that support LTAS effects. Rather, they are likely to complement talker normalization through other information sources.</p>
<p>The flip-side of the question of how a general auditory process affects speech processing is asking what purpose LTAS computations may serve in non-speech auditory perception, in general. Lotto and Sullivan (<xref ref-type="bibr" rid="B33">2007</xref>) have proposed that sensitivity to LTAS may be useful for noise reduction in natural environments. If noise sources such as babbling brooks and ceiling fans have relatively constant spectra, then perception of sound events (including speech) would be more efficient by determining the LTAS of the noise and subtracting it off or, equivalently, using it as a reference so that all sounds are perceived relative to their spectral change from the ambient sound environment. Such a system would be very effective at dealing with other structured variance in auditory signals such as the filtering characteristics of communication channels (Watkins, <xref ref-type="bibr" rid="B54">1991</xref>) or the systematic acoustic differences among talkers. How significant a role this general process plays in speech communication and other complex sound processing remains to be described, but the data from the current experiment strongly support the significance of the phenomenon in talker normalization, one of the most enduring theoretical issues in speech perception research.</p>
</sec>
<sec>
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack>
<p>The authors thank Christi Gomez and Anthony Kelly for their assistance. The research was supported by a grant from the National Institutes of Health (R01DC004674), training grants from the National Institutes of Health (T90DA022761, T32MH019983), and a National Science Foundation Graduate Research Fellowship to Ran Liu. Correspondence may be addressed to Lori L. Holt, <uri xlink:href="http://loriholt&#x00040;cmu.edu">loriholt&#x00040;cmu.edu</uri>.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antunes</surname> <given-names>F. M.</given-names></name> <name><surname>Nelken</surname> <given-names>I.</given-names></name> <name><surname>Covey</surname> <given-names>E.</given-names></name> <name><surname>Malmierca</surname> <given-names>M. S.</given-names></name></person-group> (<year>2010</year>). <article-title>Stimulus-specific adaptation in the auditory thalamus of the anesthetized rat</article-title>. <source>PLoS ONE</source> <volume>5</volume>, <fpage>e14071</fpage>.<pub-id pub-id-type="doi">10.1371/journal.pone.0010353</pub-id><pub-id pub-id-type="pmid">21124913</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atal</surname> <given-names>B. S.</given-names></name> <name><surname>Chang</surname> <given-names>J. J.</given-names></name> <name><surname>Mathews</surname> <given-names>M. V.</given-names></name> <name><surname>Tukey</surname> <given-names>J. W.</given-names></name></person-group> (<year>1978</year>). <article-title>Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>63</volume>, <fpage>1535</fpage>&#x02013;<lpage>1555</lpage>.<pub-id pub-id-type="doi">10.1121/1.2016833</pub-id><pub-id pub-id-type="pmid">690333</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belin</surname> <given-names>P.</given-names></name> <name><surname>Zatorre</surname> <given-names>R. J.</given-names></name> <name><surname>Lafaille</surname> <given-names>P.</given-names></name> <name><surname>Ahad</surname> <given-names>P.</given-names></name> <name><surname>Pike</surname> <given-names>B.</given-names></name></person-group> (<year>2000</year>). <article-title>Voice-selective areas in human auditory cortex</article-title>. <source>Nature</source> <volume>403</volume>, <fpage>309</fpage>&#x02013;<lpage>312</lpage>.<pub-id pub-id-type="doi">10.1038/35002078</pub-id><pub-id pub-id-type="pmid">10659849</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carandini</surname> <given-names>M.</given-names></name></person-group> (<year>2000</year>). <article-title>Visual cortex: fatigue and adaptation</article-title>. <source>Curr. Biol.</source> <volume>10</volume>, <fpage>605</fpage>&#x02013;<lpage>607</lpage>.<pub-id pub-id-type="doi">10.1016/S0960-9822(00)00637-0</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Diehl</surname> <given-names>R. L.</given-names></name> <name><surname>Elman</surname> <given-names>J. L.</given-names></name> <name><surname>McCusker</surname> <given-names>S. B.</given-names></name></person-group> (<year>1978</year>). <article-title>Contrast effects on stop consonant identification</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform.</source> <volume>4</volume>, <fpage>599</fpage>&#x02013;<lpage>609</lpage>.<pub-id pub-id-type="doi">10.1037/0096-1523.4.4.599</pub-id><pub-id pub-id-type="pmid">722250</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eimas</surname> <given-names>P. D.</given-names></name> <name><surname>Corbit</surname> <given-names>J. D.</given-names></name></person-group> (<year>1973</year>). <article-title>Selective adaptation of linguistic feature detectors</article-title>. <source>Cogn. Psychol.</source> <volume>4</volume>, <fpage>99</fpage>&#x02013;<lpage>109</lpage>.<pub-id pub-id-type="doi">10.1016/0010-0285(73)90006-6</pub-id></citation></ref>
<ref id="B7"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Fant</surname> <given-names>G.</given-names></name></person-group> (<year>1960</year>). <source>Acoustic Theory of Speech Production</source>. <publisher-loc>The Hague</publisher-loc>: <publisher-name>Mouton and Co</publisher-name>.</citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fox</surname> <given-names>R.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name></person-group> (<year>1990</year>). <article-title>Contextual effects in the perception of lexical tone</article-title>. <source>J. Chin. Linguist.</source> <volume>18</volume>, <fpage>261</fpage>&#x02013;<lpage>283</lpage>.</citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halle</surname> <given-names>M.</given-names></name> <name><surname>Stevens</surname> <given-names>K. N.</given-names></name></person-group> (<year>1962</year>). <article-title>Speech recognition: a model and a program for research</article-title>. <source>IEEE Trans. Inf. Theory</source> <volume>8</volume>, <fpage>155</fpage>&#x02013;<lpage>159</lpage>.<pub-id pub-id-type="doi">10.1109/TIT.1962.1057686</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harris</surname> <given-names>D. M.</given-names></name> <name><surname>Dallos</surname> <given-names>P.</given-names></name></person-group> (<year>1979</year>). <article-title>Forward masking of auditory-nerve fiber responses</article-title>. <source>J. Neurophysiol.</source> <volume>42</volume>, <fpage>1083</fpage>&#x02013;<lpage>1107</lpage>.<pub-id pub-id-type="pmid">479921</pub-id></citation></ref>
<ref id="B11"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hawkins</surname> <given-names>S.</given-names></name></person-group> (<year>2004</year>). <article-title>&#x0201C;Puzzles and patterns in 50 years of research on speech perception,&#x0201D;</article-title> in <source>From Sound to Sense: 50&#x0002B; Years of Discoveries in Speech Communication</source>, eds <person-group person-group-type="editor"><name><surname>Slifka</surname> <given-names>J.</given-names></name> <name><surname>Manuel</surname> <given-names>S.</given-names></name> <name><surname>Matthies</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>223</fpage>&#x02013;<lpage>246</lpage>.</citation></ref>
<ref id="B12"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>1999</year>). <source>Auditory Constraints on Speech Perception: An Examination of Spectral Contrast</source>. <publisher-loc>Doctoral dissertation. Madison</publisher-loc>: <publisher-name>University of Wisconsin-Madison</publisher-name>.</citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2005</year>). <article-title>Temporally non-adjacent non-linguistic sounds affect speech categorization</article-title>. <source>Psychol. Sci.</source> <volume>16</volume>, <fpage>305</fpage>&#x02013;<lpage>312</lpage>.<pub-id pub-id-type="doi">10.1111/j.0956-7976.2005.01532.x</pub-id><pub-id pub-id-type="pmid">15828978</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2006</year>). <article-title>The mean matters: effects of statistically-defined nonspeech spectral distributions on speech categorization</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>120</volume>, <fpage>2801</fpage>&#x02013;<lpage>2817</lpage>.<pub-id pub-id-type="doi">10.1121/1.2354071</pub-id><pub-id pub-id-type="pmid">17091133</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holt</surname> <given-names>L. L.</given-names></name> <name><surname>Lotto</surname> <given-names>A. J.</given-names></name></person-group> (<year>2002</year>). <article-title>Behavioral examinations of the neural mechanisms of speech context effects</article-title>. <source>Hear. Res.</source> <volume>167</volume>, <fpage>156</fpage>&#x02013;<lpage>169</lpage>.<pub-id pub-id-type="doi">10.1016/S0378-5955(02)00383-0</pub-id><pub-id pub-id-type="pmid">12117538</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>J.</given-names></name> <name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2009</year>). <article-title>General perceptual contributions to lexical tone normalization</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>125</volume>, <fpage>3983</fpage>&#x02013;<lpage>3994</lpage>.<pub-id pub-id-type="doi">10.1121/1.3097690</pub-id><pub-id pub-id-type="pmid">19507980</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>J.</given-names></name> <name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Evidence for the central origin of lexical tone normalization</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>129</volume>, <fpage>1145</fpage>&#x02013;<lpage>1148</lpage>.<pub-id pub-id-type="doi">10.1121/1.3588802</pub-id><pub-id pub-id-type="pmid">21428475</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>J.</given-names></name> <name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2012</year>). <article-title>Listening for the norm: adaptive coding in speech categorization</article-title>. <source>Front. Psychol.</source> <volume>3</volume>:<fpage>10</fpage>.<pub-id pub-id-type="doi">10.3389/fpsyg.2012.00010</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>J&#x000E4;&#x000E4;skel&#x000E4;inen</surname> <given-names>I. P.</given-names></name> <name><surname>Ahveninen</surname> <given-names>J.</given-names></name> <name><surname>Andermann</surname> <given-names>M. L.</given-names></name> <name><surname>Belliveau</surname> <given-names>J. W.</given-names></name> <name><surname>Raij</surname> <given-names>T.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Short-term plasticity as a neural mechanism supporting memory and attentional functions</article-title>. <source>Brain Res.</source> <volume>1422</volume>, <fpage>66</fpage>&#x02013;<lpage>81</lpage>.<pub-id pub-id-type="doi">10.1016/j.brainres.2011.09.031</pub-id><pub-id pub-id-type="pmid">21985958</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>1990</year>). <article-title>Contrast and normalization in vowel perception</article-title>. <source>J. Phon.</source> <volume>18</volume>, <fpage>229</fpage>&#x02013;<lpage>254</lpage>.</citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K.</given-names></name> <name><surname>Ladefoged</surname> <given-names>P.</given-names></name> <name><surname>Lindau</surname> <given-names>M.</given-names></name></person-group> (<year>1993</year>). <article-title>Individual differences in vowel production</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>94</volume>, <fpage>701</fpage>&#x02013;<lpage>714</lpage>.<pub-id pub-id-type="doi">10.1121/1.407629</pub-id><pub-id pub-id-type="pmid">8370875</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K.</given-names></name> <name><surname>Strand</surname> <given-names>E. A.</given-names></name> <name><surname>D&#x02019;Imperio</surname> <given-names>M.</given-names></name></person-group> (<year>1999</year>). <article-title>Auditory-visual integration of talker gender in vowel perception</article-title>. <source>J. Phon.</source> <volume>27</volume>, <fpage>359</fpage>&#x02013;<lpage>384</lpage>.<pub-id pub-id-type="doi">10.1006/jpho.1999.0100</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joos</surname> <given-names>M.</given-names></name></person-group> (<year>1948</year>). <article-title>Acoustic phonetics</article-title>. <source>Language</source> <volume>24</volume>, <fpage>1</fpage>&#x02013;<lpage>136</lpage>.<pub-id pub-id-type="doi">10.2307/522229</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiefte</surname> <given-names>M.</given-names></name> <name><surname>Kluender</surname> <given-names>K. R.</given-names></name></person-group> (<year>2008</year>). <article-title>Absorption of reliable spectral characteristics in auditory perception</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>123</volume>, <fpage>366</fpage>&#x02013;<lpage>376</lpage>.<pub-id pub-id-type="doi">10.1121/1.2804951</pub-id><pub-id pub-id-type="pmid">18177166</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiefte</surname> <given-names>M. J.</given-names></name> <name><surname>Kluender</surname> <given-names>K. R.</given-names></name></person-group> (<year>2001</year>). <article-title>Spectral tilt versus formant frequency in static and dynamic vowels</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>109</volume>, <fpage>2294</fpage>&#x02013;<lpage>2295</lpage>.</citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klatt</surname> <given-names>D. H.</given-names></name> <name><surname>Klatt</surname> <given-names>L. C.</given-names></name></person-group> (<year>1990</year>). <article-title>Analysis, synthesis, and perception of voice quality variations among female and male talkers</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>87</volume>, <fpage>820</fpage>&#x02013;<lpage>857</lpage>.<pub-id pub-id-type="doi">10.1121/1.398894</pub-id><pub-id pub-id-type="pmid">2137837</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ladefoged</surname> <given-names>P.</given-names></name> <name><surname>Broadbent</surname> <given-names>D. E.</given-names></name></person-group> (<year>1957</year>). <article-title>Information conveyed by vowels</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>29</volume>, <fpage>98</fpage>&#x02013;<lpage>104</lpage>.<pub-id pub-id-type="doi">10.1121/1.1908694</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leather</surname> <given-names>J.</given-names></name></person-group> (<year>1983</year>). <article-title>Speaker normalization in perception of lexical tone</article-title>. <source>J. Phon.</source> <volume>11</volume>, <fpage>373</fpage>&#x02013;<lpage>382</lpage>.</citation></ref>
<ref id="B29"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Liberman</surname> <given-names>A. M.</given-names></name></person-group> (<year>1996</year>). <source>Speech: A Special Code</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>The MIT Press</publisher-name>.</citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liberman</surname> <given-names>A. M.</given-names></name> <name><surname>Cooper</surname> <given-names>F. S.</given-names></name> <name><surname>Shankweiler</surname> <given-names>D. P.</given-names></name> <name><surname>Studdert-Kennedy</surname> <given-names>M.</given-names></name></person-group> (<year>1967</year>). <article-title>Perception of the speech code</article-title>. <source>Psychol. Rev.</source> <volume>74</volume>, <fpage>431</fpage>&#x02013;<lpage>461</lpage>.<pub-id pub-id-type="doi">10.1037/h0020279</pub-id><pub-id pub-id-type="pmid">4170865</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lotto</surname> <given-names>A. J.</given-names></name> <name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2006</year>). <article-title>Putting phonetic context effects into context: a commentary on Fowler (2006)</article-title>. <source>Percept. Psychophys.</source> <volume>68</volume>, <fpage>178</fpage>&#x02013;<lpage>183</lpage>.<pub-id pub-id-type="doi">10.3758/BF03193667</pub-id><pub-id pub-id-type="pmid">16773891</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lotto</surname> <given-names>A. J.</given-names></name> <name><surname>Kluender</surname> <given-names>K. R.</given-names></name></person-group> (<year>1998</year>). <article-title>General contrast effects of speech perception: effect of preceding liquid on stop consonant identification</article-title>. <source>Percept. Psychophys.</source> <volume>60</volume>, <fpage>602</fpage>&#x02013;<lpage>619</lpage>.<pub-id pub-id-type="doi">10.3758/BF03206049</pub-id><pub-id pub-id-type="pmid">9628993</pub-id></citation></ref>
<ref id="B33"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lotto</surname> <given-names>A. J.</given-names></name> <name><surname>Sullivan</surname> <given-names>S. C.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Speech as a sound source,&#x0201D;</article-title> in <source>Auditory Perception of Sound Sources</source>, eds <person-group person-group-type="editor"><name><surname>Yost</surname> <given-names>W. A.</given-names></name> <name><surname>Fay</surname> <given-names>R. R.</given-names></name> <name><surname>Popper</surname> <given-names>A. N.</given-names></name></person-group> (<publisher-loc>New York</publisher-loc>: <publisher-name>Springer Science and Business Media, LLC</publisher-name>), <fpage>281</fpage>&#x02013;<lpage>305</lpage>.</citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lotto</surname> <given-names>A. J.</given-names></name> <name><surname>Sullivan</surname> <given-names>S. C.</given-names></name> <name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2003</year>). <article-title>Central locus of non-speech context effects on phonetic identification</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>113</volume>, <fpage>53</fpage>&#x02013;<lpage>56</lpage>.<pub-id pub-id-type="doi">10.1121/1.1527959</pub-id><pub-id pub-id-type="pmid">12558245</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magnuson</surname> <given-names>J. S.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H. C.</given-names></name></person-group> (<year>2007</year>). <article-title>Acoustic differences, listener expectations, and the perceptual accommodation of talker variability</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform.</source> <volume>33</volume>, <fpage>391</fpage>&#x02013;<lpage>409</lpage>.<pub-id pub-id-type="doi">10.1037/0096-1523.33.2.391</pub-id><pub-id pub-id-type="pmid">17469975</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malmierca</surname> <given-names>M. S.</given-names></name> <name><surname>Cristaudo</surname> <given-names>S.</given-names></name> <name><surname>Perez-Gonzalez</surname> <given-names>D.</given-names></name> <name><surname>Covey</surname> <given-names>E.</given-names></name></person-group> (<year>2009</year>). <article-title>Stimulus-specific adaptation in the inferior colliculus of the anesthetized rat</article-title>. <source>J. Neurosci.</source> <volume>29</volume>, <fpage>5483</fpage>&#x02013;<lpage>5493</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.4153-08.2009</pub-id><pub-id pub-id-type="pmid">19403816</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McGowan</surname> <given-names>R. S.</given-names></name></person-group> (<year>1997</year>). <article-title>Normalization for articulatory recovery</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>101</volume>, <fpage>3175</fpage>.<pub-id pub-id-type="doi">10.1121/1.418310</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McGowan</surname> <given-names>R. S.</given-names></name> <name><surname>Cushing</surname> <given-names>S.</given-names></name></person-group> (<year>1999</year>). <article-title>Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>106</volume>, <fpage>1090</fpage>&#x02013;<lpage>1105</lpage>.<pub-id pub-id-type="doi">10.1121/1.427117</pub-id><pub-id pub-id-type="pmid">10462814</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>C. B.</given-names></name> <name><surname>Jongman</surname> <given-names>A.</given-names></name></person-group> (<year>1997</year>). <article-title>Speaker normalization in the perception of Mandarin Chinese tones</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>102</volume>, <fpage>1864</fpage>&#x02013;<lpage>1877</lpage>.<pub-id pub-id-type="doi">10.1121/1.420350</pub-id><pub-id pub-id-type="pmid">9301064</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nygaard</surname> <given-names>L. C.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name></person-group> (<year>1998</year>). <article-title>Talker-specific learning in speech perception</article-title>. <source>Percept. Psychophys.</source> <volume>60</volume>, <fpage>355</fpage>&#x02013;<lpage>376</lpage>.<pub-id pub-id-type="doi">10.3758/BF03206860</pub-id><pub-id pub-id-type="pmid">9599989</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perez-Gonzalez</surname> <given-names>D.</given-names></name> <name><surname>Malmierca</surname> <given-names>M. S.</given-names></name> <name><surname>Covey</surname> <given-names>E.</given-names></name></person-group> (<year>2005</year>). <article-title>Novelty detector neurons in the mammalian auditory midbrain</article-title>. <source>Eur. J. Neurosci.</source> <volume>22</volume>, <fpage>2879</fpage>&#x02013;<lpage>2885</lpage>.<pub-id pub-id-type="doi">10.1111/j.1460-9568.2005.04472.x</pub-id><pub-id pub-id-type="pmid">16324123</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peterson</surname> <given-names>G. E.</given-names></name> <name><surname>Barney</surname> <given-names>H. L.</given-names></name></person-group> (<year>1952</year>). <article-title>Control methods used in a study of the vowels</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>24</volume>, <fpage>175</fpage>&#x02013;<lpage>184</lpage>.<pub-id pub-id-type="doi">10.1121/1.1906945</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poeppel</surname> <given-names>D.</given-names></name> <name><surname>Idsardi</surname> <given-names>W. J.</given-names></name> <name><surname>van Wassenhove</surname> <given-names>V.</given-names></name></person-group> (<year>2008</year>). <article-title>Speech perception at the interface of neurobiology and linguistics</article-title>. <source>Philos. Trans. R. Soc. B Biol. Sci.</source> <volume>363</volume>, <fpage>1071</fpage>&#x02013;<lpage>1086</lpage>.<pub-id pub-id-type="doi">10.1098/rstb.2007.2160</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Potter</surname> <given-names>R.</given-names></name> <name><surname>Steinberg</surname> <given-names>J.</given-names></name></person-group> (<year>1950</year>). <article-title>Toward the specification of speech</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>22</volume>, <fpage>807</fpage>&#x02013;<lpage>820</lpage>.<pub-id pub-id-type="doi">10.1121/1.1906694</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sawusch</surname> <given-names>J. R.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H. C.</given-names></name></person-group> (<year>1979</year>). <article-title>Contextual effects in vowel perception I: anchor-induced contrast effects</article-title>. <source>Percept. Psychophys.</source> <volume>25</volume>, <fpage>292</fpage>&#x02013;<lpage>302</lpage>.<pub-id pub-id-type="doi">10.3758/BF03198808</pub-id><pub-id pub-id-type="pmid">461089</pub-id></citation></ref>
<ref id="B46"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Schneider</surname> <given-names>W.</given-names></name> <name><surname>Eschman</surname> <given-names>A.</given-names></name> <name><surname>Zuccolotto</surname> <given-names>A.</given-names></name></person-group> (<year>2002</year>). <source>E-Prime User&#x02019;s Guide</source>. <publisher-loc>Pittsburgh</publisher-loc>: <publisher-name>Psychology Software Tools, Inc</publisher-name>.</citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sjerps</surname> <given-names>M. J.</given-names></name> <name><surname>Mitterer</surname> <given-names>H.</given-names></name> <name><surname>McQueen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Constraints on the processes responsible for the extrinsic normalization of vowels</article-title>. <source>Atten. Percept. Psychophys.</source> <volume>73</volume>, <fpage>1195</fpage>&#x02013;<lpage>1215</lpage>.<pub-id pub-id-type="doi">10.3758/s13414-011-0096-8</pub-id><pub-id pub-id-type="pmid">21321794</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>R. L.</given-names></name></person-group> (<year>1979</year>). <article-title>Adaptation, saturation, and physiological masking in single auditory-nerve fibers</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>65</volume>, <fpage>166</fpage>&#x02013;<lpage>178</lpage>.<pub-id pub-id-type="doi">10.1121/1.2017548</pub-id><pub-id pub-id-type="pmid">422812</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevens</surname> <given-names>S. S.</given-names></name> <name><surname>Volkman</surname> <given-names>J.</given-names></name> <name><surname>Newman</surname> <given-names>E.</given-names></name></person-group> (<year>1937</year>). <article-title>A scale for the measurement of the psychological magnitude of pitch</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>8</volume>, <fpage>185</fpage>&#x02013;<lpage>190</lpage>.<pub-id pub-id-type="doi">10.1121/1.1915894</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Story</surname> <given-names>B. H.</given-names></name></person-group> (<year>2005</year>). <article-title>A parametric model of the vocal tract area function for vowel and consonant simulation</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>117</volume>, <fpage>3231</fpage>&#x02013;<lpage>3254</lpage>.<pub-id pub-id-type="doi">10.1121/1.1869752</pub-id><pub-id pub-id-type="pmid">15957790</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ulanovsky</surname> <given-names>N.</given-names></name> <name><surname>Las</surname> <given-names>L.</given-names></name> <name><surname>Farkas</surname> <given-names>D.</given-names></name> <name><surname>Nelken</surname> <given-names>I.</given-names></name></person-group> (<year>2004</year>). <article-title>Multiple time scales of adaptation in auditory cortex neurons</article-title>. <source>J. Neurosci.</source> <volume>24</volume>, <fpage>10440</fpage>&#x02013;<lpage>10453</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.1905-04.2004</pub-id><pub-id pub-id-type="pmid">15548659</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vitela</surname> <given-names>A. D.</given-names></name> <name><surname>Story</surname> <given-names>B. H.</given-names></name> <name><surname>Lotto</surname> <given-names>A. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Predicting the effect of talker differences on perceived vowel categories</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>128</volume>, <fpage>2349</fpage>.<pub-id pub-id-type="doi">10.1121/1.3508321</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wade</surname> <given-names>T.</given-names></name> <name><surname>Holt</surname> <given-names>L. L.</given-names></name></person-group> (<year>2005</year>). <article-title>Effects of later-occurring non-linguistic sounds on speech categorization</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>118</volume>, <fpage>1701</fpage>&#x02013;<lpage>1710</lpage>.<pub-id pub-id-type="doi">10.1121/1.2011156</pub-id><pub-id pub-id-type="pmid">16240828</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watkins</surname> <given-names>A. J.</given-names></name></person-group> (<year>1991</year>). <article-title>Central, auditory mechanisms of perceptual compensation for spectral-envelope distortion</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>90</volume>, <fpage>2942</fpage>&#x02013;<lpage>2955</lpage>.<pub-id pub-id-type="doi">10.1121/1.401769</pub-id><pub-id pub-id-type="pmid">1787236</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watkins</surname> <given-names>A. J.</given-names></name> <name><surname>Makin</surname> <given-names>S. J.</given-names></name></person-group> (<year>1994</year>). <article-title>Perceptual compensation for speaker differences and for spectral-envelope distortion</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>96</volume>, <fpage>1263</fpage>&#x02013;<lpage>1282</lpage>.<pub-id pub-id-type="doi">10.1121/1.410275</pub-id><pub-id pub-id-type="pmid">7962994</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watkins</surname> <given-names>A. J.</given-names></name> <name><surname>Makin</surname> <given-names>S. J.</given-names></name></person-group> (<year>1996</year>). <article-title>Effects of spectral contrast on perceptual compensation for spectral-envelope distortions</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>99</volume>, <fpage>3749</fpage>&#x02013;<lpage>3757</lpage>.<pub-id pub-id-type="doi">10.1121/1.414981</pub-id><pub-id pub-id-type="pmid">8655806</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watkins</surname> <given-names>A. J.</given-names></name> <name><surname>Makin</surname> <given-names>S. J.</given-names></name></person-group> (<year>2007</year>). <article-title>Steady-spectrum contexts and perceptual compensation for reverberation in speech identification</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>121</volume>, <fpage>257</fpage>&#x02013;<lpage>266</lpage>.<pub-id pub-id-type="doi">10.1121/1.2387134</pub-id><pub-id pub-id-type="pmid">17297781</pub-id></citation></ref>
</ref-list>
</back>
</article>
