<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychology</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychology</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Research Foundation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2012.00238</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>On the Role of Theta-Driven Syllabic Parsing in Decoding Speech: Intelligibility of Speech with a Manipulated Modulation Spectrum</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ghitza</surname> <given-names>Oded</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001">&#x0002A;</xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Hearing Research Center, Boston University</institution> <country>Boston, MA, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Lucia Melloni, Max Planck Institute for Brain Research, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Ehud Ahissar, Weizmann Institute of Science, Israel; Peter Cariani, Harvard Medical School, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Oded Ghitza, Hearing Research Center, Boston University, 44 Cummington Street, Boston, MA 02215, USA. e-mail: <email>oghitza&#x00040;bu.edu</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Frontiers in Language Sciences, a specialty of Frontiers in Psychology.</p></fn>
</author-notes>
<pub-date pub-type="epreprint">
<day>31</day>
<month>05</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>16</day>
<month>07</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<volume>3</volume>
<elocation-id>238</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>05</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>06</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2012 Ghitza.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access" xlink:href="http://www.frontiersin.org/licenseagreement"><p>This is an open-access article distributed under the terms of the <uri xlink:href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License</uri>, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.</p></license>
</permissions>
<abstract>
<p>Recent hypotheses on the potential role of neuronal oscillations in speech perception propose that speech is processed on multi-scale temporal analysis windows formed by a cascade of neuronal oscillators locked to the input pseudo-rhythm. In particular, Ghitza (<xref ref-type="bibr" rid="B7">2011</xref>) proposed that the oscillators are in the theta, beta, and gamma frequency bands with the theta oscillator the master, tracking the input syllabic rhythm and setting a time-varying, hierarchical window structure synchronized with the input. In the study described here the hypothesized role of theta was examined by measuring the intelligibility of speech with a manipulated modulation spectrum. Each critical-band signal was manipulated by controlling the degree of temporal envelope flatness. Intelligibility of speech with critical-band envelopes that are flat is poor; inserting extra information, restricted to the input syllabic rhythm, markedly improves intelligibility. It is concluded that flattening the critical-band envelopes prevents the theta oscillator from tracking the input rhythm, hence the disruption of the hierarchical window structure that controls the decoding process. Reinstating the input-rhythm information revives the tracking capability, hence restoring the synchronization between the window structure and the input, resulting in the extraction of additional information from the flat modulation spectrum.</p>
</abstract>
<kwd-group>
<kwd>speech perception</kwd>
<kwd>intelligibility</kwd>
<kwd>syllabic parsing</kwd>
<kwd>modulation spectrum</kwd>
<kwd>cascaded neuronal oscillations</kwd>
<kwd>theta band</kwd>
<kwd>hierarchical window structure</kwd>
<kwd>synchronization</kwd>
</kwd-group>
<counts>
<fig-count count="10"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="16"/>
<page-count count="12"/>
<word-count count="7539"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction">
<title>Introduction</title>
<p>There is a remarkable correspondence between the time span of phonetic, syllabic, and phrasal units, on the one hand, and the frequency range of the gamma, beta, theta, and delta neuronal oscillations, on the other. Phonetic features (mean duration of 25&#x02009;ms) are associated with gamma (&#x0003E;40&#x02009;Hz) and beta (15&#x02013;35&#x02009;Hz) oscillations, syllables and words (mean duration of 250&#x02009;ms) with theta oscillations (4&#x02013;8&#x02009;Hz), and sequences of syllables and words embedded within a prosodic phrase (500&#x02013;2000&#x02009;ms) with delta oscillations (&#x0003C;3&#x02009;Hz). This correspondence inspired recent hypotheses on the potential role of neuronal oscillations in speech perception (e.g., Poeppel, <xref ref-type="bibr" rid="B12">2003</xref>; Ahissar and Ahissar, <xref ref-type="bibr" rid="B1">2005</xref>; Ghitza and Greenberg, <xref ref-type="bibr" rid="B8">2009</xref>; Ghitza, <xref ref-type="bibr" rid="B7">2011</xref>; Giraud and Poeppel, <xref ref-type="bibr" rid="B9">2012</xref>; Zion-Golumbic et al., <xref ref-type="bibr" rid="B16">2012</xref>). In particular, in an attempt to account for counterintuitive behavioral findings on the intelligibility of time-compressed speech as a function of &#x0201C;repackaging&#x0201D; rate (Ghitza and Greenberg, <xref ref-type="bibr" rid="B8">2009</xref>), Ghitza (<xref ref-type="bibr" rid="B7">2011</xref>) proposed a computation principle where the speech decoding process is controlled by a time-varying hierarchical window structure <italic>locked</italic> to the input pseudo-rhythm. The key property that enabled an explanation of the behavioral data was the capability of the window structure to stay <italic>synchronized</italic> with the input; performance is high so long as the oscillators are locked to the input rhythm (and within their intrinsic frequency range), and it drops once the oscillators are out of lock (e.g., hit their boundaries).</p>
<p>This computation principle was realized by the phenomenological model shown in Figure <xref ref-type="fig" rid="F1">1</xref>, termed <italic>Tempo</italic>. In this model the sensory stream is processed, simultaneously, by a <italic>parsing</italic> path and a <italic>decoding</italic> path, which correspond to the lower and upper parts of Figure <xref ref-type="fig" rid="F1">1</xref>. Conventional models of speech perception assume a strict decoding of the acoustic signal. The decoding path of Tempo conforms to this notion; the decoding process links chunks of sensory input with different durations with stored linguistic memory patterns. The additional, parsing path determines a temporal window structure (location and duration) that controls the decoding process. The windows are defined as the cycle-duration of oscillations in the theta, beta, and gamma frequency bands, all cascaded and locked to the pseudo-rhythmic speech input. The theta oscillator, capable of tracking the input syllabic rhythm, is the <italic>master</italic>; the other oscillators entrain to theta thus forming a hierarchy of analysis windows synchronized with the input. The theta oscillator provides <italic>syllabic</italic> parsing; assuming a perfect tracking, a theta cycle is aligned with a syllable that is often a [Vowel]&#x02013;[Consonant-cluster]&#x02013;[Vowel]. (This is so because the prominent energy peaks across the auditory channels, which presumably feed the theta tracker, are associated with vowels.) The <italic>phonetic</italic> temporal windows are determined by the cycles of the beta (entrained to theta). The rationale behind proposing the theta as the master oscillator is the robust presence of energy fluctuations in the 3- to 9-Hz range in the temporal auditory response to speech acoustics. Note that the term &#x0201C;parsing&#x0201D; as employed here does not refer to the exhaustive division of the incoming speech signal into candidate constituents, or even the inference of candidate constituents from the cues in the speech signal, but rather to the function of setting a time-varying hierarchical window structure synchronized to the speech input. (See Ghitza, <xref ref-type="bibr" rid="B7">2011</xref> for a detailed description of Tempo.)</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>A block diagram of the Tempo model</bold>. It comprises lower and upper paths that process the sensory stream generated by a model of the auditory periphery. The lower path extracts <bold>parsing</bold> information, which controls the <bold>decoding</bold> process performed in the upper path. The parsing is expressed in the form of a <italic>time-varying hierarchical window structure</italic>, realized as an array of cascaded oscillators locked to the input syllabic rhythm; the frequencies and relative phases of the oscillations determine the temporal windows (location and duration) that control the decoding process. See text for details.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g001.tif"/>
</fig>
<p>The purpose of the study reported here was to provide evidence for the hypothesized role of theta in syllabic parsing. This was achieved by measuring the intelligibility of speech with manipulated modulation spectrum. Two related observations, reported by Chait et al. (<xref ref-type="bibr" rid="B4">2005</xref>) and Saoud et al. (<xref ref-type="bibr" rid="B13">2012</xref>), are noteworthy. In both these studies critical-band envelopes were decomposed into low (0&#x02013;4&#x02009;Hz) and high (22&#x02013;40&#x02009;Hz) components, each carries syllabic or phonetic information, respectively. Subjects heard naturally spoken sentences (in Chait et al., <xref ref-type="bibr" rid="B4">2005</xref>), or words in isolation (in Saoud et al., <xref ref-type="bibr" rid="B13">2012</xref>), and were instructed to type the words heard. Importantly, correct scores of 18/41% (Chait et al., <xref ref-type="bibr" rid="B4">2005</xref>), and 20/95% (Saoud et al., <xref ref-type="bibr" rid="B13">2012</xref>), were reported for a high/low component, respectively, suggesting a dominant role of the low component (associated with the delta and theta rhythms). In the context of our study, a question arises whether this dominance should be attributed to the syllabic parsing function of the theta, to the improved performance of the decoding path (when presented with richer spectro-temporal information), or to both.</p>
<p>To sharpen the association between the error patterns and the underlying mechanism for their cause we propose different kind of manipulations. Two classes of stimuli are envisioned: Class-I stimuli, with acoustic-phonetic information sufficient for the function of the decoding path, but without information about the input syllabic rhythm; and Class-II stimuli, with extra information, restricted to the input syllabic rhythm alone, re-inserted into the modulation spectrum of Class-I stimuli. Intelligibility of Class-I stimuli is expected to be poor: the absence of input-rhythm information should prevent the theta oscillator from tracking the input rhythm, hence disrupting the hierarchical window structure that controls the decoding process. Improved intelligibility of Class-II stimuli could be attributed to the reinstatement of theta parsing capability, hence the recovery of synchronization between the window structure and the input, resulting in the extraction of additional information from the flat modulation spectrum.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Stimulus preparation</title>
<p>The study comprises two experiments distinguished by the signal processing strategy used for the manipulation of the modulation spectrum.</p>
<sec id="s3">
<title>Experiment I: peak position coding (PPC) of critical-band envelopes</title>
<p>The block diagram of the system is shown in Figure <xref ref-type="fig" rid="F2">2</xref>A. The input waveform is filtered by a bank of critical-band filters, which span the 230- to 3800-Hz frequency range. Each critical-band channel is processed in the same manner: the envelope of the filter&#x02019;s band-limited output is derived first (e.g., the Hilbert envelope), followed by an operator O. The output of the operator modulates a narrow-band noise carrier centered at the mid frequency range of the critical-band, with a bandwidth equal to critical-band and with instantaneous phase which minimizes noise-envelope fluctuations. (The goal is to have a carrier with a fine structure that is independent of the fine structure of the band-limited signal, to prevent a regeneration of the original critical-band envelope at the listener&#x02019;s cochlear output; Ghitza, <xref ref-type="bibr" rid="B6">2001</xref>.) The signal at the system output is the linear sum of all critical-band channels.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>(A)</bold> A block diagram of the peak position coding (PPC) system used in Experiment I. The signal at the system output is the linear sum of all critical-band channels. The core experimental conditions are defined by the operator O. <bold>(B)</bold> The <bold>Control</bold> condition. Operator O<sub>1</sub> is a low-pass filter, with a cutoff frequency of 10&#x02009;Hz. <bold>(C)</bold> The <bold>No-Theta (No&#x00398;)</bold> condition. Operator O<sub>2</sub> is a stop-band filter with a 2- to 9-Hz frequency gap. <bold>(D)</bold> The <bold>Channel-Theta (Ch&#x00398;)</bold> condition. Operator O<sub>3</sub> comprises a linear low-pass filter (cutoff frequency of 10&#x02009;Hz) followed by a peak picking operation. The pulse train represents, in a minimalistic form, the input-rhythm information at this frequency channel. See text for details.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g002.tif"/>
</fig>
<p>Five experimental conditions were tested:</p>
<list list-type="order">
<list-item><p>A <bold>Control</bold> condition, generated by operator O<sub>1</sub> (Figure <xref ref-type="fig" rid="F2">2</xref>B). This operator is a low-pass filter, with a cutoff frequency of 10&#x02009;Hz and a frequency response shown in the lower-left insert. The band-limited signal, the filtered envelope and the modulated noise carrier are shown at the upper-left, upper-right, and lower-right inserts, respectively. The filtered envelope contains acoustic-phonetic information sufficient for speech comprehension (e.g., Drullman et al., <xref ref-type="bibr" rid="B5">1994</xref>). As such, stimuli in this condition are expected to be highly intelligible. Figure <xref ref-type="fig" rid="F4">4</xref>B shows the waveform (left) and the spectrogram (right) of a Control stimulus. An MP3 file of the depicted stimulus is available for listening as Supplementary Materials.</p></list-item>
<list-item><p>A <bold>No-Theta (No&#x00398;)</bold> condition, generated by operator O<sub>2</sub> (Figure <xref ref-type="fig" rid="F2">2</xref>C). This operator is the stop-band filter shown in the lower-left insert, with a 2- to 9-Hz frequency gap. The band-limited signal, the filtered envelope, and the modulated noise carrier are shown at the upper-left, upper-right, and lower-right inserts, respectively. As reflected in the upper-right insert the filtered envelope does not contain syllable-rate fluctuations. As such, the nullification of the 2- to 9-Hz modulation-frequency band is expected to considerably reduce intelligibility. See example in Figure <xref ref-type="fig" rid="F4">4</xref>C.</p></list-item>
<list-item><p>A <bold>Channel-Theta (Ch&#x00398;)</bold> condition. Operators O<sub>1</sub> and O<sub>2</sub> are linear operators. Operator O<sub>3</sub>, depicted in Figure <xref ref-type="fig" rid="F2">2</xref>D, is non-linear in nature, comprises a linear low-pass filter (cutoff frequency of 10&#x02009;Hz) followed by a peak picking operation. The lower-left insert illustrates the operator; the upper trace shows the filtered envelope and the lower trace &#x02013; a sequence of pulses, Hamming-window shaped and identical in amplitude, located at the peaks of the filtered envelope. This operator is termed <italic>Peak Position Coding</italic> (or <italic>PPC</italic>), and the pulse train represents the input rhythm in a minimalistic form. The band-limited signal, the PPC envelope, and the modulated noise carrier are shown at the upper-left, upper-right, and lower-right inserts, respectively. Figure <xref ref-type="fig" rid="F3">3</xref>B illustrates the sparseness of the pulse distribution in the time-frequency plane for a 2-s long speech input, whose spectrogram is shown in Figure <xref ref-type="fig" rid="F3">3</xref>A. The stimuli in this condition are expected to be barely intelligible. See example in Figure <xref ref-type="fig" rid="F4">4</xref>D.</p></list-item>
<list-item><p>A <bold>No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;</bold> condition. This stimulus is a linear sum of a No&#x00398; stimulus plus the corresponding Ch&#x00398; stimulus. See example in Figure <xref ref-type="fig" rid="F4">4</xref>F.</p></list-item>
<list-item><p>A <bold>No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398;</bold> condition. Although much of the acoustic-phonetic information in Ch&#x00398; signals was removed, a residue of spectro-temporal information is still present (indicated by the curvature in the contours of Figure <xref ref-type="fig" rid="F3">3</xref>B). To further reduce the acoustic-phonetic information we replaced the channel-theta (Ch&#x00398;) with a <bold>Global-Theta (Glb&#x00398;)</bold> version: instead of placing pulses at the peak location of the filtered envelope per critical-band, a single pulse train is generated &#x02013; same for all channels &#x02013; with pulses at mid-vowel locations obtained by hand segmentation of the full-band signal (Figure <xref ref-type="fig" rid="F3">3</xref>C). Note that the location of the mid-vowel is loose, within a time interval in the order of a few pitch cycles. See example of Glb&#x00398; and No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; signals in Figures <xref ref-type="fig" rid="F4">4</xref>E,G respectively.</p></list-item>
</list>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Illustration of the PPC strategies used in Experiment I</bold>. <bold>(A)</bold> A spectrogram of the stimulus used for <bold>(B,C)</bold>. Abscissa is time and the ordinate is frequency, in linear scale. <bold>(B)</bold> A Ch&#x00398; strategy. Pulses are placed at the peak location of the filtered critical-band envelopes. Abscissa is time and the ordinate is critical-band (i.e., frequency in a critical-band scale). Note the curvature in the pulse contours, indicating a residue of spectro-temporal information. <bold>(C)</bold> A Glb&#x00398; strategy. A single pulse train is generated, same for all channels, with pulses at the mid-vowel locations obtained by hand-segmenting the full-band signal. Note that the location of the mid-vowel is loose, within a time interval in the order of a few pitch cycles.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Waveforms (left hand side) and spectrograms (right hand side) generated by the PPC system (Figure <xref ref-type="fig" rid="F2">2</xref>A)</bold>. The stimuli are 4&#x02009;kHz wide. The frequency range of the spectrograms is 0&#x02013;5&#x02009;kHz. MP3 files are available for listening as Supplementary Materials. <bold>(A)</bold> The unprocessed waveform. <bold>(B)</bold> A <bold>Control</bold> stimulus, generated by the system shown in Figure <xref ref-type="fig" rid="F2">2</xref>B. <bold>(C)</bold> A <bold>No&#x00398;</bold> (No-Theta) stimulus, generated by the system shown in Figure <xref ref-type="fig" rid="F2">2</xref>C. <bold>(D)</bold> A <bold>Ch&#x00398;</bold> (Channel-Theta) stimulus, generated by the system shown in Figure <xref ref-type="fig" rid="F2">2</xref>D. <bold>(E)</bold> A <bold>Glb&#x00398;</bold> (Global-Theta) stimulus, generated by replacing Ch&#x00398; pulses with a single pulse train &#x02013; same for all channels &#x02013; located at mid-vowel locations obtained by hand segmentation of the full band signal (Figure <xref ref-type="fig" rid="F3">3</xref>C). <bold>(F)</bold> A <bold>No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;</bold> stimulus &#x02013; a linear sum of a No&#x00398; stimulus plus the corresponding Ch&#x00398; stimulus. <bold>(G)</bold> A <bold>No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398;</bold> stimulus &#x02013; a linear sum of a No&#x00398; stimulus plus the corresponding Glb&#x00398; stimulus.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g004.tif"/>
</fig>
<p>MP3 files are available for listening as Supplementary Materials.</p>
</sec>
<sec id="s5">
<title>Experiment II: infinite-clipping (InfC) of critical-band signals</title>
<p>In a pursuit of better separation between the role of syllabic parsing and the role of decoding, we used yet another speech processing strategy shown in Figure <xref ref-type="fig" rid="F5">5</xref>. In Experiment I, No&#x00398; stimuli were contrasted with No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; stimuli. Here, the intelligibility of stimuli with varying degree of critical-band envelope flatness was measured. The system is a variation of a design used by Licklider and Pollack (<xref ref-type="bibr" rid="B11">1948</xref>); there, the full-band speech was passed through a 1-bit hard limiter (i.e., an infinite-clipping operation), resulting in highly intelligible stimuli. Here we generalize this principle, with the infinite-clipping operation applied to the critical-band signals (prior to summation).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>A block diagram of the infinite-clipping (InfC) system used in Experiment II</bold>. The signal at the system output is the linear sum of all critical-band channels. The system is a generalization of a design used by Licklider and Pollack (<xref ref-type="bibr" rid="B11">1948</xref>), where an infinite-clipping operator was applied to the full-band signal. Here, the infinite-clipping operation is applied to each critical-band output (prior to summation). The parameter Gain determines the degree of critical-band envelope flatness. See text for details.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g005.tif"/>
</fig>
<p>The critical-band filters span the 230- to 3800-Hz frequency range. All critical-band channels are processed in the same manner, as illustrated in Figure <xref ref-type="fig" rid="F6">6</xref>. Figures <xref ref-type="fig" rid="F6">6</xref>A&#x02013;C are same in both columns; Figure <xref ref-type="fig" rid="F6">6</xref>A shows the envelope of the band-limited signal, low-pass filtered to 10&#x02009;Hz. Figure <xref ref-type="fig" rid="F6">6</xref>B shows the binary output of an infinite-clipping operator operating on the band-limited signal, but only at intervals with envelope above a prescribed fixed threshold (Figure <xref ref-type="fig" rid="F6">6</xref>A, red horizontal line); those intervals are termed <italic>non-zero signal intervals</italic>. Figure <xref ref-type="fig" rid="F6">6</xref>C shows the binary output of an infinite-clipping operator operating on noise, only inside the gaps between non-zero signal intervals. The noise is centered at the mid frequency range of the critical-band, with a bandwidth equal to the critical-band. Figure <xref ref-type="fig" rid="F6">6</xref>D shows the sum of Figures <xref ref-type="fig" rid="F6">6</xref>B,C, with Gain-parameter values <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5 (left) and <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 (right). The channel output is the signal of Figure <xref ref-type="fig" rid="F6">6</xref>D band-pass filtered by a <italic>postfilter</italic>, identical to the channel critical-band filter. The envelope of the channel output (Figure <xref ref-type="fig" rid="F6">6</xref>E, red curve) is overlaid on top of the blue curve from Figure <xref ref-type="fig" rid="F6">6</xref>A.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>An illustration of the signal flow in one critical-band channel of the system in Figure <xref ref-type="fig" rid="F5">5</xref> for <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 (right column) and <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5 (left column)</bold>. <bold>(A&#x02013;C)</bold> are same in both columns. <bold>(A)</bold> The envelope of a band-limited signal, low-pass filtered to 10&#x02009;Hz. <bold>(B)</bold> InfC(S) is the binary output of an infinite-clipping operator, operating on <italic>non-zero signal intervals</italic> &#x02013; intervals where the envelope is above a prescribed fixed threshold [<bold>(A)</bold>, red horizontal line]. <bold>(C)</bold> InfC(N) is the binary output of an infinite-clipping operator operating on band-limited noise, only inside the gaps in-between non-zero signal intervals. <bold>(D)</bold> The sum of <bold>(B,C)</bold>, with Gain values <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5 (left) and <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 (right). The channel output is the signal of <bold>(D)</bold>, band-pass filtered by a <italic>postfilter</italic> identical to the channel critical-band filter. <bold>(E)</bold> The envelope of the channel output, in red, overlaid on top of the blue curve from <bold>(A)</bold>.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g006.tif"/>
</fig>
<p>A few properties of the system are worth noting here:</p>
<list list-type="order">
<list-item><p>It is known that the fine structure (e.g., the Hilbert phase) of a band-limited signal contains information on its temporal envelope (e.g., Voelcker, <xref ref-type="bibr" rid="B15">1966</xref>). It is also known that when a signal &#x02013; generated by flattening the temporal envelope of a band-limited signal while keeping the fine structure untouched &#x02013; is passed through a band-pass filter with a center frequency equal to the center frequency of the band-limited signal, the rich temporal envelope of the original signal is regenerated, to a large extent, at the output (Ghitza, <xref ref-type="bibr" rid="B6">2001</xref>). In our context, the binary output of the non-zero signal intervals maintains the fine structure of the signal itself. Therefore, when passed through the postfilter, envelope information is surfaced at the output. If the postfilter is viewed as a listener&#x02019;s cochlear channel at the corresponding frequency, then the postfilter output (red curve of Figure <xref ref-type="fig" rid="F6">6</xref>E) reflects the envelope at the listener&#x02019;s cochlear output. (Note that, strictly speaking, the critical-band filters are auditory filters derived from psychophysical data rather than cochlear filters derived from physiological measurements. In the context of our study, however, this difference is not relevant.)</p></list-item>
<list-item><p>The role of the postfilter is to limit the bandwidth of the binary signal in Figure <xref ref-type="fig" rid="F6">6</xref>D &#x02013; a wideband signal in nature &#x02013; to the critical-band frequency band. Without the inclusion of a postfilter, the summation of channels will result in the spill of noise from one channel onto neighboring channels, adding noise to the non-zero signal intervals of the neighboring channels. (This is so because the locations of the non-zero signal intervals are channel specific and not necessarily time aligned). The postfilter prevents such interference.</p></list-item>
<list-item><p>An important property of the system is that inside the non-zero signal intervals the critical-band binary output, and thus the envelope at the postfilter output (due to property 1), are the same for all <italic>G</italic> (this is so because binary noise, calibrated by <italic>G</italic>, is added only in gaps in-between the non-zero signal intervals). Therefore, the acoustic-phonetic information <italic>in the cochlear response</italic> to the system output signal (i.e., after summation) is also independent of <italic>G</italic> (due to property 2).</p></list-item>
<list-item><p>The degree of envelope flatness is controlled by the Gain parameter. For example, for <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 the envelope of the signal of Figure <xref ref-type="fig" rid="F6">6</xref>D is 100% flat; for <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5 the envelope exhibits input syllabic rhythm. As stated in property 3, the spectro-temporal information in the cochlear response to non-zero signal intervals is independent of gain. Therefore, any difference in intelligibility between two stimuli with different gain values (say <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 vs. <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5) could be attributed only to the recovery of syllabic parsing.</p></list-item>
<list-item><p>For <italic>G</italic>&#x02009;&#x0003D;&#x02009;1, the temporal envelope should convey zero information about the input syllabic rhythm. Even though sharp envelope fluctuations may occur at the boundaries of the non-zero signal intervals, the temporal envelope at the output of the postfilter (red curve of Figure <xref ref-type="fig" rid="F6">6</xref>E) conveys little information on the input rhythm.</p></list-item>
</list>
<p>In Experiment II four conditions were tested, with varying degree of flatness: <bold><italic>G</italic></bold>&#x02009;&#x0003D;&#x02009;<bold>0</bold> (zero flatness), <bold><italic>G</italic></bold>&#x02009;&#x0003D;&#x02009;<bold>1</bold> (100% flatness), and two in-between conditions, <bold><italic>G</italic></bold>&#x02009;&#x0003D;&#x02009;<bold>0.5</bold> and <bold><italic>G</italic></bold>&#x02009;&#x0003D;&#x02009;<bold>0.8</bold>. The intelligibility of <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 stimuli is expected to be poor; significant improvement is expected for <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5 and&#x02009;&#x0003D;&#x02009;0.8 stimuli.</p>
<p>See examples of <italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;1,&#x02009;&#x0003D;&#x02009;0.8, and&#x02009;&#x0003D;&#x02009;0.5 stimuli in Figures <xref ref-type="fig" rid="F7">7</xref>B&#x02013;E, respectively. MP3 files of the depicted stimuli are available for listening as Supplementary Materials.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>Waveforms (left hand side) and spectrograms (right hand side) generated by the InfC system (Figure <xref ref-type="fig" rid="F5">5</xref>)</bold>. The stimuli are 4&#x02009;kHz wide. The frequency range of the spectrograms is 0&#x02013;5&#x02009;kHz. MP3 files are available for listening as Supplementary Materials. Shown are the unprocessed waveform and four processed stimuli, with varying degree of critical-band envelope flatness controlled by the parameter <italic>G</italic>: <bold>(A)</bold> The unprocessed waveform. <bold>(B)</bold> A <bold><italic>G</italic>&#x02009;&#x0003D;&#x02009;0</bold> stimulus, with zero flatness. <bold>(C)</bold> A <bold><italic>G</italic>&#x02009;&#x0003D;&#x02009;1</bold> stimulus, with 100% flatness. <bold>(D)</bold> A <bold><italic>G</italic>&#x02009;&#x0003D;&#x02009;0.8</bold> stimulus, and <bold>(E)</bold> a <bold><italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5</bold> stimulus, with in between degree of flatness.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g007.tif"/>
</fig>
</sec>
</sec>
<sec>
<title>Subjects</title>
<p>All listening subjects were young adults (college students) educated in the U.S. with normal hearing. Ten subjects participated in Experiment I; different set of subjects, 10 in number, participated in Experiment II. Although the number of listeners is smaller than typical for this type of study, their results (as described in Section <xref ref-type="sec" rid="s1">&#x0201C;Results&#x0201D;</xref>) are consistent with each other.</p>
</sec>
<sec>
<title>Corpus</title>
<p>The experimental corpus comprised 100 digit strings spoken fluently by a male speaker. Each string is a seven-digit sequence and is approximately 2&#x02009;s long. It is uttered as a phone number in an American accent, i.e., a cluster of three digits followed by a cluster of four digits (for example: &#x0201C;two six two, seven O one eight&#x0201D;). It is a low perplexity corpus (a vocabulary of 11 words, 0&#x02013;9, and O) but without contextual information. For each signal-manipulation condition, 80 stimuli (out of 100) were chosen at random and concatenated in a sequence: [alert tone] [digit string] [5-s long silence gap] [alert tone] &#x02026;</p>
</sec>
<sec>
<title>Paradigm</title>
<p>Subjects performed the experiment in an isolated office environment (no other occupants) using headphones. There were two listening sessions for each signal-manipulation condition, <italic>Training</italic> and <italic>Testing</italic>. A training set contained 10 digit strings and a testing set contained 80 digit strings (approximately 12&#x02009;min to complete). Training preceded testing; in the training phase, subjects had to perform above a prescribed threshold before proceeding to the testing phase. Subjects were instructed to listen to a digit string <italic>once</italic> and, during the 5-s long gap following the stimulus, to type into an electronic file the <italic>last four digits</italic> heard, in the order presented (always four digits, even those that she/he was uncertain about). The rational behind choosing the last four digits as target (as opposed to choosing the entire seven-digit string) was twofolded. First, it was an attempt to provide the opportunity for the presumed (cortical) theta oscillator to entrain to the input-rhythm prior to the occurrence of the target words (recall the inherent rhythm in the stimuli, being a seven-digit phone number uttered in an American accent). Second, it aimed at reducing the bias of memory load on the error patterns.</p>
<p>The human-subjects protocol for this study was approved by the Institutional Review Board of Boston University.</p>
</sec>
</sec>
<sec id="s1">
<title>Results</title>
<sec id="s4">
<title>Analysis procedure</title>
<p>Data is presented in terms of <italic>error rate</italic> and <italic>normalized error rate</italic>. Error rate was calculated by using two distinct error metrics: (i) <italic>digit-error rate</italic>, defined as the number of digits erroneously registered divided by the total number of digits (i.e., 80&#x02009;&#x000D7;&#x02009;4&#x02009;&#x0003D;&#x02009;320 digits), in percent, and (ii) <italic>string-error rate</italic>, defined as the number of four-digit strings that &#x02013; as a whole &#x02013; were erroneously registered, divided by the total number of strings (i.e., 80 strings), in percent. Error rates were calculated per metric, per condition, per subject. The bar charts show the mean and the standard deviation across subjects. (The standard deviation here is the square root of the unbiased estimator of the variance.)</p>
<p>In order to exclude inter-subject variability, <italic>normalized error rates</italic> were calculated by normalizing the raw scores per metric, per subject, relative to the flat envelope condition (No&#x00398; in Experiment I; <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 in Experiment II). Normalized errors show the <italic>pattern</italic> of change in performance re flat envelope condition.</p>
<p>The conclusiveness of the experimental data was tested via analysis of variance (ANOVA; see Section <xref ref-type="sec" rid="s2">&#x0201C;Statistical Analysis&#x0201D;</xref>).</p>
</sec>
<sec id="s6">
<title>Experiment I</title>
<p>Figure <xref ref-type="fig" rid="F8">8</xref> shows the error rates (Figure <xref ref-type="fig" rid="F8">8</xref>A) and the normalized error rates (Figure <xref ref-type="fig" rid="F8">8</xref>B) for the Control, No&#x00398;, No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;, and No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; conditions. As expected, the intelligibility of the Control stimuli is near perfect [see item 1 of Section <xref ref-type="sec" rid="s3">&#x0201C;Experiment I: peak position coding (PPC) of critical-band envelopes&#x0201D;</xref>]. Comparing No&#x00398; to Control, errors jump from 0 to 7% (digit) and from 0 to 36% (string). For the task in hand (a low perplexity corpus and a limited memory load) such volume of errors is high. Re-inserting the Ch&#x00398; signal improves performance. Comparing No&#x00398; to No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;, errors drop from 7 to 4 (digit), and from 36 to 19 (string). Normalized No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; errors (re No&#x00398;) are 0.45 (digit) and 0.50 (string). Is the cause for this improvement the reactivation of the parsing path, the improved performance of the decoding path (due to extra spectro-temporal information still exists in Ch&#x00398;), or both?</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p><bold>Experiment I</bold>. Error rates <bold>(A)</bold> and Normalized error rates <bold>(B)</bold> for conditions Control, No&#x00398;, No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; and No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398;. Two error metrics, digit-, and string-error, are used (defined in Section <xref ref-type="sec" rid="s4">&#x0201C;Analysis Procedure&#x0201D;</xref>). Adding Ch&#x00398; or Glb&#x00398; stimulus to the corresponding No&#x00398; stimulus improves intelligibility. Normalized error patterns indicate strong degree of consistency across subjects for the No&#x00398; vs. No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; contrast. A drop in the degree of consistency is noticed for the No&#x00398; vs. No&#x00398;&#x02009;&#x0002B;&#x02009;Glb<bold>&#x00398;</bold> contrast. These observations are quantified in the analysis of variance (ANOVA) presented in Section <xref ref-type="sec" rid="s2">&#x0201C;Statistical Analysis&#x0201D;</xref> (see Table <xref ref-type="table" rid="T1">1</xref>).</p></caption>
<graphic xlink:href="fpsyg-03-00238-g008.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>A <italic>post hoc</italic> Tukey/Kramer test, performed independently on each of the data sets behind the figures listed in the left column, and per error-metric</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"/>
<th colspan="2" align="center">Error-metric<hr/></th>
</tr>
<tr>
<th align="left"/>
<th align="left">Digit</th>
<th align="left">String</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Figure <xref ref-type="fig" rid="F8">8</xref>A</td>
<td align="left">No&#x00398;, No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; and No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; are significantly different from Control</td>
<td align="left">Same</td>
</tr>
<tr>
<td align="left"/>
<td align="left">No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; is significantly different from No&#x00398;</td>
<td colspan="1" align="left"/>
</tr>
<tr>
<td align="left"/>
<td align="left">No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; has no significant difference from No&#x00398;</td>
<td colspan="1" align="left"/>
</tr>
<tr>
<td align="left"/>
<td align="left">No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; has no significant difference from No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;</td>
<td colspan="1" align="left"/>
</tr>
<tr>
<td align="left">Figure <xref ref-type="fig" rid="F8">8</xref>B</td>
<td align="left">Control, No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; and No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; are significantly different from each other</td>
<td align="left">Same</td>
</tr>
<tr>
<td align="left">Figure <xref ref-type="fig" rid="F9">9</xref></td>
<td align="left">Control, No&#x00398; and Ch&#x00398; are significantly different from each other</td>
<td align="left">Same</td>
</tr>
<tr>
<td align="left">Figure <xref ref-type="fig" rid="F10">10</xref>A</td>
<td align="left"><italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;0.5,&#x02009;&#x0003D;&#x02009;0.8 are significantly different from <italic>G</italic>&#x02009;&#x0003D;&#x02009;1</td>
<td align="left">Same</td>
</tr>
<tr>
<td align="left"/>
<td align="left">No significant difference among <italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;0.5,&#x02009;&#x0003D;&#x02009;0.8</td>
<td colspan="1" align="left"/>
</tr>
<tr>
<td align="left">Figure <xref ref-type="fig" rid="F10">10</xref>B</td>
<td align="left"><italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;0.5 are significantly different from <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.8</td>
<td align="left">Same</td>
</tr>
<tr>
<td align="left"/>
<td align="left">No significant difference among <italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;0.5</td>
<td colspan="1" align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
<p>Figure <xref ref-type="fig" rid="F9">9</xref> provides a partial answer to this question. It shows errors, in percent, for the Control, No&#x00398;, and Ch&#x00398; (alone) conditions, with 0, 7, and 12 (digit) and 0, 36, and 50 (string), respectively. Intelligibility of Ch&#x00398; stimuli is worse than that of No&#x00398; stimuli, yet performance is better than chance, suggesting the existence of acoustic-phonetic information residue in the Ch&#x00398; stimuli, and indicating an imperfect disassociation between the roles of parsing and decoding.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>Experiment I</bold>. error rates <bold>(A)</bold> and normalized error rates <bold>(B)</bold> for conditions control, No&#x00398;, and Ch&#x00398; (alone). Intelligibility of Ch&#x00398; stimuli is worse than that of No&#x00398; stimuli, yet performance is better than chance, suggesting the existence of acoustic-phonetic information residue in the Ch&#x00398; stimuli, and indicating an imperfect disassociation between the roles of parsing and decoding.</p></caption>
<graphic xlink:href="fpsyg-03-00238-g009.tif"/>
</fig>
<p>In contrast, Glb&#x00398; stimuli carry minimal (if any) acoustic-phonetic information. Comparing No&#x00398; to No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; (Figure <xref ref-type="fig" rid="F8">8</xref>), errors drop from 7 to 5 (digit), and from 36 to 29 (digit); the corresponding normalized errors (re No&#x00398;) are 0.78 (digit) and 0.82 (string). As discussed in item 5 of Section <xref ref-type="sec" rid="s3">&#x0201C;Experiment I: peak position coding (PPC) of critical-band envelopes,&#x0201D;</xref> this improvement is exclusively due to the recovered function of syllabic parsing.</p>
<p>Three observations are noteworthy. First, note the similarity in normalized error patterns between the digit- and string-error metrics. Second, the standard deviation bars in the normalized error patterns indicate strong degree of consistency across subjects for the No&#x00398; vs. No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; contrast. And third, a drop in the degree of consistency is noticed for the No&#x00398; vs. No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398; contrast. These observations will be quantified in the ANOVA presented in Section <xref ref-type="sec" rid="s2">&#x0201C;Statistical Analysis.&#x0201D;</xref></p>
</sec>
<sec id="s7">
<title>Experiment II</title>
<p>Figure <xref ref-type="fig" rid="F10">10</xref> shows the error rates (Figure <xref ref-type="fig" rid="F10">10</xref>A) and the normalized error rates (Figure <xref ref-type="fig" rid="F10">10</xref>B) for the conditions <italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;1,&#x02009;&#x0003D;&#x02009;0.8, and&#x02009;&#x0003D;&#x02009;0.5. As expected, the intelligibility of the <italic>G</italic>&#x02009;&#x0003D;&#x02009;0 stimuli is near perfect. Comparing <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 to <italic>G</italic>&#x02009;&#x0003D;&#x02009;0, errors jump from 0 to 4% (digit), and from 1 to 24% (string). For the task in hand (a low perplexity corpus and a limited memory load) such volume of error is high. Reducing the gain (i.e., reinstating critical-band envelope fluctuations) markedly improves intelligibility; comparing <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 to <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.8 (or <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5), errors drop from 4 to 1 (or 0)% (digit), and from 24 to 7 (or 1)% (string). Normalized errors (re <italic>G</italic>&#x02009;&#x0003D;&#x02009;1) are 0.07, 0.33, and 0.08 (digit), and 0.05, 0.33, and 0.06 (string), for <italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;0.8, and&#x02009;&#x0003D;&#x02009;0.5, respectively. As discussed in property 4 of Section <xref ref-type="sec" rid="s5">&#x0201C;Experiment II: infinite-clipping (InfC) of critical-band signals,&#x0201D;</xref> this improvement is exclusively due to the recovered function of syllabic parsing. Note the consistency of normalized error patterns across subjects, and the similarity in normalized error patterns between the digit- and string-error metrics. This observation is quantified in the ANOVA presented next.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p><bold>Experiment II</bold>. Error rates <bold>(A)</bold> and normalized error rates <bold>(B)</bold> for conditions <italic>G</italic>&#x02009;&#x0003D;&#x02009;0,&#x02009;&#x0003D;&#x02009;1,&#x02009;&#x0003D;&#x02009;0.8, and&#x02009;&#x0003D;&#x02009;0.5. Two error metrics, digit-, and string-error, are used (defined in Section <xref ref-type="sec" rid="s4">&#x0201C;Analysis Procedure&#x0201D;</xref>). Reducing gain (i.e., reinstating critical-band envelope fluctuations) markedly improves intelligibility. Improvement is due exclusively to the recovered function of syllabic parsing (see property 4 of Section <xref ref-type="sec" rid="s5">&#x0201C;Experiment II: infinite-clipping (InfC) of critical-band signals&#x0201D;</xref>). Note the consistency of normalized error patterns across subjects. These observations are quantified in the analysis of variance (ANOVA) presented in Section <xref ref-type="sec" rid="s2">&#x0201C;Statistical Analysis&#x0201D;</xref> (see Table <xref ref-type="table" rid="T1">1</xref>).</p></caption>
<graphic xlink:href="fpsyg-03-00238-g010.tif"/>
</fig>
</sec>
<sec id="s2">
<title>Statistical analysis</title>
<p>An ANOVA was used to quantify the statistical significance of the data illustrated in Figures <xref ref-type="fig" rid="F8">8</xref>&#x02013;<xref ref-type="fig" rid="F10">10</xref>. Three factors were used, <italic>system-type</italic> (PPC vs. InfC), <italic>condition</italic> ([Control/<italic>G</italic>&#x02009;&#x0003D;&#x02009;0] vs. [No&#x00398;/<italic>G</italic>&#x02009;&#x0003D;&#x02009;1] vs. [No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;/<italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5] vs. [No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398;/<italic>G</italic>&#x02009;&#x0003D;&#x02009;0.8]) and <italic>error-metric</italic> (digit vs. string). Note that the variables in the condition factor, in particular [No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398;/<italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5] and [No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398;/<italic>G</italic>&#x02009;&#x0003D;&#x02009;0.8], were lumped somewhat arbitrarily.</p>
<p>Mauchly&#x02019;s test for sphericity revealed that assumptions of sphericity were not violated. The three-way ANOVA revealed that there is a significant main effect of each of the factors: (i) system-type (<italic>F</italic>&#x02009;&#x0003D;&#x02009;49.08, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001), (ii) condition (<italic>F</italic>&#x02009;&#x0003D;&#x02009;46.19, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001), and (iii) error-metric (<italic>F</italic>&#x02009;&#x0003D;&#x02009;118.19, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001). It also revealed that there is a significant interaction between each of the pairs: (i) system-type and condition (<italic>F</italic>&#x02009;&#x0003D;&#x02009;7.05, <italic>p</italic>&#x02009;&#x0003D;&#x02009;0.0002), (ii) system-type and metric (<italic>F</italic>&#x02009;&#x0003D;&#x02009;22.36, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001), and (iii) condition and metric (<italic>F</italic>&#x02009;&#x0003D;&#x02009;21.27, <italic>p</italic>&#x02009;&#x0003C;&#x02009;0.0001). A <italic>post hoc</italic> Tukey/Kramer test, performed independently on each of the data sets and per error-metric, revealed the relationships summarized in Table <xref ref-type="table" rid="T1">1</xref>.</p>
<p>The ANOVA reinforces the trends observed in Sections <xref ref-type="sec" rid="s6">&#x0201C;Experiment I&#x0201D;</xref> and <xref ref-type="sec" rid="s7">&#x0201C;Experiment II,&#x0201D;</xref> i.e., that the intelligibility of stimuli with flat critical-band envelopes is poor, and that the addition of extra information, restricted to the input syllabic rhythm alone, significantly improves intelligibility.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>In the larger context, this study should be viewed as one more step toward the validation of a model of speech perception (Tempo) with a brain-rhythms function at the core. The model hypothesizes that the speech decoding process is controlled by a time-varying hierarchical window structure locked to the input pseudo-rhythm; and that the neuronal realization of the window structure is in the form of a cascade of oscillators, with theta as the master. Do behavioral data exist in support of this hypothesis? We are aware of one set of such data, Ghitza and Greenberg (<xref ref-type="bibr" rid="B8">2009</xref>), upon which Tempo was developed.</p>
<p>The present study focuses on providing psychophysical evidence for the presumed role of theta. The methodology exploited here was to compare the intelligibility of two distinct classes of stimuli which, ideally, would have the following properties. Class-I stimuli would carry acoustic-phonetic information sufficient for the function of the decoding path but, at the same time, would have zero information about the input rhythm, to neutralize the parsing path; members in this class are referred to as stimuli with &#x0201C;flat&#x0201D; critical-band envelopes. Class-II stimuli, termed stimuli with &#x0201C;fluctuating&#x0201D; critical-band envelopes, are the sum of two signals: the signal from the first class plus a signal that would carry information restricted to the input rhythm alone (i.e., without any acoustic-phonetic content), aiming at reactivating the parsing path. Two signal processing systems were used to generate these stimuli, the PPC system [see Section <xref ref-type="sec" rid="s3">&#x0201C;Experiment I: peak position coding (PPC) of critical-band envelopes&#x0201D;</xref>], and the InfC system [see Section <xref ref-type="sec" rid="s5">&#x0201C;Experiment II: infinite-clipping (InfC) of critical-band signals&#x0201D;</xref>]. The psychophysical task &#x02013; to listen to a seven-digit sentence (a phone telephone number) and to register the last four-digit sequence &#x02013; is one with low corpus perplexity and with limited memory load.</p>
<sec>
<title>Full-band vs. critical-bands</title>
<p>When considering the possible role of the temporal envelope of speech in speech perception, the term &#x0201C;envelope&#x0201D; is often refers to the envelope of the waveform itself, i.e., of the full-band signal (e.g., Ahissar et al., <xref ref-type="bibr" rid="B3">2001</xref>; Giraud and Poeppel, <xref ref-type="bibr" rid="B9">2012</xref>; Zion-Golumbic et al., <xref ref-type="bibr" rid="B16">2012</xref>). In this study, the signal processing manipulations are performed on the critical-band outputs. The sole acoustic input available to the brain is, by necessity, the information conveyed by the auditory nerve. In the context of studying cortical mechanisms for speech perception, a more insightful approach would be to consider the information available to the brain at the cochlear output level. To exemplify the benefit of such approach consider the difference between the time-frequency representations caricatured in Figures <xref ref-type="fig" rid="F3">3</xref>B,C. Figure <xref ref-type="fig" rid="F3">3</xref>B depicts a minimalistic representation of the input rhythm at the cochlear output while Figure <xref ref-type="fig" rid="F3">3</xref>C depicts the input rhythm at the waveform level. If the hypothesized role of theta (i.e., tracking the input syllabic rhythm) is correct, does the neuronal mechanism that tracks the input rhythm (e.g., a neuronal PLL circuitry; Ahissar et al., <xref ref-type="bibr" rid="B2">1997</xref>) exploits the information embedded in the curvature of the pulse contours of Figure <xref ref-type="fig" rid="F3">3</xref>B?</p>
</sec>
<sec id="s9">
<title>The PPC system vs. the InfC system</title>
<p>Chronologically, the PPC system was developed first. Conceptually, it aimed at improving the disassociation between the roles of parsing and decoding compared to the isolation provided by the strategy used by Chait et al. (<xref ref-type="bibr" rid="B4">2005</xref>) and Saoud et al. (<xref ref-type="bibr" rid="B13">2012</xref>). The InfC system emerged later, as a solution to shortcomings of the PPC system.</p>
<p>With the PPC system, Class-I stimuli are the No&#x00398; stimuli, composed of critical-band envelopes with null energy inside the 2- to 9-Hz modulation-frequency band. As such, these signals are stripped not only from input-rhythm information but also from a significant amount of acoustic-phonetic information (e.g., Houtgast and Steeneken, <xref ref-type="bibr" rid="B10">1985</xref>; Drullman et al., <xref ref-type="bibr" rid="B5">1994</xref>). One consequence of this deficiency is that the PPC strategy cannot be generalized to tasks with higher perplexity (see Section <xref ref-type="sec" rid="s8">&#x0201C;Perplexity of Corpus&#x0201D;</xref> below). With the InfC system, on the other hand, the acoustic-phonetic information inside non-zero signal intervals for the <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 stimuli, as seen at the listener&#x02019;s cochlear output, is the same as for the <italic>G</italic>&#x02009;&#x0003D;&#x02009;0 corresponding stimuli [see property 3 of Section <xref ref-type="sec" rid="s5">&#x0201C;Experiment II: infinite-clipping (InfC) of critical-band signals&#x0201D;</xref>], thus allowing a generalizing to tasks with higher perplexity.</p>
<p>With the PPC system Class-II stimuli, significantly more intelligible than the corresponding No&#x00398; stimuli, are the No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; stimuli. The existence of residual acoustic-phonetic information in the Ch&#x00398; signal [item 5 of Section <xref ref-type="sec" rid="s3">&#x0201C;Experiment I: peak position coding (PPC) of critical-band envelopes&#x0201D;</xref>] implies that association between the functions of parsing and decoding still exists, thus preventing a decisive validation of the role of input rhythm in syllabic parsing <italic>per se</italic>. With the InfC system, on the other hand, the acoustic-phonetic information inside non-zero signal intervals, as seen in the listener&#x02019;s cochlear output, remains the same for all values of <italic>G</italic>, e.g., for any prescribed degree of temporal envelope fluctuations [see property 3 of Section <xref ref-type="sec" rid="s5">&#x0201C;Experiment II: infinite-clipping (InfC) of critical-band signals&#x0201D;</xref>], implying a clear disassociation between the roles of parsing and decoding.</p>
</sec>
<sec id="s8">
<title>Perplexity of corpus</title>
<p>One drawback of using semantically plausible material (such as TIMIT or the Harvard-IEEE sentences) in perceptual studies that measure word error rate is the ability of listeners to guess some of the words using contextual information. Semantically unpredictable sentences (SUS) make it more difficult for the listener to decode individual words on the basis of semantic context. For this reason, Ghitza and Greenberg (<xref ref-type="bibr" rid="B8">2009</xref>) used SUS material in which the sentences conformed to standard rules of English grammar but were composed of word combinations that are semantically anomalous (e.g., &#x0201C;Where does the cost feel the low night?&#x0201D; and &#x0201C;The vast trade dealt the task&#x0201D;). The SUS is a high perplexity corpus.</p>
<p>In preliminary trials with the PPC system an attempt was made to use the SUS corpus. Informal listening revealed that, because of the extensive removal of acoustic-phonetic information in generating the No&#x00398; (see Section <xref ref-type="sec" rid="s9">&#x0201C;The PPC System vs. the InfC System&#x0201D;</xref>), No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; stimuli became hardly intelligible. This finding led to the selection of a seven-digit string corpus for this study &#x02013; a low perplexity corpus but without context. On the other hand, informal listening to SUS stimuli generated by the InfC system showed that for a range of <italic>G</italic> values the stimuli are intelligible. Extending Experiment II to SUS material is beyond the scope of this study.</p>
</sec>
<sec>
<title>Interpretations</title>
<p>The behavioral data presented here show that intelligibility is improved by reinstating the input rhythm, either by adding a Ch&#x00398; (or a Glb&#x00398;, to a lesser degree) signal to the No&#x00398; signal (Experiment I), or by reducing the Gain parameter from <italic>G</italic>&#x02009;&#x0003D;&#x02009;1 to <italic>G</italic>&#x02009;&#x0003D;&#x02009;0.5 or&#x02009;&#x0003D;&#x02009;0.8 (Experiment II). What neuronal mechanism is capable of exploiting the re-inserted information in such an effective manner?</p>
<p>Our interpretation of the data argues for a crucial role of theta in syllabic parsing. We suggest that for stimuli with flat critical-band envelopes the tracking mechanism of the neuronal theta oscillator is deactivated. Consequently, the hierarchical window structure is situated in idle mode, i.e., not synchronized with the input, resulting in deterioration in performance. Reinstating the input-rhythm information (either by the PPC or by the InfC system) revives theta tracking hence the recovery of the synchronization between the window structure and the input, resulting in improved performance. This interpretation is consistent with the capability of the Tempo model to account for the behavioral data of Ghitza and Greenberg (<xref ref-type="bibr" rid="B8">2009</xref>), as detailed in Ghitza (<xref ref-type="bibr" rid="B7">2011</xref>).</p>
<p>Another possible interpretation of the data is in line with the classical view that assumes a direct role, <italic>in the decoding process per se</italic>, of neural onset responses to acoustic edges such as CV boundaries. According to this view phones are extracted from waveform segments (dyads) &#x0201C;centered&#x0201D; at markers triggered by acoustic edges. (It is worth recalling that a dyad is the acoustic reflection of the dynamic gesture of the articulators while moving from one phone to the next.) This view is supported by psychophysical reasoning, inferred from acoustics, on the importance of acoustically abrupt landmarks to speech perception (e.g., Stevens, <xref ref-type="bibr" rid="B14">2002</xref>). Indeed, re-inserting the input rhythm with the PPC system produce signals with acoustic edges (No&#x00398;&#x02009;&#x0002B;&#x02009;Ch&#x00398; and No&#x00398;&#x02009;&#x0002B;&#x02009;Glb&#x00398;; Figures <xref ref-type="fig" rid="F4">4</xref>F,G), which enable activation of neuronal circuitry sensitive to edges. Yet, two observations pose a challenge to the hypothesis that these edges are part of the decoding process: (i) Re-inserting Glb&#x00398; signals, with pulses situated either at CV boundaries or at mid vowels, resulted in similar level of performance, and (ii) Reinstating temporal fluctuations by using the InfC system produce signals with a marginal presence of acoustic edges (see Figures <xref ref-type="fig" rid="F7">7</xref>D,E), yet with high intelligibility. These observations imply that the <italic>precise</italic> location of neural responses triggered by acoustic edges have little or no effect on intelligibility.</p>
<p>We suggest a different role to acoustic landmarks, in line with our interpretation of the data, in which they are part of the parsing (rather than decoding) process. We argue that neural responses triggered by acoustic landmarks serve as input to the mechanism that tracks the input syllabic rhythm. Note that this hypothesis includes all classes of acoustic landmarks (e.g., vocalic landmarks, glide landmarks, acoustically abrupt landmarks; Stevens, <xref ref-type="bibr" rid="B14">2002</xref>). Given the prominence of vocalic speech segments in the presence of environmental noise, it may be that vocalic landmarks are more important than others in securing a reliable theta tracking. If so, a theta cycle is robustly aligned with [Vowel]&#x02013;[Consonant-cluster]&#x02013;[Vowel] syllables, and phones are decoded in temporal windows defined by the cycles of the beta, entrained to theta (Ghitza, <xref ref-type="bibr" rid="B7">2011</xref>).</p>
</sec>
</sec>
<sec>
<title>Conclusion</title>
<p>Intelligibility (in terms of digit and string-error rates) of the last four digits in seven-digit sequences was measured as a function of judiciously manipulated changes in critical-bands envelope flatness, while attending to the disassociation between parsing and decoding. We found that the intelligibility of stimuli with flat critical-band envelopes is poor. The addition of extra information, restricted to the input syllabic rhythm, markedly improves intelligibility. We suggest that flattening the critical-band envelopes prevents the theta oscillator from tracking the speech pseudo-rhythm, hence disrupting the function of syllabic parsing. We argue that by reinstating the input-rhythm information the tracking capability of the theta oscillator is restored, hence the recovery of synchronization between the input and the hierarchical window structure, which governs the decoding process.</p>
<p>In conclusion, this study provides empirical support for the hypothesized role of theta in syllabic parsing. It provides a further support to the hypothesis that neuronal oscillations are important in processing and decoding spoken language. No hypothesis about internal physiological processes can be fully validated using only psychophysical methods. Nevertheless, the perceptual consequences of the acoustic manipulations used in this study suggest a potential role for neuronal oscillations in speech perception and establishes a behavioral context for future brain-imaging experiments using comparable speech material.</p>
</sec>
<sec>
<title>Conflict of Interest Statement</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at <uri xlink:href="http://www.frontiersin.org/Language_Sciences/10.3389/fpsyg.2012.00238/abstract">http://www.frontiersin.org/Language_Sciences/10.3389/fpsyg.2012.00238/abstract</uri></p>
</sec>
</body>
<back>
<ack>
<p>This study was funded by a research grant from the Air Force Office of Scientific Research. I would like to thank Willard Larkin and the two reviewers for providing detailed and constructive suggestions, which made the revised version stronger than the original. I would also like to thank Keith Doelling for provided valuable assistance with the statistical analyses.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ahissar</surname> <given-names>E.</given-names></name> <name><surname>Ahissar</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>&#x0201C;Processing of the temporal envelope of speech,&#x0201D;</article-title> in <source>The Auditory Cortex. A Synthesis of Human and Animal Research</source>, Chap. 18, eds <person-group person-group-type="editor"><name><surname>Konig</surname> <given-names>R.</given-names></name> <name><surname>Heil</surname> <given-names>P.</given-names></name> <name><surname>Bundinger</surname> <given-names>E.</given-names></name> <name><surname>Scheich</surname> <given-names>H.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Lawrence Erlbaum</publisher-name>), <fpage>295</fpage>&#x02013;<lpage>313</lpage>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahissar</surname> <given-names>E.</given-names></name> <name><surname>Haidarliu</surname> <given-names>S.</given-names></name> <name><surname>Zacksenhouse</surname> <given-names>M.</given-names></name></person-group> (<year>1997</year>). <article-title>Decoding temporally encoded sensory input by cortical oscillations and thalamic phase comparators</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>94</volume>, <fpage>11633</fpage>&#x02013;<lpage>11638</lpage>.<pub-id pub-id-type="doi">10.1073/pnas.94.21.11633</pub-id><pub-id pub-id-type="pmid">9326662</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahissar</surname> <given-names>E.</given-names></name> <name><surname>Nagarajan</surname> <given-names>S.</given-names></name> <name><surname>Ahissar</surname> <given-names>M.</given-names></name> <name><surname>Protopapas</surname> <given-names>A.</given-names></name> <name><surname>Mahncke</surname> <given-names>H.</given-names></name> <name><surname>Merzenich</surname> <given-names>M. M.</given-names></name></person-group> (<year>2001</year>). <article-title>Speech comprehension is correlated with temporal response patterns recorded from auditory cortex</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>98</volume>, <fpage>13367</fpage>&#x02013;<lpage>13372</lpage>.<pub-id pub-id-type="doi">10.1073/pnas.221461598</pub-id><pub-id pub-id-type="pmid">11698688</pub-id></citation></ref>
<ref id="B4"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Chait</surname> <given-names>M.</given-names></name> <name><surname>Greenberg</surname> <given-names>S.</given-names></name> <name><surname>Arai</surname> <given-names>T.</given-names></name> <name><surname>Simon</surname> <given-names>J.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2005</year>). <article-title>Two time scales in speech processing</article-title>. <conf-name>Paper Presented at ISCA Workshop on Plasticity in Speech Perception</conf-name>, <conf-loc>London</conf-loc>.</citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drullman</surname> <given-names>R.</given-names></name> <name><surname>Festen</surname> <given-names>J. M.</given-names></name> <name><surname>Plomp</surname> <given-names>R.</given-names></name></person-group> (<year>1994</year>). <article-title>Effect of temporal envelope smearing on speech reception</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>95</volume>, <fpage>1053</fpage>&#x02013;<lpage>1106</lpage>.<pub-id pub-id-type="doi">10.1121/1.408825</pub-id><pub-id pub-id-type="pmid">8132899</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghitza</surname> <given-names>O.</given-names></name></person-group> (<year>2001</year>). <article-title>On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>110</volume>, <fpage>1628</fpage>&#x02013;<lpage>1640</lpage>.<pub-id pub-id-type="doi">10.1121/1.1396325</pub-id><pub-id pub-id-type="pmid">11572372</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghitza</surname> <given-names>O.</given-names></name></person-group> (<year>2011</year>). <article-title>Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm</article-title>. <source>Front. Psychol.</source> <volume>2</volume>:<fpage>130</fpage>.<pub-id pub-id-type="doi">10.3389/fpsyg.2011.00130</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghitza</surname> <given-names>O.</given-names></name> <name><surname>Greenberg</surname> <given-names>S.</given-names></name></person-group> (<year>2009</year>). <article-title>On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence</article-title>. <source>Phonetica</source> <volume>66</volume>, <fpage>113</fpage>&#x02013;<lpage>126</lpage>.<pub-id pub-id-type="doi">10.1159/000208934</pub-id><pub-id pub-id-type="pmid">19390234</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giraud</surname> <given-names>A. L.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Cortical oscillations and speech processing: emerging computational principles and operations</article-title>. <source>Nat. Neurosci.</source> <volume>15</volume>, <fpage>511</fpage>&#x02013;<lpage>517</lpage>.<pub-id pub-id-type="doi">10.1038/nn.3063</pub-id><pub-id pub-id-type="pmid">22426255</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Houtgast</surname> <given-names>T.</given-names></name> <name><surname>Steeneken</surname> <given-names>H. J. M.</given-names></name></person-group> (<year>1985</year>). <article-title>A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>77</volume>, <fpage>1069</fpage>&#x02013;<lpage>1077</lpage>.<pub-id pub-id-type="doi">10.1121/1.392224</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Licklider</surname> <given-names>J. C. R.</given-names></name> <name><surname>Pollack</surname> <given-names>I. P.</given-names></name></person-group> (<year>1948</year>). <article-title>Effects of differentiation, integration and infinite peak clipping upon the intelligibility of speech</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>20</volume>, <fpage>42</fpage>&#x02013;<lpage>51</lpage>.<pub-id pub-id-type="doi">10.1121/1.1916995</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>The analysis of speech in different temporal integration windows: cerebral lateralization as &#x02018;asymmetric sampling in time.&#x02019;</article-title> <source>Speech Commun.</source> <volume>41</volume>, <fpage>245</fpage>&#x02013;<lpage>255</lpage>.<pub-id pub-id-type="doi">10.1016/S0167-6393(02)00107-3</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saoud</surname> <given-names>H.</given-names></name> <name><surname>Josse</surname> <given-names>G.</given-names></name> <name><surname>Bertasi</surname> <given-names>E.</given-names></name> <name><surname>Truy</surname> <given-names>E.</given-names></name> <name><surname>Chait</surname> <given-names>M.</given-names></name> <name><surname>Giraud</surname> <given-names>A. L.</given-names></name></person-group> (<year>2012</year>). <article-title>Brain-speech alignment enhances auditory cortical responses and speech perception</article-title>. <source>J. Neurosci.</source> <volume>32</volume>, <fpage>275</fpage>&#x02013;<lpage>281</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.3970-11.2012</pub-id><pub-id pub-id-type="pmid">22219289</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevens</surname> <given-names>K. N.</given-names></name></person-group> (<year>2002</year>). <article-title>Toward a model for lexical access based on acoustic landmarks and distinctive features</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>111</volume>, <fpage>1872</fpage>&#x02013;<lpage>1891</lpage>.<pub-id pub-id-type="doi">10.1121/1.1479021</pub-id><pub-id pub-id-type="pmid">12002871</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Voelcker</surname> <given-names>H. B.</given-names></name></person-group> (<year>1966</year>). <article-title>Towards a unified theory of modulation. I. Phase envelope relationships</article-title>. <source>Proc. IEEE</source> <volume>54</volume>, <fpage>340</fpage>&#x02013;<lpage>354</lpage>.<pub-id pub-id-type="doi">10.1109/PROC.1966.4695</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zion-Golumbic</surname> <given-names>E. M.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name> <name><surname>Schroeder</surname> <given-names>C. E.</given-names></name></person-group> (<year>2012</year>). <article-title>Temporal context in speech processing and attentional stream selection: a behavioral and neural perspective</article-title>. <source>Brain Lang</source>.<pub-id pub-id-type="doi">10.1016/j.bandl.2011.12.010</pub-id><pub-id pub-id-type="pmid">22285024</pub-id></citation></ref>
</ref-list>
</back>
</article>
