<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title>Frontiers in Computational Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5188</issn>
<publisher>
<publisher-name>Frontiers Research Foundation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fncom.2010.00126</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Demixing Population Activity in Higher Cortical Areas</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Machens</surname> <given-names>Christian K.</given-names></name>
<xref ref-type="author-notes" rid="fn001">&#x0002A;</xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Group for Neural Theory, INSERM Unit&#x000E9; 960, D&#x000E9;partement d&#x00027;Etudes Cognitives, &#x000C9;cole Normale Sup&#x000E9;rieure</institution> <country>Paris, France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Jakob H. Macke, University College London, UK</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Byron Yu, Carnegie Mellon University, USA; Satish Iyengar, University of Pittsburgh, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Christian K. Machens, D&#x000E9;partement d&#x00027;Etudes Cognitives, &#x000C9;cole Normale Sup&#x000E9;rieure, Paris, France. e-mail: <email>christian.machens&#x00040;ens.fr</email></p></fn>
</author-notes>
<pub-date pub-type="epreprint">
<day>19</day>
<month>11</month>
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>06</day>
<month>10</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="collection">
<year>2010</year>
</pub-date>
<volume>4</volume>
<elocation-id>126</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>11</month>
<year>2009</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>07</month>
<year>2010</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2010 Machens.</copyright-statement>
<copyright-year>2010</copyright-year>
<license license-type="open-access" xlink:href="http://www.frontiersin.org/licenseagreement"><p>This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.</p></license>
</permissions>
<abstract>
<p>Neural responses in higher cortical areas often display a baffling complexity. In animals performing behavioral tasks, single neurons will typically encode several parameters simultaneously, such as stimuli, rewards, decisions, etc. When dealing with this large heterogeneity of responses, cells are conventionally classified into separate response categories using various statistical tools. However, this classical approach usually fails to account for the distributed nature of representations in higher cortical areas. Alternatively, principal component analysis (PCA) or related techniques can be employed to reduce the complexity of a data set while retaining the distributional aspect of the population activity. These methods, however, fail to explicitly extract the task parameters from the neural responses. Here we suggest a coordinate transformation that seeks to ameliorate these problems by combining the advantages of both methods. Our basic insight is that variance in neural firing rates can have different origins (such as changes in a stimulus, a reward, or the passage of time), and that, instead of lumping them together, as PCA does, we need to treat these sources separately. We present a method that seeks an orthogonal coordinate transformation such that the variance captured from different sources falls into orthogonal subspaces and is maximized within these subspaces. Using simulated examples, we show how this approach can be used to demix heterogeneous neural responses. Our method may help to lift the fog of response heterogeneity in higher cortical areas.</p>
</abstract>
<kwd-group>
<kwd>prefrontal cortex</kwd>
<kwd>population code</kwd>
<kwd>principal component analysis</kwd>
<kwd>multi-electrode recordings</kwd>
<kwd>blind source separation</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="0"/>
<equation-count count="42"/>
<ref-count count="35"/>
<page-count count="8"/>
<word-count count="7646"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction">
<title>Introduction</title>
<p>Higher-order cortical areas such as the prefrontal cortex receive and integrate information from many other areas of the brain. The activity of neurons in these areas often reflects this mix of influences. Typical neural responses are shaped both by the internal dynamics of these systems as well as by various external events such as the perception of a stimulus or a reward (Rao et al., <xref ref-type="bibr" rid="B25">1997</xref>; Romo et al., <xref ref-type="bibr" rid="B26">1999</xref>; Brody et al., <xref ref-type="bibr" rid="B5">2003</xref>; Averbeck et al., <xref ref-type="bibr" rid="B1">2006</xref>; Feierstein et al., <xref ref-type="bibr" rid="B9">2006</xref>; Gold and Shadlen, <xref ref-type="bibr" rid="B11">2007</xref>; Seo et al., <xref ref-type="bibr" rid="B29">2009</xref>). As a result, neural responses are extremely complex and heterogeneous, even in animals that are performing relatively facile tasks such as simple stimulus&#x02013;response associations (Gold and Shadlen, <xref ref-type="bibr" rid="B11">2007</xref>).</p>
<p>To make sense of these data, researchers typically seek to relate the firing rate of a neuron to one of various experimentally controlled task parameters, such as a sensory stimulus, a reward, or a decision that an animal takes. To this end, a number of statistical tools are exploited such as regression (Romo et al., <xref ref-type="bibr" rid="B27">2002</xref>; Brody et al., <xref ref-type="bibr" rid="B5">2003</xref>; Sugrue et al., <xref ref-type="bibr" rid="B32">2004</xref>; Kiani and Shadlen, <xref ref-type="bibr" rid="B16">2009</xref>; Seo et al., <xref ref-type="bibr" rid="B29">2009</xref>), signal detection theory (Feierstein et al., <xref ref-type="bibr" rid="B9">2006</xref>; Kepecs et al., <xref ref-type="bibr" rid="B15">2008</xref>), or discriminant analysis (Rao et al., <xref ref-type="bibr" rid="B25">1997</xref>). The population response is then characterized by quantifying how each neuron in the population responds to a particular task parameter. Subsequently, neurons can be attributed to different (possibly overlapping) response categories, and population responses can be constructed by averaging the time-varying firing rates within such a category.</p>
<p>This classical, single-cell based approach to electrophysiological population data has been quite successful in clarifying what information neurons in higher-order cortical areas represent. However, the approach rarely succeeds in giving a complete account of the recorded activity on the population level. For instance, many interesting features of the population response may go unnoticed if they have not been explicitly looked for. Furthermore, the strongly distributional nature of the population response, in which individual neurons can be responsive to several task parameters at once, is often left in the shadows.</p>
<p>Principal component analysis (PCA) and other dimensionality reduction techniques seek to alleviate these problems by providing methods that summarize neural activity at the population level (Nicolelis et al., <xref ref-type="bibr" rid="B23">1995</xref>; Friedrich and Laurent, <xref ref-type="bibr" rid="B10">2001</xref>; Zacksenhouse and Nemets, <xref ref-type="bibr" rid="B35">2008</xref>; Yu et al., <xref ref-type="bibr" rid="B34">2009</xref>; Machens et al., <xref ref-type="bibr" rid="B17">2010</xref>). However, such &#x0201C;unsupervised&#x0201D; techniques will usually neglect information about the relevant task variables. While the methods do provide a succinct and complete description of the population response, the description may yield only limited insights into how different task parameters are represented in the population of neurons.</p>
<p>In this paper, we propose an exploratory data analysis method that seeks to maintain the major benefits of PCA while also extracting the relevant task variables from the data. The primary goal of our method is to improve on dimensionality reduction techniques by explicitly taking knowledge about task parameters into account. The method has previously been applied to data from the prefrontal cortex to separate stimulus- from time-related activities (Machens et al., <xref ref-type="bibr" rid="B17">2010</xref>). Here, we describe the method in greater detail, derive it from first principles, investigate its performance under noise, and generalize it to more than two task parameters. Our hope is that this method provides a better visualization of a given data set, thereby yielding new insights into the function of higher-order areas. We will first explain the main ideas in the context of a simple example, then show how these ideas can be generalized, and finally discuss some caveats and limitations of our approach.</p>
</sec>
<sec>
<title>Results</title>
<sec>
<title>Response heterogeneity through linear mixing</title>
<p>Recordings from higher-order areas in awake behaving animals often yield a large variety of neural responses (see e.g., Miller, <xref ref-type="bibr" rid="B19">1999</xref>; Churchland and Shenoy, <xref ref-type="bibr" rid="B6">2007</xref>; Jun et al., <xref ref-type="bibr" rid="B14">2010</xref>; Machens et al., <xref ref-type="bibr" rid="B17">2010</xref>). These observations at the level of individual cells could imply a complicated and intricate response at the population level for which a simplified description does not exist. Alternatively, the large heterogeneity of responses may be the result of a simple mixing procedure. For instance, response variety can come about if the responses of individual neurons are random, linear mixtures of a few generic response components (see e.g., Eliasmith and Anderson, <xref ref-type="bibr" rid="B8">2003</xref>).</p>
<p>To illustrate this insight, we will construct a simple toy model. Imagine an animal which performs a two-alternative-forced choice task (Newsome et al., <xref ref-type="bibr" rid="B22">1989</xref>; Uchida and Mainen, <xref ref-type="bibr" rid="B33">2003</xref>). In each trial of such a task, the animal receives a sensory stimulus, <italic>s</italic>, and then makes a binary decision, <italic>d</italic>, based on whether <italic>s</italic> falls into one of two response categories. If the animal decides correctly, it receives a reward. We will assume that the activity of the neurons in our toy model depends only on the stimulus <italic>s</italic> and the decision <italic>d</italic>.</p>
<p>To obtain response heterogeneity, we construct the response of each neuron as a random, linear mixture of two underlying response components, one that represents the stimulus, <italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>), and one that represents the decision, <italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>), see Figure <xref ref-type="fig" rid="F1">1</xref>A. The time-varying firing rate of neuron <italic>i</italic> is then given by</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="m1"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mo>&#x003B7;</mml:mo><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Here, the parameters <italic>a</italic><sub><italic>i</italic>1</sub> and <italic>a</italic><sub><italic>i</italic>2</sub> are the mixing coefficients of the neuron, the bias parameter <italic>c<sub>i</sub></italic> describes a constant offset, and the term &#x003B7;<italic><sub>i</sub></italic>(t) denotes additive, white noise. We assume that the noise of different neurons can be correlated so that</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="m2"><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:msub><mml:mo>&#x003B7;</mml:mo><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mo>&#x003B7;</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003C4;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x003B4;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003C4;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the angular brackets denote averaging over time, and <italic>H<sub>ij</sub></italic> is the noise covariance between neuron <italic>i</italic> and <italic>j</italic>. We will assume that there are <italic>N</italic> neurons and, for notational compactness, we will assemble their activities into one large vector, <bold>r</bold>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>)&#x02009;&#x0003D;&#x02009;(<italic>r</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>),&#x02026;,<italic>r<sub>N</sub></italic>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>))<sup><italic>T</italic></sup>. After doing the same for the mixing coefficients, the constant offset, and the noise, we can write equivalently,</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="m3"><mml:mrow><mml:mtext mathvariant="bold">r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mtext mathvariant="bold">a</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mtext mathvariant="bold">a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mtext mathvariant="bold">c</mml:mtext><mml:mo>+</mml:mo><mml:mtext mathvariant="bold">n</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Mixing and demixing of neural responses in a simulated two-alternative forced choice task</bold>. <bold>(A)</bold> We assume that neural responses are linear mixtures of two underlying components, one of which encodes the stimulus (left, colors representing different stimuli), and one of which encodes the binary decision (right) of a two-alternative-forced choice task. For concreteness, we assume that the task comprised <italic>M<sub>s</sub></italic>&#x02009;&#x0003D;&#x02009;8 stimuli and <italic>M<sub>d</sub></italic>&#x02009;&#x0003D;&#x02009;2 decisions. <bold>(B)</bold> Single cell responses are random combinations of these two components. We assume that <italic>N</italic>&#x02009;&#x0003D;&#x02009;50 neurons have been recorded, five of which are shown here. The noisy variability of the responses was obtained by transforming the deterministic, linear mixture of each neuron into 10 inhomogeneous Poisson spike trains, and then re-estimating the firing rates by low-pass filtering and averaging the spike trains. This type of noise may be considered more realistic, even if it deviates from the assumptions in the main text. In our numerical example, this did not prove to be a problem. To systematically address such problems, however, one may apply a variance-stabilizing transformation to the data, such as taking the square-root of the firing rates before computing the covariance matrix (see e.g., Efron, <xref ref-type="bibr" rid="B7">1982</xref>). <bold>(C)</bold> PCA uncovers the underlying two-dimensionality of the data, but the resulting coordinates do not demix the separate sources of firing rate variance. <bold>(D)</bold> By explicitly contrasting these separate sources, we can retrieve the original components up to a sign.</p></caption>
<graphic xlink:href="fncom-04-00126-g001.tif"/>
</fig>
<p>Without loss of generality, we can furthermore assume that the mixing coefficients are normalized so that <inline-formula><mml:math id="m4"><mml:mrow><mml:msubsup><mml:mtext>a</mml:mtext><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mtext>a</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> for <italic>i</italic>&#x02208;{1,2}. Since we assume that the mixing coefficients are drawn at random, and independently of each other, the first and second coefficient will be uncorrelated, so that on average, <inline-formula><mml:math id="m5"><mml:mrow><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, implying that <bold>a</bold><sub>1</sub> and <bold>a</bold><sub>2</sub> are approximately orthogonal.</p>
<p>With this formulation, individual neural responses mix information about the stimulus <italic>s</italic> and the decision <italic>d</italic>, leading to a variety of responses, as shown in Figure <xref ref-type="fig" rid="F1">1</xref>B. While with only two underlying components, the overall heterogeneity of responses remains limited, the response heterogeneity increases strongly when more components are allowed (see Figures <xref ref-type="fig" rid="F3">3</xref>A,B for an example with three components).</p>
</sec>
<sec>
<title>Principal component analysis fails to demix the responses</title>
<p>The standard approach to deal with such data sets is to sort cells into categories. In our example, this approach may yield two overlapping categories of cells, one for cells that respond to the stimulus and one for cells that respond to the decision. While this approach tracks down which variables are represented in the population, it will fail to quantify the exact nature of the population activity, such as the precise co-evolution of the neural population activity over time.</p>
<p>A common approach to address these types of problems are dimensionality reduction methods such as PCA (Nicolelis et al., <xref ref-type="bibr" rid="B23">1995</xref>; Friedrich and Laurent, <xref ref-type="bibr" rid="B10">2001</xref>; Hastie et al., <xref ref-type="bibr" rid="B12">2001</xref>; Zacksenhouse and Nemets, <xref ref-type="bibr" rid="B35">2008</xref>; Machens et al., <xref ref-type="bibr" rid="B17">2010</xref>). The main aim of PCA is to find a new coordinate system in which the data can be represented in a more succinct and compact fashion. In our toy example, even though we may have many neurons with different responses (<italic>N</italic>&#x02009;&#x0003D;&#x02009;50 in Figure <xref ref-type="fig" rid="F1">1</xref>, with five examples shown in Figure <xref ref-type="fig" rid="F1">1</xref>B), the activity of each neuron can be represented by a linear combination of only two components. In the <italic>N</italic>-dimensional space of neural activities, the two components, <italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>) and <italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>), can be viewed as two coordinates of a coordinate system whose axes are given by the vectors of mixing coefficients, <bold>a</bold><sub>1</sub> and <bold>a</bold><sub>2</sub>. Since the first two coordinates capture all the relevant information, the components live in a two-dimensional subspace. Using PCA, we can retrieve the two-dimensional subspace from the data. While the method allows us to reduce the dimensionality and complexity of the data dramatically, PCA will in general only retrieve the two-dimensional subspace, but not the original coordinates, <italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>) and <italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>).</p>
<p>To see this, we will briefly review PCA and show what it does to the data from our toy model. PCA commences by computing the covariances of the firing rates between all pairwise combination of neurons. Let us define the mean firing rate of neuron <italic>i</italic> as the average number of spikes that this neuron emits, so that</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="m6"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:msub><mml:mi>M</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:msub><mml:mi>M</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:munderover><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:munderover><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="m7"><mml:mrow><mml:mo>=</mml:mo><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>We will use the angular brackets in the second line as a short-hand for averaging. The variables to be averaged over are indicated as subscript on the right bracket. Here, the average runs over all time points <italic>t</italic>, all stimuli <italic>s</italic>, and all decisions <italic>d</italic>. For the vector of mean firing rates we write <bold>r</bold>&#x02009;&#x0003D;&#x02009;(<italic>r</italic><sub>1</sub>,&#x02026;,<italic>r<sub>N</sub></italic>)<sup><italic>T</italic></sup>.</p>
<p>The covariance matrix of the data summarizes the second-order statistics of the data set,</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="m8"><mml:mrow><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>and has size <italic>N</italic>&#x02009;&#x000D7;&#x02009;<italic>N</italic> where <italic>N</italic> is the number of neurons in the data set. Given the covariance matrix, we can compute the firing rate variance that falls along arbitrary directions in state space. For instance, the variance captured by a coordinate axis given by a normalized vector <bold>u</bold> is simply <italic>L</italic>&#x02009;&#x0003D;&#x02009;<bold>u</bold><italic><sup>T</sup><bold>C</bold></italic><bold>u</bold>. We can then look for the axis that captures most of the variance of the data by maximizing the function <italic>L</italic> with respect to <bold>u</bold> subject to the normalization constraint <bold>u</bold><sup><italic>T</italic></sup><bold>u</bold>&#x02009;&#x0003D;&#x02009;1. The solution corresponds to the first axis of the coordinate system that PCA constructs. If we are looking for several mutually orthogonal axes, these can be conveniently summarized into an <italic>N</italic>&#x02009;&#x000D7;&#x02009;<italic>n</italic> orthogonal matrix, <italic>U</italic>&#x02009;&#x0003D;&#x02009;[<bold>u</bold><sub>1</sub>,&#x02026;,<bold>u</bold><italic><sub>n</sub></italic>]. To find the maximum amount of variance that falls into the subspace spanned by these axes, we need to maximize</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="m9"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:msubsup><mml:mtext>u</mml:mtext><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mi>C</mml:mi><mml:msub><mml:mtext>u</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>C</mml:mi><mml:mi>U</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02003;</mml:mo><mml:mtext>subject</mml:mtext><mml:mo>&#x02009;</mml:mo><mml:mtext>to</mml:mtext><mml:mo>&#x02003;</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the trace-operation, tr(&#x00B7;), sums over all the diagonal entries of a matrix, and <italic>I<sub>n</sub></italic> denotes the <italic>n</italic>&#x02009;&#x000D7;&#x02009;<italic>n</italic> identity matrix.</p>
<p>Mathematically, the <italic>principal</italic> axes <bold>u</bold><italic><sub>i</sub></italic> correspond to the eigenvectors of the covariance matrix, <italic>C</italic>, which can nowadays be computed quite easily using numerical methods. Subsequently, the data can be plotted in the new coordinate system. The new coordinates of the data are given by</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="m10"><mml:mrow><mml:mtext>y</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>These new coordinates are called the <italic>principal components</italic>. Note that the new coordinate system has a different origin from the old one, since we subtracted the vector of mean firing rates, <bold>r</bold>. Consequently, the principal components can take both negative and positive values. Note also that the principal components are only defined up to a minus sign since every coordinate axis can be reflected along the origin. For our artificial data set, only two eigenvalues are non-zero, so that two principal components suffice to capture the complete variance of the data. The data in these two new coordinates, <italic>y</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>) and <italic>y</italic><sub>2</sub>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>), are shown in Figure <xref ref-type="fig" rid="F1">1</xref>C.</p>
<p>Our toy model shows how PCA can succeed in summarizing the population response, yet it also illustrates the key problem of PCA: just as the individual neurons, the components mix information about the different task parameters (Figure <xref ref-type="fig" rid="F1">1</xref>C), even though the original components do not (Figure <xref ref-type="fig" rid="F1">1</xref>A). The underlying problem is that PCA ignores the causes of firing rate variability. Whether firing rates have changed due to the external stimulus <italic>s</italic>, due to the internally generated decision <italic>d</italic>, or due to some other cause, they will enter equally into the computation of the covariance matrix and therefore not influence the choice of the coordinate system constructed by PCA.</p>
<p>To make these notions more precise, we compute the covariance matrix of the simulated data. Inserting Eq. <xref ref-type="disp-formula" rid="E3">3</xref> into Eq. <xref ref-type="disp-formula" rid="E6">6</xref>, we obtain</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="m11"><mml:mrow><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>[</mml:mo> <mml:mrow><mml:msub><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup></mml:mrow> <mml:mo>]</mml:mo></mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>M</italic><sub>11</sub> and <italic>M</italic><sub>22</sub> denote firing rate variance due to the first and second component, respectively, <italic>M</italic><sub>12</sub> denotes firing rate variance due to a mix of the two components, and <italic>H</italic> is the covariance matrix of the noise. Using the short-hand notations <italic>z</italic><sub>1</sub>(<italic>t</italic>)&#x02009;&#x0003D;&#x02009;&#x03008;<italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>)&#x03009;<italic><sub>s</sub></italic>, <italic>z</italic><sub>2</sub>(<italic>t</italic>)&#x02009;&#x0003D;&#x02009;&#x03008;<italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>)&#x03009;<italic><sub>d</sub></italic>, and <italic>z<sub>i</sub></italic>&#x02009;&#x0003D;&#x02009;&#x03008;<italic>z<sub>i</sub></italic>(<italic>t</italic>)&#x03009;<italic><sub>t</sub></italic> for <italic>i</italic>&#x02208;[1,2], the different variances are given by</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="m12"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E11"><label>(11)</label><mml:math id="m13"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E12"><label>(12)</label><mml:math id="m14"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mi>t</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Principal component analysis will only be able to segregate the stimulus- and decision-dependent variance if the mixture term <italic>M</italic><sub>12</sub> vanishes and if the variances of the individual components, <italic>M</italic><sub>11</sub> and <italic>M</italic><sub>22</sub>, are sufficiently different from each other. However, if the two underlying components <italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>) and <italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>) are temporally correlated, then the mixture term <italic>M</italic><sub>12</sub> will be non-zero. Its presence will then force the eigenvectors of <italic>C</italic> away from <bold>a</bold><sub>1</sub> and <bold>a</bold><sub>2</sub>. Moreover, even if the mixture term vanishes, PCA may still not be able to retrieve the original mixture coefficients, if the variances of the individual components, <italic>M</italic><sub>11</sub> and <italic>M</italic><sub>22</sub> are too close to each other when compared to the magnitude of the noise: in this case the eigenvalue problem becomes degenerate. In general, the covariance matrix therefore mixes different origins of firing rate variance rather than separating them. While PCA allows us to reduce the dimensionality of the data, the coordinate system found may therefore provide only limited insight into how the different task parameters are represented in the neural activities.</p>
</sec>
<sec>
<title>Demixing responses using covariances over marginalized data</title>
<p>To solve these problems, we need to separate the different causes of firing rate variability. In the context of our example, we can attribute changes in the firing rates to two separate sources, both of which contribute to the covariance in Eq. <xref ref-type="disp-formula" rid="E6">6</xref>. First, firing rates may change due to the externally applied stimulus <italic>s</italic>. Second, firing rates may change due to the internally generated decision <italic>d</italic>.</p>
<p>To account for these separate sources of variance in the population response, we suggest to estimate one covariance matrix for every source of interest. Such a covariance matrix needs to be specifically targeted toward extracting the relevant source of firing rate variance without contamination by other sources. Naturally, this step is somewhat problem-specific. For our example, we will first focus on the problem of estimating firing rate variance caused by the stimulus separately from firing rate variance caused by the decision. When averaging over all stimuli, we obtain the marginalized firing rates <bold>r</bold>(<italic>t</italic>,<italic>d</italic>)&#x02009;&#x0003D;&#x02009;&#x03008;<bold>r</bold>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>)&#x03009;<italic><sub>s</sub></italic>. The covariance caused by the stimulus is then given by the <italic>N</italic>&#x02009;&#x000D7;&#x02009;<italic>N</italic> matrix</p>
<disp-formula id="E13"><label>(13)</label><mml:math id="m15"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>We will refer to <italic>C<sub>s</sub></italic> as the marginalized covariance matrix for the stimulus. We can repeat the procedure for the decision-part of the task. Marginalizing over decisions, we obtain <bold>r</bold>(<italic>t</italic>,<italic>s</italic>)&#x02009;&#x0003D;&#x02009;&#x03008;<bold>r</bold>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>)&#x03009;<italic><sub>d</sub></italic> and</p>
<disp-formula id="E14"><label>(14)</label><mml:math id="m16"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Having two different covariance matrices, one may now perform two separate PCAs, one for each covariance matrix. In turn, one obtains two separate coordinate systems, one in which the principal axes point into the directions of state space along which firing rates vary if the stimulus is changed, the other in which they point into the directions along which firing rates vary if the decision changes.</p>
<p>For the toy model, it is readily seen that the marginalized covariance matrices are given by <inline-formula><mml:math id="m17"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="m18"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>,</mml:mo><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:math></inline-formula> with <italic>M</italic><sub><italic>s</italic>,11</sub>&#x02009;&#x0003D;&#x02009;&#x03008;(<italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>)&#x02009;&#x02212;&#x02009;<italic>z</italic><sub>1</sub>(<italic>t</italic>))<sup>2</sup>&#x03009; and <italic>M</italic><sub><italic>d</italic>,22</sub>&#x02009;&#x0003D;&#x02009;&#x03008;(<italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>)&#x02009;&#x02212;&#x02009;<italic>z</italic><sub>2</sub>(<italic>t</italic>))<sup>2</sup>&#x03009;. Consequently, the principal eigenvectors of <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic> will be equivalent to the mixing coefficients <bold>a</bold><sub>1</sub> and <bold>a</bold><sub>2</sub>, at least as long as the variances <italic>M</italic><sub><italic>s</italic>,11</sub> and <italic>M</italic><sub><italic>d</italic>,22</sub> are much larger than the size of the noise, which is given by tr(<italic>H</italic>).</p>
<p>If the noise term is not negligible, it will force the eigenvectors away from the actual mixing coefficients. This problem can be alleviated by using the orthogonality condition, <inline-formula><mml:math id="m19"><mml:mrow><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, which implies that there are separate sources of variance for the stimulus- and decision-components. To this end, we can seek to divide the full space into two subspaces, one that captures as much as possible about the stimulus-dependent covariance <italic>C<sub>s</sub></italic>, and another, that captures as much as possible about the decision-dependent covariance <italic>C<sub>d</sub></italic>. Our goal will then be to maximize the function</p>
<disp-formula id="E15"><label>(15)</label><mml:math id="m20"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>with respect to the two orthogonal matrices <italic>U</italic><sub>1</sub> and <italic>U</italic><sub>2</sub> whose columns contain the basis vectors of the respective subspaces. The first term in Eq. <xref ref-type="disp-formula" rid="E15">15</xref> captures the total variance falling into the subspace spanned by the columns of <italic>U</italic><sub>1</sub>, and the second term the total variance falling into the subspace given by <italic>U</italic><sub>2</sub>. Writing <italic>U</italic>&#x02009;&#x0003D;&#x02009;[<italic>U</italic><sub>1</sub>,<italic>U</italic><sub>2</sub>], we obtain an orthogonal matrix for the full space, and the orthogonality conditions are neatly summarized by <italic>UU</italic><sup><italic>T</italic></sup>&#x02009;&#x0003D;&#x02009;<italic>I</italic>. As shown in the Appendix, the maximization of Eq. <xref ref-type="disp-formula" rid="E15">15</xref> under these orthogonality constraints can be solved by computing the eigenvectors and eigenvalues of the difference of covariance matrices,</p>
<disp-formula id="E16"><label>(16)</label><mml:math id="m21"><mml:mrow><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>In this case, the eigenvectors belonging to the positive eigenvalues of <italic>D</italic> form the columns of <italic>U</italic><sub>1</sub> and the eigenvectors belonging to the negative eigenvalues of <italic>D</italic> form the columns of <italic>U</italic><sub>2</sub>. As with PCA, the positive or negative eigenvalues can be sorted according to the amount of variance they capture about <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic>.</p>
<p>For the simulated example, we obtain</p>
<disp-formula id="E17"><label>(17)</label><mml:math id="m22"><mml:mrow><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>,</mml:mo><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the noise term <italic>H</italic> has now dropped out. Diagonalization of <italic>D</italic> results in two clearly separated eigenvalues, <italic>M</italic><sub><italic>s</italic>,11</sub> and &#x02212;<italic>M</italic><sub><italic>d</italic>,11</sub>, and in two eigenvectors, <bold>a</bold><sub>1</sub> and <bold>a</bold><sub>2</sub>, that correspond to the original mixing coefficients.</p>
</sec>
<sec>
<title>Linking the population level and the single cell level</title>
<p>As a result of the above method, we obtain a new coordinate system, whose basis vectors are given by the columns of the matrix <italic>U</italic>. This coordinate system provides simply a different, and hopefully useful, way of representing the population response. One major advantage of orthogonality is that one can easily move back and forth between the single cell and population level description of the neural activities. Just as in PCA, we can project the original firing rates of the neurons onto the new coordinates,</p>
<disp-formula id="E18"><label>(18)</label><mml:math id="m23"><mml:mrow><mml:mtext>y</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>and the two leading coordinates for the toy model are shown in Figure <xref ref-type="fig" rid="F1">1</xref>D. These components correspond approximately to the original components, <italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>) and <italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>). In turn, we can reconstruct the activity of each neuron by inverting the coordinate transform,</p>
<disp-formula id="E19"><label>(19)</label><mml:math id="m24"><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>U</mml:mi><mml:mtext>y</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mtext>r</mml:mtext><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>For every neuron this yields a set of <italic>N</italic> reconstruction coefficients which correspond to the rows of <italic>U</italic>.</p>
<p>Since two coordinates were sufficient to capture most of the variance in the toy example, the firing rate of every neuron can be reconstructed by a linear combination of these two components, <italic>y</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>) and <italic>y</italic><sub>2</sub>(<italic>t</italic>,<italic>s</italic>,<italic>d</italic>). For each neuron, we thereby obtain two reconstruction coefficients, <italic>u</italic><sub><italic>i</italic>1</sub> and <italic>u</italic><sub><italic>i</italic>2</sub>. The set of all reconstruction coefficients constitutes a cloud of points in a two-dimensional space. The distribution of this cloud, together with the activities of several example neurons are shown in Figure <xref ref-type="fig" rid="F2">2</xref>. This plot allows us to link the single cell with the population level by visualizing how the activity of each neuron is composed out of the two underlying components.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Linking the single cell and population level</bold>. Using the coordinate system retrieved by separating variances, we can illuminate the contributions of each component to the individual neural responses. The center shows the contribution of the stimulus- and decision-related components to each individual neuron. The surrounding plots show the activity of eight example neurons, corresponding to the respective dots in the center.</p></caption>
<graphic xlink:href="fncom-04-00126-g002.tif"/>
</fig>
</sec>
<sec>
<title>Generalizations to more than two parameters</title>
<p>In our toy example, we have assumed that each task parameter is represented by a single component. We note that this is a feature of our specific example. In more realistic scenarios, a single task parameter could potentially be represented by more than one component. For instance, if one set of neurons fires transiently with respect to a stimulus <italic>s</italic>, but another set of neurons fires tonically, then the firing rate dynamics of the stimulus representation are already two-dimensional, even without taking the decision into account. In such a case, we can still use the method described above to retrieve the two subspaces in which the respective components lie.</p>
<p>However, the number of task parameters will often be larger than two. In the two-alternative-forced choice task, there are at least four parameters that could lead to changes in firing rates: the timing of the task, <italic>t</italic>, potentially related to anticipation or rhythmic aspects of a task, the stimulus, <italic>s</italic>, the decision, <italic>d</italic>, and the reward, <italic>r</italic>. Even more task parameters could be of interest, such as those extracted from previous trials etc.</p>
<p>These observations raise the question of how the method can be generalized if there are more than two task parameters to account for. To do so, we write the relevant parameters into one long vector <bold>&#x003B8;</bold>&#x02009;&#x0003D;&#x02009;(&#x003B8;<sub>1</sub>,&#x003B8;<sub>2</sub>,&#x02026;,&#x003B8;<italic><sub>M</sub></italic>), and assume that the firing rates of the neurons are linear mixtures of the form</p>
<disp-formula id="E20"><label>(20)</label><mml:math id="m25"><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mo>&#x02026;</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E21"><label>(21)</label><mml:math id="m26"><mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mrow><mml:mn>21</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mn>21</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mo>&#x02026;</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="E22"><label>(22)</label><mml:math id="m27"><mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mrow><mml:mi>M</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where each task parameter is now represented by more than one component. For each parameter, &#x003B8;<italic><sub>i</sub></italic>, we can compute the marginalized covariance matrix,</p>
<disp-formula id="E23"><label>(23)</label><mml:math id="m28"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>which measures the covariance in the firing rates due to changes in the parameter &#x003B8;<italic><sub>i</sub></italic>. Diagonalizing each of these covariance matrices will retrieve the various subspaces corresponding to the different mixture coefficients. For instance, when diagonalizing <italic>C</italic><sub>1</sub>, we obtain the subspace for the components that depend on the parameter &#x003B8;<sub>1</sub>. The relevant eigenvectors of <italic>C</italic><sub>1</sub> will therefore span the same subspace as the mixture coefficients <bold>a</bold><sub>11</sub>, <bold>a</bold><sub>12</sub>, etc., in Eq. <xref ref-type="disp-formula" rid="E22">22</xref>.</p>
<p>As before, the method&#x00027;s performance under additive noise can be enhanced by maximizing a single function (see <xref ref-type="app" rid="A1">Appendix</xref>)</p>
<disp-formula id="E24"><label>(24)</label><mml:math id="m29"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>subject to the orthogonality constraint <italic>U<sup>T</sup>U</italic>&#x02009;&#x0003D;&#x02009;<italic>I</italic> for <italic>U</italic>&#x02009;&#x0003D;&#x02009;[<italic>U</italic><sub>1</sub>,<italic>U</italic><sub>2</sub>,&#x02026;,<italic>U<sub>M</sub></italic>]. Maximization of this function will force the firing rate variance due to different parameters &#x003B8;<italic><sub>i</sub></italic> into orthogonal subspaces (as required by the model). If <italic>M</italic>&#x02009;&#x0003D;&#x02009;1, then maximization results in a standard PCA. In the case <italic>M</italic>&#x02009;&#x0003D;&#x02009;2, maximization requires the diagonalization of the difference of covariance matrices <italic>C</italic><sub>1</sub>&#x02009;&#x02212;&#x02009;<italic>C</italic><sub>2</sub>, as in Eq.&#x02009;16. In the case <italic>M</italic>&#x02009;&#x0003E;&#x02009;2, various algorithms can be constructed to find local maxima of <italic>L</italic> (see e.g., Bolla et al., <xref ref-type="bibr" rid="B4">1998</xref>). To our knowledge, a full understanding of the global solution structure of the maximization problem does not exist for <italic>M</italic>&#x02009;&#x0003E;&#x02009;2. In the Appendix, we show how to maximize Eq. <xref ref-type="disp-formula" rid="E24">24</xref> with standard gradient ascent methods. In any case, it may often be a good idea to use PCA on the full covariance matrix of the data, Eq. <xref ref-type="disp-formula" rid="E6">6</xref>, to reduce the dimensionality of the data set prior to the demixing procedure. Indeed, this preprocessing step was applied in Machens et al. (<xref ref-type="bibr" rid="B17">2010</xref>).</p>
</sec>
<sec>
<title>Further generalizations and limitations of the method</title>
<p>The above formulation of the problem may be further generalized by allowing individual components to mix parameters in non-trivial ways. To study this scenario in a simple example, imagine that in the above two-alternative-forced choice task, in addition to the stimulus- and decision-dependent component, there were a purely time-dependent component, <italic>z</italic><sub>3</sub>(<italic>t</italic>), locked to the time structure of the task, so that</p>
<disp-formula id="E25"><label>(25)</label><mml:math id="m30"><mml:mrow><mml:mtext>r</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mtext>a</mml:mtext><mml:mn>3</mml:mn></mml:msub><mml:msub><mml:mi>z</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mtext>c</mml:mtext><mml:mo>+</mml:mo><mml:mtext>n</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>This scenario is illustrated in Figures <xref ref-type="fig" rid="F3">3</xref>A,B. As before, we can compute marginalized covariance matrices, that capture the covariance due to the stimuli <italic>s</italic>, the decisions <italic>d</italic>, or the time points <italic>t</italic>. While the marginalized covariance matrices for the stimuli and decisions, <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic>, have one significant eigenvalue each, and thereby capture the relevant component (Figure <xref ref-type="fig" rid="F3">3</xref>C), the marginalized covariance matrix for time, <italic>C<sub>t</sub></italic>, now has three significant eigenvalues, and therefore does not allow us to retrieve the purely time-dependent component <italic>z</italic><sub>3</sub>(<italic>t</italic>). The reason for this failure is that all three components in Eq. <xref ref-type="disp-formula" rid="E25">25</xref> have a time-dependence that cannot be averaged out. By design, the stimulus-averaged first component, <italic>z</italic><sub>1</sub>(<italic>t</italic>)&#x02009;&#x0003D;&#x02009;&#x03008;<italic>z</italic><sub>1</sub>(<italic>t</italic>,<italic>s</italic>)&#x03009;<italic><sub>s</sub></italic>, and the decision-averaged second component, <italic>z</italic><sub>2</sub>(<italic>t</italic>)&#x02009;&#x0003D;&#x02009;&#x03008;<italic>z</italic><sub>2</sub>(<italic>t</italic>,<italic>d</italic>)&#x03009;<italic><sub>d</sub></italic> do not vanish. In other words, the stimulus- and decision-components have intrinsic time-dependent variance that cannot be separated from the stimulus- or decision-induced variance.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Heterogeneous responses as a result of linear mixture of three components</bold>. <bold>(A)</bold> We now assume that neural responses are constructed from three underlying components, one of which encodes only the task rhythm (left), while the two others encode stimuli and responses, as in Figure <xref ref-type="fig" rid="F1">1</xref>. <bold>(B)</bold> Single cell responses are random combinations of these three components. We again assume that <italic>N</italic>&#x02009;&#x0003D;&#x02009;50 neurons have been recorded, five of which are shown here, and response noise was generated as in Figure <xref ref-type="fig" rid="F1">1</xref>. <bold>(C)</bold> Whereas the marginalized covariance matrices for stimulus and decision, <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic>, have only one significant eigenvalue, the one for time, <italic>C<sub>t</sub></italic>, features three eigenvalues above the noise floor. <bold>(D)</bold> The confusion matrix displays the percentage of variance captured by the different components (<italic>y</italic>-axis) with respect to the different marginalized covariance matrices (<italic>x</italic>-axis). The <italic>C<sub>s</sub></italic> row shows, for instance, that the first eigenvector of <italic>C<sub>s</sub></italic> captures a significant amount of variance from the <italic>C<sub>s</sub></italic> matrix (which is good), but it also captures a significant amount of variance from the <italic>C<sub>t</sub></italic> matrix (which is bad). <bold>(E)</bold> By estimating the time-dependent covariance over a time window limited to <italic>t</italic>&#x02208;[0,2], we can reduce the number of significant eigenvalues to one. <bold>(F)</bold> In turn, every row of the confusion matrix has only one entry, showing that every component captures the variance of one marginalized covariance matrix only. <bold>(G)</bold> The firing rates projected onto the components from <bold>(F)</bold>.</p></caption>
<graphic xlink:href="fncom-04-00126-g003.tif"/>
</fig>
<p>Consequently, the subspace spanned by the first three eigenvectors of <italic>C<sub>t</sub></italic> overlaps with the respective subspaces spanned by the first eigenvectors of <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic>. One way to visualize this overlap is to take the five relevant eigenvectors (three for <italic>C<sub>t</sub></italic>, one for <italic>C<sub>s</sub></italic>, and one for <italic>C<sub>d</sub></italic>) and compute how much of the variance of each marginalized covariance matrix they capture. To do so, we compute the &#x0201C;confusion matrix&#x0201D;</p>
<disp-formula id="E26"><label>(26)</label><mml:math id="m31"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mtext>u</mml:mtext><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mtext>u</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mtext>tr</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>This confusion matrix measures what percentage of the variance attributed to the <italic>j</italic>-th cause is captured by the <italic>i</italic>-th coordinate. For the above example, it is illustrated in Figure <xref ref-type="fig" rid="F3">3</xref>D. If in one row of this matrix, more than one entry is significantly above 0, then more than one covariance matrix has significant variance along that direction of state space. Whereas the eigenvectors of the <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic> matrix do not interfere with each other, i.e., they are approximately orthogonal, the eigenvectors of the <italic>C<sub>t</sub></italic> matrix interfere with both the <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic> eigenvectors, i.e., the respective subspaces overlap. The method introduced above will still yield a result in this case, however, the new coordinate system will generally not retrieve the original components.</p>
<p>An <italic>ad hoc</italic> solution to this problem may be to section the three-dimensional eigenvector subspace of <italic>C<sub>t</sub></italic>, and identify a direction that is orthogonal to the first eigenvectors of <italic>C<sub>s</sub></italic> and <italic>C<sub>d</sub></italic>, which will then correspond to the purely time-dependent component <italic>z</italic><sub>3</sub>(<italic>t</italic>). Alternatively, we could restrict the estimation of <italic>C<sub>t</sub></italic> to the time before stimulus onset, so that the covariance matrix is no longer contaminated by time-dependent variance from the stimulus- or decision-components. The rank of <italic>C<sub>t</sub></italic> then reduces to one, and the different components separate nicely (Figures <xref ref-type="fig" rid="F3">3</xref>E,F,G). While feasible in our toy scenario, these <italic>ad hoc</italic> procedures are not guaranteed to work for real data, when more dimensions are involved, and more complex confusion matrices may result. However, the latter solution demonstrates that by a judicious choice of marginalized covariance matrices, one may sometimes be able to avoid such problems of non-separability.</p>
</sec>
<sec>
<title>Connection to blind source separation methods</title>
<p>In all of these scenarios, we assumed that the firing rates <bold>r</bold> are linear mixtures of a set of underlying sources <bold>z</bold>, each with mean 0, so that</p>
<disp-formula id="E27"><label>(27)</label><mml:math id="m32"><mml:mrow><mml:mtext>r</mml:mtext><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mtext>z</mml:mtext><mml:mo>+</mml:mo><mml:mtext>c</mml:mtext><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The problem that we have been describing then consists in estimating the unknown sources, <bold>z</bold>, the unknown mixture coefficients, <italic>A</italic>, and the unknown bias parameters <bold>c</bold> from the observed data, <bold>r</bold>. Without loss of generality, we can assume that the sources are centered so that &#x03008;<italic>z</italic>&#x03009;&#x02009;&#x0003D;&#x02009;0. Ours is therefore a specific version of the much-studied blind source separation problem (see e.g., Molgedey and Schuster, <xref ref-type="bibr" rid="B20">1994</xref>; Bell and Sejnowski, <xref ref-type="bibr" rid="B2">1995</xref>). In many standard formulations of this problem, one assumes that the sources are uncorrelated, or even statistically independent, which implies that the covariance matrix of the sources, <italic>M</italic>&#x02009;&#x0003D;&#x02009;&#x03008;<bold>zz</bold><italic><sup>T</sup></italic>&#x03009;<italic><sub>t</sub></italic>, is diagonal.</p>
<p>In our case, we do not want to make this assumption, which rules out the use of many blind source separation methods, such as independent component analysis (Hyv&#x000E4;rinen et al., <xref ref-type="bibr" rid="B13">2001</xref>). On the upside, we do have additional information, in the form of <italic>n</italic> task parameters, that provide indirect clues toward the underlying sources. More specifically, we assume that the sources are of the form <italic>z<sub>k</sub></italic>(<italic>t</italic>,&#x003B8;<italic><sub>k</sub></italic>) where &#x003B8;<italic><sub>k</sub></italic> denotes a single task parameter, or a specific combination of task parameters. For each task parameter, we can estimate the marginalized covariance matrix <italic>C<sub>i</sub></italic>, which in turn is given by <italic>C<sub>i</sub></italic>&#x02009;&#x0003D;&#x02009;<italic>AM<sub>i</sub>A<sup>T</sup></italic> with</p>
<disp-formula id="E28"><label>(28)</label><mml:math id="m33"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo> <mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>z</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mtext>z</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>z</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mtext>z</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow> <mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula>
<p>As long as different task parameters are distributed over different components, the matrix <italic>M<sub>i</sub></italic> will be block-diagonal. In the most general case, however, as discussed above, this will not be true. If one parameter is shared among several components, then the respective marginalized covariance matrix will capture variance from all of these components, and maximization of Eq. <xref ref-type="disp-formula" rid="E24">24</xref> will not necessarily retrieve the original components. Future work may show how this general, semi-blind source separation problem can be solved by using knowledge about the structure of the marginalized <italic>M</italic>-matrices. For now, we suggest that in many practical scenarios, a judicious choice of covariance measurements, for instance, by focusing on particular time intervals of a task etc., may help to partly reduce the problem to those that are completely separable, as in Eq. <xref ref-type="disp-formula" rid="E22">22</xref>.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>In this article, we addressed the problem of analyzing neural recordings with strong response heterogeneity. A key problem for these data sets is first and foremost the difficulty of visualizing the neural activities at the population level. Simply parsing through individual neural responses is often not sufficient, hence the quest for methods that provide a useful and interpretable summary of the population response.</p>
<p>To provide such a summary, we made one crucial assumption. We assumed that the heterogeneity of neural responses is caused by a simple mixing procedure in which the firing rates of individual neurons are random, linear combinations of a few fundamental components. We believe that such a scenario is likely to be responsible for at least part of the observed response diversity. Higher-level areas of the brain are known to integrate and process information from many other areas in the brain. The presumed fundamental components could be given by the inputs and outputs of these areas. If such components are mixed at random at the level of single cells, then upstream or downstream areas can access the relevant information with simple linear and orthogonal read-outs. Such linear population read-outs have long been known to work quite well in various neural systems (Seung and Sompolinsky, <xref ref-type="bibr" rid="B30">1993</xref>; Salinas and Abbott, <xref ref-type="bibr" rid="B28">1994</xref>).</p>
<p>To retrieve the components from recorded neural activity, and thereby at least partly reduce the response heterogeneity, we suggest to estimate the covariances in the firing rates that can be attributed to the experimentally controlled, external task parameters. Using these marginalized covariance matrices, we showed how to construct an orthogonal coordinate system such that individual coordinates capture the main aspects of the task-related neural activities and the coordinate system as a whole captures all aspects of the neural activities. In the new coordinate system, firing rate variance due to different task parameters is projected onto orthogonal coordinates, making visualization and interpretation of the data particularly easy. We note, though, that the existence of a useful, orthogonal coordinate system is not guaranteed by the method, but can only be a feature of the data. Our method will generally not return useful results if mixing is linear, but not orthogonal, or if mixing is non-linear. Nonetheless, the case of non-orthogonal, linear mixing, may still be investigated through separate PCAs on the different marginalized covariance matrices.</p>
<p>Other methods exist that address similar goals. Most prominently, application of canonical correlation analysis (CCA) to the type of data discussed here would also construct a coordinate system whose choice is influenced by knowledge about the task structure. In our context, CCA would seek a coordinate axis in the state space of neural responses and a coordinate axis in the space of task parameters, such that the correlation between the two is maximized. Whether this method would yield a useful, i.e., interpretable, coordinate system for real data sets remains open to investigation. CCA has recently been proposed as a method to construct population responses in sensory systems (Macke et al., <xref ref-type="bibr" rid="B18">2008</xref>) and as a way to correlate electrophysiological with fMRI data (Biessmann et al., <xref ref-type="bibr" rid="B3">2009</xref>).</p>
<p>Further extensions and generalizations of PCA exist, some of which are specifically targeted to the type of data we have discussed here. The work of Yu et al. (<xref ref-type="bibr" rid="B34">2009</xref>), for instance, explicitly addresses the problems that are incurred by estimating firing rates prior to the dimensionality reduction. They show how to combine these two separate steps into a single one using the theory of Gaussian processes. Their work is therefore complementary to ours, and could potentially be incorporated into the methodology introduced here.</p>
<p>Methods to summarize population activity have been employed in many different neurophysiological settings (Friedrich and Laurent, <xref ref-type="bibr" rid="B10">2001</xref>; Stopfer et al., <xref ref-type="bibr" rid="B31">2003</xref>; Paz et al., <xref ref-type="bibr" rid="B24">2005</xref>; Narayanan and Laubach, <xref ref-type="bibr" rid="B21">2009</xref>; Yu et al., <xref ref-type="bibr" rid="B34">2009</xref>). Our main aim here was to modify these methods such that experimentally controlled parameters are taken into account and influence the construction of a new coordinate system. A first application of this method to neural responses from the prefrontal cortex revealed new aspects of a previously studied data set (Machens et al., <xref ref-type="bibr" rid="B17">2010</xref>). Many other data sets with strong response heterogeneity may be amenable to a similar analysis.</p>
</sec>
<sec>
<title>Conflict of Interest Statement</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<app-group>
<app id="A1">
<title>Appendix</title>
<sec>
<title>Maximization for two covariance measurements</title>
<p>Assume that our goal is to separate the state space into two mutually orthogonal subspaces, such that most of the variance measured by <italic>C</italic><sub>1</sub> falls into one subspace, and most of the variance measured by <italic>C</italic><sub>2</sub> into the orthogonal subspace. To do so, we define a matrix <italic>U</italic><sub>1</sub> whose columns contain a set of vectors <bold>u</bold><italic><sub>i</sub></italic> with <italic>i</italic>&#x02009;&#x0003D;&#x02009;1,&#x02026;,<italic>M</italic>, and a matrix <italic>U</italic><sub>2</sub> whose columns contain a set of vectors <bold>u</bold><italic><sub>i</sub></italic> with <italic>i</italic>&#x02009;&#x0003D;&#x02009;<italic>M</italic>&#x02009;&#x0002B;&#x02009;1,&#x02026;,<italic>N</italic>. All vectors are mutually orthonormal, so that <inline-formula><mml:math id="m34"><mml:mrow><mml:msubsup><mml:mtext>u</mml:mtext><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mtext>u</mml:mtext><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mo>&#x003B4;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. Our goal will then be to maximize</p>
<disp-formula id="E29"><label>(29)</label><mml:math id="m35"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The orthogonality constraint is given by the condition <inline-formula><mml:math id="m36"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:math></inline-formula>. By the rules of traces, and using this constraint, we obtain</p>
<disp-formula id="E30"><mml:math id="m37"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>2</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The last line is maximized if the matrix <italic>U</italic><sub>1</sub> contains all the eigenvectors that correspond to the positive eigenvalues of <italic>C</italic><sub>1</sub>&#x02009;&#x02212;&#x02009;<italic>C</italic><sub>2</sub>. Consequently, the matrix <italic>U</italic><sub>2</sub> will contain all the eigenvectors corresponding to the negative eigenvalues of <italic>C</italic><sub>1</sub>&#x02009;&#x02212;&#x02009;<italic>C</italic><sub>2</sub>. The extremal eigenvalues of the difference matrix, i.e., the largest and the smallest, correspond to the two eigenvectors that capture most of the variance in <italic>C</italic><sub>1</sub> and <italic>C</italic><sub>2</sub> under the given trade-off.</p>
</sec>
<sec>
<title>Additive noise does not affect the maximum</title>
<p>To study the maximization problem under condition of additive noise, we assume <italic>n</italic> covariance measurements so that</p>
<disp-formula id="E31"><label>(30)</label><mml:math id="m38"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>S<sub>i</sub></italic> is the signal-part and <italic>H</italic> the noise part of the covariance matrix. Since the noise acts additively on the firing rates, every covariance measurement is polluted with the same amount of noise, <italic>H</italic>, compare Eq. <xref ref-type="disp-formula" rid="E23">23</xref>. When maximizing Eq. <xref ref-type="disp-formula" rid="E24">24</xref> with respect to an orthogonal transform, <italic>U</italic>&#x02009;&#x0003D;&#x02009;[<italic>U</italic><sub>1</sub>,&#x02026;,<italic>U<sub>n</sub></italic>], we will then target only the signal part of the covariance matrices, but not the noise part. To see that, we note that</p>
<disp-formula id="E32"><label>(31)</label><mml:math id="m39"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<disp-formula id="E33"><label>(32)</label><mml:math id="m40"><mml:mrow><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E34"><label>(33)</label><mml:math id="m41"><mml:mrow><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E35"><label>(34)</label><mml:math id="m42"><mml:mrow><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E36"><label>(35)</label><mml:math id="m43"><mml:mrow><mml:mo>=</mml:mo><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mtext>tr</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>H</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Accordingly, the projection operators, <inline-formula><mml:math id="m44"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>, which project the variance into the relevant subspaces, target the difference of covariance matrices, <italic>C<sub>i</sub></italic>&#x02009;&#x02212;&#x02009;<italic>C<sub>n</sub></italic>, so that the noise drops out, since <italic>C<sub>i</sub></italic>&#x02009;&#x02212;&#x02009;<italic>C<sub>n</sub></italic>&#x02009;&#x0003D;&#x02009;<italic>S<sub>i</sub></italic>&#x02009;&#x02212;&#x02009;<italic>S<sub>n</sub></italic>.</p>
</sec>
<sec>
<title>Maximization for <italic>N</italic> covariance measurements</title>
<p>Maximization of Eq. <xref ref-type="disp-formula" rid="E24">24</xref>,</p>
<disp-formula id="E37"><label>(36)</label><mml:math id="m45"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02003;</mml:mo><mml:mtext>subject</mml:mtext><mml:mo>&#x02009;</mml:mo><mml:mtext>to</mml:mtext><mml:mo>&#x02003;</mml:mo><mml:mi>U</mml:mi><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>is a quadratic optimization problem under quadratic constraints which can be solved numerically by any of a standard set of methods. A specific method to solve a related problem has been proposed in Bolla et al. (<xref ref-type="bibr" rid="B4">1998</xref>). Here, we present an algorithm based on a simple gradient ascent.</p>
<p>First, we need an initial guess for the <italic>U<sub>i</sub></italic>. We suggest to use the first principal axes (eigenvector with largest eigenvalue) of the marginalized covariance matrix <italic>C<sub>i</sub></italic>. This procedure, however, will generally yield a set of matrices <italic>U<sub>i</sub></italic> which are not mutually orthogonal. To orthogonalize these vectors, one can use the method of symmetric orthogonalization. Given the initial guess for the matrix, <italic>U</italic>&#x02009;&#x0003D;&#x02009;[<italic>U</italic><sub>1</sub>,&#x02026;,<italic>U<sub>n</sub></italic>], the transform</p>
<disp-formula id="E38"><label>(37)</label><mml:math id="m46"><mml:mrow><mml:mi>U</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>U</mml:mi><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>U</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula>
<p>will yield a matrix with mutually orthogonal columns so that <italic>U<sup>T</sup>U</italic>&#x02009;&#x0003D;&#x02009;<italic>I</italic>. We will use this matrix <italic>U</italic> as our initial guess for the gradient ascent.</p>
<p>Next, let us define the matrix <italic>Q<sub>i</sub></italic> as an <italic>n</italic>&#x02009;&#x000D7;&#x02009;<italic>n</italic> matrix of zeros in which only the entry in the <italic>i</italic>-th column and <italic>i</italic>-th row is 1. The maximization over the captured variances, Eq. <xref ref-type="disp-formula" rid="E37">36</xref>, can then be rewritten as</p>
<disp-formula id="E39"><label>(38)</label><mml:math id="m47"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mtext>tr</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>U</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02003;</mml:mo><mml:mtext>subject</mml:mtext><mml:mo>&#x02009;</mml:mo><mml:mtext>to</mml:mtext><mml:mo>&#x02003;</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:mi>I</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>which allows us to compactly write the matrix derivative of <italic>L</italic> as</p>
<disp-formula id="E40"><label>(39)</label><mml:math id="m48"><mml:mrow><mml:mfrac><mml:mrow><mml:mo>&#x02202;</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x02202;</mml:mo><mml:mi>U</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>U</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>Hence, to maximize <italic>L</italic> on the manifold of orthogonal matrices, <italic>U</italic>, we need to iterate the equations,</p>
<disp-formula id="E41"><label>(40)</label><mml:math id="m49"><mml:mrow><mml:mi>U</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>U</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mfrac><mml:mrow><mml:mo>&#x02202;</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x02202;</mml:mo><mml:mi>U</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E42"><label>(41)</label><mml:math id="m50"><mml:mrow><mml:mi>U</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>U</mml:mi><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>U</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the first equation performs a step toward the maximum, whose length is determined by the learning rate &#x003B1;, and the second step projects <italic>U</italic> back onto the manifold of orthogonal matrices.</p>
</sec>
</app>
</app-group>
<ack><p>I thank Claudia Feierstein, Naoshige Uchida, and Ranulfo Romo for access to their multi-electrode data which have been the main source of inspiration for the present work. I furthermore thank Carlos Brody, Matthias Bethge, Claudia Feierstein, and Thomas Schatz for helpful discussions along various stages of the project. My work is supported by an Emmy-Noether grant from the Deutsche Forschungsgemeinschaft and a Chair d&#x00027;excellence grant from the Agence Nationale de la Recherche.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Averbeck</surname> <given-names>B. B.</given-names></name> <name><surname>Sohn</surname> <given-names>J.-W.</given-names></name> <name><surname>Lee</surname> <given-names>D.</given-names></name></person-group> (<year>2006</year>). <article-title>Activity in prefrontal cortex during dynamic selection of action sequences</article-title>. <source>Nat. Neurosci.</source> <volume>9</volume>, <fpage>276</fpage>&#x02013;<lpage>282</lpage>.<pub-id pub-id-type="doi">10.1038/nn1634</pub-id><pub-id pub-id-type="pmid">16429134</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>A. J.</given-names></name> <name><surname>Sejnowski</surname> <given-names>T. J.</given-names></name></person-group> (<year>1995</year>). <article-title>An information-maximization approach to blind separation and blind deconvolution</article-title>. <source>Neural Comput.</source> <volume>7</volume>, <fpage>1129</fpage>&#x02013;<lpage>1159</lpage>.<pub-id pub-id-type="doi">10.1162/neco.1995.7.6.1129</pub-id><pub-id pub-id-type="pmid">7584893</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Biessmann</surname> <given-names>F.</given-names></name> <name><surname>Meinecke</surname> <given-names>F. C.</given-names></name> <name><surname>Gretton</surname> <given-names>A.</given-names></name> <name><surname>Rauch</surname> <given-names>A.</given-names></name> <name><surname>Rainer</surname> <given-names>G.</given-names></name> <name><surname>Logothetis</surname> <given-names>N.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Temporal kernel canonical correlation analysis and its application in multimodal neuronal data analysis</article-title>. <source>Mach. Learn.</source> <volume>79</volume>, <fpage>5</fpage>&#x02013;<lpage>27</lpage>.<pub-id pub-id-type="doi">10.1007/s10994-009-5153-3</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bolla</surname> <given-names>M.</given-names></name> <name><surname>Michaletzky</surname> <given-names>G.</given-names></name> <name><surname>Tusn&#x000E1;dy</surname> <given-names>G.</given-names></name> <name><surname>Ziermann</surname> <given-names>M.</given-names></name></person-group> (<year>1998</year>). <article-title>Extrema of sums of heterogeneous quadratic forms</article-title>. <source>Linear Algebra Appl.</source> <volume>269</volume>, <fpage>331</fpage>&#x02013;<lpage>365</lpage>.<pub-id pub-id-type="doi">10.1016/S0024-3795(97)00230-9</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brody</surname> <given-names>C. D.</given-names></name> <name><surname>Hernandez</surname> <given-names>A.</given-names></name> <name><surname>Zainos</surname> <given-names>A.</given-names></name> <name><surname>Romo</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex</article-title>. <source>Cereb. Cortex</source> <volume>13</volume>, <fpage>1196</fpage>&#x02013;<lpage>1207</lpage>.<pub-id pub-id-type="doi">10.1093/cercor/bhg100</pub-id><pub-id pub-id-type="pmid">14576211</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Churchland</surname> <given-names>M. M.</given-names></name> <name><surname>Shenoy</surname> <given-names>K. V.</given-names></name></person-group> (<year>2007</year>). <article-title>Temporal complexity and heterogeneity of single-neuron activity in premotor and motor cortex</article-title>. <source>J. Neurophysiol.</source> <volume>97</volume>, <fpage>4235</fpage>&#x02013;<lpage>4257</lpage>.<pub-id pub-id-type="doi">10.1152/jn.00095.2007</pub-id><pub-id pub-id-type="pmid">17376854</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Efron</surname> <given-names>B.</given-names></name></person-group> (<year>1982</year>). <article-title>Transformation theory: how normal is a family of distributions? Ann</article-title>. <source>Stat.</source> <volume>10</volume>, <fpage>328</fpage>&#x02013;<lpage>339</lpage>.</citation></ref>
<ref id="B8"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Eliasmith</surname> <given-names>C.</given-names></name> <name><surname>Anderson</surname> <given-names>C. H.</given-names></name></person-group> (<year>2003</year>). <source>Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems.</source> <publisher-loc>Cambridge, MA:</publisher-loc> <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feierstein</surname> <given-names>C. E.</given-names></name> <name><surname>Quirk</surname> <given-names>M. C.</given-names></name> <name><surname>Uchida</surname> <given-names>N.</given-names></name> <name><surname>Sosulski</surname> <given-names>D. L.</given-names></name> <name><surname>Mainen</surname> <given-names>Z. F.</given-names></name></person-group> (<year>2006</year>). <article-title>Representation of spatial goals in rat orbitofrontal cortex</article-title>. <source>Neuron</source> <volume>51</volume>, <fpage>495</fpage>&#x02013;<lpage>507</lpage>.<pub-id pub-id-type="doi">10.1016/j.neuron.2006.06.032</pub-id><pub-id pub-id-type="pmid">16908414</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friedrich</surname> <given-names>R. W.</given-names></name> <name><surname>Laurent</surname> <given-names>G.</given-names></name></person-group> (<year>2001</year>). <article-title>Dynamic optimization of odor representations by slow temporal patterning of mitral cell activity</article-title>. <source>Science</source> <volume>291</volume>, <fpage>889</fpage>&#x02013;<lpage>894</lpage>.<pub-id pub-id-type="doi">10.1126/science.291.5505.889</pub-id><pub-id pub-id-type="pmid">11157170</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gold</surname> <given-names>J. I.</given-names></name> <name><surname>Shadlen</surname> <given-names>M. N.</given-names></name></person-group> (<year>2007</year>). <article-title>The neural basis of decision making</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>30</volume>, <fpage>535</fpage>&#x02013;<lpage>574</lpage>.<pub-id pub-id-type="doi">10.1146/annurev.neuro.29.051605.113038</pub-id><pub-id pub-id-type="pmid">17600525</pub-id></citation></ref>
<ref id="B12"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hastie</surname> <given-names>T.</given-names></name> <name><surname>Tibshirani</surname> <given-names>R.</given-names></name> <name><surname>Friedman</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <source>The Elements of Statistical Learning Theory.</source> <publisher-loc>New York:</publisher-loc> <publisher-name>Springer</publisher-name>.</citation></ref>
<ref id="B13"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hyv&#x000E4;rinen</surname> <given-names>A.</given-names></name> <name><surname>Karhunen</surname> <given-names>J.</given-names></name> <name><surname>Oja</surname> <given-names>E.</given-names></name></person-group> (<year>2001</year>). <source>Independent Component Analysis.</source> <publisher-loc>New York:</publisher-loc> <publisher-name>Wiley InterScience.</publisher-name></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jun</surname> <given-names>J. K.</given-names></name> <name><surname>Miller</surname> <given-names>P.</given-names></name> <name><surname>Hern&#x000E1;ndez</surname> <given-names>A.</given-names></name> <name><surname>Zainos</surname> <given-names>A.</given-names></name> <name><surname>Lemus</surname> <given-names>L.</given-names></name> <name><surname>Brody</surname> <given-names>C.</given-names></name> <name><surname>Romo</surname> <given-names>R.</given-names></name></person-group> (<year>2010</year>). <article-title>Heterogeneous population coding of a short-term memory and decision-task</article-title>. <source>J. Neurosci.</source> <volume>30</volume>, <fpage>916</fpage>&#x02013;<lpage>929</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.2062-09.2010</pub-id><pub-id pub-id-type="pmid">20089900</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kepecs</surname> <given-names>A.</given-names></name> <name><surname>Uchida</surname> <given-names>N.</given-names></name> <name><surname>Zariwala</surname> <given-names>H. A.</given-names></name> <name><surname>Mainen</surname> <given-names>Z. F.</given-names></name></person-group> (<year>2008</year>). <article-title>Neural correlates, computation and behavioural impact of decision confidence</article-title>. <source>Nature</source> <volume>455</volume>, <fpage>227</fpage>&#x02013;<lpage>231</lpage>.<pub-id pub-id-type="doi">10.1038/nature07200</pub-id><pub-id pub-id-type="pmid">18690210</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiani</surname> <given-names>R.</given-names></name> <name><surname>Shadlen</surname> <given-names>M. N.</given-names></name></person-group> (<year>2009</year>). <article-title>Representation of confidence associated with a decision by neurons in the parietal cortex</article-title>. <source>Science</source> <volume>324</volume>, <fpage>759</fpage>&#x02013;<lpage>764</lpage>.<pub-id pub-id-type="doi">10.1126/science.1169405</pub-id><pub-id pub-id-type="pmid">19423820</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Machens</surname> <given-names>C. K.</given-names></name> <name><surname>Romo</surname> <given-names>R.</given-names></name> <name><surname>Brody</surname> <given-names>C. D.</given-names></name></person-group> (<year>2010</year>). <article-title>Functional, but not anatomical, separation of &#x0201C;what&#x0201D; and &#x0201C;when&#x0201D; in prefrontal cortex</article-title>. <source>J. Neurosci.</source> <volume>30</volume>, <fpage>350</fpage>&#x02013;<lpage>360</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.3276-09.2010</pub-id><pub-id pub-id-type="pmid">20053916</pub-id></citation></ref>
<ref id="B18"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Macke</surname> <given-names>J. H.</given-names></name> <name><surname>Zeck</surname> <given-names>G.</given-names></name> <name><surname>Bethge</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). <article-title>&#x0201C;Receptive fields without spike-triggering,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems 20</source>, eds <person-group person-group-type="editor"><name><surname>Platt,</surname> <given-names>J. C.</given-names></name> <name><surname>Koller,</surname> <given-names>D.</given-names></name> <name><surname>Singer,</surname> <given-names>Y.</given-names></name> <name><surname>Roweis</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Red Hook, NY</publisher-loc>: <publisher-name>Curran</publisher-name>), <fpage>969</fpage>&#x02013;<lpage>976</lpage>.</citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>E. K.</given-names></name></person-group> (<year>1999</year>). <article-title>The prefrontal cortex: complex neural properties for complex behavior</article-title>. <source>Neuron</source> <volume>22</volume>, <fpage>15</fpage>&#x02013;<lpage>17</lpage>.<pub-id pub-id-type="doi">10.1016/S0896-6273(00)80673-X</pub-id><pub-id pub-id-type="pmid">10027284</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Molgedey</surname> <given-names>L.</given-names></name> <name><surname>Schuster</surname> <given-names>H. G.</given-names></name></person-group> (<year>1994</year>). <article-title>Separation of a mixture of independent signals using time delayed correlations</article-title>. <source>Phys. Rev. Lett.</source> <volume>72</volume>, <fpage>3634</fpage>&#x02013;<lpage>3637</lpage>.<pub-id pub-id-type="doi">10.1103/PhysRevLett.72.3634</pub-id><pub-id pub-id-type="pmid">10056251</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Narayanan</surname> <given-names>N. S.</given-names></name> <name><surname>Laubach</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Delay activity in rodent frontal cortex during a simple reaction time task</article-title>. <source>J. Neurophysiol.</source> <volume>101</volume>, <fpage>2859</fpage>&#x02013;<lpage>2871</lpage>.<pub-id pub-id-type="doi">10.1152/jn.90615.2008</pub-id><pub-id pub-id-type="pmid">19339463</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Newsome</surname> <given-names>W. T.</given-names></name> <name><surname>Britten</surname> <given-names>K. H.</given-names></name> <name><surname>Movshon</surname> <given-names>J. A.</given-names></name></person-group> (<year>1989</year>). <article-title>Neuronal correlates of a perceptual decision</article-title>. <source>Nature</source> <volume>341</volume>, <fpage>52</fpage>&#x02013;<lpage>54</lpage>.<pub-id pub-id-type="doi">10.1038/341052a0</pub-id><pub-id pub-id-type="pmid">2770878</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nicolelis</surname> <given-names>M. A. L.</given-names></name> <name><surname>Baccala</surname> <given-names>L. A.</given-names></name> <name><surname>Lin</surname> <given-names>R. C. S.</given-names></name> <name><surname>Chapin</surname> <given-names>J. K.</given-names></name></person-group> (<year>1995</year>). <article-title>Sensorimotor encoding by synchronous neural ensemble activity at multiple levels of the somatosensory system</article-title>. <source>Science</source> <volume>268</volume>, <fpage>1353</fpage>&#x02013;<lpage>1358</lpage>.<pub-id pub-id-type="doi">10.1126/science.7761855</pub-id><pub-id pub-id-type="pmid">7761855</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paz</surname> <given-names>R.</given-names></name> <name><surname>Natan</surname> <given-names>C.</given-names></name> <name><surname>Boraud</surname> <given-names>T.</given-names></name> <name><surname>Berman</surname> <given-names>H.</given-names></name> <name><surname>Vaadia</surname> <given-names>E.</given-names></name></person-group> (<year>2005</year>). <article-title>Emerging patterns of neuronal responses in supplementary and primary motor areas during sensorimotor adaptation</article-title>. <source>J. Neurosci.</source> <volume>25</volume>, <fpage>10941</fpage>&#x02013;<lpage>10951</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.0164-05.2005</pub-id><pub-id pub-id-type="pmid">16306407</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>S. C.</given-names></name> <name><surname>Rainer</surname> <given-names>G.</given-names></name> <name><surname>Miller</surname> <given-names>E. K.</given-names></name></person-group> (<year>1997</year>). <article-title>Integration of what and where in the primate prefrontal cortex</article-title>. <source>Science</source> <volume>276</volume>, <fpage>821</fpage>&#x02013;<lpage>824</lpage>.<pub-id pub-id-type="doi">10.1126/science.276.5313.821</pub-id><pub-id pub-id-type="pmid">9115211</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romo</surname> <given-names>R.</given-names></name> <name><surname>Brody</surname> <given-names>C. D.</given-names></name> <name><surname>Hernandez</surname> <given-names>A.</given-names></name> <name><surname>Lemus</surname> <given-names>L.</given-names></name></person-group> (<year>1999</year>). <article-title>Neuronal correlates of parametric working memory in the prefrontal cortex</article-title>. <source>Nature</source> <volume>399</volume>, <fpage>470</fpage>&#x02013;<lpage>473</lpage>.<pub-id pub-id-type="doi">10.1038/20939</pub-id><pub-id pub-id-type="pmid">10365959</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romo</surname> <given-names>R.</given-names></name> <name><surname>Hern&#x000E1;ndez</surname> <given-names>A.</given-names></name> <name><surname>Zainos</surname> <given-names>A.</given-names></name> <name><surname>Lemus</surname> <given-names>L.</given-names></name> <name><surname>Brody</surname> <given-names>C. D.</given-names></name></person-group> (<year>2002</year>). <article-title>Neuronal correlates of decision-making in secondary somatosensory cortex</article-title>. <source>Nat. Neurosci.</source> <volume>5</volume>, <fpage>1217</fpage>&#x02013;<lpage>1225</lpage>.<pub-id pub-id-type="doi">10.1038/nn950</pub-id><pub-id pub-id-type="pmid">12368806</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salinas</surname> <given-names>E.</given-names></name> <name><surname>Abbott</surname> <given-names>L. F.</given-names></name></person-group> (<year>1994</year>). <article-title>Vector reconstruction from firing rates</article-title>. <source>J. Comput. Neurosci.</source> <volume>1</volume>, <fpage>89</fpage>&#x02013;<lpage>107</lpage>.<pub-id pub-id-type="doi">10.1007/BF00962720</pub-id><pub-id pub-id-type="pmid">8792227</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seo</surname> <given-names>H.</given-names></name> <name><surname>Barraclough</surname> <given-names>D. J.</given-names></name> <name><surname>Lee</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Lateral intraparietal cortex and reinforcement learning during a mixed-strategy game</article-title>. <source>J. Neurosci.</source> <volume>29</volume>, <fpage>7278</fpage>&#x02013;<lpage>7289</lpage>.<pub-id pub-id-type="doi">10.1523/JNEUROSCI.1479-09.2009</pub-id><pub-id pub-id-type="pmid">19494150</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seung</surname> <given-names>H. S.</given-names></name> <name><surname>Sompolinsky</surname> <given-names>H.</given-names></name></person-group> (<year>1993</year>). <article-title>Simple models for reading neuronal population codes</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>90</volume>, <fpage>10749</fpage>&#x02013;<lpage>10753</lpage>.<pub-id pub-id-type="doi">10.1073/pnas.90.22.10749</pub-id><pub-id pub-id-type="pmid">8248166</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stopfer</surname> <given-names>M.</given-names></name> <name><surname>Jayaraman</surname> <given-names>V.</given-names></name> <name><surname>Laurent</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). <article-title>Intensity versus identity coding in an olfactory system</article-title>. <source>Neuron</source> <volume>39</volume>, <fpage>991</fpage>&#x02013;<lpage>1004</lpage>.<pub-id pub-id-type="doi">10.1016/j.neuron.2003.08.011</pub-id><pub-id pub-id-type="pmid">12971898</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sugrue</surname> <given-names>L. P.</given-names></name> <name><surname>Corrado</surname> <given-names>G. S.</given-names></name> <name><surname>Newsome</surname> <given-names>W. T.</given-names></name></person-group> (<year>2004</year>). <article-title>Matching behavior and the representation of value in the parietal cortex</article-title>. <source>Science</source> <volume>304</volume>, <fpage>1782</fpage>&#x02013;<lpage>1787</lpage>.<pub-id pub-id-type="doi">10.1126/science.1094765</pub-id><pub-id pub-id-type="pmid">15205529</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uchida</surname> <given-names>N.</given-names></name> <name><surname>Mainen</surname> <given-names>Z. F.</given-names></name></person-group> (<year>2003</year>). <article-title>Speed and accuracy of olfactory discrimination in the rat</article-title>. <source>Nat. Neurosci.</source> <volume>6</volume>, <fpage>1224</fpage>&#x02013;<lpage>1229</lpage>.<pub-id pub-id-type="doi">10.1038/nn1142</pub-id><pub-id pub-id-type="pmid">14566341</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>B. M.</given-names></name> <name><surname>Cunningham</surname> <given-names>J. P.</given-names></name> <name><surname>Santhanam</surname> <given-names>G.</given-names></name> <name><surname>Ryu</surname> <given-names>S. I.</given-names></name> <name><surname>Shenoy</surname> <given-names>K. V.</given-names></name> <name><surname>Sahani</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity</article-title>. <source>J. Neurophysiol.</source> <volume>102</volume>, <fpage>614</fpage>&#x02013;<lpage>635</lpage>.<pub-id pub-id-type="doi">10.1152/jn.90941.2008</pub-id><pub-id pub-id-type="pmid">19357332</pub-id></citation></ref>
<ref id="B35"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zacksenhouse</surname> <given-names>M.</given-names></name> <name><surname>Nemets</surname> <given-names>S.</given-names></name></person-group> (<year>2008</year>). <article-title>&#x0201C;Strategies for neural ensemble data analysis for brain&#x02013;machine interface (BMI) applications,&#x0201D;</article-title> in <source>Methods for Neural Ensemble Recordings,</source> ed. <person-group person-group-type="editor"><name><surname>Nicolelis</surname> <given-names>M. A. L.</given-names></name></person-group> (<publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>CRC Press</publisher-name>), <fpage>57</fpage>&#x02013;<lpage>82</lpage>.</citation></ref>
</ref-list>
</back>
</article>