<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2016.00619</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Interpretability of Multivariate Brain Maps in Linear Brain Decoding: Definition, and Heuristic Quantification in Multivariate Analysis of MEG Time-Locked Effects</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kia</surname> <given-names>Seyed Mostafa</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/378424/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Vega Pons</surname> <given-names>Sandro</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/354592/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Weisz</surname> <given-names>Nathan</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/3120/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Passerini</surname> <given-names>Andrea</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Information Engineering and Computer Science, University of Trento</institution> <country>Trento, Italy</country></aff>
<aff id="aff2"><sup>2</sup><institution>Fondazione Bruno Kessler</institution> <country>Trento, Italy</country></aff>
<aff id="aff3"><sup>3</sup><institution>Pattern Analysis and Computer Vision, Istituto Italiano di Tecnologia</institution> <country>Genova, Italy</country></aff>
<aff id="aff4"><sup>4</sup><institution>Division of Physiological Psychology, Centre for Cognitive Neuroscience, University of Salzburg</institution> <country>Salzburg, Austria</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Alexandre Gramfort, Universit&#x000E9; Paris-Saclay (CNRS), France</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Denis A. Engemann, Unicog (CEA/INSERM) and ICM, France; Mads Jensen, Aarhus University, Denmark</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Seyed Mostafa Kia <email>seyedmostafa.kia&#x00040;unitn.it</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Brain Imaging Methods, a section of the journal Frontiers in Neuroscience</p></fn></author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>01</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>10</volume>
<elocation-id>619</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>09</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>12</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Kia, Vega Pons, Weisz and Passerini.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Kia, Vega Pons, Weisz and Passerini</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. Linear classifiers are widely employed in the brain decoding paradigm to discriminate among experimental conditions. Then, the derived linear weights are visualized in the form of multivariate brain maps to further study spatio-temporal patterns of underlying neural activities. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed definition, we exemplify a heuristic for approximating the interpretability in multivariate analysis of evoked magnetoencephalography (MEG) responses. Third, we propose to combine the approximated interpretability and the generalization performance of the brain decoding into a new multi-objective criterion for model selection. Our results, for the simulated and real MEG data, show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms in the future.</p></abstract>
<kwd-group>
<kwd>brain decoding</kwd>
<kwd>brain mapping</kwd>
<kwd>interpretation</kwd>
<kwd>model selection</kwd>
<kwd>MEG</kwd>
</kwd-group>
<counts>
<fig-count count="11"/>
<table-count count="3"/>
<equation-count count="29"/>
<ref-count count="105"/>
<page-count count="22"/>
<word-count count="15863"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Understanding the mechanisms of the brain has been a crucial topic throughout the history of science. Ancient Greek philosophers envisaged different functionalities for the brain ranging from cooling the body to acting as the seat of the rational soul and the center of sensation (Crivellato and Ribatti, <xref ref-type="bibr" rid="B21">2007</xref>). Modern cognitive science, emerging in the twentieth century, provides better insight into the brain&#x00027;s functionality. In cognitive science, researchers usually analyze recorded brain activity and behavioral parameters to discover the answers of <italic>where, when</italic>, and <italic>how</italic> a brain region participates in a particular cognitive process.</p>
<p>To answer the key questions in cognitive science, scientists often employ mass-univariate hypothesis testing methods to test scientific hypotheses on a large set of independent variables (Groppe et al., <xref ref-type="bibr" rid="B30">2011a</xref>; Maris, <xref ref-type="bibr" rid="B56">2012</xref>). Mass-univariate hypothesis testing is based on performing multiple tests, e.g., <italic>t</italic>-tests, one for each unit of the neuroimaging data, i.e., independent variables. The high spatial and temporal granularity of the univariate tests provides fair level of interpretability. On the down side, the high dimensionality of neuroimaging data requires a large number of tests that reduces the sensitivity of these methods after multiple comparison correction (Bzdok et al., <xref ref-type="bibr" rid="B15">2016</xref>). Although techniques such as the non-parametric cluster-based permutation test (Bullmore et al., <xref ref-type="bibr" rid="B14">1996</xref>; Maris and Oostenveld, <xref ref-type="bibr" rid="B57">2007</xref>), by weak rather strong control of family-wise error rate, offer more sensitivity, they still experience low sensitivity to brain activities that are narrowly distributed in time and space (Groppe et al., <xref ref-type="bibr" rid="B30">2011a</xref>,<xref ref-type="bibr" rid="B31">b</xref>). The multivariate counterpart of mass-univariate analysis, known generally as multivariate pattern analysis, have the potential to overcome these deficits. Multivariate approaches are capable of identifying complex spatio-temporal interactions between different brain areas with higher sensitivity and specificity than univariate analysis (van Gerven et al., <xref ref-type="bibr" rid="B86">2009</xref>), especially in group analysis of neuroimaging data (Davis et al., <xref ref-type="bibr" rid="B24">2014</xref>).</p>
<p><italic>Brain decoding</italic> (Haynes and Rees, <xref ref-type="bibr" rid="B42">2006</xref>) is a multivariate technique that delivers a model to predict the mental state of a human subject based on the recorded brain signal. There are two potential applications for brain decoding: (1) brain-computer interfaces (BCIs) (Wolpaw et al., <xref ref-type="bibr" rid="B96">2002</xref>), and (2) multivariate hypothesis testing (Bzdok et al., <xref ref-type="bibr" rid="B15">2016</xref>). In the first case, a brain decoder with maximum prediction power is desired. In the second case, in addition to the prediction power, extra information on the spatio-temporal nature of a cognitive process is desired. In this study, we are interested in the second application of brain decoding that can be considered a multivariate alternative for mass-univariate hypothesis testing. Further, we mainly focus on the linear brain decoding because of its wider usage in analyzing inherently small sample size and high dimensional neuroimaging data, compared to the complex (Cox and Savoy, <xref ref-type="bibr" rid="B20">2003</xref>; LaConte et al., <xref ref-type="bibr" rid="B51">2005</xref>) and non-transparent (Lipton et al., <xref ref-type="bibr" rid="B55">2016</xref>) non-linear models.</p>
<p>In linear brain decoding, linear classifiers are used to assess the relation between independent variables, i.e., features, and dependent variables, i.e., cognitive tasks (Besserve et al., <xref ref-type="bibr" rid="B8">2007</xref>; Pereira et al., <xref ref-type="bibr" rid="B69">2009</xref>; Lemm et al., <xref ref-type="bibr" rid="B53">2011</xref>). This assessment is performed by solving an optimization problem that assigns weights to each independent variable. Currently, brain decoding is the gold standard in multivariate analysis for functional magnetic resonance imaging (fMRI) (Haxby et al., <xref ref-type="bibr" rid="B40">2001</xref>; Cox and Savoy, <xref ref-type="bibr" rid="B20">2003</xref>; Mitchell et al., <xref ref-type="bibr" rid="B60">2004</xref>; Norman et al., <xref ref-type="bibr" rid="B65">2006</xref>) and magnetoencephalogram/electroencephalogram (MEEG) studies (Parra et al., <xref ref-type="bibr" rid="B68">2003</xref>; Rieger et al., <xref ref-type="bibr" rid="B71">2008</xref>; Carroll et al., <xref ref-type="bibr" rid="B17">2009</xref>; Chan et al., <xref ref-type="bibr" rid="B18">2011</xref>; Huttunen et al., <xref ref-type="bibr" rid="B44">2013</xref>; Vidaurre et al., <xref ref-type="bibr" rid="B93">2013</xref>; Abadi et al., <xref ref-type="bibr" rid="B1">2015</xref>). It has been shown that brain decoding can be used in combination with brain encoding (Naselaris et al., <xref ref-type="bibr" rid="B64">2011</xref>) to infer the causal relationship between stimuli and responses (Weichwald et al., <xref ref-type="bibr" rid="B95">2015</xref>).</p>
<p>In <italic>Brain mapping</italic> (Kriegeskorte et al., <xref ref-type="bibr" rid="B49">2006</xref>) the pre-computed quantities, e.g., univariate statistics or weights of a linear classifier, are assigned to the spatio-temporal representation of neuroimaging data in order to reveal functionally specialized brain regions which are activated by a certain cognitive task. In its multivariate form, brain mapping uses the learned parameters from brain decoding to produce brain maps, in which the engagement of different brain areas in a cognitive task is visualized. Intuitively, the interpretability of a brain decoder refers to the level of information that can be reliably derived by an expert from the resulting maps. From the cognitive neuroscience perspective, a brain map is considered <italic>interpretable</italic> if it enables a scientist to find out the answers of three key questions: &#x0201C;<italic>where, when</italic>, and <italic>how</italic> does a brain region contribute to a cognitive function?&#x0201D;</p>
<p>In fact, a classifier only answers the question of <italic>what</italic> is the most likely label of a given unseen sample (Baehrens et al., <xref ref-type="bibr" rid="B6">2010</xref>). This fact is generally known as knowledge extraction gap (Vellido et al., <xref ref-type="bibr" rid="B92">2012</xref>) in the machine learning context. Thus far, many efforts have been devoted to filling the knowledge extraction gap of linear and non-linear data modeling methods in different areas such as computer vision (Bach et al., <xref ref-type="bibr" rid="B5">2015</xref>), signal processing (Montavon et al., <xref ref-type="bibr" rid="B61">2013</xref>), chemometrics (Yu et al., <xref ref-type="bibr" rid="B102">2015</xref>), bioinformatics (Hansen et al., <xref ref-type="bibr" rid="B35">2011</xref>), and neuroinformatics (Haufe et al., <xref ref-type="bibr" rid="B38">2013</xref>). In the context of neuroimaging, the knowledge extraction gap in classification is generally known as the interpretation problem (Sabuncu, <xref ref-type="bibr" rid="B74">2014</xref>; Haynes, <xref ref-type="bibr" rid="B41">2015</xref>; Naselaris and Kay, <xref ref-type="bibr" rid="B63">2015</xref>). Therefore, improving the interpretability of linear brain decoding and associated brain maps is a primary goal in the brain imaging literature (Strother et al., <xref ref-type="bibr" rid="B77">2014</xref>). The lack of interpretability of multivariate brain maps is a direct consequence of low signal-to-noise ratios (SNRs), high dimensionality of whole-scalp recordings, high correlations among different dimensions of data, and cross-subject variability (Besserve et al., <xref ref-type="bibr" rid="B8">2007</xref>; Anderson et al., <xref ref-type="bibr" rid="B4">2011</xref>; Blankertz et al., <xref ref-type="bibr" rid="B10">2011</xref>; Brodersen et al., <xref ref-type="bibr" rid="B13">2011</xref>; Langs et al., <xref ref-type="bibr" rid="B52">2011</xref>; Lemm et al., <xref ref-type="bibr" rid="B53">2011</xref>; Varoquaux et al., <xref ref-type="bibr" rid="B88">2012</xref>; Kauppi et al., <xref ref-type="bibr" rid="B46">2013</xref>; Haufe et al., <xref ref-type="bibr" rid="B37">2014a</xref>; Olivetti et al., <xref ref-type="bibr" rid="B66">2014</xref>; Taulu et al., <xref ref-type="bibr" rid="B78">2014</xref>; Varoquaux and Thirion, <xref ref-type="bibr" rid="B91">2014</xref>; Haynes, <xref ref-type="bibr" rid="B41">2015</xref>; Wang et al., <xref ref-type="bibr" rid="B94">2015</xref>). At present, two main approaches are proposed to enhance the interpretability of multivariate brain maps: (1) introducing new metrics into the model selection procedure, and (2) introducing new hybrid penalty terms for regularization.</p>
<p>The first approach to improving the interpretability of brain decoding concentrates on the model selection procedure. Model selection is a procedure in which the best values for the hyper-parameters of a model are determined (Lemm et al., <xref ref-type="bibr" rid="B53">2011</xref>). The selection process is generally performed by considering the generalization performance, i.e., the accuracy, of a model as the decisive criterion. Rasmussen et al. (<xref ref-type="bibr" rid="B70">2012</xref>) showed that there is a trade-off between the spatial reproducibility and the prediction accuracy of a classifier; therefore, the reliability of maps cannot be assessed merely by focusing on their prediction accuracy. To utilize this finding, they incorporated the spatial reproducibility of brain maps in the model selection procedure. An analogous approach, using a different definition of spatial reproducibility, is proposed by Conroy et al. (<xref ref-type="bibr" rid="B19">2013</xref>). Beside spatial reproducibility, the stability of the classifiers (Bousquet and Elisseeff, <xref ref-type="bibr" rid="B11">2002</xref>) is another criterion that is used in combination with generalization performance to enhance the interpretability. For example Yu (<xref ref-type="bibr" rid="B101">2013</xref>) and Lim and Yu (<xref ref-type="bibr" rid="B54">2016</xref>) showed that incorporating the stability of models into cross-validation improves the interpretability of the estimated parameters (by linear models).</p>
<p>The second approach to improving the interpretability of brain decoding focuses on the underlying mechanism of regularization. The main idea behind this approach is two-fold: 1) customizing the regularization terms to address the ill-posed nature of brain decoding problems (where the number of samples is much less than the number of features; M&#x000F8;rch et al., <xref ref-type="bibr" rid="B62">1997</xref>; Varoquaux and Thirion, <xref ref-type="bibr" rid="B91">2014</xref>), and (2) combining the structural and functional prior knowledge with the decoding process so as to enhance the neurophysiological plausibility of the models. Group Lasso (Yuan and Lin, <xref ref-type="bibr" rid="B103">2006</xref>) and total-variation penalty (Tibshirani et al., <xref ref-type="bibr" rid="B81">2005</xref>) are two effective methods using this technique (Rish et al., <xref ref-type="bibr" rid="B72">2014</xref>; Xing et al., <xref ref-type="bibr" rid="B99">2014</xref>). Sparse penalized discriminant analysis (Grosenick et al., <xref ref-type="bibr" rid="B32">2008</xref>), group-wise regularization (van Gerven et al., <xref ref-type="bibr" rid="B86">2009</xref>), smoothed-sparse logistic regression (de Brecht and Yamagishi, <xref ref-type="bibr" rid="B25">2012</xref>), total-variation &#x02113;<sub>1</sub> penalization (Michel et al., <xref ref-type="bibr" rid="B59">2011</xref>; Gramfort et al., <xref ref-type="bibr" rid="B28">2013</xref>), the graph-constrained elastic-net (Grosenick et al., <xref ref-type="bibr" rid="B33">2009</xref>, <xref ref-type="bibr" rid="B34">2013</xref>), and social-sparsity (Varoquaux et al., <xref ref-type="bibr" rid="B89">2016</xref>) are examples of brain decoding methods in which regularization techniques are employed to improve the interpretability of linear brain decoding models.</p>
<p>Recently, taking a new approach to the problem, Haufe et al. questioned the interpretability of weights of linear classifiers because of the contribution of noise in the decoding process (Bie&#x000DF;mann et al., <xref ref-type="bibr" rid="B9">2012</xref>; Haufe et al., <xref ref-type="bibr" rid="B38">2013</xref>, <xref ref-type="bibr" rid="B39">2014b</xref>). To address this problem, they proposed a procedure to convert the linear brain decoding models into their equivalent generative models. Their experiments on the simulated and fMRI/EEG data illustrate that, whereas the direct interpretation of classifier weights may cause severe misunderstanding regarding the actual underlying effect, their proposed transformation effectively provides interpretable maps. Despite the theoretical soundness, the intricate challenge of estimating the empirical covariance matrix of the small sample size neuroimaging data (Blankertz et al., <xref ref-type="bibr" rid="B10">2011</xref>) limits the practical application of this method.</p>
<p>In spite of the aforementioned efforts to improve the interpretability of brain decoding, there is still no formal definition for the interpretability of brain decoding in the literature. Therefore, the interpretability of different brain decoding methods are evaluated either qualitatively or indirectly (i.e., by means of an intermediate property). In qualitative evaluation, to show the superiority of one decoding method over the other (or a univariate map), the corresponding brain maps are compared visually in terms of smoothness, sparseness, and coherency using already known facts (see for example, Varoquaux et al., <xref ref-type="bibr" rid="B88">2012</xref>). In the second approach, important factors in interpretability such as spatio-temporal reproducibility are evaluated to indirectly assess the interpretability of results (see for example, Langs et al., <xref ref-type="bibr" rid="B52">2011</xref>; Rasmussen et al., <xref ref-type="bibr" rid="B70">2012</xref>; Conroy et al., <xref ref-type="bibr" rid="B19">2013</xref>; Kia et al., <xref ref-type="bibr" rid="B47">2016</xref>). Despite partial effectiveness, there is no general consensus regarding the quantification of these intermediate criteria. For example, in the case of spatial reproducibility, different methods such as correlation (Rasmussen et al., <xref ref-type="bibr" rid="B70">2012</xref>; Kia et al., <xref ref-type="bibr" rid="B47">2016</xref>), dice score (Langs et al., <xref ref-type="bibr" rid="B52">2011</xref>), or parameter variability (Conroy et al., <xref ref-type="bibr" rid="B19">2013</xref>; Haufe et al., <xref ref-type="bibr" rid="B38">2013</xref>) are used for quantifying the stability of brain maps, each of which considers different aspects of local or global reproducibility.</p>
<p>With the aim of filling this gap, our contribution is three-fold: (1) Assuming that the true solution of brain decoding is available, we present a theoretical definition of the interpretability. The presented definition is simply based on cosine proximity in the parameter space. Furthermore, we show that the interpretability can be decomposed into the reproducibility and the representativeness of brain maps. (2) As a proof of the concept, we exemplify a practical heuristic based on event-related fields for quantifying the interpretability of brain maps in time-locked analysis of MEG data. (3) Finally, we propose the combination of the interpretability and the performance of the brain decoding as a new Pareto optimal multi-objective criterion for model selection. We experimentally, on both simulated and real data, show that incorporating the interpretability into the model selection procedure provides more reproducible, more neurophysiologically plausible, and (as a result) more interpretable maps. Furthermore, in comparison with a standard univariate analysis, we show the proposed paradigm offers more sensitivity while preserving the interpretability of results.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and methods</title>
<sec>
<title>2.1. Notation and background</title>
<p>Let <inline-formula><mml:math id="M30"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">X</mml:mi></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> be a manifold in Euclidean space that represents the input space and <inline-formula><mml:math id="M31"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mi>&#x0211D;</mml:mi></mml:math></inline-formula> be the output space, where <inline-formula><mml:math id="M32"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">X</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Then, let <italic>S</italic> &#x0003D; {<bold>Z</bold> &#x0003D; (<bold>X</bold>, <bold>Y</bold>) &#x0007C; <italic>z</italic><sub>1</sub> &#x0003D; (<italic>x</italic><sub>1</sub>, <italic>y</italic><sub>1</sub>), &#x02026;, <italic>z</italic><sub><italic>n</italic></sub> &#x0003D; (<italic>x</italic><sub><italic>n</italic></sub>, <italic>y</italic><sub><italic>n</italic></sub>)} be a training set of <italic>n</italic> independently and identically distributed (i.i.d) samples drawn from the joint distribution of <inline-formula><mml:math id="M33"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Z</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">X</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:mrow></mml:math></inline-formula> based on an unknown Borel probability measure &#x003C1;. In the neuroimaging context, <bold>X</bold> indicates the trials of brain recording, e.g., fMRI, MEG, or EEG signals, <bold>Y</bold> represents the experimental conditions or dependent variables, and we have &#x003A6;<sub><italic>S</italic></sub> : <bold>X</bold> &#x02192; <bold>Y</bold> (note the difference between &#x003A6;<sub><italic>S</italic></sub> and &#x003A6;<sup>&#x0002A;</sup>). The goal of brain decoding is to find the function <inline-formula><mml:math id="M34"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>:</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">X</mml:mtext></mml:mstyle><mml:mo>&#x02192;</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">Y</mml:mtext></mml:mstyle></mml:math></inline-formula> as an estimation of &#x003A6;<sub><italic>S</italic></sub>. Here on we refer to <inline-formula><mml:math id="M35"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> as a brain decoding model.</p>
<p>As is a common assumption in the neuroimaging context, we assume the true solution of a brain decoding problem is among the family of linear functions <inline-formula><mml:math id="M36"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M37"><mml:msup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula>). Therefore, the aim of brain decoding reduces to finding an empirical approximation of &#x003A6;<sub><italic>S</italic></sub>, indicated by <inline-formula><mml:math id="M38"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, among all <inline-formula><mml:math id="M39"><mml:mo>&#x003A6;</mml:mo><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula>. This approximation can be obtained by estimating the predictive conditional density &#x003C1;(<bold>Y</bold> &#x0007C; <bold>X</bold>) by training a parametric model &#x003C1;(<bold>Y</bold> &#x0007C; <bold>X</bold>, &#x00398;) (i.e., a likelihood function), where &#x00398; denotes the parameters of the model. Alternatively, &#x00398; can be estimated by solving a risk minimization problem:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle><mml:mo>&#x00398;</mml:mo><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold'><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mi>&#x003A9;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M40"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is the parameter of <inline-formula><mml:math id="M41"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, <inline-formula><mml:math id="M42"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mo>:</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">Z</mml:mtext></mml:mstyle><mml:mo>&#x000D7;</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">Z</mml:mtext></mml:mstyle><mml:mo>&#x02192;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> is the loss function, &#x003A9; : &#x0211D;<sup><italic>p</italic></sup> &#x02192; &#x0211D;<sup>&#x0002B;</sup> is the regularization term, and &#x003BB; is a hyper-parameter that controls the amount of regularization. There are various choices for &#x003A9;, each of which reduces the hypothesis space <inline-formula><mml:math id="M43"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M44"><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02282;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula> by enforcing different prior functional or structural constraints on the parameters of the linear decoding model (see for example, Tibshirani, <xref ref-type="bibr" rid="B80">1996b</xref>; Tibshirani et al., <xref ref-type="bibr" rid="B81">2005</xref>; Zou and Hastie, <xref ref-type="bibr" rid="B105">2005</xref>; Jenatton et al., <xref ref-type="bibr" rid="B45">2011</xref>). The amount of regularization &#x003BB; is generally decided using cross-validation or other data perturbation methods in the model selection procedure.</p>
<p>In the neuroimaging context, the estimated parameters of a linear decoding model <inline-formula><mml:math id="M45"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> can be used in the form of a brain map so as to visualize the discriminative neurophysiological effect. Although the magnitude of <inline-formula><mml:math id="M46"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> (i.e., the 2-norm of <inline-formula><mml:math id="M47"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>) is affected by the dynamic range of data and the level of regularization, it has no effect on the predictive power and the interpretability of maps. On the other hand, the direction of <inline-formula><mml:math id="M48"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> affects the predictive power and contains information regarding the importance of and relations among predictors. This type of relational information is very useful when interpreting brain maps in which the relation between different spatio-temporal independent variables can be used to describe how different brain regions interact over time for a certain cognitive process. Therefore, we refer to the normalized parameter vector of a linear brain decoder in the unit hyper-sphere as a multivariate brain map (MBM); we denote it by <inline-formula><mml:math id="M49"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> where <inline-formula><mml:math id="M50"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mfrac><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mo>&#x00398;</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> (||.||<sub>2</sub> represents the 2-norm of a vector).</p>
<p>As shown in Equation (1), learning occurs using the sampled data. In other words, in the learning paradigm, we attempt to minimize the loss function with respect to &#x003A6;<sub><italic>S</italic></sub> (and not &#x003A6;<sup>&#x0002A;</sup>) (Cucker and Smale, <xref ref-type="bibr" rid="B22">2002</xref>). Therefore, all of the implicit assumptions (such as linearity) regarding &#x003A6;<sup>&#x0002A;</sup> might not hold on &#x003A6;<sub><italic>S</italic></sub>, and vice versa. The <italic>irreducible error</italic> &#x003B5; is the direct consequence of sampling; it sets a lower bound on the error, where we have:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B5;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The distribution of &#x003B5; dictates the type of loss function <inline-formula><mml:math id="M51"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:math></inline-formula> in Equation (1). For example, assuming a Gaussian distribution with mean 0 and variance &#x003C3;<sup>2</sup> for &#x003B5; implies the least squares loss function (Wu et al., <xref ref-type="bibr" rid="B98">2006</xref>).</p>
</sec>
<sec>
<title>2.2. Interpretability of multivariate brain maps: theoretical definition</title>
<p>In this section, we present a theoretical definition for the interpretability of linear brain decoding models and their associated MBMs. Consider a linearly separable brain decoding problem in an ideal scenario where &#x003B5; &#x0003D; 0 and <italic>rank</italic>(<bold>X</bold>) &#x0003D; <italic>p</italic>. In this case, the ideal solution of brain decoding, &#x003A6;<sup>&#x0002A;</sup>, is linear and its parameters &#x00398;<sup>&#x0002A;</sup> are <italic>unique</italic> and neurophysiologically <italic>plausible</italic> (van Ede and Maris, <xref ref-type="bibr" rid="B85">2016</xref>). The unique parameter vector &#x00398;<sup>&#x0002A;</sup> can be computed as follows:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mstyle mathvariant='bold'><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003A3;<sub><bold>X</bold></sub> represents the covariance of <bold>X</bold>. Using &#x00398;<sup>&#x0002A;</sup> as the reference, we define the <italic>strong-interpretability</italic> of an MBM as follows:
<list list-type="simple">
<list-item><p>Definition 1. An MBM <inline-formula><mml:math id="M52"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> associated with a linear brain decoding model <inline-formula><mml:math id="M53"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is &#x0201C;strongly-interpretable&#x0201D; if and only if <inline-formula><mml:math id="M54"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>&#x0221D;</mml:mo><mml:msup><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>.</p></list-item>
</list></p>
<p>It can be shown that, in practice, the estimated solution of a linear brain decoding problem is not strongly-interpretable because of the inherent limitations of neuroimaging data, such as uncertainty (Aggarwal and Yu, <xref ref-type="bibr" rid="B3">2009</xref>) in the input and output space (&#x003B5; &#x02260; 0), the high dimensionality of data (<italic>n</italic> &#x0226A; <italic>p</italic>), and the high correlation between predictors (<italic>rank</italic>(<bold>X</bold>) &#x0003C; <italic>p</italic>). With these limitations in mind, even though in practice the solution of linear brain decoding is not strongly-interpretable, one can argue that some are more interpretable than others. For example, in the case in which &#x00398;<sup>&#x0002A;</sup> &#x0221D; [0, 1]<sup><italic>T</italic></sup>, a linear classifier where <inline-formula><mml:math id="M55"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>&#x0221D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> can be considered more interpretable than a linear classifier where <inline-formula><mml:math id="M56"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>&#x0221D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. This issue raises the following question:
<list list-type="simple">
<list-item><p>Problem 1. Let <italic>S</italic> be a training set of <italic>n</italic> i.i.d samples drawn from the joint distribution of <inline-formula><mml:math id="M57"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Z</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">X</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:mrow></mml:math></inline-formula>, and <italic>P</italic>(<italic>S</italic>) be the probability of drawing a certain <italic>S</italic> from <inline-formula><mml:math id="M58"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">Z</mml:mi></mml:mrow></mml:math></inline-formula>. Assume <inline-formula><mml:math id="M59"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> is the MBM of a linear brain decoding model <inline-formula><mml:math id="M60"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> on <italic>S</italic> (estimated using Equation 1 for a certain loss function <inline-formula><mml:math id="M61"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:math></inline-formula>, regularization term &#x003A9;, and hyper-parameter &#x003BB;). How can we quantify the proximity of <inline-formula><mml:math id="M62"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> to the strongly-intrepretable solution of the brain decoding problem &#x003A6;<sup>&#x0002A;</sup>?</p></list-item>
</list></p>
<p>To answer this question, considering the uniqueness and the plausibility of &#x003A6;<sup>&#x0002A;</sup> as the two main characteristics that convey its strong-interpretability, we define the interpretability as follows:
<list list-type="simple">
<list-item><p>Definition 2. Let <italic>S</italic>, <italic>P</italic>(<italic>S</italic>), and <inline-formula><mml:math id="M63"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> be as defined in Problem 1. Then, assume &#x003B1; be the angle between <inline-formula><mml:math id="M64"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> and <inline-formula><mml:math id="M65"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>. The &#x0201C;interpretability&#x0201D; (0 &#x02264; &#x003B7;<sub>&#x003A6;</sub> &#x02264; 1) of a linear brain decoding model <inline-formula><mml:math id="M66"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is defined as follows:
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x1D53C;</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo class="qopname">cos</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p></list-item>
</list></p>
<p>In practice, only a limited number of samples are available. Therefore, perturbation techniques are used to imitate the sampling procedure. Let <italic>S</italic><sup>1</sup>, &#x02026;, <italic>S</italic><sup><italic>m</italic></sup> be <italic>m</italic> perturbed training sets drawn from <italic>S</italic> via a certain perturbation scheme such as jackknife, bootstrapping (Efron, <xref ref-type="bibr" rid="B27">1992</xref>), or cross-validation (Kohavi, <xref ref-type="bibr" rid="B48">1995</xref>). Assume <inline-formula><mml:math id="M67"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are <italic>m</italic> MBMs estimated on the corresponding perturbed training sets, and &#x003B1;<sup><italic>j</italic></sup> (<italic>j</italic> &#x0003D; 1, &#x02026;, <italic>m</italic>) be the angle between <inline-formula><mml:math id="M68"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M69"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>. Then, the empirical version of Equation (4) can be rewritten as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo class="qopname">cos</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Empirically, the interpretability is the mean of cosine similarities between &#x00398;<sup>&#x0002A;</sup> and MBMs derived fro different samplings of the training set (see Figure <xref ref-type="fig" rid="F1">1A</xref> for a schematic illustration). In addition to the fact that employing cosine similarity is a common method for measuring the similarity between vectors, we have another strong motivation for this choice. It can be shown that, for large values of <italic>p</italic>, the distribution of the dot product in the unit hyper-sphere, i.e., the cosine similarity, converges to a normal distribution with 0 mean and variance of <inline-formula><mml:math id="M70"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>, i.e., <inline-formula><mml:math id="M71"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msqrt><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msqrt></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Due to the small variance for a large enough <italic>p</italic> values, any similarity value that is significantly larger than zero represents a meaningful similarity between two high dimensional vectors (see Appendix 6.3 for the mathematical demonstration).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>A schematic illustrations for (A)</bold> interpretability (&#x003B7;<sub>&#x003A6;</sub>), <bold>(B)</bold> reproducibility (&#x003C8;<sub>&#x003A6;</sub>), and <bold>(C)</bold> representativeness (&#x003B2;<sub>&#x003A6;</sub>) of a linear decoding model in two dimensions. <bold>(D)</bold> The independent effects of the reproducibility and the representativeness of a model on its interpretability.</p></caption>
<graphic xlink:href="fnins-10-00619-g0001.tif"/>
</fig>
<p>In what follows, we demonstrate how the definition of interpretability is geometrically related to the uniqueness and plausibility characteristics of the true solution of the brain decoding problem.</p>
</sec>
<sec>
<title>2.3. Interpretability decomposition into reproducibility and representativeness</title>
<p>The trustworthy and informativeness of decoding models are providing two important motivations for improving the interpretability of models (Lipton et al., <xref ref-type="bibr" rid="B55">2016</xref>). The trust of a learning algorithm refers to its ability to converge to a unique solution. On the other hand, the informativeness refers to the level of plausible information that can be derived from a model to assist or advise to a human expert. Therefore, it is expected that the interpretability can be quantified alternatively by assessing its uniqueness and neurophysiological plausibility. In this section, we firstly define the reproducibility and representativeness as measures for quantifying the uniqueness and neurophysiological plausibility of a brain decoding model, respectively. Then we show how these definitions are related to the definition of interpretability.</p>
<p>The high dimensionality and the high correlations between variables are two inherent characteristics of neuroimaging data that negatively affect the uniqueness of the solution of a brain decoding problem. Therefore, a certain configuration of hyper-parameters may result different estimated parameters on different portions of data. Here, we are interested in assessing this variability as a measure for uniqueness. We first define the <italic>main multivariate brain map</italic> as follows:
<list list-type="simple">
<list-item><p>Definition 3. Let <italic>S</italic>, <italic>P</italic>(<italic>S</italic>), and <inline-formula><mml:math id="M72"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> be as defined in Problem 1. The &#x0201C;main multivariate brain map&#x0201D; <inline-formula><mml:math id="M73"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> of a linear brain decoding model <inline-formula><mml:math id="M74"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is defined as:
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x1D53C;</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>&#x1D53C;</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p></list-item>
</list></p>
<p>Assuming <inline-formula><mml:math id="M75"><mml:msubsup><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> be the <italic>i</italic>th (<italic>i</italic> &#x0003D; 1, &#x02026;, <italic>p</italic>) element of an MBM estimated on the <italic>j</italic>th (<italic>j</italic> &#x0003D; 1, &#x02026;, <italic>m</italic>) perturbed training set, <inline-formula><mml:math id="M76"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> empirically can be estimated by summing up <inline-formula><mml:math id="M77"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>s (computed on the perturbed training set <italic>S</italic><sup><italic>j</italic></sup>) in the unit hyper-sphere:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:msup><mml:mover accent='true'><mml:mi>&#x00398;</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>&#x003B8;</mml:mi><mml:mn>1</mml:mn><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:mstyle><mml:mtext>&#x02009;</mml:mtext><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>&#x003B8;</mml:mi><mml:mn>2</mml:mn><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:mstyle><mml:mo>&#x02026;</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>&#x003B8;</mml:mi><mml:mi>p</mml:mi><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:mstyle></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>&#x003B8;</mml:mi><mml:mn>1</mml:mn><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:mstyle><mml:mtext>&#x02009;</mml:mtext><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>&#x003B8;</mml:mi><mml:mn>2</mml:mn><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:mstyle><mml:mo>&#x02026;</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>&#x003B8;</mml:mi><mml:mi>p</mml:mi><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:mstyle></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow><mml:mo>&#x02016;</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p><inline-formula><mml:math id="M78"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> provides a reference for quantifying the reproducibility of an MBM:
<list list-type="simple">
<list-item><p>Definition 4. Let <italic>S</italic>, <italic>P</italic>(<italic>S</italic>), and <inline-formula><mml:math id="M79"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:math></inline-formula> be as defined in Problem 1, and <inline-formula><mml:math id="M80"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> be the main multivariate brain map of <inline-formula><mml:math id="M81"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>. Then, assume &#x003B1; be the angle between <inline-formula><mml:math id="M82"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M83"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. The &#x0201C;reproducibility&#x0201D; &#x003C8;<sub>&#x003A6;</sub> (0 &#x02264; &#x003C8;<sub>&#x003A6;</sub> &#x02264; 1) of a linear brain decoding model <inline-formula><mml:math id="M84"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is defined as follows:
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo class="qopname">cos</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p></list-item>
</list></p>
<p>Let <inline-formula><mml:math id="M85"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are <italic>m</italic> MBMs estimated on the corresponding perturbed training sets, and &#x003B1;<sup><italic>j</italic></sup> (<italic>j</italic> &#x0003D; 1, &#x02026;, <italic>m</italic>) be the angle between <inline-formula><mml:math id="M86"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M87"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Then, the empirical version of Eq. 8 can be rewritten as follows:</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo class="qopname">cos</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In fact, reproducibility provides a measure for quantifying the dispersion of MBMs, computed over different perturbed training sets, from the main multivariate brain map. Figure <xref ref-type="fig" rid="F1">1B</xref> shows a schematic illustration for the reproducibility of a linear brain decoding model.</p>
<p>On the other hand, the similarity between the main multivariate brain map of a decoder and the true solution can be employed as a measure for the neurophysiological plausibility of a model. We refer to this similarity as the <italic>representativeness</italic> of a linear brain decoding model:
<list list-type="simple">
<list-item><p>Definition 5. Let <inline-formula><mml:math id="M88"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> be the main multivariate brain map of <inline-formula><mml:math id="M89"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>. The &#x0201C;representativeness&#x0201D; &#x003B2;<sub>&#x003A6;</sub> (0 &#x02264; &#x003B2;<sub>&#x003A6;</sub> &#x02264; 1) of a linear brain decoding model <inline-formula><mml:math id="M90"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is defined as the cosine similarity between its main multivariate brain map (<inline-formula><mml:math id="M91"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>) and the parameters of the true solution (<inline-formula><mml:math id="M92"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>) (see Figure <xref ref-type="fig" rid="F1">1C</xref>):
<disp-formula id="E10"><label>(10)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>.</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p></list-item>
</list></p>
<p>As discussed before, the notion of interpretabilty is tightly related to the uniqueness and plausibility, thus to the reproducibility and representativeness, of a decoding model. The following proposition analytically shows this relationship:
<list list-type="simple">
<list-item><p>Proposition 1. &#x003B7;<sub>&#x003A6;</sub> &#x0003D; &#x003B2;<sub>&#x003A6;</sub> &#x000D7; &#x003C8;<sub>&#x003A6;</sub>.</p></list-item>
</list></p>
<p>See Appendix 6.1 for a proof. Proposition 1 indicates the interpretability of a linear brain decoding model can be decomposed into its representativeness and reproducibility. Figure <xref ref-type="fig" rid="F1">1D</xref> illustrates how the reproducibility and the representativeness of a decoding model independently affect its interpretability. Each colored region schematically represents a span of different solutions of the a certain linear model (for example with a certain configuration for its hyper-parameters) on different perturbed training sets. The area of each region schematically visualizes the reproducibility of each model, i.e., the less is the area, the higher is the reproducibility of a model. Further, the angular distance between the centroid of each region (&#x00398;<sup>&#x003BC;</sup>) and the true solution (&#x00398;<sup>&#x0002A;</sup>) visualizes the representativeness of each corresponding model. While &#x003A6;<sub>1</sub> and &#x003A6;<sub>2</sub> have similar reproducibility, &#x003A6;<sub>2</sub> has higher interpretability than &#x003A6;<sub>1</sub> because it is more representative of the true solution. On the other hand, &#x003A6;<sub>1</sub> and &#x003A6;<sub>3</sub> have similar representativeness, however &#x003A6;<sub>3</sub> is more interpretable due to the higher level of reproducibility.</p>
</sec>
<sec>
<title>2.4. A heuristic for practical quantification of interpretability in time-locked analysis of MEG data</title>
<p>In practice, it is impossible to evaluate the interpretability, as the true solution of the brain decoding problem &#x003A6;<sup>&#x0002A;</sup> is unknown. In this study, to provide a practical proof of theoretical concepts, we exemplify contrast event-related field (cERF) as a neurophysiological plausible heuristic for the true parameters of the linear brain decoding problem (&#x00398;<sup>&#x0002A;</sup>) in a binary MEG decoding scenario in time domain. Due to the nature of proposed heuristic, its application is limited to the brain responses that are time-locked to the stimulus onset, i.e., the evoked responses.</p>
<p>The MEEG data are a mixture of several simultaneous stimulus-related and stimulus-unrelated brain activities. Assessing the electro/magneto-physiological changes that are time-locked to events of interest is a common approach to the study of MEEG data. In general, unrelated-stimulus brain activities are considered as Gaussian noise with zero mean and variance &#x003C3;<sup>2</sup>. One popular approach to canceling the noise component is to compute the average of multiple trials. The assumption is that, when the effect of interest is time-locked to the stimulus onset, the independent noise components can be vanished by means of averaging. It is expected that the average will converge to the true value of the signal with a variance of <inline-formula><mml:math id="M93"><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula> (where <italic>n</italic> is the number of trials). The result of the averaging process consist of a series of positive and negative peaks occurring at a fixed time relative to the event onset, generally known as ERF in the MEG context. These component peaks are reflecting phasic activity that are indexed with different aspects of cognitive processing (Rugg and Coles, <xref ref-type="bibr" rid="B73">1995</xref>)<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>.</p>
<p>Assume <inline-formula><mml:math id="M94"><mml:msup><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">X</mml:mtext></mml:mstyle><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup><mml:mo>&#x000D7;</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M95"><mml:msup><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">X</mml:mtext></mml:mstyle><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup><mml:mo>&#x000D7;</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> be sets of positive and negative samples in a binary MEG decoding scenario. Then, the cERF brain map <inline-formula><mml:math id="M96"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is computed as follows:</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Generally speaking <inline-formula><mml:math id="M97"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is a contrast ERF map between two experimental conditions. Using the core theory presented in Haufe et al. (<xref ref-type="bibr" rid="B38">2013</xref>), the equivalent generative model for the solution of linear brain decoding, i.e., the activation pattern (<italic>A</italic>), is unique and we have:</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mo>&#x0221D;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow></mml:msub><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Assuming <inline-formula><mml:math id="M98"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> to be the solution of least squares in a binary decoding scenario, then the following proposition describes the relation between <inline-formula><mml:math id="M99"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and the activation pattern <italic>A</italic>:
<list list-type="simple">
<list-item><p>Proposition 2. <inline-formula><mml:math id="M100"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0221D;</mml:mo><mml:mi>A</mml:mi></mml:math></inline-formula>.</p></list-item>
</list></p>
<p>See Appendix 6.2 for the proof. Proposition 2.4 shows that, in a binary time-domain MEG decoding scenario, cERF is proportional to the equivalent generative model for the solution of least squares classifier. Furthermore, <inline-formula><mml:math id="M101"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is proportional to the t-statistic that is widely used in the univariate analysis of neuroimaging data. Using <inline-formula><mml:math id="M102"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> as a heuristic for <inline-formula><mml:math id="M103"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, the representativeness can be approximated as follows:</p>
<disp-formula id="E13"><label>(13)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>.</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where <inline-formula><mml:math id="M104"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> is an approximation of the actual representativeness &#x003B2;<sub>&#x003A6;</sub>. In a similar manner, <inline-formula><mml:math id="M105"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> can be used to heuristically approximate the interpretability as follows:</p>
<disp-formula id="E14"><label>(14)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo class="qopname">cos</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B3;<sub>1</sub>, &#x02026;, &#x003B3;<sub><italic>m</italic></sub> are the angles between <inline-formula><mml:math id="M106"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M107"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. It can be shown that <inline-formula><mml:math id="M108"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>The proposed heuristic is only applicable to the evoked responses in sensor and source space MEEG data. Despite this limitation, cERF provides an empirical example that shows how the presented theoretical definitions can be applied in a real decoding scenario. The choice of the heuristic has a direct effect on the approximation of interpretability and that an inappropriate selection of the heuristic yields a very poor estimation of interpretability. Therefore, the choice of heuristic should be carefully justified based on accepted and well-defined facts regarding the nature of the collected data.</p>
<p>Since the labels are used in the computation of cERF, a proper validation strategy should be employed to avoid the double dipping issue (Kriegeskorte et al., <xref ref-type="bibr" rid="B50">2009</xref>). One possible approach is to exclude the entire test set from the model selection procedure using a nested nested cross-validation strategy. An alternative approach is employing model averaging techniques to neatly get advantage of the whole dataset (Varoquaux et al., <xref ref-type="bibr" rid="B90">2017</xref>). Since our focus is on the model selection, in the remaining text, we implicitly assume the test data is excluded from the experiments, thus, all the experimental results are reported on the training and validation sets.</p>
</sec>
<sec>
<title>2.5. Incorporating the interpretability into model selection</title>
<p>The procedure for evaluating the performance of a model so as to choose the best values for hyper-parameters is known as <italic>model selection</italic> (Hastie et al., <xref ref-type="bibr" rid="B36">2009</xref>). This procedure generally involves numerical optimization of the model selection criterion on the training and validation sets (and not the test set). Let <italic>U</italic> be a set of hyper-parameters, then the goal of model selection procedure reduces to finding the best model configuration <italic>u</italic><sup>&#x0002A;</sup> &#x02208; <italic>U</italic> that maximizes the model selection criterion (e.g., generalization performance) on the training set <italic>S</italic>. The most common model selection criterion is based on an estimator of generalization performance, i.e., the predictive power. In the context of brain decoding, especially when the interpretability of brain maps matters, employing the predictive power as the only decisive criterion in model selection is problematic in terms of interpretability of MBMs (Gramfort et al., <xref ref-type="bibr" rid="B29">2012</xref>; Rasmussen et al., <xref ref-type="bibr" rid="B70">2012</xref>; Conroy et al., <xref ref-type="bibr" rid="B19">2013</xref>; Varoquaux et al., <xref ref-type="bibr" rid="B90">2017</xref>). Valverde-Albacete and Pel&#x000E1;ez-Moreno (<xref ref-type="bibr" rid="B84">2014</xref>) experimentally showed that in a classification task optimizing only classification error rate is insufficient to capture the transfer of crucial information from the input to the output of a classifier. This fact highlights the importance of having some control over the estimated model weights in the model selection. Here, we propose a multi-objective criterion for model selection that takes into account both prediction accuracy and MBM interpretability.</p>
<p>Let <inline-formula><mml:math id="M109"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> and &#x003B4;<sub>&#x003A6;</sub> be the approximated interpretability and the generalization performance of a linear brain decoding model <inline-formula><mml:math id="M110"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, respectively. We propose the use of the <italic>scalarization</italic> technique (Caramia and Dell&#x000B4; Olmo, <xref ref-type="bibr" rid="B16">2008</xref>) for combining <inline-formula><mml:math id="M111"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> and &#x003B4;<sub>&#x003A6;</sub> into one scalar 0 &#x02264; &#x003B6;(&#x003A6;) &#x02264; 1 as follows:</p>
<disp-formula id="E15"><label>(15)</label><mml:math id="M15"><mml:mrow><mml:msub><mml:mi>&#x003B6;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mover accent='true'><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mtext>&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mtext>&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo>&#x02265;</mml:mo><mml:mi>&#x003BA;</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>&#x003B4;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003BA;</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where &#x003C9;<sub>1</sub> and &#x003C9;<sub>2</sub> are weights that specify the level of importance of the interpretability and the performance, respectively. &#x003BA; is a threshold on the performance that filters out solutions with poor performance. In classification scenarios, &#x003BA; can be set by adding a small safe interval to the chance level of classification. The hyper-parameters that are optimized based on &#x003B6;<sub>&#x003A6;</sub> are Pareto optimal (Marler and Arora, <xref ref-type="bibr" rid="B58">2004</xref>). We hypothesize that optimizing the hyper-parameters based on &#x003B6;<sub>&#x003A6;</sub>, rather only &#x003B4;<sub>&#x003A6;</sub>, yields more informative MBMs.</p>
<p>Algorithm <xref ref-type="table" rid="T3">1</xref> summarizes the proposed model selection scheme. The model selection procedure receives the training set <italic>S</italic> and a set of possible configurations for hyper-parameters <italic>U</italic>, and returns the best hyper-parameter configuration <italic>u</italic><sup>&#x0002A;</sup>.</p>
<table-wrap position="float" id="T3">
<label>Algorithm 1</label>
<caption><p>The model selection procedure</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top">1: &#x000A0;<bold>procedure</bold> M<sc>odel</sc>S<sc>election</sc>(<italic>S</italic>,<italic>U</italic>)</td></tr>
<tr><td align="left" valign="top">2: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; Compute <inline-formula><mml:math id="M112"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> on <italic>S</italic>. &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; using Equation (11)</td></tr>
<tr><td align="left" valign="top">3: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <bold>for all</bold>&#x000A0;<italic>u</italic><sub><italic>i</italic></sub> &#x02208; <italic>U</italic>&#x000A0;<bold>do</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; For all hyper-parameter configurations.</td></tr>
<tr><td align="left" valign="top">4: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold>&#x000A0;<italic>j</italic> &#x02190; 1, <italic>m</italic>&#x000A0;<bold>do</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; Data perturbation iterations.</td></tr>
<tr><td align="left" valign="top">5: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Partition <italic>S</italic> into training <italic>S</italic><sub><italic>tr</italic></sub> and validation <italic>S</italic><sub><italic>vl</italic></sub></td></tr>
<tr><td align="left" valign="top">6: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; subsets via a perturbation method.</td></tr>
<tr><td align="left" valign="top">7: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; Compute <inline-formula><mml:math id="M113"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> on <italic>S</italic><sub><italic>tr</italic></sub> using <italic>u</italic><sub><italic>i</italic></sub> as the</td></tr>
<tr><td align="left" valign="top">8: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; hyper-parameter.</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end</bold></td></tr>
<tr><td align="left" valign="top">9: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Compute<inline-formula><mml:math id="M114"><mml:msubsup><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of <inline-formula><mml:math id="M115"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>s on <italic>S</italic><sub><italic>vl</italic></sub>.</td></tr>
<tr><td align="left" valign="top">10: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Compute <inline-formula><mml:math id="M116"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of <inline-formula><mml:math id="M117"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>s using <inline-formula><mml:math id="M118"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. &#x000A0;&#x000A0;&#x022B3; using Equation (14)</td></tr>
<tr><td align="left" valign="top">11: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; Compute <inline-formula><mml:math id="M119"><mml:msubsup><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; using Equation (15)</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>end</bold></td></tr>
<tr><td align="left" valign="top">12: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;u<sup>&#x0002A;</sup> &#x0003D; argmax<sub><italic>u</italic><sub><italic>i</italic></sub> &#x02208; <italic>U</italic></sub>(&#x003B6;<sub>&#x003A6;</sub>).</td></tr>
<tr><td align="left" valign="top">13: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;return <italic>u</italic><sup>&#x0002A;</sup>.</td></tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>2.6. Experimental materials</title>
<sec>
<title>2.6.1. Toy dataset</title>
<p>We regenerate the simple 2-dimensional toy data presented in Haufe et al. (<xref ref-type="bibr" rid="B38">2013</xref>). Assume that the true underlying generative function &#x003A6;<sup>&#x0002A;</sup> is defined by:</p>
<disp-formula id="E16"><mml:math id="M16"><mml:mrow><mml:mi mathvariant='script'>Y</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mo>&#x02217;</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='script'>X</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>&#x02003;</mml:mtext><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn>1.5</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mtext>&#x02003;</mml:mtext><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1.5</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M120"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">X</mml:mi></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>; and <italic>x</italic><sub>1</sub> and <italic>x</italic><sub>2</sub> represent the first and the second dimension of the data, respectively. Furthermore, assume the data is contaminated by Gaussian noise with co-variance <inline-formula><mml:math id="M121"><mml:mrow><mml:mo>&#x003A3;</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>1.02</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>0.3</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>0.3</mml:mn></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mn>0.15</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. In fact, the Gaussian noise adds uncertainty to the input space.</p>
</sec>
<sec>
<title>2.6.2. Simulated MEG data</title>
<p>We simulated two classes of MEG data, each of which composed of 250 epochs with length of 330<italic>ms</italic> at 300<italic>Hz</italic> sampling rate (so that we have 100 time-points). For simplicity, the whole scalp topography are simulated with a single dipole located at &#x02212;4.7, &#x02212;3.7, and 5.3<italic>cm</italic> in the RAS (right, anterior, superior) coordinate system. The dipole is oriented toward [1,1,0] direction in the RA plane (see Figure <xref ref-type="fig" rid="F2">2A</xref>). One hundred two magnetometer sensors of Elekta Neuromag system are simulated using a standard forward model algorithm implemented in the Fieldtrip toolbox (Oostenveld et al., <xref ref-type="bibr" rid="B67">2010</xref>). The epochs of the positive class are constructed by adding three components to the dipole time-course: (1) a time-locked ERF effect with a positive 3<italic>Hz</italic> followed by a negative 5 <italic>Hz</italic> half-cycle sinusoid peaks after 150 &#x000B1; 10<italic>ms</italic> and 250 &#x000B1; 10<italic>ms</italic> of the epoch onset, respectively; (2) uncorrelated background brain activity that was simulated by summing 50 sinusoids with random frequency from 1 to 125<italic>Hz</italic>, and random phase varied between 0 and 2&#x003C0;. Following the data simulation procedure in Yeung et al. (<xref ref-type="bibr" rid="B100">2004</xref>), the amplitude of any single frequency component of the signal (the ERF effect and the background noise) is set based on the empirical spectral power of human brain activity to mimic the actual magnetic features of scalp surface; and (3) white Gaussian noise scaled with the root mean squared of the signal in each epoch. The epochs of the negative class are constructed without the ERF effect by adding up only the noise components (i.e., the background activity and the white noise). Therefore, the ERF component is considered as the discriminative ground-truth in our experiments (see Figure <xref ref-type="fig" rid="F2">2B</xref>).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>(A)</bold> The red circles show the dipole position, and the red stick shows the dipole direction. <bold>(B)</bold> The spatio-temporal pattern of the discriminative ground-truth effect.</p></caption>
<graphic xlink:href="fnins-10-00619-g0002.tif"/>
</fig>
</sec>
<sec>
<title>2.6.3. MEG data</title>
<p>We use the MEG dataset presented in Henson et al. (<xref ref-type="bibr" rid="B43">2011</xref>)<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>. The dataset was also used for the DecMeg2014 competition<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. In this dataset, visual stimuli consisting of famous faces, unfamiliar faces, and scrambled faces are presented to 16 subjects and fMRI, EEG, and MEG signals are recorded. Here, we are only interested in MEG recordings. The MEG data were recorded using a VectorView system (Elekta Neuromag, Helsinki, Finland) with a magnetometer and two orthogonal planar gradiometers located at 102 positions in a hemispherical array in a light Elekta-Neuromag magnetically shielded room.</p>
<p>Three major reasons motivated the choice of this dataset: (1) It is publicly available. (2) The spatio-temporal dynamic of the MEG signal for face vs. scramble stimuli has been well studied. The event-related potential analysis of EEG/MEG shows that <italic>N</italic>170 occurs 130 &#x02212; 200<italic>ms</italic> after stimulus presentation and reflects the neural processing of faces (Bentin et al., <xref ref-type="bibr" rid="B7">1996</xref>; Henson et al., <xref ref-type="bibr" rid="B43">2011</xref>). Therefore, the <italic>N</italic>170 component can be considered the ground truth for our analysis. (3) In the literature, non-parametric mass-univariate analysis such as cluster-based permutation tests is unable to identify narrowly distributed effects in space and time (e.g., an <italic>N</italic>170 component; Groppe et al., <xref ref-type="bibr" rid="B30">2011a</xref>,<xref ref-type="bibr" rid="B31">b</xref>). These facts motivate us to employ multivariate approaches that are more sensitive to these effects.</p>
<p>As in Olivetti et al. (<xref ref-type="bibr" rid="B66">2014</xref>), we created a balanced face vs. scrambled MEG dataset by randomly drawing from the trials of unscrambled (famous or unfamiliar) faces and scrambled faces in equal number. The samples in the face and scrambled face categories are labeled as 1 and &#x02212;1, respectively. The raw data is high-pass filtered at 1<italic>Hz</italic>, down-sampled to 250<italic>Hz</italic>, and trimmed from 200<italic>ms</italic> before the stimulus onset to 800<italic>ms</italic> after the stimulus. Thus, each trial has 250 time-points for each of the 306 MEG sensors (102 magnetometers and 204 planar gradiometers)<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref>. To create the feature vector of each sample, we pooled all of the temporal data of 306 MEG sensors into one vector (i.e., we have <italic>p</italic> &#x0003D; 250 &#x000D7; 306 &#x0003D; 76500 features for each sample). Before training the classifier, all of the features are standardized to have a mean of 0 and standard-deviation of 1.</p>
</sec>
</sec>
<sec>
<title>2.7. Classification and evaluation</title>
<p>In all experiments, Lasso (Tibshirani, <xref ref-type="bibr" rid="B80">1996b</xref>) classifier with &#x02113;<sub>1</sub> penalization is used for decoding. Lasso is a very popular classification method in the context of brain decoding, mainly because of its sparsity assumption. The choice of Lasso, as a simple model with only one hyper-parameter, helps us to better illustrate the importance of including the interpretability in the model selection (see the <xref ref-type="supplementary-material" rid="SM1">supplementary materials</xref> for the results of the elastic-net; Zou and Hastie, <xref ref-type="bibr" rid="B105">2005</xref> classifier). The solution of decoding is computed by solving the following optimization problem:</p>
<disp-formula id="E17"><label>(16)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant='bold'><mml:mtext>X</mml:mtext></mml:mstyle><mml:mo>&#x00398;</mml:mo><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold'><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mo>&#x00398;</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where ||.||<sub>1</sub> represents the &#x02113;<sub>1</sub>-norm, and &#x003BB; is the hyper-parameter that specifies the level of regularization. Therefore, the aim of the model selection is to find the best value for &#x003BB; on the training set <italic>S</italic>. Here, we try to find the best regularization parameter value among &#x003BB; &#x0003D; {0.001, 0.01, 0.1, 1, 10, 50, 100, 250, 500, 1000}.</p>
<p>We use the out-of-bag (OOB) (Wolpert and Macready, <xref ref-type="bibr" rid="B97">1999</xref>; Breiman, <xref ref-type="bibr" rid="B12">2001</xref>) method for computing &#x003B4;<sub>&#x003A6;</sub>, &#x003C8;<sub>&#x003A6;</sub>, <inline-formula><mml:math id="M122"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="M123"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula>, and &#x003B6;<sub>&#x003A6;</sub> for different values of &#x003BB;. In OOB, given a training set (<bold>X</bold>, <bold>Y</bold>), <italic>m</italic> replications of bootstrap (Efron, <xref ref-type="bibr" rid="B27">1992</xref>) are used to create perturbed training and validation sets (we set <italic>m</italic> &#x0003D; 50)<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref>. In all of our experiments, we set &#x003C9;<sub>1</sub> &#x0003D; &#x003C9;<sub>2</sub> &#x0003D; 1 and &#x003BA; &#x0003D; 0.6 in the computation of &#x003B6;<sub>&#x003A6;</sub>. Furthermore, we set &#x003B4;<sub>&#x003A6;</sub> &#x0003D; 1&#x02212;<italic>EPE</italic> where EPE indicates the expected prediction error; it is computed using the procedure explained in Appendix 6.4. Employing OOB provides the possibility of computing the bias and variance of the model as contributing factors in EPE.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<sec>
<title>3.1. Performance-interpretability dilemma: a toy example</title>
<p>In the definition of &#x003A6;<sup>&#x0002A;</sup> on the toy dataset discussed in Section 2.6.1, <italic>x</italic><sub>1</sub> is the decisive variable and <italic>x</italic><sub>2</sub> has no effect on the classification of samples into target classes. Therefore, excluding the effect of noise and based on the theory of the maximal margin classifier (Vapnik and Kotz, <xref ref-type="bibr" rid="B87">1982</xref>), <inline-formula><mml:math id="M124"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>&#x0221D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the true solution to the decoding problem. By accounting for the effect of noise, solving the decoding problem in (<bold>X</bold>, <bold>Y</bold>) space yields <inline-formula><mml:math id="M125"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>&#x0221D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msqrt><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msqrt></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> as the parameters of the linear classifier. Although the estimated parameters on the noisy data provide the best generalization performance for the noisy samples, any attempt to interpret this solution fails, as it yields the wrong conclusion with respect to the ground truth (it says <italic>x</italic><sub>2</sub> has twice the influence of <italic>x</italic><sub>1</sub> on the results, whereas it has no effect). This simple experiment shows that the most accurate model is not always the most interpretable one, primarily because the contribution of the noise in the decoding process (Haufe et al., <xref ref-type="bibr" rid="B38">2013</xref>). On the other hand, the true solution of the problem <inline-formula><mml:math id="M126"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> does not provide the best generalization performance for the noisy data.</p>
<p>To illustrate the effect of incorporating the interpretability in the model selection, a Lasso model with different &#x003BB; values is used for classifying the toy data. In this example, because <inline-formula><mml:math id="M127"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> is known, the exact value of interpretability can be computed using Equation (5). Table <xref ref-type="table" rid="T1">1</xref> compares the resultant performance and interpretability from Lasso. Lasso achieves its highest performance (&#x003B4;<sub>&#x003A6;</sub> &#x0003D; 0.9884) at &#x003BB; &#x0003D; 10 with <inline-formula><mml:math id="M128"><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>&#x0221D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>4636</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>8660</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> (indicated by the black dashed line in Figure <xref ref-type="fig" rid="F3">3</xref>). Despite having the highest performance, this solution suffers from a lack of interpretability (&#x003B7;<sub>&#x003A6;</sub> &#x0003D; 0.4484). By increasing &#x003BB;, the interpretability improves so that for &#x003BB; &#x0003D; 500, 1000 the classifier reaches its highest interpretability by compensating for 0.06 of its performance. Our observation highlights two main points:
<list list-type="order">
<list-item><p>In the case of noisy data, the interpretability of a decoding model can be possibly incoherent with its performance. Thus, optimizing the parameter of the model based on its performance does not necessarily improve its interpretability. This observation confirms the previous finding by Rasmussen et al. (<xref ref-type="bibr" rid="B70">2012</xref>) regarding the trade-off between the spatial reproducibility (as a measure for the interpretability) and the prediction accuracy in brain decoding.</p></list-item>
<list-item><p>If the right criterion is used in the model selection, employing proper regularization technique (sparsity prior, in the case of toy data) leads to more interpretable decoding models.</p></list-item>
</list></p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Comparison between &#x003B4;<sub>&#x003A6;</sub>, &#x003B7;<sub>&#x003A6;</sub>, and &#x003B6;<sub>&#x003A6;</sub> for different &#x003BB; values on the toy example shows the performance-interpretability dilemma, in which the most accurate classifier is not the most interpretable one</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="center"><bold>&#x003BB;</bold></th>
<th valign="top" align="center"><bold>0</bold></th>
<th valign="top" align="center"><bold>0.001</bold></th>
<th valign="top" align="center"><bold>0.01</bold></th>
<th valign="top" align="center"><bold>0.1</bold></th>
<th valign="top" align="center"><bold>1</bold></th>
<th valign="top" align="center"><bold>10</bold></th>
<th valign="top" align="center"><bold>50</bold></th>
<th valign="top" align="center"><bold>100</bold></th>
<th valign="top" align="center"><bold>250</bold></th>
<th valign="top" align="center"><bold>500</bold></th>
<th valign="top" align="center"><bold>1000</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">&#x003B4;(&#x003A6;)</td>
<td valign="top" align="center">0.9883</td>
<td valign="top" align="center">0.9883</td>
<td valign="top" align="center">0.9883</td>
<td valign="top" align="center">0.9883</td>
<td valign="top" align="center">0.9883</td>
<td valign="top" align="center"><bold>0.9884</bold></td>
<td valign="top" align="center">0.9880</td>
<td valign="top" align="center">0.9840</td>
<td valign="top" align="center">0.9310</td>
<td valign="top" align="center">0.9292</td>
<td valign="top" align="center">0.9292</td>
</tr>
<tr>
<td valign="top" align="left">&#x003B7;(&#x003A6;)</td>
<td valign="top" align="center">0.4391</td>
<td valign="top" align="center">0.4391</td>
<td valign="top" align="center">0.4391</td>
<td valign="top" align="center">0.4392</td>
<td valign="top" align="center">0.4400</td>
<td valign="top" align="center">0.4484</td>
<td valign="top" align="center">0.4921</td>
<td valign="top" align="center">0.5845</td>
<td valign="top" align="center">0.9968</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>1</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x003B6;(&#x003A6;)</td>
<td valign="top" align="center">0.7137</td>
<td valign="top" align="center">0.7137</td>
<td valign="top" align="center">0.7137</td>
<td valign="top" align="center">0.7137</td>
<td valign="top" align="center">0.7142</td>
<td valign="top" align="center">0.7184</td>
<td valign="top" align="center">0.7400</td>
<td valign="top" align="center">0.7842</td>
<td valign="top" align="center">0.9639</td>
<td valign="top" align="center"><bold>0.9646</bold></td>
<td valign="top" align="center"><bold>0.9646</bold></td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M129"><mml:mrow><mml:mover accent='true'><mml:mover accent='true'><mml:mo>&#x00398;</mml:mo><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>&#x0221D;</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M130"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4520</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8920</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M131"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4520</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8920</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M132"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4520</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8920</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M133"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4521</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8919</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M134"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4532</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8914</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M135"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4636</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8660</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M136"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.4883</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8727</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M137"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.5800</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.8146</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M138"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.99</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>0.02</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M139"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="M140"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The bold indicates the best values of different criteria</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Noisy samples of toy data</bold>. The red dashed line shows the true separator based on the generative model (&#x003A6;<sup>&#x0002A;</sup>). The black dashed line shows the most accurate classification solution. Because of the contribution of noise, any interpretation of the parameters of the most accurate classifier yields a misleading conclusion with respect to the true underlying phenomenon (Haufe et al., <xref ref-type="bibr" rid="B38">2013</xref>).</p></caption>
<graphic xlink:href="fnins-10-00619-g0003.tif"/>
</fig>
</sec>
<sec>
<title>3.2. Decoding on simulated MEG data</title>
<p>With the main aim of comparing the quality of the heuristically approximated interpretability with respect to its actual value, we solve the decoding problem on the simulated MEG data where the ground-truth discriminative effect is known. The ground truth effect <inline-formula><mml:math id="M141"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> is used to compute the actual interpretability of the decoding model. On the other hand, interpretability is approximated by means of <inline-formula><mml:math id="M142"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. The whole data simulation and decoding processes are repeated 25 times and the results are summarized in Figure <xref ref-type="fig" rid="F4">4</xref>. Figures <xref ref-type="fig" rid="F4">4A,B</xref> show the actual (&#x003B7;<sub>&#x003A6;</sub>) and the approximated (<inline-formula><mml:math id="M143"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula>) interpretability for different &#x003BB; values. Even though <inline-formula><mml:math id="M144"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> consistently overestimates &#x003B7;<sub>&#x003A6;</sub>, there is a significant co-variation (Pearson&#x00027;s correlation p-value &#x0003D; 9 &#x000D7; 10<sup>&#x02212;4</sup>) between two measures as &#x003BB; increases. Thus, despite overestimation problem of the heuristically approximated interpretability values, they are still reliable measures for quantitative comparison between interpretability level of brain decoding models with different hyper-parameters. For example, both &#x003B7;<sub>&#x003A6;</sub> and <inline-formula><mml:math id="M145"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> suggest the decoding model with &#x003BB; &#x0003D; 50 as the most interpretable model.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>(A)</bold> The actual &#x003B7;<sub>&#x003A6;</sub>, and <bold>(B)</bold> the heuristically approximated interpretability <inline-formula><mml:math id="M146"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> of decoding models across different &#x003BB; values. There is a significant co-variation (Pearson&#x00027;s correlation p-value &#x0003D; 9 &#x000D7; 10<sup>&#x02212;4</sup>) between &#x003B7;<sub>&#x003A6;</sub> and <inline-formula><mml:math id="M147"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula>. <bold>(C)</bold> The generalization performance of decoding models. The box gives the quartiles, while the whiskers give the 5 and 95 percentiles.</p></caption>
<graphic xlink:href="fnins-10-00619-g0004.tif"/>
</fig>
<p>Figure <xref ref-type="fig" rid="F4">4C</xref> shows brain decoding models at &#x003BB; &#x0003D; 10 and &#x003BB; &#x0003D; 50 yield equivalent generalization performances (Wilcoxon rank sum test <italic>p</italic>-value &#x0003D; 0.08), while the MBM resulted from &#x003BB; &#x0003D; 50 has significantly higher interpretability (Wilcoxon rank sum test <italic>p</italic>-value &#x0003D; 4 &#x000D7; 10<sup>&#x02212;9</sup>). The advantage of this difference in interpretability levels is visualized in Figure <xref ref-type="fig" rid="F5">5</xref> where topographic maps are plotted for the weights of brain decoding models with different &#x003BB; values by averaging the classifier weights in the time interval of 100&#x02013;200 ms. The visual comparison shows MBM at &#x003BB; &#x0003D; 50 is more similar to the ground-truth map (see Figure <xref ref-type="fig" rid="F2">2B</xref>) than the MBMs computed at other &#x003BB; values. This superiority is well-reflected in the corresponding approximated interpretability values, that confirms the effectiveness of the interpretability criterion in measuring the level of information in the MBMs.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Topographic maps of weights of brain decoding models for (A)</bold> &#x003BB; &#x0003D; 1, <bold>(B)</bold> &#x003BB; &#x0003D; 10, <bold>(C)</bold> &#x003BB; &#x0003D; 50, and <bold>(D)</bold> &#x003BB; &#x0003D; 100.</p></caption>
<graphic xlink:href="fnins-10-00619-g0005.tif"/>
</fig>
<p>The results of this experiment confirm again the fact that the generalization performance is not a reliable criterion to measure the level of information learned by a linear classifier. For example consider the decoding model with &#x003BB; &#x0003D; 1 in which the performance of the model is significantly above the chance level (see Figure <xref ref-type="fig" rid="F4">4C</xref>) while the corresponding MBM (Figure <xref ref-type="fig" rid="F5">5A</xref>) is completely misrepresents the ground-truth effect (Figure <xref ref-type="fig" rid="F2">2B</xref>).</p>
</sec>
<sec>
<title>3.3. Single-subject decoding on MEG data</title>
<p>To investigate the behavior of the proposed model selection criterion &#x003B6;<sub>&#x003A6;</sub>, we benchmark it against the commonly used performance criterion &#x003B4;<sub>&#x003A6;</sub> in a single-subject decoding scenario. Assuming (<bold>X</bold><sub><italic>i</italic></sub>, <bold>Y</bold><sub><italic>i</italic></sub>) for <italic>i</italic> &#x0003D; 1, &#x02026;, 16 are MEG trial/label pairs for subject <italic>i</italic>, we separately train a Lasso model for each subject to estimate the parameter of the linear function <inline-formula><mml:math id="M148"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where <inline-formula><mml:math id="M149"><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">Y</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. We represent the optimized solution based on &#x003B4;<sub>&#x003A6;</sub> and &#x003B6;<sub>&#x003A6;</sub> by <inline-formula><mml:math id="M150"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M151"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, respectively. We also denote the MBM associated with <inline-formula><mml:math id="M152"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M153"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> by <inline-formula><mml:math id="M154"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M155"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, respectively. Therefore, for each subject, we compare the resulting decoders and MBMs computed based on these two model selection criteria.</p>
<p>Figure <xref ref-type="fig" rid="F6">6A</xref> represents the mean and standard-deviation of the performance and interpretability of Lasso across 16 subjects for different &#x003BB; values. The performance and interpretability curves further illustrate the performance-interpretability dilemma of Lasso classifier in the single-subject decoding scenario, in which increasing the performance delivers less interpretability. The average performance across subjects is improved when &#x003BB; approaches 1, but on the other side, the reproducibility and the representativeness of models declines significantly (see Figure <xref ref-type="fig" rid="F6">6B</xref>; Wilcoxon rank sum test <italic>p</italic>-value &#x0003D; 9 &#x000D7; 10<sup>&#x02212;4</sup> and 8 &#x000D7; 10<sup>&#x02212;7</sup>, respectively). In fact, in this dataset a higher amount of sparsity increases the gap between the generalization performance and interpretability.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>(A)</bold> Mean and standard-deviation of the performance (&#x003B4;<sub>&#x003A6;</sub>), interpretability (&#x003B7;<sub>&#x003A6;</sub>), and &#x003B6;<sub>&#x003A6;</sub> of Lasso over 16 subjects. <bold>(B)</bold> Mean and standard-deviation of the reproducibility (&#x003C8;<sub>&#x003A6;</sub>), representativeness (&#x003B2;<sub>&#x003A6;</sub>), and interpretability (&#x003B7;<sub>&#x003A6;</sub>) of Lasso over 16 subjects. The interpretability declines because of the decrease in both reproducibility and representativeness (see Proposition 1). <bold>(C)</bold> Mean and standard-deviation of the bias, variance, and EPE of Lasso over 16 subjects. While the change in bias is correlated with that of EPE (Pearson&#x00027;s correlation coefficient &#x0003D; 0.9993), there is anti-correlation between the trend of variance and EPE (Pearson&#x00027;s correlation coefficient &#x0003D; &#x02212;0.8884).</p></caption>
<graphic xlink:href="fnins-10-00619-g0006.tif"/>
</fig>
<p>One possible reason behind the performance-interpretability dilemma in this experiment is illustrated in Figure <xref ref-type="fig" rid="F6">6C</xref>. The figure shows the mean and standard deviation of bias, variance, and EPE of Lasso across 16 subjects. The plot shows while the change in bias is correlated with that of EPE (Pearson&#x00027;s correlation coefficient &#x0003D; 0.9993), there is anti-correlation between the trends of variance and EPE (Pearson&#x00027;s correlation coefficient &#x0003D; &#x02212;0.8884). Furthermore, it proposes that the effect of variance is overwhelmed by bias in the computation of EPE, where the best performance (minimum EPE) at &#x003BB; &#x0003D; 1 has the lowest bias, its variance is higher than for &#x003BB; &#x0003D; 0.001, 0.01, 0.1. While this tiny increase in the variance has negligible effect on the EPE of the model, Figure <xref ref-type="fig" rid="F6">6B</xref> shows its significant (Wilcoxon rank sum test <italic>p</italic>-value &#x0003D; 8 &#x000D7; 10<sup>&#x02212;7</sup>) negative effect on the reproducibility of maps from &#x003BB; &#x0003D; 0.1 to &#x003BB; &#x0003D; 1.</p>
<p>Table <xref ref-type="table" rid="T2">2</xref> summarizes the performance, reproducibility, representativeness, and interpretability of <inline-formula><mml:math id="M156"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M157"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> for 16 subjects. The average result over 16 subjects shows that employing &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> in model selection provides higher reproducibility, representativeness, and (as a result) interpretability compensating for 0.04 of performance. The last column of table (&#x003B4;<sub><italic>cERF</italic></sub>) summarizes the performance of decoding models over 16 subjects when <inline-formula><mml:math id="M158"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is used as classifier weights. The comparison illustrates a significant difference (Wilcoxon rank sum test <italic>p</italic>-value &#x0003D; 1.5 &#x000D7; 10<sup>&#x02212;6</sup>) between &#x003B4;<sub><italic>cERF</italic></sub> and &#x003B4;(&#x003A6;)s. These facts demonstrate that <inline-formula><mml:math id="M159"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is a good compromise between <inline-formula><mml:math id="M160"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M161"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> in terms of classification performance and model interpretability.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>The performance, reproducibility, representativeness, and interpretability of <inline-formula><mml:math id="M162"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M163"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> over 16 subjects</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Subs</bold></th>
<th valign="top" align="center" colspan="5" style="border-bottom: thin solid #000000;"><bold>Criterion: &#x003B4;(&#x003A6;)</bold></th>
<th valign="top" align="center" colspan="5" style="border-bottom: thin solid #000000;"><bold>Criterion: &#x003B6;(&#x003A6;)</bold></th>
<th valign="top" align="center"><bold>&#x003B4;<sub><italic>cERF</italic></sub></bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>&#x003B4;(&#x003A6;)</bold></th>
<th valign="top" align="center"><bold>&#x003B6;(&#x003A6;)</bold></th>
<th valign="top" align="center"><bold><inline-formula><mml:math id="M164"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></bold></th>
<th valign="top" align="center"><bold><inline-formula><mml:math id="M165"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></bold></th>
<th valign="top" align="center"><bold>&#x003C8;(&#x003A6;)</bold></th>
<th valign="top" align="center"><bold>&#x003B4;(&#x003A6;)</bold></th>
<th valign="top" align="center"><bold>&#x003B6;(&#x003A6;)</bold></th>
<th valign="top" align="center"><bold><inline-formula><mml:math id="M166"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></bold></th>
<th valign="top" align="center"><bold><inline-formula><mml:math id="M167"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></bold></th>
<th valign="top" align="center"><bold>&#x003C8;(&#x003A6;)</bold></th>
<th/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="char" char=".">0.81</td>
<td valign="top" align="char" char=".">0.53</td>
<td valign="top" align="char" char=".">0.26</td>
<td valign="top" align="char" char=".">0.42</td>
<td valign="top" align="char" char=".">0.62</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.63</td>
<td valign="top" align="char" char=".">0.76</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.56</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="char" char=".">0.80</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.60</td>
<td valign="top" align="char" char=".">0.72</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.80</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.60</td>
<td valign="top" align="char" char=".">0.72</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.54</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="char" char=".">0.81</td>
<td valign="top" align="char" char=".">0.63</td>
<td valign="top" align="char" char=".">0.45</td>
<td valign="top" align="char" char=".">0.64</td>
<td valign="top" align="char" char=".">0.71</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.71</td>
<td valign="top" align="char" char=".">0.64</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.57</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="char" char=".">0.84</td>
<td valign="top" align="char" char=".">0.52</td>
<td valign="top" align="char" char=".">0.20</td>
<td valign="top" align="char" char=".">0.31</td>
<td valign="top" align="char" char=".">0.66</td>
<td valign="top" align="char" char=".">0.76</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.64</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.55</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="char" char=".">0.80</td>
<td valign="top" align="char" char=".">0.54</td>
<td valign="top" align="char" char=".">0.29</td>
<td valign="top" align="char" char=".">0.44</td>
<td valign="top" align="char" char=".">0.65</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.69</td>
<td valign="top" align="char" char=".">0.61</td>
<td valign="top" align="char" char=".">0.73</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.54</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="char" char=".">0.79</td>
<td valign="top" align="char" char=".">0.52</td>
<td valign="top" align="char" char=".">0.24</td>
<td valign="top" align="char" char=".">0.39</td>
<td valign="top" align="char" char=".">0.63</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.67</td>
<td valign="top" align="char" char=".">0.61</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.82</td>
<td valign="top" align="char" char=".">0.57</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="char" char=".">0.84</td>
<td valign="top" align="char" char=".">0.55</td>
<td valign="top" align="char" char=".">0.27</td>
<td valign="top" align="char" char=".">0.40</td>
<td valign="top" align="char" char=".">0.66</td>
<td valign="top" align="char" char=".">0.81</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.59</td>
<td valign="top" align="char" char=".">0.71</td>
<td valign="top" align="char" char=".">0.84</td>
<td valign="top" align="char" char=".">0.56</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="char" char=".">0.87</td>
<td valign="top" align="char" char=".">0.55</td>
<td valign="top" align="char" char=".">0.24</td>
<td valign="top" align="char" char=".">0.35</td>
<td valign="top" align="char" char=".">0.68</td>
<td valign="top" align="char" char=".">0.85</td>
<td valign="top" align="char" char=".">0.68</td>
<td valign="top" align="char" char=".">0.52</td>
<td valign="top" align="char" char=".">0.61</td>
<td valign="top" align="char" char=".">0.84</td>
<td valign="top" align="char" char=".">0.56</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="char" char=".">0.80</td>
<td valign="top" align="char" char=".">0.55</td>
<td valign="top" align="char" char=".">0.31</td>
<td valign="top" align="char" char=".">0.46</td>
<td valign="top" align="char" char=".">0.67</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.67</td>
<td valign="top" align="char" char=".">0.57</td>
<td valign="top" align="char" char=".">0.69</td>
<td valign="top" align="char" char=".">0.82</td>
<td valign="top" align="char" char=".">0.57</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="char" char=".">0.79</td>
<td valign="top" align="char" char=".">0.53</td>
<td valign="top" align="char" char=".">0.26</td>
<td valign="top" align="char" char=".">0.41</td>
<td valign="top" align="char" char=".">0.64</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.68</td>
<td valign="top" align="char" char=".">0.58</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.59</td>
</tr>
<tr>
<td valign="top" align="left">11</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.65</td>
<td valign="top" align="char" char=".">0.56</td>
<td valign="top" align="char" char=".">0.68</td>
<td valign="top" align="char" char=".">0.82</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.65</td>
<td valign="top" align="char" char=".">0.56</td>
<td valign="top" align="char" char=".">0.68</td>
<td valign="top" align="char" char=".">0.82</td>
<td valign="top" align="char" char=".">0.53</td>
</tr>
<tr>
<td valign="top" align="left">12</td>
<td valign="top" align="char" char=".">0.80</td>
<td valign="top" align="char" char=".">0.55</td>
<td valign="top" align="char" char=".">0.29</td>
<td valign="top" align="char" char=".">0.46</td>
<td valign="top" align="char" char=".">0.64</td>
<td valign="top" align="char" char=".">0.79</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.61</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.58</td>
</tr>
<tr>
<td valign="top" align="left">13</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.50</td>
<td valign="top" align="char" char=".">0.18</td>
<td valign="top" align="char" char=".">0.29</td>
<td valign="top" align="char" char=".">0.61</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.63</td>
<td valign="top" align="char" char=".">0.76</td>
<td valign="top" align="char" char=".">0.82</td>
<td valign="top" align="char" char=".">0.59</td>
</tr>
<tr>
<td valign="top" align="left">14</td>
<td valign="top" align="char" char=".">0.90</td>
<td valign="top" align="char" char=".">0.58</td>
<td valign="top" align="char" char=".">0.27</td>
<td valign="top" align="char" char=".">0.39</td>
<td valign="top" align="char" char=".">0.68</td>
<td valign="top" align="char" char=".">0.81</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.89</td>
<td valign="top" align="char" char=".">0.84</td>
<td valign="top" align="char" char=".">0.62</td>
</tr>
<tr>
<td valign="top" align="left">15</td>
<td valign="top" align="char" char=".">0.92</td>
<td valign="top" align="char" char=".">0.63</td>
<td valign="top" align="char" char=".">0.34</td>
<td valign="top" align="char" char=".">0.48</td>
<td valign="top" align="char" char=".">0.71</td>
<td valign="top" align="char" char=".">0.89</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.66</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.86</td>
<td valign="top" align="char" char=".">0.63</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left">16</td>
<td valign="top" align="char" char=".">0.87</td>
<td valign="top" align="char" char=".">0.55</td>
<td valign="top" align="char" char=".">0.23</td>
<td valign="top" align="char" char=".">0.37</td>
<td valign="top" align="char" char=".">0.62</td>
<td valign="top" align="char" char=".">0.81</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">0.67</td>
<td valign="top" align="char" char=".">0.81</td>
<td valign="top" align="char" char=".">0.83</td>
<td valign="top" align="char" char=".">0.65</td>
</tr> <tr>
<td valign="top" align="left">Mean</td>
<td valign="top" align="center"><bold>0.83</bold>&#x000B1;<bold>0.05</bold></td>
<td valign="top" align="center">0.57 &#x000B1; 0.05</td>
<td valign="top" align="center">0.31 &#x000B1; 0.12</td>
<td valign="top" align="center">0.45 &#x000B1; 0.13</td>
<td valign="top" align="center">0.68 &#x000B1; 0.07</td>
<td valign="top" align="center">0.79 &#x000B1; 0.04</td>
<td valign="top" align="center"><bold>0.70</bold> &#x000B1; <bold>0.04</bold></td>
<td valign="top" align="center"><bold>0.62</bold> &#x000B1; <bold>0.05</bold></td>
<td valign="top" align="center"><bold>0.74</bold> &#x000B1; <bold>0.06</bold></td>
<td valign="top" align="center"><bold>0.83</bold> &#x000B1; <bold>0.01</bold></td>
<td valign="top" align="center">0.58 &#x000B1; 0.03</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The bold indicates the best mean values over different criteria</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>These results are further analyzed in Figure <xref ref-type="fig" rid="F7">7</xref> where <inline-formula><mml:math id="M168"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M169"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are compared subject-wise in terms of their performance and interpretability. The comparison shows that adopting &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> as the criterion for model selection yields higher interpretable models by compensating a negligible degree of performance in 14 out of 16 subjects. Figure <xref ref-type="fig" rid="F7">7A</xref> shows that employing &#x003B4;<sub>&#x003A6;</sub> provides on average slightly higher accurate models (Wilcoxon rank sum test <italic>p</italic>-value &#x0003D; 0.012) across subjects (0.83 &#x000B1; 0.05) than using &#x003B6;<sub>&#x003A6;</sub> (0.79 &#x000B1; 0.04). On the other side, Figure <xref ref-type="fig" rid="F7">7B</xref> shows that employing &#x003B6;<sub>&#x003A6;</sub> and compensating by 0.04 in the performance provides (on average) substantially higher (Wilcoxon rank sum test p-value&#x0003D; 5.6 &#x000D7; 10<sup>&#x02212;6</sup>) interpretability across subjects (0.62 &#x000B1; 0.05) compared to &#x003B4;<sub>&#x003A6;</sub> (0.31 &#x000B1; 0.12). For example, in the case of subject 1 (see Table <xref ref-type="table" rid="T2">2</xref>), using &#x003B4;<sub>&#x003A6;</sub> in model selection to select the best &#x003BB; value for the Lasso yields a model with &#x003B4;<sub>&#x003A6;</sub> &#x0003D; 0.81 and <inline-formula><mml:math id="M170"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>26</mml:mn></mml:math></inline-formula>. In contrast, using &#x003B6;<sub>&#x003A6;</sub> delivers a model with &#x003B4;<sub>&#x003A6;</sub> &#x0003D; 0.78 and <inline-formula><mml:math id="M171"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>63</mml:mn></mml:math></inline-formula>. This inverse relationship between performance and interpretability is direct consequence of over-fitting of model to the noise structure in a small-sample size brain decoding problem (see Section 3.1).</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>(A)</bold> Comparison between generalization performances of <inline-formula><mml:math id="M172"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M173"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. Adopting &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> in model selection yields (on average) 0.04 less accurate classifiers over 16 subjects. <bold>(B)</bold> Comparison between interpretabilities of <inline-formula><mml:math id="M174"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M175"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. Adopting &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> in model selection yields on average 0.31 more interpretable classifiers over 16 subjects.</p></caption>
<graphic xlink:href="fnins-10-00619-g0007.tif"/>
</fig>
<p>The advantage of the exchange between the performance and the interpretability can be seen in the quality of MBMs. Figures <xref ref-type="fig" rid="F8">8A,B</xref> show <inline-formula><mml:math id="M176"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M177"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of subject 1, i.e., the spatio-temporal multivariate maps of the Lasso models with maximum values of &#x003B4;<sub>&#x003A6;</sub> and &#x003B6;<sub>&#x003A6;</sub>, respectively. The maps are plotted for 102 magnetometer sensors. In each case, the time course of weights of classifiers associated with the MEG2041 and MEG1931 sensors are plotted. Furthermore, the topographic maps represent the spatial patterns of weights averaged between 184<italic>ms</italic> and 236<italic>ms</italic> after stimulus onset. While <inline-formula><mml:math id="M178"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is sparse in time and space, it fails to accurately represent the spatio-temporal dynamic of the N170 component. Furthermore, the multicollinearity problem arising from the correlation between the time course of the MEG2041 and MEG1931 sensors causes extra attenuation of the N170 effect in the MEG1931 sensor. Therefore, the model is unable to capture the spatial pattern of the dipole in the posterior area. In contrast, <inline-formula><mml:math id="M179"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the dynamic of the N170 component in time. In addition, it also shows the spatial pattern of two dipoles in the posterior and temporal areas. In summary, <inline-formula><mml:math id="M180"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> suggests a more representative pattern of the underlying neurophysiological effect than <inline-formula><mml:math id="M181"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p><bold>Comparison between spatio-temporal multivariate maps of (A)</bold> the most accurate, and <bold>(B)</bold> the most interpretable classifiers for Subject 1. <inline-formula><mml:math id="M182"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> provides a better spatio-temporal representation of the N170 effect than <inline-formula><mml:math id="M183"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>.</p></caption>
<graphic xlink:href="fnins-10-00619-g0008.tif"/>
</fig>
<p>In addition, optimizing the hyper-parameters of brain decoding based on &#x003B6;<sub>&#x003A6;</sub> offers more reproducible brain decoders. According to Table <xref ref-type="table" rid="T2">2</xref>, using &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> provides (on average) 0.15 more reproducibility over 16 subjects. To illustrate the advantage of higher reproducibility on the interpretability of maps, Figure <xref ref-type="fig" rid="F9">9</xref> visualizes <inline-formula><mml:math id="M184"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M185"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> over 4 perturbed training sets. The spatial maps (Figures <xref ref-type="fig" rid="F9">9A,C</xref>) are plotted for the magnetometer sensors averaged in the time interval between 184<italic>ms</italic> and 236<italic>ms</italic> after stimulus onset. The temporal maps (Figures <xref ref-type="fig" rid="F9">9B,D</xref>) are showing the multivariate temporal maps of MEG1931 and MEG2041 sensors. While <inline-formula><mml:math id="M186"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is unstable in time and space across the 4 perturbed training sets, <inline-formula><mml:math id="M187"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> provides more reproducible maps.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>Comparison of the reproducibility of Lasso when &#x003B4;<sub>&#x003A6;</sub> and &#x003B6;<sub>&#x003A6;</sub> are used in the model selection procedure</bold>. <bold>(A,B)</bold> show the spatio-temporal patterns represented by <inline-formula><mml:math id="M188"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> across the 4 perturbed training sets. <bold>(C,D)</bold> show the spatio-temporal patterns represented by <inline-formula><mml:math id="M189"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> across the 4 perturbed training sets. Employing &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> in the model selection yields on average 0.15 more reproduciblilty of MBMs.</p></caption>
<graphic xlink:href="fnins-10-00619-g0009.tif"/>
</fig>
</sec>
<sec>
<title>3.4. Mass-univariate hypothesis testing on MEG data</title>
<p>It is shown by Groppe et al. (<xref ref-type="bibr" rid="B30">2011a</xref>,<xref ref-type="bibr" rid="B31">b</xref>) that non-parametric mass-univariate analysis is unable to detect narrowly distributed effects in space and time (e.g., an <italic>N</italic>170 component). To illustrate the advantage of the proposed decoding framework for spotting these effects, we performed a non-parametric cluster-based permutation test (Maris and Oostenveld, <xref ref-type="bibr" rid="B57">2007</xref>) on our MEG dataset using Fieldtrip toolbox (Oostenveld et al., <xref ref-type="bibr" rid="B67">2010</xref>). In a single subject analysis scenario, we considered the trials of MEG recordings as the unit of observation in a between-trials experiment. Independent-samples t-statistics are used as the statistics for evaluating the effect at the sample level and to construct spatio-temporal clusters. The maximum of the cluster-level summed <italic>t</italic>-value is used for the cluster level statistics; the significance probability is computed using a Monte Carlo method. The minimum number of neighboring channels for computing the clusters is set to 2. Considering 0.025 as the two-sided threshold for testing the significance level and repeating the procedure separately for magnetometers and combined-gradiometers, no significant result is found for any of the 16 subjects. This result motivates the search for more sensitive (and, at the same time, more interpretable) alternatives for univariate hypothesis testing.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Discussions</title>
<sec>
<title>4.1. Defining interpretability: theoretical advantages</title>
<p>An overview of the brain decoding literature shows frequent co-occurrence of the terms interpretation, interpretable, and interpretability with the terms model, classification, parameter, decoding, method, feature, and pattern (see the quick meta-analysis on the literature in the <xref ref-type="supplementary-material" rid="SM1">supplementary material</xref>); however, a formal formulation of the interpretability is never presented. In this study, our primary interest is to present a simple and theoretical definition of the interpretability of linear brain decoding models and their corresponding MBMs. Furthermore, we show the way in which interpretability is related to the reproducibility and neurophysiological representativeness of MBMs. Our definition and quantification of interpretability remains theoretical, as we assume that the true solution of the brain decoding problem is available. Despite this limitation, we argue that the presented definition provides a concrete framework of a previously abstract concept and that it establishes a theoretical background to explain an ambiguous phenomenon in the brain decoding context. We support our argument using an example in the time-domain MEG decoding in which we show how the presented definition can be exploited to heuristically approximate the interpretability. Our experimental results on MEG data shows accounting for the approximated measure of interpretability has a positive effect on the human interpretation of brain decoding models. This example shows how partial prior knowledge regarding the timing and location of neural activity can be used to find more plausible multivariate patterns in data. Furthermore, the proposed decomposition of the interpretability of MBMs into their reproducibility and representativeness explains the relationship between the influential cooperative factors in the interpretability of brain decoding models and highlights the possibility of indirect and partial evaluation of interpretability by measuring these effective factors.</p>
</sec>
<sec>
<title>4.2. Application in model evaluation</title>
<p>Discriminative models in the framework of brain decoding provide higher sensitivity and specificity than univariate analysis in hypothesis testing of neuroimaging data. Although multivariate hypothesis testing is performed based solely on the generalization performance of classifiers, the emergent need for extracting reliable complementary information regarding the underlying neuronal activity motivated a considerable amount of research on improving and assessing the interpretability of classifiers and their associated MBMs. Despite ubiquitous use, the generalization performance of classifiers is not a reliable criterion for assessing the interpretability of brain decoding models (Rasmussen et al., <xref ref-type="bibr" rid="B70">2012</xref>; Varoquaux et al., <xref ref-type="bibr" rid="B90">2017</xref>). Therefore, considering extra criteria might be required. However, because of the lack of a formal definition for interpretability, different characteristics of linear classifiers are considered as the decisive criterion in assessing their interpretability. Reproducibility (Rasmussen et al., <xref ref-type="bibr" rid="B70">2012</xref>; Conroy et al., <xref ref-type="bibr" rid="B19">2013</xref>), stability selection (Varoquaux et al., <xref ref-type="bibr" rid="B88">2012</xref>; Wang et al., <xref ref-type="bibr" rid="B94">2015</xref>), sparsity (Dash et al., <xref ref-type="bibr" rid="B23">2015</xref>; Shervashidze and Bach, <xref ref-type="bibr" rid="B75">2015</xref>), and neurophysiological plausibility (Afshin-Pour et al., <xref ref-type="bibr" rid="B2">2011</xref>) are examples of related criteria.</p>
<p>Our definition of interpretability helped us to fill this gap by introducing a new multi-objective model selection criterion as a weighted compromise between interpretability and generalization performance of linear models. Our experimental results on single-subject decoding showed that adopting the new criterion for optimizing the hyper-parameters of brain decoding models is an important step toward reliable visualization of learned models from neuroimaging data. It is not the first time in the neuroimaging context that a new metric is proposed in combination with generalization performance for the model selection. Several recent studies proposed the combination of the reproducibility of the maps (Rasmussen et al., <xref ref-type="bibr" rid="B70">2012</xref>; Conroy et al., <xref ref-type="bibr" rid="B19">2013</xref>; Strother et al., <xref ref-type="bibr" rid="B77">2014</xref>) or the stability of the classifiers (Yu, <xref ref-type="bibr" rid="B101">2013</xref>; Lim and Yu, <xref ref-type="bibr" rid="B54">2016</xref>; Varoquaux et al., <xref ref-type="bibr" rid="B90">2017</xref>) with the performance of discriminative models to enhance the interpretability of decoding models. Our definition of interpretability supports the claim that the reproducibility is not the only effective factor in interpretability. Therefore, our contribution can be considered a complementary effort with respect to the state of the art of improving the interpretability of brain decoding at the model selection level.</p>
<p>Furthermore, this work presents an effective approach for evaluating the quality of different regularization strategies for improving the interpretability of MBMs. As briefly reviewed in Section 1, there is a trend of research within the brain decoding context in which the prior knowledge is injected into the decoding process via the penalization term in order to improve the interpretability of decoding models. Thus far, in the literature, there is no <italic>ad-hoc</italic> method to directly compare the interpretability of MBMs resulting from different penalization techniques. Our findings provide a further step toward direct evaluation of interpretability of the currently proposed penalization strategies. Such an evaluation can highlight the advantages and disadvantages of applying different strategies on different data types and facilitates the choice of appropriate methods for a certain application.</p>
</sec>
<sec>
<title>4.3. Regularization and interpretability</title>
<p>Haufe et al. (<xref ref-type="bibr" rid="B38">2013</xref>) demonstrated that the weight in linear discriminative models are unable to accurately assess the relationship between independent variables, primarily because of the contribution of noise in the decoding process. The authors concluded that the interpretability of brain decoding cannot be improved using regularization. The problem is primarily caused by the decoding process <italic>per se</italic>, where it minimizes the classification error only considering the uncertainty in the output space (Zhang, <xref ref-type="bibr" rid="B104">2005</xref>; Aggarwal and Yu, <xref ref-type="bibr" rid="B3">2009</xref>; Tzelepis et al., <xref ref-type="bibr" rid="B82">2015</xref>) and not the uncertainty in the input space (or noise). Our experimental results on the toy data (see Section 3.1) shows that if the right criterion is used for selecting the best values for hyper-parameters, appropriate choice of the regularization strategy can still play a significant role in improving the interpretability of results. For example, in the case of toy data, the true generative function behind the sampled data is sparse (see Section 2.6.1), but because of the noise in the data, the sparse model is not the most accurate one. On the other hand, a more comprehensive criterion (in this case, &#x003B6;<sub>&#x003A6;</sub>) that considers also the interpretability of model parameters facilitates the selection of correct prior assumptions about the distribution of the data via regularization. This observation encourages the modification of the conclusion in Haufe et al. (<xref ref-type="bibr" rid="B38">2013</xref>) as follows: if the performance of the model is the only criterion in the model selection, then the interpretability cannot necessarily be improved by means of regularization. This modification offers a practical shift in methodology, where we propose to replace the post-processing of weights proposed in Haufe et al. (<xref ref-type="bibr" rid="B38">2013</xref>) with refinement of hyper-parameter selection based on the newly developed model selection criterion.</p>
</sec>
<sec>
<title>4.4. The performance-interpretability dilemma</title>
<p>The performance-interpretability dilemma refers to the trade-off between the generalization performance and the interpretability of a decoding model. In some applications of brain decoding, such as BCI, a more accurate model (even with no interpretability) is desired. On the other hand, when the brain decoding is employed for hypothesis testing purpose, an astute balance between two factors is more favorable. The presented metric for model selection (&#x003B6;<sub>&#x003A6;</sub>) provides the possibility to maintain this balance. An important question at this point is on the nature of the performance-interpretability dilemma, whether it is model-driven or data-driven? In other words, whether some decoding models (e.g., sparse models) suffer from this deficit, or it is independent from the decoding model and depends on the distribution of data rather assumptions of the decoding model.</p>
<p>Our experiments shed light on the fact that the performance-interpretability dilemma is driven by the <italic>uncertainty</italic> (Aggarwal and Yu, <xref ref-type="bibr" rid="B3">2009</xref>) in data. The uncertainty in data refers to the difference between the true solution of decoding &#x003A6;<sup>&#x0002A;</sup> and the solution of decoding in sampled data space &#x003A6;<sub><italic>S</italic></sub>, and is generally consequence of noise in the input or/and output spaces. This gap between &#x003A6;<sup>&#x0002A;</sup> and &#x003A6;<sub><italic>S</italic></sub> is also known as irreducible error (see Equation 2) in the learning theory, and it cannot fundamentally be reduced by minimizing the error. Therefore, any attempt toward improving the classification performance in the sampled data space might increase the irreducible error. As an example, our experiment on the toy data (see Section 3.1) shows the effect of noise in input space on the performance-interpretability dilemma. Improving the performance of the model (i.e., fitting to &#x003A6;<sub><italic>S</italic></sub>) diverges the estimated solution of decoding <inline-formula><mml:math id="M190"><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> from its true solution &#x003A6;<sup>&#x0002A;</sup>, thus reduces the interpretability of the decoding model. Furthermore, our experiments demonstrate that incorporating the interpretability of decoding models in model selection facilitates finding the best match between the decoding model and the distribution of data. For example in classification of toy data, the new model selection metric &#x003B6;<sub>&#x003A6;</sub> selects the more sparse model with a better match to the true distribution of data, despite worse generalization performance.</p>
</sec>
<sec>
<title>4.5. Advantage over mass-univariate analysis</title>
<p>Mass-univariate hypothesis testing methods are among the most popular tools for forward inference on neuroimaging data in cognitive neuroscience field. Mass-univariate analyses consist of univariate statistical tests on single independent variables followed by multiple comparison correction. Generally, multiple comparison correction reduces the sensitivity of mass-univariate approaches because of the large number of univariate tests involved. Cluster-based permutation testing (Maris and Oostenveld, <xref ref-type="bibr" rid="B57">2007</xref>) provides a more sensitive univariate analysis framework by making the cluster assumption in the multiple comparison correction. Unfortunately, this method is not able to detect narrow spatio-temporal effects in the data (Groppe et al., <xref ref-type="bibr" rid="B30">2011a</xref>). As a remedy, brain decoding provides a very sensitive tool for hypothesis testing; it has the ability to detect multivariate patterns, but suffers from a low level of interpretability. Our study proposes a possible solution for the interpretability problem of classifiers, and therefore, it facilitates the application of brain decoding in the analysis of neuroimaging data. Our experimental results for the MEG data demonstrate that, although the non-parametric cluster-based permutation test is unable to detect the N170 effect in MEG data, employing &#x003B6;<sub>&#x003A6;</sub> instead of &#x003B4;<sub>&#x003A6;</sub> in model selection not only detects the stimuli-relevant information in the data, but also assures both reproducible and representative spatio-temporal mapping of the timing and the location of underlying neurophysiological effect.</p>
</sec>
<sec>
<title>4.6. Limitations and future directions</title>
<p>Despite theoretical and practical advantages, the proposed definition and quantification of interpretability suffer from some limitations. All of the presented concepts are defined for linear models, with the main assumption that <inline-formula><mml:math id="M191"><mml:msup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula> (where <inline-formula><mml:math id="M192"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:math></inline-formula> is a class of linear functions). This fact highlights the importance of linearizing the experimental protocol in the data collection phase (Naselaris et al., <xref ref-type="bibr" rid="B64">2011</xref>). Extending the definition of interpretability to non-linear models demands future research into the visualization of non-linear models in the form of brain maps. Currently, our findings cannot be directly applied to non-linear models. Furthermore, the proposed heuristic for the time-domain MEG data applies only to binary classification. One possible solution in multiclass classification is to separate the decoding problem into several binary sub-problems. In addition the quality of the proposed heuristic is limited for the small sample size datasets. Of course the proposed heuristic is just an example of possible options for assessing the neurophysiological plausibility of MBMs in time-locked analysis of MEG data, thus, improving the quality of heuristic would be of interest in future researches. Finding physiologically relevant heuristics for other acquisition modalities such as fMRI, or frequency domain MEEG data, can be also considered as possible directions in future work.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusions</title>
<p>We presented a novel theoretical definition for the interpretability of linear brain decoding and associated multivariate brain maps. We demonstrated how the interpretability relates to the representativeness and reproducibility of brain decoding. Although it is theoretical, the presented definition provides a first step toward practical solution for filling the knowledge extraction gap in linear brain decoding. As an example of this major breakthrough, and to provide a proof of concept, a heuristic approach based on the contrast event-related field is proposed for practical evaluation of the interpretability in multivariate recovery of evoked MEG responses. We experimentally showed that adding the interpretability of brain decoding models as a criterion in the model selection procedure yields significantly higher interpretable models by sacrificing a negligible amount of performance. Our methodological and experimental achievements can be considered a complementary theoretical and practical effort that contributes to researches on enhancing the interpretability of multivariate pattern analysis.</p>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>SK contributed in developing the theoretical and experimental contents of this study. SV and AP were involved in developing the theoretical machine learning aspects. NW was advising on the MEG experimental aspects.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack><p>The authors wish to thank Raffaele Tucciarelli, Emanuele Olivetti, and Paolo Avesani for valuable discussion and comments. We would like to also thank reviewers for their insightful comments that improved our work significantly. This work has been partially funded by the IIT-FBK grant n.0030882/14.</p>
</ack>
<sec sec-type="supplementary-material" id="s7">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://journal.frontiersin.org/article/10.3389/fnins.2016.00619/full#supplementary-material">http://journal.frontiersin.org/article/10.3389/fnins.2016.00619/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Presentation1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abadi</surname> <given-names>M.</given-names></name> <name><surname>Subramanian</surname> <given-names>R.</given-names></name> <name><surname>Kia</surname> <given-names>S.</given-names></name> <name><surname>Avesani</surname> <given-names>P.</given-names></name> <name><surname>Patras</surname> <given-names>I.</given-names></name> <name><surname>Sebe</surname> <given-names>N.</given-names></name></person-group> (<year>2015</year>). <article-title>DECAF: MEG-based multimodal database for decoding affective physiological responses</article-title>. <source>IEEE Trans. Affect. Comput.</source> <volume>6</volume>, <fpage>209</fpage>&#x02013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1109/TAFFC.2015.2392932</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Afshin-Pour</surname> <given-names>B.</given-names></name> <name><surname>Soltanian-Zadeh</surname> <given-names>H.</given-names></name> <name><surname>Hossein-Zadeh</surname> <given-names>G.-A.</given-names></name> <name><surname>Grady</surname> <given-names>C. L.</given-names></name> <name><surname>Strother</surname> <given-names>S. C.</given-names></name></person-group> (<year>2011</year>). <article-title>A mutual information-based metric for evaluation of fMRI data-processing approaches</article-title>. <source>Hum. Brain Mapping</source> <volume>32</volume>, <fpage>699</fpage>&#x02013;<lpage>715</lpage>. <pub-id pub-id-type="doi">10.1002/hbm.21057</pub-id><pub-id pub-id-type="pmid">20533565</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aggarwal</surname> <given-names>C. C.</given-names></name> <name><surname>Yu</surname> <given-names>P. S.</given-names></name></person-group> (<year>2009</year>). <article-title>A survey of uncertain data algorithms and applications</article-title>. <source>IEEE Transac. Knowl. Data Eng.</source> <volume>21</volume>, <fpage>609</fpage>&#x02013;<lpage>623</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2008.190</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>A.</given-names></name> <name><surname>Labus</surname> <given-names>J. S.</given-names></name> <name><surname>Vianna</surname> <given-names>E. P.</given-names></name> <name><surname>Mayer</surname> <given-names>E. A.</given-names></name> <name><surname>Cohen</surname> <given-names>M. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Common component classification: what can we learn from machine learning?</article-title> <source>Neuroimage</source> <volume>56</volume>, <fpage>517</fpage>&#x02013;<lpage>524</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.05.065</pub-id><pub-id pub-id-type="pmid">20599621</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bach</surname> <given-names>S.</given-names></name> <name><surname>Binder</surname> <given-names>A.</given-names></name> <name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Klauschen</surname> <given-names>F.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e130140</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0130140</pub-id><pub-id pub-id-type="pmid">26161953</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baehrens</surname> <given-names>D.</given-names></name> <name><surname>Schroeter</surname> <given-names>T.</given-names></name> <name><surname>Harmeling</surname> <given-names>S.</given-names></name> <name><surname>Kawanabe</surname> <given-names>M.</given-names></name> <name><surname>Hansen</surname> <given-names>K.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2010</year>). <article-title>How to explain individual classification decisions</article-title>. <source>J. Mach. Learn. Res.</source> <volume>11</volume>, <fpage>1803</fpage>&#x02013;<lpage>1831</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bentin</surname> <given-names>S.</given-names></name> <name><surname>Allison</surname> <given-names>T.</given-names></name> <name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>Perez</surname> <given-names>E.</given-names></name> <name><surname>McCarthy</surname> <given-names>G.</given-names></name></person-group> (<year>1996</year>). <article-title>Electrophysiological studies of face perception in humans</article-title>. <source>J. Cogn. Neurosci.</source> <volume>8</volume>, <fpage>551</fpage>&#x02013;<lpage>565</lpage>. <pub-id pub-id-type="pmid">20740065</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besserve</surname> <given-names>M.</given-names></name> <name><surname>Jerbi</surname> <given-names>K.</given-names></name> <name><surname>Laurent</surname> <given-names>F.</given-names></name> <name><surname>Baillet</surname> <given-names>S.</given-names></name> <name><surname>Martinerie</surname> <given-names>J.</given-names></name> <name><surname>Garnero</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>Classification methods for ongoing EEG and MEG signals</article-title>. <source>Biol. Res.</source> <volume>40</volume>, <fpage>415</fpage>&#x02013;<lpage>437</lpage>. <pub-id pub-id-type="doi">10.4067/S0716-97602007000500005</pub-id><pub-id pub-id-type="pmid">18575676</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bie&#x000DF;mann</surname> <given-names>F.</given-names></name> <name><surname>D&#x000E4;hne</surname> <given-names>S.</given-names></name> <name><surname>Meinecke</surname> <given-names>F. C.</given-names></name> <name><surname>Blankertz</surname> <given-names>B.</given-names></name> <name><surname>G&#x000F6;rgen</surname> <given-names>K.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>On the interpretability of linear multivariate neuroimaging analyses: filters, patterns and their relationship</article-title>, in <source>Proceedings of the 2nd NIPS Workshop on Machine Learning and Interpretation in Neuroimaging</source> (<publisher-loc>Lake Tahoe</publisher-loc>: <publisher-name>Harrahs and Harveys</publisher-name>).</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blankertz</surname> <given-names>B.</given-names></name> <name><surname>Lemm</surname> <given-names>S.</given-names></name> <name><surname>Treder</surname> <given-names>M.</given-names></name> <name><surname>Haufe</surname> <given-names>S.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2011</year>). <article-title>Single-trial analysis and classification of erp components a tutorial</article-title>. <source>Neuroimage</source> <volume>56</volume>, <fpage>814</fpage>&#x02013;<lpage>825</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.06.048</pub-id><pub-id pub-id-type="pmid">20600976</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bousquet</surname> <given-names>O.</given-names></name> <name><surname>Elisseeff</surname> <given-names>A.</given-names></name></person-group> (<year>2002</year>). <article-title>Stability and generalization</article-title>. <source>J. Mach. Learn. Res.</source> <volume>2</volume>, <fpage>499</fpage>&#x02013;<lpage>526</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn.</source> <volume>45</volume>, <fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brodersen</surname> <given-names>K. H.</given-names></name> <name><surname>Haiss</surname> <given-names>F.</given-names></name> <name><surname>Ong</surname> <given-names>C. S.</given-names></name> <name><surname>Jung</surname> <given-names>F.</given-names></name> <name><surname>Tittgemeyer</surname> <given-names>M.</given-names></name> <name><surname>Buhmann</surname> <given-names>J. M.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Model-based feature construction for multivariate decoding</article-title>. <source>Neuroimage</source> <volume>56</volume>, <fpage>601</fpage>&#x02013;<lpage>615</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.04.036</pub-id><pub-id pub-id-type="pmid">20406688</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bullmore</surname> <given-names>E.</given-names></name> <name><surname>Brammer</surname> <given-names>M.</given-names></name> <name><surname>Williams</surname> <given-names>S. C.</given-names></name> <name><surname>Rabe-Hesketh</surname> <given-names>S.</given-names></name> <name><surname>Janot</surname> <given-names>N.</given-names></name> <name><surname>David</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>1996</year>). <article-title>Statistical methods of estimation and inference for functional MR image analysis</article-title>. <source>Mag. Reson. Med.</source> <volume>35</volume>, <fpage>261</fpage>&#x02013;<lpage>277</lpage>. <pub-id pub-id-type="pmid">8622592</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bzdok</surname> <given-names>D.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Neuroimaging research: from null-hypothesis falsification to out-of-sample generalization</article-title>. <source>Educ. Psychol. Meas.</source> <pub-id pub-id-type="doi">10.1177/0013164416667982</pub-id>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://journals.sagepub.com/doi/full/10.1177/0013164416667982">http://journals.sagepub.com/doi/full/10.1177/0013164416667982</ext-link></citation>
</ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Caramia</surname> <given-names>M.</given-names></name> <name><surname>Dell&#x00027;Olmo</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <source>Multi-objective Management in Freight Logistics: Increasing Capacity, Service Level and Safety with Optimization Algorithms</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Springer</publisher-name>. <fpage>11</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-84800-382-8</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carroll</surname> <given-names>M. K.</given-names></name> <name><surname>Cecchi</surname> <given-names>G. A.</given-names></name> <name><surname>Rish</surname> <given-names>I.</given-names></name> <name><surname>Garg</surname> <given-names>R.</given-names></name> <name><surname>Rao</surname> <given-names>A. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Prediction and interpretation of distributed neural activity with sparse models</article-title>. <source>Neuroimage</source> <volume>44</volume>, <fpage>112</fpage>&#x02013;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2008.08.020</pub-id><pub-id pub-id-type="pmid">18793733</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chan</surname> <given-names>A. M.</given-names></name> <name><surname>Halgren</surname> <given-names>E.</given-names></name> <name><surname>Marinkovic</surname> <given-names>K.</given-names></name> <name><surname>Cash</surname> <given-names>S. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Decoding word and category-specific spatiotemporal representations from MEG and EEG</article-title>. <source>Neuroimage</source> <volume>54</volume>, <fpage>3028</fpage>&#x02013;<lpage>3039</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.10.073</pub-id><pub-id pub-id-type="pmid">21040796</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conroy</surname> <given-names>B. R.</given-names></name> <name><surname>Walz</surname> <given-names>J. M.</given-names></name> <name><surname>Sajda</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>Fast bootstrapping and permutation testing for assessing reproducibility and interpretability of multivariate fMRI decoding models</article-title>. <source>PLoS ONE</source> <volume>8</volume>:<fpage>e79271</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0079271</pub-id><pub-id pub-id-type="pmid">24244465</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cox</surname> <given-names>D. D.</given-names></name> <name><surname>Savoy</surname> <given-names>R. L.</given-names></name></person-group> (<year>2003</year>). <article-title>Functional magnetic resonance imaging (fMRI) brain reading: detecting and classifying distributed patterns of fMRI activity in human visual cortex</article-title>. <source>Neuroimage</source> <volume>19</volume>, <fpage>261</fpage>&#x02013;<lpage>270</lpage>. <pub-id pub-id-type="doi">10.1016/S1053-8119(03)00049-1</pub-id><pub-id pub-id-type="pmid">12814577</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crivellato</surname> <given-names>E.</given-names></name> <name><surname>Ribatti</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>Soul, mind, brain: Greek philosophy and the birth of neuroscience</article-title>. <source>Brain Res. Bull.</source> <volume>71</volume>, <fpage>327</fpage>&#x02013;<lpage>336</lpage>. <pub-id pub-id-type="doi">10.1016/j.brainresbull.2006.09.020</pub-id><pub-id pub-id-type="pmid">17208648</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cucker</surname> <given-names>F.</given-names></name> <name><surname>Smale</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>On the mathematical foundations of learning</article-title>. <source>Am. Math. Soc.</source> <volume>39</volume>, <fpage>1</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1090/S0273-0979-01-00923-5</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dash</surname> <given-names>S.</given-names></name> <name><surname>Malioutov</surname> <given-names>D. M.</given-names></name> <name><surname>Varshney</surname> <given-names>K. R.</given-names></name></person-group> (<year>2015</year>). <article-title>Learning interpretable classification rules using sequential rowsampling</article-title>, in <source>Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference</source> (<publisher-loc>South Brisbane, QLD</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davis</surname> <given-names>T.</given-names></name> <name><surname>LaRocque</surname> <given-names>K. F.</given-names></name> <name><surname>Mumford</surname> <given-names>J. A.</given-names></name> <name><surname>Norman</surname> <given-names>K. A.</given-names></name> <name><surname>Wagner</surname> <given-names>A. D.</given-names></name> <name><surname>Poldrack</surname> <given-names>R. A.</given-names></name></person-group> (<year>2014</year>). <article-title>What do differences between multi-voxel and univariate analysis mean? how subject-, voxel-, and trial-level variance impact fMRI analysis</article-title>. <source>Neuroimage</source> <volume>97</volume>, <fpage>271</fpage>&#x02013;<lpage>283</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2014.04.037</pub-id><pub-id pub-id-type="pmid">24768930</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Brecht</surname> <given-names>M.</given-names></name> <name><surname>Yamagishi</surname> <given-names>N.</given-names></name></person-group> (<year>2012</year>). <article-title>Combining sparseness and smoothness improves classification accuracy and interpretability</article-title>. <source>Neuroimage</source> <volume>60</volume>, <fpage>1550</fpage>&#x02013;<lpage>1561</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2011.12.085</pub-id><pub-id pub-id-type="pmid">22261376</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Domingos</surname> <given-names>P.</given-names></name></person-group> (<year>2000</year>). <article-title>A unified bias-variance decomposition for zero-one and squared loss</article-title>. <source>AAAI/IAAI</source> <volume>2000</volume>, <fpage>564</fpage>&#x02013;<lpage>569</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.aaai.org/Library/AAAI/2000/aaai00-086.php">http://www.aaai.org/Library/AAAI/2000/aaai00-086.php</ext-link></citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Efron</surname> <given-names>B.</given-names></name></person-group> (<year>1992</year>). <article-title>Bootstrap methods: another look at the jackknife</article-title>, in <source>Breakthroughs in Statistics: Methodology and Distribution</source>, eds <person-group person-group-type="editor"><name><surname>Kotz</surname> <given-names>S.</given-names></name> <name><surname>Johnson</surname> <given-names>N. L.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>569</fpage>&#x02013;<lpage>593</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4612-4380-9_41</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gramfort</surname> <given-names>A.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Identifying predictive regions from fMRI with TV-L1 prior</article-title>, in <source>International Workshop on Pattern Recognition in Neuroimaging (PRNI)</source> (<publisher-loc>Philadelphia, PA</publisher-loc>), <fpage>17</fpage>&#x02013;<lpage>20</lpage>.</citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gramfort</surname> <given-names>A.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2012</year>). <article-title>Beyond brain reading: randomized sparsity and clustering to simultaneously predict and identify</article-title>, in <source>Machine Learning and Interpretation in Neuroimaging</source>, eds <person-group person-group-type="editor"><name><surname>Langs</surname> <given-names>G.</given-names></name> <name><surname>Rish</surname> <given-names>I.</given-names></name> <name><surname>Grosse-Wentrup</surname> <given-names>M.</given-names></name> <name><surname>Murphy</surname> <given-names>B.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>). <fpage>9</fpage>&#x02013;<lpage>16</lpage>.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Groppe</surname> <given-names>D. M.</given-names></name> <name><surname>Urbach</surname> <given-names>T. P.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name></person-group> (<year>2011a</year>). <article-title>Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review</article-title>. <source>Psychophysiology</source> <volume>48</volume>, <fpage>1711</fpage>&#x02013;<lpage>1725</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.2011.01273.x</pub-id><pub-id pub-id-type="pmid">21895683</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Groppe</surname> <given-names>D. M.</given-names></name> <name><surname>Urbach</surname> <given-names>T. P.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name></person-group> (<year>2011b</year>). <article-title>Mass univariate analysis of event-related brain potentials/fields II: simulation studies</article-title>. <source>Psychophysiology</source> <volume>48</volume>, <fpage>1726</fpage>&#x02013;<lpage>1737</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.2011.01272.x</pub-id><pub-id pub-id-type="pmid">21895684</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grosenick</surname> <given-names>L.</given-names></name> <name><surname>Greer</surname> <given-names>S.</given-names></name> <name><surname>Knutson</surname> <given-names>B.</given-names></name></person-group> (<year>2008</year>). <article-title>Interpretable classifiers for fMRI improve prediction of purchases</article-title>. <source>IEEE Trans. Neural Sys. Rehabilit. Eng.</source> <volume>16</volume>, <fpage>539</fpage>&#x02013;<lpage>548</lpage>. <pub-id pub-id-type="doi">10.1109/TNSRE.2008.926701</pub-id><pub-id pub-id-type="pmid">19144586</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grosenick</surname> <given-names>L.</given-names></name> <name><surname>Klingenberg</surname> <given-names>B.</given-names></name> <name><surname>Greer</surname> <given-names>S.</given-names></name> <name><surname>Taylor</surname> <given-names>J.</given-names></name> <name><surname>Knutson</surname> <given-names>B.</given-names></name></person-group> (<year>2009</year>). <article-title>Whole-brain sparse penalized discriminant analysis for predicting choice</article-title>. <source>Neuroimage</source> <volume>47</volume>, <fpage>S58</fpage>. <pub-id pub-id-type="doi">10.1016/S1053-8119(09)70232-0</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grosenick</surname> <given-names>L.</given-names></name> <name><surname>Klingenberg</surname> <given-names>B.</given-names></name> <name><surname>Katovich</surname> <given-names>K.</given-names></name> <name><surname>Knutson</surname> <given-names>B.</given-names></name> <name><surname>Taylor</surname> <given-names>J. E.</given-names></name></person-group> (<year>2013</year>). <article-title>Interpretable whole-brain prediction analysis with graphnet</article-title>. <source>Neuroimage</source> <volume>72</volume>, <fpage>304</fpage>&#x02013;<lpage>321</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2012.12.062</pub-id><pub-id pub-id-type="pmid">23298747</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansen</surname> <given-names>K.</given-names></name> <name><surname>Baehrens</surname> <given-names>D.</given-names></name> <name><surname>Schroeter</surname> <given-names>T.</given-names></name> <name><surname>Rupp</surname> <given-names>M.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2011</year>). <article-title>Visual interpretation of kernel-based prediction models</article-title>. <source>Mol. Inform.</source> <volume>30</volume>, <fpage>817</fpage>&#x02013;<lpage>826</lpage>. <pub-id pub-id-type="doi">10.1002/minf.201100059</pub-id><pub-id pub-id-type="pmid">27467414</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hastie</surname> <given-names>T.</given-names></name> <name><surname>Tibshirani</surname> <given-names>R.</given-names></name> <name><surname>Friedman</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <source>The Elements of Statistical Learning</source>, <volume>Vol. 2</volume>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haufe</surname> <given-names>S.</given-names></name> <name><surname>D&#x000E4;hne</surname> <given-names>S.</given-names></name> <name><surname>Nikulin</surname> <given-names>V. V.</given-names></name></person-group> (<year>2014a</year>). <article-title>Dimensionality reduction for the analysis of brain oscillations</article-title>. <source>Neuroimage</source> <volume>101</volume>, <fpage>583</fpage>&#x02013;<lpage>597</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2014.06.073</pub-id><pub-id pub-id-type="pmid">25003816</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haufe</surname> <given-names>S.</given-names></name> <name><surname>Meinecke</surname> <given-names>F.</given-names></name> <name><surname>G&#x000F6;rgen</surname> <given-names>K.</given-names></name> <name><surname>D&#x000E4;hne</surname> <given-names>S.</given-names></name> <name><surname>Haynes</surname> <given-names>J.-D.</given-names></name> <name><surname>Blankertz</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>On the interpretation of weight vectors of linear models in multivariate neuroimaging</article-title>. <source>Neuroimage</source> <volume>87</volume>, <fpage>96</fpage>&#x02013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2013.10.067</pub-id><pub-id pub-id-type="pmid">24239590</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Haufe</surname> <given-names>S.</given-names></name> <name><surname>Meinecke</surname> <given-names>F.</given-names></name> <name><surname>Gorgen</surname> <given-names>K.</given-names></name> <name><surname>Dahne</surname> <given-names>S.</given-names></name> <name><surname>Haynes</surname> <given-names>J.-D.</given-names></name> <name><surname>Blankertz</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2014b</year>). <article-title>Parameter interpretation, regularization and source localization in multivariate linear models</article-title>, in <source>International Workshop on Pattern Recognition in Neuroimaging, (PRNI)</source> (<publisher-loc>Tubingen</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>4</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haxby</surname> <given-names>J. V.</given-names></name> <name><surname>Gobbini</surname> <given-names>M. I.</given-names></name> <name><surname>Furey</surname> <given-names>M. L.</given-names></name> <name><surname>Ishai</surname> <given-names>A.</given-names></name> <name><surname>Schouten</surname> <given-names>J. L.</given-names></name> <name><surname>Pietrini</surname> <given-names>P.</given-names></name></person-group> (<year>2001</year>). <article-title>Distributed and overlapping representations of faces and objects in ventral temporal cortex</article-title>. <source>Science</source> <volume>293</volume>, <fpage>2425</fpage>&#x02013;<lpage>2430</lpage>. <pub-id pub-id-type="doi">10.1126/science.1063736</pub-id><pub-id pub-id-type="pmid">11577229</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haynes</surname> <given-names>J.-D.</given-names></name></person-group> (<year>2015</year>). <article-title>A primer on pattern-based approaches to fMRI: Principles, pitfalls, and perspectives</article-title>. <source>Neuron</source> <volume>87</volume>, <fpage>257</fpage>&#x02013;<lpage>270</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2015.05.025</pub-id><pub-id pub-id-type="pmid">26182413</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haynes</surname> <given-names>J.-D.</given-names></name> <name><surname>Rees</surname> <given-names>G.</given-names></name></person-group> (<year>2006</year>). <article-title>Decoding mental states from brain activity in humans</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>7</volume>, <fpage>523</fpage>&#x02013;<lpage>534</lpage>. <pub-id pub-id-type="doi">10.1038/nrn1931</pub-id><pub-id pub-id-type="pmid">16791142</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henson</surname> <given-names>R. N.</given-names></name> <name><surname>Wakeman</surname> <given-names>D. G.</given-names></name> <name><surname>Litvak</surname> <given-names>V.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name></person-group> (<year>2011</year>). <article-title>A Parametric Empirical Bayesian framework for the EEG/MEG inverse problem: generative models for multisubject and multimodal integration</article-title>. <source>Front. Hum. Neurosci.</source> <volume>5</volume>:<fpage>76</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2011.00076</pub-id><pub-id pub-id-type="pmid">21904527</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huttunen</surname> <given-names>H.</given-names></name> <name><surname>Manninen</surname> <given-names>T.</given-names></name> <name><surname>Kauppi</surname> <given-names>J.-P.</given-names></name> <name><surname>Tohka</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Mind reading with regularized multinomial logistic regression</article-title>. <source>Mach. Visi. Appl.</source> <volume>24</volume>, <fpage>1311</fpage>&#x02013;<lpage>1325</lpage>. <pub-id pub-id-type="doi">10.1007/s00138-012-0464-y</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jenatton</surname> <given-names>R.</given-names></name> <name><surname>Audibert</surname> <given-names>J.-Y.</given-names></name> <name><surname>Bach</surname> <given-names>F.</given-names></name></person-group> (<year>2011</year>). <article-title>Structured variable selection with sparsity-inducing norms</article-title>. <source>J. Mach. Learn. Res.</source> <volume>12</volume>, <fpage>2777</fpage>&#x02013;<lpage>2824</lpage>.</citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kauppi</surname> <given-names>J.-P.</given-names></name> <name><surname>Parkkonen</surname> <given-names>L.</given-names></name> <name><surname>Hari</surname> <given-names>R.</given-names></name> <name><surname>Hyv&#x000E4;rinen</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Decoding magnetoencephalographic rhythmic activity using spectrospatial information</article-title>. <source>Neuroimage</source> <volume>83</volume>, <fpage>921</fpage>&#x02013;<lpage>936</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2013.07.026</pub-id><pub-id pub-id-type="pmid">23872494</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kia</surname> <given-names>S. M.</given-names></name> <name><surname>Vega-Pons</surname> <given-names>S.</given-names></name> <name><surname>Olivetti</surname> <given-names>E.</given-names></name> <name><surname>Avesani</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <source>Multi-Task Learning for Interpretation of Brain Decoding Models</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>.</citation>
</ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kohavi</surname> <given-names>R.</given-names></name></person-group> (<year>1995</year>). <article-title>A study of cross-validation and bootstrap for accuracy estimation and model selection</article-title>, in <source>Proceedings of the 14th International Joint Conference on Artificial Intelligence, Vol. 2</source> (<publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Morgan Kaufmann Publishers Inc.</publisher-name>), <fpage>1137</fpage>&#x02013;<lpage>1143</lpage>.</citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kriegeskorte</surname> <given-names>N.</given-names></name> <name><surname>Goebel</surname> <given-names>R.</given-names></name> <name><surname>Bandettini</surname> <given-names>P.</given-names></name></person-group> (<year>2006</year>). <article-title>Information-based functional brain mapping</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>103</volume>, <fpage>3863</fpage>&#x02013;<lpage>3868</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0600244103</pub-id><pub-id pub-id-type="pmid">16537458</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kriegeskorte</surname> <given-names>N.</given-names></name> <name><surname>Simmons</surname> <given-names>W. K.</given-names></name> <name><surname>Bellgowan</surname> <given-names>P. S.</given-names></name> <name><surname>Baker</surname> <given-names>C. I.</given-names></name></person-group> (<year>2009</year>). <article-title>Circular analysis in systems neuroscience: the dangers of double dipping</article-title>. <source>Nat. Neurosci.</source> <volume>12</volume>, <fpage>535</fpage>&#x02013;<lpage>540</lpage>. <pub-id pub-id-type="doi">10.1038/nn.2303</pub-id><pub-id pub-id-type="pmid">19396166</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LaConte</surname> <given-names>S.</given-names></name> <name><surname>Strother</surname> <given-names>S.</given-names></name> <name><surname>Cherkassky</surname> <given-names>V.</given-names></name> <name><surname>Anderson</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2005</year>). <article-title>Support vector machines for temporal classification of block design fMRI data</article-title>. <source>Neuroimage</source> <volume>26</volume>, <fpage>317</fpage>&#x02013;<lpage>329</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2005.01.048</pub-id><pub-id pub-id-type="pmid">15907293</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Langs</surname> <given-names>G.</given-names></name> <name><surname>Menze</surname> <given-names>B. H.</given-names></name> <name><surname>Lashkari</surname> <given-names>D.</given-names></name> <name><surname>Golland</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>Detecting stable distributed patterns of brain activation using gini contrast</article-title>. <source>Neuroimage</source> <volume>56</volume>, <fpage>497</fpage>&#x02013;<lpage>507</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.07.074</pub-id><pub-id pub-id-type="pmid">20709176</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lemm</surname> <given-names>S.</given-names></name> <name><surname>Blankertz</surname> <given-names>B.</given-names></name> <name><surname>Dickhaus</surname> <given-names>T.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2011</year>). <article-title>Introduction to machine learning for brain imaging</article-title>. <source>Neuroimage</source> <volume>56</volume>, <fpage>387</fpage>&#x02013;<lpage>399</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.11.004</pub-id><pub-id pub-id-type="pmid">21172442</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lim</surname> <given-names>C.</given-names></name> <name><surname>Yu</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Estimation stability with cross validation (escv)</article-title>. <source>J. Comput. Graphical Statist.</source> <volume>25</volume>, <fpage>464</fpage>&#x02013;<lpage>492</lpage>. <pub-id pub-id-type="doi">10.1080/10618600.2015.1020159</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lipton</surname> <given-names>Z. C.</given-names></name> <name><surname>Kale</surname> <given-names>D. C.</given-names></name> <name><surname>Elkan</surname> <given-names>C.</given-names></name> <name><surname>Wetzell</surname> <given-names>R.</given-names></name> <name><surname>Vikram</surname> <given-names>S.</given-names></name> <name><surname>McAuley</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>The mythos of model interpretability</article-title>. <source>IEEE Spectrum.</source></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maris</surname> <given-names>E.</given-names></name></person-group> (<year>2012</year>). <article-title>Statistical testing in electrophysiological studies</article-title>. <source>Psychophysiology</source> <volume>49</volume>, <fpage>549</fpage>&#x02013;<lpage>565</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.2011.01320.x</pub-id><pub-id pub-id-type="pmid">22176204</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maris</surname> <given-names>E.</given-names></name> <name><surname>Oostenveld</surname> <given-names>R.</given-names></name></person-group> (<year>2007</year>). <article-title>Nonparametric statistical testing of EEG-and MEG-data</article-title>. <source>J. Neurosci. Methods</source> <volume>164</volume>, <fpage>177</fpage>&#x02013;<lpage>190</lpage>. <pub-id pub-id-type="doi">10.1016/j.jneumeth.2007.03.024</pub-id><pub-id pub-id-type="pmid">17517438</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marler</surname> <given-names>R. T.</given-names></name> <name><surname>Arora</surname> <given-names>J. S.</given-names></name></person-group> (<year>2004</year>). <article-title>Survey of multi-objective optimization methods for engineering</article-title>. <source>Struc. Multidiscipl. Optimiz.</source> <volume>26</volume>, <fpage>369</fpage>&#x02013;<lpage>395</lpage>. <pub-id pub-id-type="doi">10.1007/s00158-003-0368-6</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Michel</surname> <given-names>V.</given-names></name> <name><surname>Gramfort</surname> <given-names>A.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Eger</surname> <given-names>E.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). <article-title>Total variation regularization for fMRI-based prediction of behavior</article-title>. <source>Imaging IEEE Transac. Med.</source> <volume>30</volume>, <fpage>1328</fpage>&#x02013;<lpage>1340</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2011.2113378</pub-id><pub-id pub-id-type="pmid">21317080</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mitchell</surname> <given-names>T. M.</given-names></name> <name><surname>Hutchinson</surname> <given-names>R.</given-names></name> <name><surname>Niculescu</surname> <given-names>R. S.</given-names></name> <name><surname>Pereira</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Just</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2004</year>). <article-title>Learning to decode cognitive states from brain images</article-title>. <source>Mach. Learn.</source> <volume>57</volume>, <fpage>145</fpage>&#x02013;<lpage>175</lpage>. <pub-id pub-id-type="doi">10.1023/B:MACH.0000035475.85309.1b</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Braun</surname> <given-names>M.</given-names></name> <name><surname>Krueger</surname> <given-names>T.</given-names></name> <name><surname>Muller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2013</year>). <article-title>Analyzing local structure in kernel-based learning: Explanation, complexity, and reliability assessment</article-title>. <source>Signal Process. Magaz. IEEE</source> <volume>30</volume>, <fpage>62</fpage>&#x02013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1109/MSP.2013.2249294</pub-id></citation>
</ref>
<ref id="B62">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>M&#x000F8;rch</surname> <given-names>N.</given-names></name> <name><surname>Hansen</surname> <given-names>L. K.</given-names></name> <name><surname>Strother</surname> <given-names>S. C.</given-names></name> <name><surname>Svarer</surname> <given-names>C.</given-names></name> <name><surname>Rottenberg</surname> <given-names>D. A.</given-names></name> <name><surname>Lautrup</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>1997</year>). <article-title>Nonlinear versus linear models in functional neuroimaging: Learning curves and generalization crossover</article-title>, in <source>Information Processing in Medical Imaging</source>, eds <person-group person-group-type="editor"><name><surname>Duncan</surname> <given-names>J.</given-names></name> <name><surname>Gindi</surname> <given-names>G.</given-names></name></person-group> (<publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>259</fpage>&#x02013;<lpage>270</lpage>.</citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naselaris</surname> <given-names>T.</given-names></name> <name><surname>Kay</surname> <given-names>K. N.</given-names></name></person-group> (<year>2015</year>). <article-title>Resolving ambiguities of MVPA using explicit models of representation</article-title>. <source>Trends Cogn. Sci.</source> <volume>19</volume>, <fpage>551</fpage>&#x02013;<lpage>554</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2015.07.005</pub-id><pub-id pub-id-type="pmid">26412094</pub-id></citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naselaris</surname> <given-names>T.</given-names></name> <name><surname>Kay</surname> <given-names>K. N.</given-names></name> <name><surname>Nishimoto</surname> <given-names>S.</given-names></name> <name><surname>Gallant</surname> <given-names>J. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Encoding and decoding in fMRI</article-title>. <source>Neuroimage</source> <volume>56</volume>, <fpage>400</fpage>&#x02013;<lpage>410</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.07.073</pub-id><pub-id pub-id-type="pmid">20691790</pub-id></citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Norman</surname> <given-names>K. A.</given-names></name> <name><surname>Polyn</surname> <given-names>S. M.</given-names></name> <name><surname>Detre</surname> <given-names>G. J.</given-names></name> <name><surname>Haxby</surname> <given-names>J. V.</given-names></name></person-group> (<year>2006</year>). <article-title>Beyond mind-reading: multi-voxel pattern analysis of fMRI data</article-title>. <source>Trends Cognitive Sci.</source> <volume>10</volume>, <fpage>424</fpage>&#x02013;<lpage>430</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2006.07.005</pub-id><pub-id pub-id-type="pmid">16899397</pub-id></citation>
</ref>
<ref id="B66">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Olivetti</surname> <given-names>E.</given-names></name> <name><surname>Kia</surname> <given-names>S. M.</given-names></name> <name><surname>Avesani</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>MEG decoding across subjects</article-title>, in <source>International Workshop on Pattern Recognition in Neuroimaging</source> (<publisher-loc>Tubingen</publisher-loc>: <publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oostenveld</surname> <given-names>R.</given-names></name> <name><surname>Fries</surname> <given-names>P.</given-names></name> <name><surname>Maris</surname> <given-names>E.</given-names></name> <name><surname>Schoffelen</surname> <given-names>J.-M.</given-names></name></person-group> (<year>2010</year>). <article-title>Fieldtrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data</article-title>. <source>Comput. Intell. Neurosci.</source> <volume>2011</volume>:<fpage>156869</fpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.hindawi.com/journals/cin/2011/156869/cta/">https://www.hindawi.com/journals/cin/2011/156869/cta/</ext-link> <pub-id pub-id-type="doi">10.1155/2011/156869</pub-id><pub-id pub-id-type="pmid">21253357</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parra</surname> <given-names>L.</given-names></name> <name><surname>Alvino</surname> <given-names>C.</given-names></name> <name><surname>Tang</surname> <given-names>A.</given-names></name> <name><surname>Pearlmutter</surname> <given-names>B.</given-names></name> <name><surname>Yeung</surname> <given-names>N.</given-names></name> <name><surname>Osman</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Single-trial detection in EEG and MEG: keeping it linear</article-title>. <source>Neurocomputing</source> 52&#x02013;<volume>54</volume>, <fpage>177</fpage>&#x02013;<lpage>183</lpage>. <pub-id pub-id-type="doi">10.1016/s0925-2312(02)00821-4</pub-id></citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira</surname> <given-names>F.</given-names></name> <name><surname>Mitchell</surname> <given-names>T.</given-names></name> <name><surname>Botvinick</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Machine learning classifiers and fMRI: a tutorial overview</article-title>. <source>NeuroImage</source> <volume>45</volume>, <fpage>199</fpage>&#x02013;<lpage>209</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2008.11.007</pub-id><pub-id pub-id-type="pmid">19070668</pub-id></citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rasmussen</surname> <given-names>P. M.</given-names></name> <name><surname>Hansen</surname> <given-names>L. K.</given-names></name> <name><surname>Madsen</surname> <given-names>K. H.</given-names></name> <name><surname>Churchill</surname> <given-names>N. W.</given-names></name> <name><surname>Strother</surname> <given-names>S. C.</given-names></name></person-group> (<year>2012</year>). <article-title>Model sparsity and brain pattern interpretation of classification models in neuroimaging</article-title>. <source>Pattern Recogn.</source> <volume>45</volume>, <fpage>2085</fpage>&#x02013;<lpage>2100</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2011.09.011</pub-id></citation>
</ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rieger</surname> <given-names>J. W.</given-names></name> <name><surname>Reichert</surname> <given-names>C.</given-names></name> <name><surname>Gegenfurtner</surname> <given-names>K. R.</given-names></name> <name><surname>Noesselt</surname> <given-names>T.</given-names></name> <name><surname>Braun</surname> <given-names>C.</given-names></name> <name><surname>Heinze</surname> <given-names>H.-J.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Predicting the recognition of natural scenes from single trial MEG recordings of brain activity</article-title>. <source>Neuroimage</source> <volume>42</volume>, <fpage>1056</fpage>&#x02013;<lpage>1068</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2008.06.014</pub-id><pub-id pub-id-type="pmid">18620063</pub-id></citation>
</ref>
<ref id="B72">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rish</surname> <given-names>I.</given-names></name> <name><surname>Cecchi</surname> <given-names>G. A.</given-names></name> <name><surname>Lozano</surname> <given-names>A.</given-names></name> <name><surname>Niculescu-Mizil</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <source>Practical Applications of Sparse Modeling</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name></citation>
</ref>
<ref id="B73">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rugg</surname> <given-names>M. D.</given-names></name> <name><surname>Coles</surname> <given-names>M. G.</given-names></name></person-group> (<year>1995</year>). <source>Electrophysiology of Mind: Event-Related Brain Potentials and Cognition</source>. <publisher-loc>Oxford</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sabuncu</surname> <given-names>M. R.</given-names></name></person-group> (<year>2014</year>). <article-title>A universal and efficient method to compute maps from image-based prediction models</article-title>. <source>Med. Image Comput. Comput. Assist. Intervent.</source> <volume>17</volume>(<issue>Pt 3</issue>), <fpage>353</fpage>&#x02013;<lpage>360</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-10443-0_45</pub-id><pub-id pub-id-type="pmid">25320819</pub-id></citation>
</ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shervashidze</surname> <given-names>N.</given-names></name> <name><surname>Bach</surname> <given-names>F.</given-names></name></person-group> (<year>2015</year>). <article-title>Learning the structure for structured sparsity</article-title>. <source>IEEE Trans. Signal Process.</source> <volume>63</volume>, <fpage>4894</fpage>&#x02013;<lpage>4902</lpage>. <pub-id pub-id-type="doi">10.1109/TSP.2015.2446432</pub-id></citation>
</ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spruill</surname> <given-names>M. C.</given-names></name></person-group> (<year>2007</year>). <article-title>Asymptotic distribution of coordinates on high dimensional spheres</article-title>. <source>Electron. Communic. Probab.</source> <volume>12</volume>, <fpage>234</fpage>&#x02013;<lpage>247</lpage>. <pub-id pub-id-type="doi">10.1214/ECP.v12-1294</pub-id></citation>
</ref>
<ref id="B77">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Strother</surname> <given-names>S. C.</given-names></name> <name><surname>Rasmussen</surname> <given-names>P. M.</given-names></name> <name><surname>Churchill</surname> <given-names>N. W.</given-names></name> <name><surname>Hansen</surname> <given-names>K.</given-names></name></person-group> (<year>2014</year>). <source>Stability and Reproducibility in fMRI Analysis</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>.</citation>
</ref>
<ref id="B78">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Taulu</surname> <given-names>S.</given-names></name> <name><surname>Simola</surname> <given-names>J.</given-names></name> <name><surname>Nenonen</surname> <given-names>J.</given-names></name> <name><surname>Parkkonen</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>Novel noise reduction methods</article-title>, <source>Magnetoencephalography: From Signals to Dynamic Cortical Networks</source>, eds <person-group person-group-type="editor"><name><surname>Supek</surname> <given-names>S.</given-names></name> <name><surname>Aine</surname> <given-names>C. J.</given-names></name></person-group> (<publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>35</fpage>&#x02013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-33045-2_2</pub-id></citation>
</ref>
<ref id="B79">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tibshirani</surname> <given-names>R.</given-names></name></person-group> (<year>1996a</year>). <source>Bias, Variance and Prediction Error for Classification Rules.</source> <publisher-loc>Toronto, ON</publisher-loc>: <publisher-name>University of Toronto; Department of Statistics</publisher-name>.</citation>
</ref>
<ref id="B80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tibshirani</surname> <given-names>R.</given-names></name></person-group> (<year>1996b</year>). <article-title>Regression shrinkage and selection via the lasso</article-title>. <source>J. R. Statist. Soc. Ser. B (Methodol)</source> <volume>58</volume>, <fpage>267</fpage>&#x02013;<lpage>288</lpage>.</citation>
</ref>
<ref id="B81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tibshirani</surname> <given-names>R.</given-names></name> <name><surname>Saunders</surname> <given-names>M.</given-names></name> <name><surname>Rosset</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Knight</surname> <given-names>K.</given-names></name></person-group> (<year>2005</year>). <article-title>Sparsity and smoothness via the fused lasso</article-title>. <source>J. R. Statist. Soc. Ser. B (Statist. Methodol.)</source> <volume>67</volume>, <fpage>91</fpage>&#x02013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9868.2005.00490.x</pub-id></citation>
</ref>
<ref id="B82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tzelepis</surname> <given-names>C.</given-names></name> <name><surname>Mezaris</surname> <given-names>V.</given-names></name> <name><surname>Patras</surname> <given-names>I.</given-names></name></person-group> (<year>2015</year>). <article-title>Linear maximum margin classifier for learning from uncertain data</article-title>. <source>arXiv preprint arXiv:1504.03892</source>.</citation>
</ref>
<ref id="B83">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valentini</surname> <given-names>G.</given-names></name> <name><surname>Dietterich</surname> <given-names>T. G.</given-names></name></person-group> (<year>2004</year>). <article-title>Bias-variance analysis of support vector machines for the development of svm-based ensemble methods</article-title>. <source>J. Mach. Learn. Res.</source> <volume>5</volume>, <fpage>725</fpage>&#x02013;<lpage>775</lpage>.</citation>
</ref>
<ref id="B84">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valverde-Albacete</surname> <given-names>F. J.</given-names></name> <name><surname>Pel&#x000E1;ez-Moreno</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox</article-title>. <source>PLoS ONE</source> <volume>9</volume>:<fpage>e84217</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0084217</pub-id><pub-id pub-id-type="pmid">24427282</pub-id></citation>
</ref>
<ref id="B85">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Ede</surname> <given-names>F.</given-names></name> <name><surname>Maris</surname> <given-names>E.</given-names></name></person-group> (<year>2016</year>). <article-title>Physiological plausibility can increase reproducibility in cognitive neuroscience</article-title>. <source>Trends Cogn. Sci.</source> <volume>20</volume>, <fpage>567</fpage>&#x02013;<lpage>569</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2016.05.006</pub-id><pub-id pub-id-type="pmid">27233147</pub-id></citation>
</ref>
<ref id="B86">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Gerven</surname> <given-names>M.</given-names></name> <name><surname>Hesse</surname> <given-names>C.</given-names></name> <name><surname>Jensen</surname> <given-names>O.</given-names></name> <name><surname>Heskes</surname> <given-names>T.</given-names></name></person-group> (<year>2009</year>). <article-title>Interpreting single trial data using groupwise regularisation</article-title>. <source>NeuroImage</source> <volume>46</volume>, <fpage>665</fpage>&#x02013;<lpage>676</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.02.041</pub-id><pub-id pub-id-type="pmid">19285139</pub-id></citation>
</ref>
<ref id="B87">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vapnik</surname> <given-names>V. N.</given-names></name> <name><surname>Kotz</surname> <given-names>S.</given-names></name></person-group> (<year>1982</year>). <source>Estimation of Dependences Based on Empirical Data</source>, <volume>Vol. 40</volume>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer-verlag</publisher-name>.</citation>
</ref>
<ref id="B88">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Gramfort</surname> <given-names>A.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2012</year>). <article-title>Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering</article-title>, in <source>Proceedings of the 29th International Conference on Machine Learning (ICML-12)</source> (<publisher-loc>Edinburgh</publisher-loc>), <fpage>1375</fpage>&#x02013;<lpage>1382</lpage>.</citation>
</ref>
<ref id="B89">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Kowalski</surname> <given-names>M.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Social-sparsity brain decoders: faster spatial sparsity</article-title>, in <source>Pattern Recognition in Neuroimaging, 2016 International Workshop on</source> (<publisher-name>IEEE</publisher-name>).</citation>
</ref>
<ref id="B90">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Raamana</surname> <given-names>P. R.</given-names></name> <name><surname>Engemann</surname> <given-names>D. A.</given-names></name> <name><surname>Hoyos-Idrobo</surname> <given-names>A.</given-names></name> <name><surname>Schwartz</surname> <given-names>Y.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>Assessing and tuning brain decoders: cross-validation, caveats, and guidelines</article-title>. <source>Neuroimage</source> <volume>145</volume>, <fpage>166</fpage>&#x02013;<lpage>179</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2016.10.038</pub-id><pub-id pub-id-type="pmid">27989847</pub-id></citation>
</ref>
<ref id="B91">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name></person-group> (<year>2014</year>). <article-title>How machine learning is shaping cognitive neuroimaging</article-title>. <source>GigaScience</source> <volume>3</volume>:<fpage>28</fpage>. <pub-id pub-id-type="doi">10.1186/2047-217X-3-28</pub-id><pub-id pub-id-type="pmid">25405022</pub-id></citation>
</ref>
<ref id="B92">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vellido</surname> <given-names>A.</given-names></name> <name><surname>Martin-Guerroro</surname> <given-names>J.</given-names></name> <name><surname>Lisboa</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>Making machine learning models interpretable</article-title>, in <source>Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)</source> (<publisher-loc>Bruges</publisher-loc>), <fpage>163</fpage>&#x02013;<lpage>172</lpage>.</citation>
</ref>
<ref id="B93">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vidaurre</surname> <given-names>D.</given-names></name> <name><surname>Bielza</surname> <given-names>C.</given-names></name> <name><surname>Larra&#x000F1;aga</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>A survey of L1 regression</article-title>. <source>Int. Statist. Rev.</source> <volume>81</volume>, <fpage>361</fpage>&#x02013;<lpage>387</lpage>. <pub-id pub-id-type="doi">10.1111/insr.12023</pub-id></citation>
</ref>
<ref id="B94">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Zheng</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Duan</surname> <given-names>X.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name></person-group> (<year>2015</year>). <article-title>Randomized structural sparsity via constrained block subsampling for improved sensitivity of discriminative voxel identification</article-title>. <source>Neuroimage</source> <volume>117</volume>, <fpage>170</fpage>&#x02013;<lpage>183</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2015.05.057</pub-id><pub-id pub-id-type="pmid">26027884</pub-id></citation>
</ref>
<ref id="B95">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weichwald</surname> <given-names>S.</given-names></name> <name><surname>Meyer</surname> <given-names>T.</given-names></name> <name><surname>&#x000D6;zdenizci</surname> <given-names>O.</given-names></name> <name><surname>Sch&#x000F6;lkopf</surname> <given-names>B.</given-names></name> <name><surname>Ball</surname> <given-names>T.</given-names></name> <name><surname>Grosse-Wentrup</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>Causal interpretation rules for encoding and decoding models in neuroimaging</article-title>. <source>Neuroimage</source> <volume>110</volume>, <fpage>48</fpage>&#x02013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2015.01.036</pub-id><pub-id pub-id-type="pmid">25623501</pub-id></citation>
</ref>
<ref id="B96">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolpaw</surname> <given-names>J. R.</given-names></name> <name><surname>Birbaumer</surname> <given-names>N.</given-names></name> <name><surname>McFarland</surname> <given-names>D. J.</given-names></name> <name><surname>Pfurtscheller</surname> <given-names>G.</given-names></name> <name><surname>Vaughan</surname> <given-names>T. M.</given-names></name></person-group> (<year>2002</year>). <article-title>Brain&#x02013;computer interfaces for communication and control</article-title>. <source>Clin. Neurophysiol.</source> <volume>113</volume>, <fpage>767</fpage>&#x02013;<lpage>791</lpage>. <pub-id pub-id-type="doi">10.1016/S1388-2457(02)00057-3</pub-id><pub-id pub-id-type="pmid">12048038</pub-id></citation>
</ref>
<ref id="B97">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolpert</surname> <given-names>D. H.</given-names></name> <name><surname>Macready</surname> <given-names>W. G.</given-names></name></person-group> (<year>1999</year>). <article-title>An efficient method to estimate bagging&#x00027;s generalization error</article-title>. <source>Machine Learning</source> <volume>35</volume>, <fpage>41</fpage>&#x02013;<lpage>55</lpage>.</citation>
</ref>
<ref id="B98">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>M. C.-K.</given-names></name> <name><surname>David</surname> <given-names>S. V.</given-names></name> <name><surname>Gallant</surname> <given-names>J. L.</given-names></name></person-group> (<year>2006</year>). <article-title>Complete functional characterization of sensory neurons by system identification</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>29</volume>, <fpage>477</fpage>&#x02013;<lpage>505</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.neuro.29.051605.113024</pub-id><pub-id pub-id-type="pmid">16776594</pub-id></citation>
</ref>
<ref id="B99">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xing</surname> <given-names>E. P.</given-names></name> <name><surname>Kolar</surname> <given-names>M.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name></person-group> (<year>2014</year>). <article-title>High-dimensional sparse structured input-output models, with applications to gwas</article-title>, in <source>Practical Applications of Sparse Modeling</source> (<publisher-loc>Cambridgem, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>37</fpage>&#x02013;<lpage>64</lpage>.</citation>
</ref>
<ref id="B100">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yeung</surname> <given-names>N.</given-names></name> <name><surname>Bogacz</surname> <given-names>R.</given-names></name> <name><surname>Holroyd</surname> <given-names>C. B.</given-names></name> <name><surname>Cohen</surname> <given-names>J. D.</given-names></name></person-group> (<year>2004</year>). <article-title>Detection of synchronized oscillations in the electroencephalogram: an evaluation of methods</article-title>. <source>Psychophysiology</source> <volume>41</volume>, <fpage>822</fpage>&#x02013;<lpage>832</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.2004.00239.x</pub-id><pub-id pub-id-type="pmid">15563335</pub-id></citation>
</ref>
<ref id="B101">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Stability</article-title>. <source>Bernoulli</source> <volume>19</volume>, <fpage>1484</fpage>&#x02013;<lpage>1500</lpage>. <pub-id pub-id-type="doi">10.3150/13-BEJSP14</pub-id></citation>
</ref>
<ref id="B102">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>D.</given-names></name> <name><surname>Lee</surname> <given-names>S. J.</given-names></name> <name><surname>Lee</surname> <given-names>W. J.</given-names></name> <name><surname>Kim</surname> <given-names>S. C.</given-names></name> <name><surname>Lim</surname> <given-names>J.</given-names></name> <name><surname>Kwon</surname> <given-names>S. W.</given-names></name></person-group> (<year>2015</year>). <article-title>Classification of spectral data using fused lasso logistic regression</article-title>. <source>Chemometrics Intell. Lab. Sys.</source> <volume>142</volume>, <fpage>70</fpage>&#x02013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2015.01.006</pub-id></citation>
</ref>
<ref id="B103">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>M.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name></person-group> (<year>2006</year>). <article-title>Model selection and estimation in regression with grouped variables</article-title>. <source>J. R. Stat. Soc. Ser. B (Stat. Methodol.)</source> <volume>68</volume>, <fpage>49</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9868.2005.00532.x</pub-id></citation>
</ref>
<ref id="B104">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>J. B. T.</given-names></name></person-group> (<year>2005</year>). <article-title>Support vector classification with input data uncertainty</article-title>, in <source>Advances in Neural Information Processing Systems</source>, eds <person-group person-group-type="editor"><name><surname>Saul</surname> <given-names>L. K.</given-names></name> <name><surname>Weiss</surname> <given-names>Y.</given-names></name> <name><surname>Bottou</surname> <given-names>L.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>The MIT Press</publisher-name>), <volume>17</volume>, <fpage>161</fpage>&#x02013;<lpage>168</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://papers.nips.cc/paper/2743-support-vector-classification-with-input-data-uncertainty">http://papers.nips.cc/paper/2743-support-vector-classification-with-input-data-uncertainty</ext-link></citation>
</ref>
<ref id="B105">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zou</surname> <given-names>H.</given-names></name> <name><surname>Hastie</surname> <given-names>T.</given-names></name></person-group> (<year>2005</year>). <article-title>Regularization and variable selection via the elastic net</article-title>. <source>J. R. Stat. Soc. Ser. B</source> <volume>67</volume>, <fpage>301</fpage>&#x02013;<lpage>320</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9868.2005.00503.x</pub-id></citation>
</ref>
</ref-list>
<app-group>
<app>
<title>A. Appendices</title>
<sec>
<title>A.1. Proof of proposition 1</title>
<p>Throughout this proof, we assume that all of the parameter vectors are normalized in the unit hypersphere (see Figure <xref ref-type="fig" rid="FA1">A1</xref> as an illustrative example in two dimensions). Let <inline-formula><mml:math id="M194"><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> be a set <italic>m</italic> MBMs, for <italic>m</italic> perturbed training sets where <inline-formula><mml:math id="M195"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Now, consider any arbitrary <italic>p</italic> &#x02212; 1 dimensional hyperplane <inline-formula><mml:math id="M196"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:math></inline-formula> that contains <inline-formula><mml:math id="M197"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Clearly, <inline-formula><mml:math id="M198"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:math></inline-formula> divides the <italic>p</italic>-dimensional parameter space into two subspaces. Let &#x025BD; and &#x025BC; be binary operators where <inline-formula><mml:math id="M199"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo class="MathClass-ord">&#x025BD;</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> indicates that <inline-formula><mml:math id="M200"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M201"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are in the same subspace, and <inline-formula><mml:math id="M202"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo class="MathClass-ord">&#x025BC;</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> indicates that they are in different subspaces. Now, we define <inline-formula><mml:math id="M203"><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo class="MathClass-ord">&#x025BD;</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M204"><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo class="MathClass-ord">&#x025BC;</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. Let the cardinality of <italic>T</italic><sub><italic>L</italic></sub> denoted by <italic>n</italic>(<italic>T</italic><sub><italic>L</italic></sub>) be <italic>j</italic> (<italic>n</italic>(<italic>T</italic><sub><italic>L</italic></sub>) &#x0003D; <italic>j</italic>). Thus, <italic>n</italic>(<italic>T</italic><sub><italic>U</italic></sub>) &#x0003D; <italic>m</italic> &#x02212; <italic>j</italic>. Now, assume that <inline-formula><mml:math id="M205"><mml:mo class="MathClass-ord">&#x02221;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> are the angles between <inline-formula><mml:math id="M206"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M207"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:math></inline-formula>, and (similarly) &#x003B1;<sub><italic>j</italic>&#x0002B;1</sub>, &#x02026;, &#x003B1;<sub><italic>m</italic></sub> for <inline-formula><mml:math id="M208"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M209"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:math></inline-formula>. Based on Equation (6), let <inline-formula><mml:math id="M210"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M211"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> be the main maps of <italic>T</italic><sub><italic>L</italic></sub> and <italic>T</italic><sub><italic>U</italic></sub>, respectively. Therefore, we obtain <inline-formula><mml:math id="M212"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:math></inline-formula> and <inline-formula><mml:math id="M213"><mml:mo class="MathClass-ord">&#x02221;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo class="MathClass-ord">&#x02221;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:math></inline-formula>. Furthermore, assume <inline-formula><mml:math id="M214"><mml:mo class="MathClass-ord">&#x02221;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003B3;</mml:mi></mml:math></inline-formula>. As a result, &#x003C8;<sub>&#x003A6;</sub> &#x0003D; cos(&#x003B1;) and &#x003B2;<sub>&#x003A6;</sub> &#x0003D; cos(&#x003B3;). According to Equation (4) and using a cosine similarity definition, we have:</p>
<disp-formula id="E18"><label>(A1)</label><mml:math id="M18"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msup><mml:mover accent='true'><mml:mo>&#x00398;</mml:mo><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mo>*</mml:mo></mml:msup><mml:mo>.</mml:mo><mml:msup><mml:mover accent='true'><mml:mover accent='true'><mml:mo>&#x00398;</mml:mo><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msup></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>+</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mo>&#x02026;</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mtext>&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mi>m</mml:mi></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>sin</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>sin</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mtext>&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02003;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>+</mml:mo><mml:mi>sin</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>sin</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>cos</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mi>&#x003C8;</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>A similar procedure can be used to prove <inline-formula><mml:math id="M215"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> by replacing <inline-formula><mml:math id="M216"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> with <inline-formula><mml:math id="M217"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
<fig id="FA1" position="float">
<label>Figure A1</label>
<caption><p><bold>Relation between representativeness, reproducibility, and interpretability in two dimensions</bold>.</p></caption>
<graphic xlink:href="fnins-10-00619-a0001.tif"/>
</fig>
</sec>
<sec>
<title>A.2. Proof of proposition 2</title>
<p>According to Haufe et al. (<xref ref-type="bibr" rid="B38">2013</xref>), for a linear discriminative model with parameters <inline-formula><mml:math id="M218"><mml:mover accent="true"><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, the unique equivalent generative model can be computed as follows:</p>
<disp-formula id="E19"><label>(A2)</label><mml:math id="M19"><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x0221D;</mml:mo><mml:msub><mml:mo>&#x003A3;</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>X</mml:mi></mml:mstyle></mml:msub><mml:mover accent='true'><mml:mo>&#x00398;</mml:mo><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:mrow></mml:math></disp-formula>
<p>In a binary (<bold>Y</bold> &#x0003D; {1, &#x02212;1}) least squares classification scenario, we have:</p>
<disp-formula id="E20"><label>(A3)</label><mml:math id="M20"><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x0221D;</mml:mo><mml:msub><mml:mo>&#x003A3;</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>X</mml:mi></mml:mstyle></mml:msub><mml:msubsup><mml:mo>&#x003A3;</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>X</mml:mi></mml:mstyle><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>X</mml:mi></mml:mstyle><mml:mi>T</mml:mi></mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>Y</mml:mi></mml:mstyle><mml:mo>=</mml:mo><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>X</mml:mi></mml:mstyle><mml:mi>T</mml:mi></mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>Y</mml:mi></mml:mstyle><mml:mo>=</mml:mo><mml:msup><mml:mi>&#x003BC;</mml:mi><mml:mo>+</mml:mo></mml:msup><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x02212;</mml:mo></mml:msup></mml:mrow></mml:math></disp-formula>
<p>where &#x003A3;<sub><bold>X</bold></sub> represents the covariance of the input matrix <bold>X</bold>, and &#x003BC;<sup>&#x0002B;</sup> and &#x003BC;<sup>&#x02212;</sup> are the means of positive and negative samples, respectively. Therefore, the equivalent generative model for the above classification problem can be derived by computing the difference between the mean of samples in two classes that is equivalent to the definition of cERF in time-domain MEG data.</p>
</sec>
<sec>
<title>A.3. The distribution of cosine similarity</title>
<p>The aim of this section is to illustrate that the probability density function (PDF) of the cosine similarity between two randomly drawn vectors in the high dimensional space (large <italic>p</italic>) is very close to normal distribution with zero mean and small variance. To do this, we first need to find the distribution of dot product in the uniform unit hyper-sphere. Let <italic>a</italic> and <italic>b</italic> be two uniformly drawn random vectors from a unit hyper-sphere in &#x0211D;<sup><italic>p</italic></sup>. Assuming that &#x003B3; is the angle between <italic>a</italic> and <italic>b</italic>, the distribution of cosine similarity is equivalent to the dot product &#x0003C; <italic>a</italic>.<italic>b</italic> &#x0003E;. Without loss of generality, let <italic>b</italic> be along the positive x-axis in the coordinate system. Thus, the dot product &#x0003C; <italic>a</italic>.<italic>b</italic> &#x0003E; is the projection of <italic>a</italic> on the x-axis, i.e., <italic>x</italic> coordinate of <italic>a</italic>. Therefore, for a certain value of &#x003B3;, the dot product is a <italic>p</italic> &#x02212; 1 dimensional hyper-sphere that is orthogonal to the x-axis (the red circle in Figure <xref ref-type="fig" rid="FA2">A2</xref>) and the PDF of the dot product is the surface area of <italic>p</italic> dimensional hyper-sphere constructed by the dot products for different &#x003B3; values (the dashed blue sphere in Figure <xref ref-type="fig" rid="FA2">A2</xref>). To compute the area of this hyper-sphere we take the sum of the surface area of the <italic>p</italic> dimensional conical frustums over small intervals <italic>dx</italic> (the gray area in Figure <xref ref-type="fig" rid="FA2">A2</xref>):</p>
<disp-formula id="E21"><label>(A4)</label><mml:math id="M21"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02264;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x02264;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>&#x003C0;</mml:mi><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>1</mml:mn></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:mstyle><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>&#x003C0;</mml:mi><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>1</mml:mn></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:mstyle><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where (1 &#x02212; <italic>x</italic><sup>2</sup>)<sup><italic>p</italic>&#x02212;2</sup> is the surface area of the base of the cone (e.g., the perimeter of the red circle in Figure <xref ref-type="fig" rid="FA2">A2</xref>) and <inline-formula><mml:math id="M219"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></inline-formula> is the slope size. Setting <inline-formula><mml:math id="M220"><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula> we have:</p>
<disp-formula id="E22"><label>(A5)</label><mml:math id="M22"><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo>&#x02264;</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x02264;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mn>4</mml:mn><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>&#x003C0;</mml:mi><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mn>0</mml:mn><mml:mn>1</mml:mn></mml:msubsup><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mrow><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>3</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:mstyle><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>3</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math></disp-formula>
<p>which is a Beta distribution, where <inline-formula><mml:math id="M221"><mml:mi>&#x003B1;</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula>, that is a symmetric and unimodal distribution with mean 0.5. Because the PDF of <italic>x</italic> &#x0003D; 2<italic>t</italic> &#x02212; 1 can be computed using a linear transformation of the above density function, it can be shown that the distribution of the dot product in unit hyper-sphere, i.e., the cosine similarity, is also a symmetric and unimodal distribution with 0 mean. Based on asymptotic assumption of Spruill (<xref ref-type="bibr" rid="B76">2007</xref>), for a large values of <italic>p</italic> this distribution converges to a normal distribution with <inline-formula><mml:math id="M222"><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>. Therefore assuming large <italic>p</italic>, the distribution of cosine similarity for uniformly random vectors drawn from p-dimensional unit hyper-sphere is approximately <inline-formula><mml:math id="M223"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msqrt><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:msqrt></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<fig id="FA2" position="float">
<label>Figure A2</label>
<caption><p><bold>Two-dimensional geometrical illustration for computing the PDF of cosine similarity</bold>.</p></caption>
<graphic xlink:href="fnins-10-00619-a0002.tif"/>
</fig>
</sec>
<sec>
<title>A.4. Computing the bias and variance in binary classification</title>
<p>Here, using the out-of-bag (OOB) technique, and based on procedures proposed by Domingos (<xref ref-type="bibr" rid="B26">2000</xref>) and Valentini and Dietterich (<xref ref-type="bibr" rid="B83">2004</xref>), we compute the expected prediction error (EPE) for a linear binary classifier &#x003A6; under bootstrap perturbation of the training set. Let <italic>m</italic> be the number of perturbed training sets resulting from partitioning <italic>S</italic> &#x0003D; (<italic>X</italic>, <italic>Y</italic>) into <italic>S</italic><sub><italic>tr</italic></sub> &#x0003D; (<italic>X</italic><sub><italic>tr</italic></sub>, <italic>Y</italic><sub><italic>tr</italic></sub>) and <italic>S</italic><sub><italic>vl</italic></sub> &#x0003D; (<italic>X</italic><sub><italic>vl</italic></sub>, <italic>Y</italic><sub><italic>vl</italic></sub>), i.e., training and validation sets. If <inline-formula><mml:math id="M224"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the linear classifier estimated from the <italic>j</italic>th perturbed training set, then the main prediction <inline-formula><mml:math id="M225"><mml:msup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> for each sample in the dataset can be computed as follows:</p>
<disp-formula id="E23"><label>(A6)</label><mml:math id="M23"><mml:mrow><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:msup><mml:mover accent='true'><mml:mo>&#x003A6;</mml:mo><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02265;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>k</italic><sub><italic>i</italic></sub> is the number of times that <italic>x</italic><sub><italic>i</italic></sub> is present in the test set<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref>.</p>
<p>The computation of bias is challenging because the optimal model &#x003A6;<sup>&#x0002A;</sup> is unknown. According to Tibshirani (<xref ref-type="bibr" rid="B79">1996a</xref>), misclassification error is one of the loss measures that satisfies a Pythagorean-type equality, and:</p>
<disp-formula id="E24"><label>(A7)</label><mml:math id="M24"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mi>&#x02112;</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mi>&#x02112;</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mi>&#x02112;</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Because all terms of the above equation are positive, the mean loss between the main prediction and the actual labels can be considered as an upper-bound for the bias:</p>
<disp-formula id="E25"><label>(A8)</label><mml:math id="M25"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mi>&#x02112;</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mi>&#x02112;</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Therefore, a pessimistic approximation of bias <italic>B</italic>(<bold>x</bold><sub><italic>i</italic></sub>) can be calculated as follows:</p>
<disp-formula id="E26"><label>(A9)</label><mml:math id="M26"><mml:mrow><mml:mi>B</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>Then, the unbiased and biased variances (see Domingos, <xref ref-type="bibr" rid="B26">2000</xref> for definitions) in each training set can be calculated by:</p>
<disp-formula id="E27"><label>(A10)</label><mml:math id="M27"><mml:mrow><mml:msubsup><mml:mi>V</mml:mi><mml:mi>u</mml:mi><mml:mi>j</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mi>B</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mtext>&#x02003;</mml:mtext><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02260;</mml:mo><mml:msup><mml:mover accent='true'><mml:mo>&#x003A6;</mml:mo><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="E28"><label>(A11)</label><mml:math id="M28"><mml:mrow><mml:msubsup><mml:mi>V</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:mi>B</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mtext>&#x02003;</mml:mtext><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mtext>&#x02003;</mml:mtext><mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02260;</mml:mo><mml:msup><mml:mover accent='true'><mml:mo>&#x003A6;</mml:mo><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>Then, the expected prediction error of &#x003A6; can be computed as follows (ignoring the irreducible error):</p>
<disp-formula id="E29"><label>(A12)</label><mml:math id="M29"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mi>E</mml:mi><mml:mi>P</mml:mi><mml:msub><mml:mi>E</mml:mi><mml:mo>&#x003A6;</mml:mo></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mi>B</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo stretchy='true'>&#x0FE38;</mml:mo></mml:munder><mml:mrow><mml:mi>B</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:munder><mml:mo>+</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:munder><mml:munder><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>n</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:mfrac><mml:mstyle mathsize='140%' displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:munderover><mml:mstyle mathsize='140%' displaystyle='true'><mml:mo>&#x02211;</mml:mo></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover></mml:mstyle><mml:mo stretchy='false'>[</mml:mo><mml:msubsup><mml:mi>V</mml:mi><mml:mi>u</mml:mi><mml:mi>j</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>V</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>x</mml:mi></mml:mstyle><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo></mml:mrow><mml:mo stretchy='true'>&#x0FE38;</mml:mo></mml:munder><mml:mrow><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:munder></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
</app>
</app-group>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>The application of the presented heuristic to MEG data can be extended to EEG because of the inherent similarity of the measured neural correlates in these two devices. In the EEG context, the ERF can be replaced by the event-related potential (ERP).</p></fn>
<fn id="fn0002"><p><sup>2</sup>The full dataset is publicly available at <ext-link ext-link-type="uri" xlink:href="ftp://ftp.mrc-cbu.cam.ac.uk/personal/rik.henson/wakemandg_hensonrn/">ftp://ftp.mrc-cbu.cam.ac.uk/personal/rik.henson/wakemandg_hensonrn/</ext-link>.</p></fn>
<fn id="fn0003"><p><sup>3</sup>The competition data are available at <ext-link ext-link-type="uri" xlink:href="http://www.kaggle.com/c/decoding-the-human-brain">http://www.kaggle.com/c/decoding-the-human-brain</ext-link>.</p></fn>
<fn id="fn0004"><p><sup>4</sup>The preprocessing scripts in python and MATLAB are available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/FBK-NILab/DecMeg2014/">https://github.com/FBK-NILab/DecMeg2014/</ext-link>.</p></fn>
<fn id="fn0005"><p><sup>5</sup>The MATLAB code used for experiments is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/smkia/interpretability/">https://github.com/smkia/interpretability/</ext-link>.</p></fn>
<fn id="fn0006"><p><sup>6</sup>It is expected that each sample <bold>x</bold><sub><italic>i</italic></sub> &#x02208; <italic>X</italic> appears (on average) <inline-formula><mml:math id="M193"><mml:msub><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02248;</mml:mo><mml:mfrac><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula> times in the test sets.</p></fn>
</fn-group>
</back>
</article>