<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2021.636743</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>DEEPsc: A Deep Learning-Based Map Connecting Single-Cell Transcriptomics and Spatial Imaging Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Maseda</surname> <given-names>Floyd</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1256062/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Cang</surname> <given-names>Zixuan</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/995198/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Nie</surname> <given-names>Qing</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/869389/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Mathematics, University of California, Irvine</institution>, <addr-line>Irvine, CA</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine</institution>, <addr-line>Irvine, CA</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Developmental and Cell Biology, University of California, Irvine</institution>, <addr-line>Irvine, CA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Chunhe Li, Fudan University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Shihua Zhang, Academy of Mathematics and Systems Science (CAS), China; Xiaoqiang Sun, Sun Yat-sen University, China</p></fn>
<corresp id="c001">&#x002A;Correspondence: Zixuan Cang, <email>zcang@uci.edu</email></corresp>
<corresp id="c002">Qing Nie, <email>qnie@uci.edu</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Systems Biology, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>03</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>636743</elocation-id>
<history>
<date date-type="received">
<day>02</day>
<month>12</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>02</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Maseda, Cang and Nie.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Maseda, Cang and Nie</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq dataset from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.</p>
</abstract>
<kwd-group>
<kwd>spatial gene expression atlas</kwd>
<kwd>scRNA-seq data</kwd>
<kwd>spatial information imputation</kwd>
<kwd>deep learning</kwd>
<kwd>metric learning</kwd>
<kwd>comprehensive evaluation metric</kwd>
</kwd-group>
<contract-num rid="cn001">U01AR073159</contract-num>
<contract-num rid="cn001">P30AR075047</contract-num>
<contract-num rid="cn002">594598, QN</contract-num>
<contract-num rid="cn003">DMS1763272</contract-num>
<contract-num rid="cn003">MCB2028424</contract-num>
<contract-sponsor id="cn001">National Institutes of Health<named-content content-type="fundref-id">10.13039/100000002</named-content></contract-sponsor>
<contract-sponsor id="cn002">Simons Foundation<named-content content-type="fundref-id">10.13039/100000893</named-content></contract-sponsor>
<contract-sponsor id="cn003">National Science Foundation<named-content content-type="fundref-id">10.13039/100000001</named-content></contract-sponsor>
<counts>
<fig-count count="8"/>
<table-count count="2"/>
<equation-count count="2"/>
<ref-count count="54"/>
<page-count count="14"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>Introduction</title>
<p>While cells of a biological system have access to the same genetic blueprint, they navigate through different developmental paths toward various cell fates. These diverse fate programs of cells are controlled by their own states, interactions with spatially neighboring cells, and other environmental cues (<xref ref-type="bibr" rid="B11">Guo et al., 2010</xref>). To decipher the processes of cell fate acquisitions, observations of the transcriptomics with single-cell resolution in spatial context are desired. The advent of sophisticated single-cell RNA sequencing (scRNA-seq) techniques now allows investigation of the transcriptomic landscape of tens of thousands of genes across tissues at the resolution of individual cells (<xref ref-type="bibr" rid="B36">Rosenberg et al., 2018</xref>; <xref ref-type="bibr" rid="B42">Svensson et al., 2018</xref>). However, a drawback to scRNA-seq methods is the necessity of dissociating the sample in question, thereby destroying any spatial context which can be crucial to the understanding of cellular development and dynamics (<xref ref-type="bibr" rid="B53">Yuan et al., 2017</xref>). In current common workflows of scRNA-seq data analysis, unsupervised clustering of cells is carried out, followed by identifying marker genes associated with each cell cluster (<xref ref-type="bibr" rid="B26">Luecken and Theis, 2019</xref>). While the list of marker genes for each cell cluster can be screened for genes associated with known spatial regions to estimate the spatial origin of the cluster, the spatial arrangement of individual cells remains unclear (<xref ref-type="bibr" rid="B19">Kiselev et al., 2019</xref>; <xref ref-type="bibr" rid="B26">Luecken and Theis, 2019</xref>). Several existing methods attempt to impute a pseudospatial or pseudotemporal axis onto the data (<xref ref-type="bibr" rid="B15">Joost et al., 2016</xref>; <xref ref-type="bibr" rid="B34">Puram et al., 2017</xref>; <xref ref-type="bibr" rid="B32">Pandey et al., 2018</xref>; <xref ref-type="bibr" rid="B48">Wang et al., 2019</xref>); however, little related to physical space is immediately discernible from scRNA-seq data alone.</p>
<p>The loss of spatial information in scRNA-seq data can be partially mitigated by referring to spatial staining data (<xref ref-type="bibr" rid="B39">Sprague et al., 2006</xref>; <xref ref-type="bibr" rid="B10">Fowlkes et al., 2008</xref>). Another promising solution is the emerging spatial transcriptomics methods such as osmFISH (<xref ref-type="bibr" rid="B7">Codeluppi et al., 2018</xref>), MERFISH (<xref ref-type="bibr" rid="B30">Moffitt et al., 2018</xref>), seqFISH (<xref ref-type="bibr" rid="B38">Shah et al., 2016</xref>), seqFISH+ (<xref ref-type="bibr" rid="B9">Eng et al., 2019</xref>), STARmap (<xref ref-type="bibr" rid="B49">Wang et al., 2018</xref>), and Slide-seq (<xref ref-type="bibr" rid="B35">Rodriques et al., 2019</xref>) that obtain <italic>in situ</italic> spatial expression patterns. Compared to scRNA-seq, current spatial techniques often cover fewer cells or genes or with a suboptimal resolution and depth. It is therefore a trending theme to combine the strengths of both methods to achieve a high coverage and individual-cell resolution while retaining the spatial arrangement (<xref ref-type="bibr" rid="B53">Yuan et al., 2017</xref>; <xref ref-type="bibr" rid="B19">Kiselev et al., 2019</xref>). Due to these differences among the scRNA-seq and spatial techniques, and biological systems, it is challenging to derive a generally applicable computation method to integrate the two kinds of data.</p>
<p>Several recent computational methods have been developed to impute spatial data onto existing scRNA-seq datasets through analyzing known spatial expression patterns of a small subset of genes, termed a &#x201C;spatial reference atlas.&#x201D; Seminal methods were developed independently by <xref ref-type="bibr" rid="B1">Achim et al. (2015)</xref> and <xref ref-type="bibr" rid="B37">Satija et al. (2015)</xref> and were applied to the <italic>Platynereis dumerilii</italic> brain and zebrafish embryo, respectively, using binarized reference atlases derived from <italic>in situ</italic> hybridization (ISH) images. DistMap, another method that uses a binarized ISH-based reference atlas, was developed by <xref ref-type="bibr" rid="B16">Karaiskos et al. (2017)</xref> and applied to the <italic>Drosophila</italic> embryo. <xref ref-type="bibr" rid="B1">Achim et al. (2015)</xref> use an empirical correspondence score between each cell-location pair based on the specificity ratio of genes. <xref ref-type="bibr" rid="B37">Satija et al. (2015)</xref> (Seurat v1) fits a bimodal mixture model to the scRNA-seq data and then projects cells to their spatial origins using a probabilistic score. DistMap applies Matthew&#x2019;s correlation coefficients to the binarized spatial imaging and scRNA-seq data to assign a cell-location score (<xref ref-type="bibr" rid="B16">Karaiskos et al., 2017</xref>). Several methods have also been developed which use spatial reference atlases directly measuring the RNA counts that are comparable to scRNA-seq data without binarization (<xref ref-type="bibr" rid="B33">Peng et al., 2016</xref>; <xref ref-type="bibr" rid="B12">Halpern et al., 2017</xref>). More recently, computational methods have been developed for imputing gene expression in spatial data (<xref ref-type="bibr" rid="B24">Lopez et al., 2019</xref>), transferring cell type label from scRNA-seq data to spatial data (<xref ref-type="bibr" rid="B54">Zhu et al., 2018</xref>; <xref ref-type="bibr" rid="B8">Dries et al., 2019</xref>; <xref ref-type="bibr" rid="B2">Andersson et al., 2020</xref>), <italic>de novo</italic> spatial placement of single cells (<xref ref-type="bibr" rid="B31">Nitzan et al., 2019</xref>), and inferring spatial distances between single cells (<xref ref-type="bibr" rid="B5">Cang and Nie, 2020</xref>).</p>
<p>In addition to the methods designed specifically for integrating spatial data and scRNA-seq data, other computational methods have been developed recently for general data integration. Such methods focus on the general task of integrating RNA sequencing datasets obtained from the same biological system through different technologies, <italic>in situ</italic> data being one possibility among many, into one large dataset offering a more complete description of the system under study. These methods include newer versions of Seurat (<xref ref-type="bibr" rid="B4">Butler et al., 2018</xref>; <xref ref-type="bibr" rid="B41">Stuart et al., 2019</xref>), LIGER (<xref ref-type="bibr" rid="B51">Welch et al., 2019</xref>), Harmony (<xref ref-type="bibr" rid="B21">Korsunsky et al., 2019</xref>), and Scanorama (<xref ref-type="bibr" rid="B13">Hie et al., 2019</xref>) which are mainly based on correlation analyses and matrix factorizations. Another more specific task is to transfer high-level information such as cell types between datasets. Many machine learning- and deep learning-based methods have been developed for this task by formulating a supervised learning problem with the high-level information being the target (<xref ref-type="bibr" rid="B20">Kiselev et al., 2018</xref>; <xref ref-type="bibr" rid="B23">Lieberman et al., 2018</xref>; <xref ref-type="bibr" rid="B25">Lopez et al., 2018</xref>; <xref ref-type="bibr" rid="B47">Wagner and Yanai, 2018</xref>; <xref ref-type="bibr" rid="B43">Tan and Cahan, 2019</xref>; <xref ref-type="bibr" rid="B3">Boufea et al., 2020</xref>; <xref ref-type="bibr" rid="B14">Hu et al., 2020</xref>; <xref ref-type="bibr" rid="B27">Ma and Pellegrini, 2020</xref>).</p>
<p>Since the spatial characteristics of different biological systems could be significantly different, we aim to develop a system-adaptive method specifically designed for imputing spatial information onto scRNA-seq data. To this end, unlike other spatial integration methods that use predefined algorithms for computing scores, we learn a specialized correspondence score between cells and locations for a given biological system. This can then be regarded as a general metric learning task (<xref ref-type="bibr" rid="B22">Kulis, 2013</xref>). In addition to linear methods that learn a pseudometric (<xref ref-type="bibr" rid="B50">Weinberger and Saul, 2009</xref>), there has been increasing interest in applying deep learning to metric learning (<xref ref-type="bibr" rid="B17">Kaya and Bilge, 2019</xref>; <xref ref-type="bibr" rid="B6">Chicco, 2020</xref>). These methods are mostly designed for cases where the pair of data points to be compared are in the same space. Though the common genes from the spatial data and scRNA-seq data are used here, directly treating them as in the same space may cause inaccuracy due to differences in the original datasets such as scaling and noise.</p>
<p>Here we develop a system-adaptive deep learning-based method (DEEPsc) for imputing spatial data onto scRNA-seq data. A DEEPsc network accepts a low-dimensional feature vector corresponding to a single position in the spatial reference atlas along with a corresponding feature vector of the gene expression of a single cell and returns a likelihood the input cell originated from the input position. The network is trained and validated using positions in the spatial reference atlas as simulated scRNA-seq data. The network is also validated through the task of predicting the scRNA-seq data from the spatial reference atlas or the other way around. In addition, we implemented several strong baseline methods using different norms and linear metric learning for benchmark comparison. We further develop a comprehensive measure, which was previously lacking, for evaluating how well a given method maps scRNA-seq data to known spatial origins, called a performance score. This score contains three components that measure the accuracy, precision, and robustness of a method, respectively. Using this score on four biological systems, we show that DEEPsc maintains a comparable accuracy to four existing methods while achieving a better balance between precision and robustness.</p>
</sec>
<sec id="S2">
<title>Results</title>
<sec id="S2.SS1">
<title>A Deep-Leaning Based Method to Connect scRNA-seq Datasets and Spatial Imaging Data</title>
<p>Given any spatial reference atlas consisting of binary or continuous gene expression levels for a biological system on a set of locations with known spatial coordinates, and a scRNA-seq dataset consisting of binary or continuous gene expression levels for the same biological system, we introduce a <bold>D</bold>eep-learning based <bold>E</bold>nvironment for the <bold>E</bold>xtraction of <bold>P</bold>ositional information from <bold>sc</bold>RNA-seq data (DEEPsc) to impute the spatial information onto the scRNA-seq data.</p>
<p>In DEEPsc, we first select a common set of genes from the reference atlas and scRNA-seq data, then perform dimensionality reduction via principal component analysis (PCA) on the reduced reference atlas to shorten training time (<xref ref-type="fig" rid="F1">Figure 1A</xref>). The scRNA-seq data is then projected into the same PCA space on which we learn a metric for comparison between cells and spatial positions. The DEEPsc network accepts a concatenated feature vector for a single cell and a single position and returns a likelihood the input cell originated from the input position. The network contains two fully connected hidden layers with <italic>N</italic> nodes each, where <italic>N</italic> is the number of principal components kept from PCA, and a single node in the output layer. Sigmoid activation functions are applied to each node, including the output node, so that the resulting output is in [0,1] and can be interpreted as a likelihood that the input cell originated from the input spatial position. To train the DEEPsc network, we use the spatial position feature vectors as simulated scRNA-seq data for comparison (<xref ref-type="fig" rid="F1">Figure 1B</xref>). Each simulated cell is compared pairwise with every position in the spatial reference atlas; if the simulated cell is an exact match to a given position, the target output is 1 (a high likelihood of origin), and if the simulated cell and chosen position are not an exact match, the target output is 0 (a low likelihood of origin). Training is terminated when the error on a randomly chosen validation set is no longer improving.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The general workflow of training and implementing DEEPsc. <bold>(A)</bold> Given a spatial reference atlas of gene expression levels for some biological system and a scRNA-seq dataset, genes common to both are selected, and dimensionality of the data is reduced (e.g., by PCA, UMAP). Each spatial position in the reference atlas and each cell in the scRNA-seq dataset is associated with a feature vector in the reduced space. <bold>(B)</bold> The DEEPsc architecture takes as input the feature vectors of one single cell and one spatial position, returning a likelihood between 0 (low likelihood) and 1 (high likelihood) that the cell originated from the spatial position. A DEEPsc network is trained using the spatial position feature vectors as simulated scRNA-seq data. The target output is a 1 (high likelihood of origin) if the simulated input cell matches the input position, and 0 (low likelihood of origin) if they do not match. <bold>(C)</bold> Once the DEEPsc network is sufficiently trained, a feature vector associated with a cell in the scRNA-seq dataset can be fed into the network with each spatial position individually. The resulting likelihoods are displayed as a heatmap depicting the likelihood of origin of the cell from each position. The position with the highest likelihood is chosen as the origin of the cell. This process is repeated for each cell in the scRNA-seq dataset.</p></caption>
<graphic xlink:href="fgene-12-636743-g001.tif"/>
</fig>
<p>After training the DEEPsc network, a feature vector associated with an actual cell from the scRNA-seq data is fed in as input and compared to each position in the reference atlas individually. We display the results as a heatmap on the schematic diagram of the biological system, choosing the spatial position with the largest likelihood of origin according to DEEPsc as the determined origin of the cell. This process is repeated for each cell in the scRNA-seq dataset to assign spatial origins of all cells (<xref ref-type="fig" rid="F1">Figure 1C</xref>).</p>
</sec>
<sec id="S2.SS2">
<title>Quantifying Spatial Mapping Performance</title>
<p>Each of the highlighted methods to impute spatial data onto scRNA-seq data, including DEEPsc, can be essentially boiled down to the following: For some tissue with a well-defined standard spatial structure, given known binary or continuous expression levels of <italic>G</italic> genes at each of <italic>P</italic> spatial locations (the reference atlas), calculate a correspondence score, <italic>S</italic>, of how similar each of <italic>C</italic> cells in an scRNA-seq dataset is to each of the <italic>P</italic> positions in the atlas. That is, define a function, <italic>S</italic>:[0,1]<italic><sup>G</sup></italic>&#x00D7;[0,1]<italic><sup>G</sup></italic>&#x2192;[0,1], such that <italic>S</italic>(<italic>c</italic><sub><italic>i</italic></sub>,<italic>p</italic><sub><italic>j</italic></sub>);<italic>i</italic> = 1,2,&#x2026;,<italic>C</italic>;<italic>j</italic> = 1,2,&#x2026;,<italic>P</italic>; which describes the likelihood that cell <italic>c</italic><sub><italic>i</italic></sub> originated from position <italic>p</italic><sub><italic>j</italic></sub>, based on the similarity of the expression vectors of the cell and position.</p>
<p>To quantify how well a given method performs for a given spatial reference atlas, we use the reference atlas itself as simulated single cell data; that is, we generate a simulated scRNA-seq dataset with <italic>C=P</italic> cells, each an exact copy of a reference atlas position. This allows us to treat the simulated scRNA-seq data as having a known spatial origin, against which we can compare the output of each method. We define a system-adaptive, comprehensive performance score, consisting of three penalty terms: accuracy, which determines whether or not the known spatial origin was given a high likelihood of origin; precision, which determines whether or not other locations were given low likelihoods of origin; and robustness, which determines how sensitive a mapping method is to random noise in the input data. Each penalty term is represented by a number in [0,1], with 0 being no penalty and 1 being a worst-case scenario. The performance score is defined as <inline-formula><mml:math id="INEQ5"><mml:mrow><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>P</mml:mi></mml:mfrac><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munder><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, where</p>
<disp-formula id="S2.Ex1">
<mml:math id="M1">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>3</mml:mn>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:munder>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo movablelimits="false">-</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo movablelimits="false">.</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo movablelimits="false">&#x23DF;</mml:mo>
</mml:munder>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:munder>
<mml:munder accentunder="true">
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo movablelimits="false">-</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo movablelimits="false">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>P</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo movablelimits="false">,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo movablelimits="false">-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo movablelimits="false">&#x23DF;</mml:mo>
</mml:munder>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:munder>
<mml:munder accentunder="true">
<mml:msup>
<mml:mrow>
<mml:mo movablelimits="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo movablelimits="false">-</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mo movablelimits="false">&#x002A;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo movablelimits="false">)</mml:mo>
</mml:mrow>
<mml:mn>4</mml:mn>
</mml:msup>
<mml:mo movablelimits="false">&#x23DF;</mml:mo>
</mml:munder>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p><italic>S</italic><sub><italic>i</italic>,<italic>j</italic></sub> = <italic>S</italic>(<italic>c</italic><sub><italic>i</italic></sub>,<italic>p</italic><sub><italic>j</italic></sub>) is the correspondence score of cell <italic>c</italic><sub><italic>i</italic></sub> to position <italic>p</italic><sub><italic>j</italic></sub>, and <italic>E</italic><sub><italic>i</italic></sub> is interpreted as the error in the mapping of cell <italic>c</italic><sub><italic>i</italic></sub>. The quantity &#x03C3;<sup>&#x2217;</sup> in the robustness term is calculated by determining the accuracy and precision penalty terms with no Gaussian noise added to the input data, then calculating the same two penalties with various levels of Gaussian noise with standard deviation &#x03C3; &#x2208; [0,1]. The quantity &#x03C3;<sup>&#x2217;</sup> is defined to be the level of Gaussian noise required to raise the mean of the accuracy and precision penalties by 0.1 from their values with no added noise, or &#x03C3;<sup>&#x2217;</sup> = 1, whichever is smallest. The exponent of four in the robustness term was chosen empirically such that the robustness term does not dominate the performance score, keeping in mind that expression levels are normalized to [0,1] before calculating the correspondence scores, so e.g., &#x03C3;<sup>&#x2217;</sup> = 0.5 means a method requires noise on the order of half of the expression levels to raise the precision and accuracy penalties by 0.1. The performance score has a range of [0,1], where a performance score of <italic>E=1</italic> represents an ideal mapping that maps a cell to its known location with high confidence, to all other locations with low confidence, and does so in a manner robust to noise. An illustration of each term in the performance score is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Explanation of the terms constituting the performance score. In each hypothetical mapping heatmap, the known location of the input cell is highlighted in black. <bold>(A)</bold> The accuracy score measures whether or not the known location receives a high likelihood; the precision score measures whether or not other locations receive low likelihoods. <bold>(B)</bold> The robustness score measures how much the accuracy and precision scores change if random noise is added to the input cell. A mapping method which is accurate, precise, and robust is given a high performance score while a mapping method that lacks in any or all of the three areas is given a lower performance score.</p></caption>
<graphic xlink:href="fgene-12-636743-g002.tif"/>
</fig>
<p>This performance score is limited by the fact that it relies on ground truth knowledge of the spatial origin of a single cell/spot to determine the performance of a given mapping method. However, this ground truth knowledge is not available for actual scRNA-seq data. To more directly quantify the mapping performance on actual scRNA-seq datasets, we use a measure of predictive reproducibility, obtained from a <italic>k</italic>-fold cross validation scheme, in which we randomly split the common genes in the reference atlas and scRNA-seq data into <italic>k</italic> folds and calculate the correspondence score for each method using all but one fold. The correspondence scores are then used as coefficients in a weighted sum to predict the value of the dropped-out genes in each fold for each cell (scRNA-seq predictive reproducibility) or each spatial position (atlas predictive reproducibility) and determine the error in the predicted expression level. The predicted expression of gene <italic>k</italic> in cell <italic>c</italic><sub><italic>i</italic></sub> is computed as <inline-formula><mml:math id="INEQ14"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>c</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x2062;</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:munderover><mml:msubsup><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> and the predicted expression of gene <italic>k</italic> in position <italic>p</italic><sub><italic>j</italic></sub> is computed as <inline-formula><mml:math id="INEQ15"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>p</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>C</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x2062;</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>C</mml:mi></mml:munderover><mml:msubsup><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> where <inline-formula><mml:math id="INEQ16"><mml:msubsup><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> is the correspondence score between cell <italic>c</italic><sub><italic>i</italic></sub> and position <italic>p</italic><sub><italic>j</italic></sub> with genes in folds not containing gene <italic>k</italic> and <inline-formula><mml:math id="INEQ17"><mml:msubsup><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="INEQ18"><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> are the known expression values of gene <italic>k</italic> from the scRNA-seq and the spatial atlas data, respectively. To accommodate the sparsity of data, we compute the predictive reproducibility scores separately for cells or positions with zero expression values and with positive expression values. For example, we measure the predictive reproducibility for the task of reproducing gene <italic>k</italic> in scRNA-seq data on cells with zero expression using <inline-formula><mml:math id="INEQ19"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi mathvariant="normal">_</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>I</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi mathvariant="normal">_</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>c</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:msubsup><mml:mi>I</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi mathvariant="normal">_</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> where <inline-formula><mml:math id="INEQ20"><mml:mrow><mml:msubsup><mml:mi>I</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi mathvariant="normal">_</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>z</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mi>i</mml:mi><mml:mo>:</mml:mo><mml:mrow><mml:msubsup><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. Taking the average over all common genes results in a single score <italic>R</italic><sub><italic>sc_ zero</italic></sub>, and in the same manner, we define <italic>R</italic><sub><italic>sc_ nonzero</italic></sub>, <italic>R</italic><sub><italic>atlas_ zero</italic></sub>, and <italic>R</italic><sub><italic>atlas_ nonzero</italic></sub>. When producing predictive reproducibility scores, we use the same k-fold split across all methods to ensure a fair comparison.</p>
</sec>
<sec id="S2.SS3">
<title>Comparisons of Multiple Methods Using Simulated scRNA-seq Data</title>
<p>Using the performance score, we benchmarked the methods developed by <xref ref-type="bibr" rid="B1">Achim et al. (2015)</xref> and <xref ref-type="bibr" rid="B37">Satija et al. (2015)</xref> (Seurat v1), <xref ref-type="bibr" rid="B16">Karaiskos et al. (2017)</xref> (DistMap), and <xref ref-type="bibr" rid="B33">Peng et al. (2016)</xref> together with our DEEPsc method and applied them to four different biological systems: the zebrafish embryo (<xref ref-type="bibr" rid="B37">Satija et al., 2015</xref>), the <italic>Drosophila</italic> embryo (<xref ref-type="bibr" rid="B16">Karaiskos et al., 2017</xref>), the murine hair follicle (<xref ref-type="bibr" rid="B15">Joost et al., 2016</xref>), and the murine frontal cortex, downloaded from the 10x Genomics Spatial Gene Expression Datasets. The reference atlas for the zebrafish embryo contains the binarized expression of 47 genes on 64 spatial bins that assemble half of the hemisphere of the 6hpf embryo (<xref ref-type="bibr" rid="B37">Satija et al., 2015</xref>). The <italic>Drosophila</italic> embryo reference atlas contains 84 genes on 3,039 spatial positions (<xref ref-type="bibr" rid="B16">Karaiskos et al., 2017</xref>). The spatial reference atlas generated with the Visium technology (<xref ref-type="bibr" rid="B40">St&#x00E5;hl et al., 2016</xref>) for the murine frontal cortex contains 32,285 genes on 961 spatial positions (a subset presenting the frontal cortex from the original data), from which we kept 2755 genes from the 3,000 most variable genes in spatial data that are also present in scRNA-seq data. Segmenting a standard diagram of the follicle into 233 spatial positions and using FISH imaging of eight genes identified as spatially localized (<xref ref-type="bibr" rid="B15">Joost et al., 2016</xref>), we manually defined a continuous reference atlas for the follicle (section &#x201C;Materials and Methods&#x201D;). For mapping methods requiring a binary reference atlas, we defined a cutoff expression of 0.2 to be considered on in this follicle reference atlas of follicle. We further implemented several baseline methods for benchmark comparisons, including several methods using predefined metrics where the correspondence score <italic>S</italic> is defined to be the 2-norm, infinity norm, or mean percent difference in the space of common genes between the input cell and spatial position. We also implemented a large margin nearest neighbor (LMNN) method that learns a linear metric (section &#x201C;Materials and Methods&#x201D;). <xref ref-type="fig" rid="F3">Figure 3</xref> shows a scatter plot of the penalty terms constituting the performance score of each implemented method on each of the four biological systems, as well as the average for each method across all four systems. <xref ref-type="table" rid="T1">Table 1</xref> includes the numerical values for each penalty term, as well as the calculated performance score for each method. <xref ref-type="fig" rid="F4">Figure 4</xref> includes example heatmaps of simulated cells for each of the biological systems. The penalty terms for the individual locations are shown in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Summary of the robustness, precision, and accuracy scores of the implemented methods on four different biological systems <bold>(A)</bold>, as well as the simple average across all four <bold>(B)</bold>. These scores are each defined to be one minus the corresponding penalty term in the performance score, so that a higher score is better. Since most methods have near perfect accuracy scores, the x-axis shows a mean of the precision and accuracy scores. The y-axis shows the robustness scores for each method. Due to memory constraints, we were unable to run Seurat v1 on the cortex dataset.</p></caption>
<graphic xlink:href="fgene-12-636743-g003.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Numerical values of each of the three constituent terms of the performance score, as determined from simulated scRNA-seq data for each biological system, as well as the average across all systems.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Method</td>
<td valign="top" align="center">Accuracy (Author)</td>
<td valign="top" align="center">Precision Term</td>
<td valign="top" align="center">Robustness Term</td>
<td valign="top" align="center">Performance Score</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="5"><bold>Follicle</bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center">0.0043</td>
<td valign="top" align="center">0.3484</td>
<td valign="top" align="center">0.4116</td>
<td valign="top" align="center">0.7452</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.0795</td>
<td valign="top" align="center">0.1076</td>
<td valign="top" align="center">0.5704</td>
<td valign="top" align="center">0.7475</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center">0.0043</td>
<td valign="top" align="center">0.4076</td>
<td valign="top" align="center">0.3723</td>
<td valign="top" align="center">0.7386</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.5118</td>
<td valign="top" align="center">0.4439</td>
<td valign="top" align="center">0.6814</td>
</tr>
<tr>
<td valign="top" align="left">2-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3255</td>
<td valign="top" align="center">0.2686</td>
<td valign="top" align="center">0.8020</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center">0.0005</td>
<td valign="top" align="center">0.2299</td>
<td valign="top" align="center">0.3613</td>
<td valign="top" align="center">0.8028</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.2829</td>
<td valign="top" align="center">0.8722</td>
<td valign="top" align="center">0.6150</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center"><bold>0.0002</bold></td>
<td valign="top" align="center">0.8455</td>
<td valign="top" align="center">0.7181</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center">0.0272</td>
<td valign="top" align="center">0.2684</td>
<td valign="top" align="center"><bold>0.1904</bold></td>
<td valign="top" align="center"><bold>0.8380</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Zebrafish</bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4645</td>
<td valign="top" align="center">0.2516</td>
<td valign="top" align="center">0.7613</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center"><bold>0.0156</bold></td>
<td valign="top" align="center">0.0604</td>
<td valign="top" align="center"><bold>0.9747</bold></td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3989</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.8670</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4296</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.8568</td>
</tr>
<tr>
<td valign="top" align="left">2-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.2902</td>
<td valign="top" align="center">0.0003</td>
<td valign="top" align="center">0.9302</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.0536</td>
<td valign="top" align="center">0.1588</td>
<td valign="top" align="center">0.9292</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4249</td>
<td valign="top" align="center">0.0095</td>
<td valign="top" align="center">0.8552</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.0315</td>
<td valign="top" align="center">0.1689</td>
<td valign="top" align="center">0.9332</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center">0.0339</td>
<td valign="top" align="center">0.1281</td>
<td valign="top" align="center">0.0230</td>
<td valign="top" align="center">0.9383</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Drosophila</bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3407</td>
<td valign="top" align="center">0.0759</td>
<td valign="top" align="center">0.8611</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.6605</td>
<td valign="top" align="center">0.0848</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.7516</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3496</td>
<td valign="top" align="center">0.0024</td>
<td valign="top" align="center">0.8827</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4313</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.8562</td>
</tr>
<tr>
<td valign="top" align="left">2-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.2310</td>
<td valign="top" align="center">0.0130</td>
<td valign="top" align="center">0.9186</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center"><bold>0.0006</bold></td>
<td valign="top" align="center">0.1671</td>
<td valign="top" align="center">0.9441</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3597</td>
<td valign="top" align="center">0.0013</td>
<td valign="top" align="center">0.8797</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.0052</td>
<td valign="top" align="center">0.0987</td>
<td valign="top" align="center"><bold>0.9653</bold></td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center">0.0087</td>
<td valign="top" align="center">0.0179</td>
<td valign="top" align="center">0.1827</td>
<td valign="top" align="center">0.9303</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Cortex</bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.6357</td>
<td valign="top" align="center">0.0859</td>
<td valign="top" align="center">0.7594</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4778</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.8407</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4400</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.8533</td>
</tr>
<tr>
<td valign="top" align="left">2-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3008</td>
<td valign="top" align="center">0.1546</td>
<td valign="top" align="center">0.8482</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center"><bold>0.0006</bold></td>
<td valign="top" align="center">0.3042</td>
<td valign="top" align="center">0.8984</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4332</td>
<td valign="top" align="center">0.3817</td>
<td valign="top" align="center">0.7284</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.0143</td>
<td valign="top" align="center">0.3376</td>
<td valign="top" align="center">0.8827</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.1167</td>
<td valign="top" align="center">0.0289</td>
<td valign="top" align="center"><bold>0.9515</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Average</bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center">0.0011</td>
<td valign="top" align="center">0.4473</td>
<td valign="top" align="center">0.2063</td>
<td valign="top" align="center">0.7818</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.1850</td>
<td valign="top" align="center">0.0693</td>
<td valign="top" align="center">0.2103</td>
<td valign="top" align="center">0.8246</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center">0.0011</td>
<td valign="top" align="center">0.4085</td>
<td valign="top" align="center"><bold>0.0937</bold></td>
<td valign="top" align="center">0.8323</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.4532</td>
<td valign="top" align="center">0.1110</td>
<td valign="top" align="center">0.8119</td>
</tr>
<tr>
<td valign="top" align="left">2-norm (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.2869</td>
<td valign="top" align="center">0.1091</td>
<td valign="top" align="center">0.8748</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center">0.0001</td>
<td valign="top" align="center">0.0712</td>
<td valign="top" align="center">0.2479</td>
<td valign="top" align="center">0.8936</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center">0.3752</td>
<td valign="top" align="center">0.3162</td>
<td valign="top" align="center">0.7696</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center"><bold>0.0000</bold></td>
<td valign="top" align="center"><bold>0.0128</bold></td>
<td valign="top" align="center">0.3627</td>
<td valign="top" align="center">0.8748</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center">0.0175</td>
<td valign="top" align="center">0.1328</td>
<td valign="top" align="center">0.1063</td>
<td valign="top" align="center"><bold>0.9145</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>For each term, a value closer to zero signifies lower error. For the performance score, a value closer to one indicates a better performing method. The best method for each term is bolded for each system.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Example mappings of simulated single cells produced by various existing methods on four different biological systems, with DEEPsc mappings for comparison. The simulated input cell for the murine follicle system corresponds to position 228. For the Zebrafish system (for which Seurat was introduced), the simulated input cell corresponds to position 34. For Drosophila (for which DistMap was introduced), the simulated input cell corresponds to position 1982. For the murine frontal cortex, the simulated input cell corresponds to position 458. Each known position is highlighted in black in each of the heatmaps.</p></caption>
<graphic xlink:href="fgene-12-636743-g004.tif"/>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Heatmap representation of the various components of the performance score on a per position basis in <bold>(A)</bold> the follicle system, <bold>(B)</bold> the Zebrafish, <bold>(C)</bold> the Drosophila embryo, and <bold>(D)</bold> the murine frontal cortex. We were unable to run Seurat v1 on the Drosophila embryo and cortex data due to memory constraint. The penalty terms for each simulated cell, including robustness, were computed individually and plotted as a heatmap.</p></caption>
<graphic xlink:href="fgene-12-636743-g005.tif"/>
</fig>
<p>The majority of methods were able to project the simulated scRNA-seq cells to their known spatial origins with high accuracy. Specifically, Seurat v1 and DistMap achieve high performance scores in the zebrafish embryo and Drosophila embryo datasets that they were originally applied to, respectively. Designed to be a system-adaptive method, DEEPsc has the best average performance score across the four datasets (<xref ref-type="table" rid="T1">Table 1</xref>). Moreover, while some methods are stronger in terms robustness or precision, DEEPsc attains a balance between robustness and precision (<xref ref-type="fig" rid="F3">Figure 3</xref>). This balance is especially important when investigating the impact of cellular spatial neighborhood on cell fate acquisition. It is desired to narrow down the inferred spatial neighborhood (precision) and at the same time reduce the sensitivity to noise (robustness). The high precision and robustness of DEEPsc is consistently observed across all locations in the dataset (<xref ref-type="fig" rid="F5">Figure 5</xref>). Finally, it is worth mentioning that DEEPsc has a significantly higher robustness in the follicle dataset which has the smallest number of genes and is the noisiest among the four datasets.</p>
</sec>
<sec id="S2.SS4">
<title>Applications to Real scRNA-seq Datasets</title>
<p>We now map actual scRNA-seq data for each system and calculate the predictive reproducibility for each method (<xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="fig" rid="F6">Figure 6</xref>). For the follicle, the scRNA-seq data contains 1,422 cells with 26,024 genes measured containing the eight genes in the spatial atlas (<xref ref-type="bibr" rid="B15">Joost et al., 2016</xref>). For the Drosophila embryo, we used the scRNA-seq dataset with 1,297 cells and 8,924 genes among which all the 84 spatial genes are present (<xref ref-type="bibr" rid="B16">Karaiskos et al., 2017</xref>). For the Zebrafish embryo, there are 1,152 cells and 11,978 genes in the scRNA-seq dataset with all the 47 spatial genes included (<xref ref-type="bibr" rid="B37">Satija et al., 2015</xref>). For the murine frontal cortex, we used the scRNA-seq dataset provided by the Allen Institute (<xref ref-type="bibr" rid="B44">Tasic et al., 2016</xref>), generated with SMART-Seq2, which contains 14,249 cells and 34,617 genes, from which a set of 2,755 genes were found to be present in the top 3,000 highly variable genes in spatial atlas. These four datasets cover different situations. The follicle data contains a moderate number of locations, and the cells form well-defined layered structures such that there could be long and thin spatial regions that contain the same cells. The zebrafish embryo spatial data has a suboptimal resolution such that each spatial location consists of multiple cells. This data helps to evaluate the methods in treating coarse spatial atlases. The Drosophila embryo data contains rich spatial characteristics. There is a well-defined global ventral-dorsal/anterior-posterior coordinate system. Locally, there is also a striped pattern in the lateral side of the embryo. The frontal cortex data examines spatial gene expression at the transcriptomics level, and functions as a demonstration that DEEPsc is able to maintain a high performance on high-dimensional datasets.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Predictive reproducibility of each method for real scRNA-seq data.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Method (Author)</td>
<td valign="top" align="center">Follicle</td>
<td valign="top" align="center">Zebrafish</td>
<td valign="top" align="center">Drosophila</td>
<td valign="top" align="center">Cortex</td>
<td valign="top" align="center">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="6"><bold>R<sub>sc_zero</sub></bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center"><bold>0.8772</bold></td>
<td valign="top" align="center">0.5537</td>
<td valign="top" align="center">0.7798</td>
<td valign="top" align="center">0.8019</td>
<td valign="top" align="center">0.7531</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.8335</td>
<td valign="top" align="center">0.6842</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.7589</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center">0.8404</td>
<td valign="top" align="center">0.6641</td>
<td valign="top" align="center">0.7850</td>
<td valign="top" align="center">0.8055</td>
<td valign="top" align="center">0.7738</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center">0.8219</td>
<td valign="top" align="center">0.6375</td>
<td valign="top" align="center">0.7859</td>
<td valign="top" align="center">0.8092</td>
<td valign="top" align="center">0.7636</td>
</tr>
<tr>
<td valign="top" align="left">Two-norm (baseline)</td>
<td valign="top" align="center">0.8017</td>
<td valign="top" align="center">0.6973</td>
<td valign="top" align="center">0.7874</td>
<td valign="top" align="center">0.8114</td>
<td valign="top" align="center">0.7745</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center">0.8641</td>
<td valign="top" align="center">0.6180</td>
<td valign="top" align="center">0.7807</td>
<td valign="top" align="center">0.8141</td>
<td valign="top" align="center">0.7692</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center">0.8357</td>
<td valign="top" align="center">0.5657</td>
<td valign="top" align="center">0.7790</td>
<td valign="top" align="center">0.8079</td>
<td valign="top" align="center">0.7471</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center">0.8254</td>
<td valign="top" align="center">0.6795</td>
<td valign="top" align="center">0.7917</td>
<td valign="top" align="center">0.8120</td>
<td valign="top" align="center">0.7772</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center">0.8344</td>
<td valign="top" align="center"><bold>0.7335</bold></td>
<td valign="top" align="center"><bold>0.7961</bold></td>
<td valign="top" align="center"><bold>0.8165</bold></td>
<td valign="top" align="center"><bold>0.7951</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="6"><bold>R<sub>sc_nonzero</sub></bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center">0.7495</td>
<td valign="top" align="center">0.7698</td>
<td valign="top" align="center">0.8126</td>
<td valign="top" align="center">0.6693</td>
<td valign="top" align="center">0.7503</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.7640</td>
<td valign="top" align="center">0.6975</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.7308</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center">0.7705</td>
<td valign="top" align="center">0.7619</td>
<td valign="top" align="center">0.8103</td>
<td valign="top" align="center">0.6685</td>
<td valign="top" align="center">0.7528</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center">0.7801</td>
<td valign="top" align="center">0.7663</td>
<td valign="top" align="center">0.8114</td>
<td valign="top" align="center">0.6680</td>
<td valign="top" align="center">0.7565</td>
</tr>
<tr>
<td valign="top" align="left">Two-norm (baseline)</td>
<td valign="top" align="center"><bold>0.7891</bold></td>
<td valign="top" align="center">0.7386</td>
<td valign="top" align="center">0.8083</td>
<td valign="top" align="center">0.6667</td>
<td valign="top" align="center">0.7507</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center">0.7496</td>
<td valign="top" align="center">0.7636</td>
<td valign="top" align="center"><bold>0.8128</bold></td>
<td valign="top" align="center"><bold>0.6695</bold></td>
<td valign="top" align="center">0.7489</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center">0.7740</td>
<td valign="top" align="center"><bold>0.7721</bold></td>
<td valign="top" align="center">0.8115</td>
<td valign="top" align="center">0.6690</td>
<td valign="top" align="center"><bold>0.7567</bold></td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center">0.7730</td>
<td valign="top" align="center">0.7477</td>
<td valign="top" align="center">0.8117</td>
<td valign="top" align="center">0.6643</td>
<td valign="top" align="center">0.7492</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center">0.7352</td>
<td valign="top" align="center">0.7026</td>
<td valign="top" align="center">0.8080</td>
<td valign="top" align="center">0.6691</td>
<td valign="top" align="center">0.7287</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6"><bold>R<sub>atlas_zero</sub></bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center">0.7680</td>
<td valign="top" align="center">0.9042</td>
<td valign="top" align="center">0.9264</td>
<td valign="top" align="center">0.8360</td>
<td valign="top" align="center">0.8587</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.7681</td>
<td valign="top" align="center">0.9088</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.8385</td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center">0.7674</td>
<td valign="top" align="center">0.9005</td>
<td valign="top" align="center">0.9259</td>
<td valign="top" align="center">0.8374</td>
<td valign="top" align="center">0.8578</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center">0.7707</td>
<td valign="top" align="center">0.9006</td>
<td valign="top" align="center">0.9267</td>
<td valign="top" align="center">0.8406</td>
<td valign="top" align="center">0.8597</td>
</tr>
<tr>
<td valign="top" align="left">Two-norm (baseline)</td>
<td valign="top" align="center">0.7681</td>
<td valign="top" align="center">0.9003</td>
<td valign="top" align="center">0.9278</td>
<td valign="top" align="center">0.8411</td>
<td valign="top" align="center">0.8593</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center">0.7623</td>
<td valign="top" align="center">0.9050</td>
<td valign="top" align="center">0.9259</td>
<td valign="top" align="center">0.8343</td>
<td valign="top" align="center">0.8569</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center">0.7714</td>
<td valign="top" align="center">0.9035</td>
<td valign="top" align="center">0.9261</td>
<td valign="top" align="center"><bold>0.8438</bold></td>
<td valign="top" align="center">0.8612</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center">0.7677</td>
<td valign="top" align="center">0.8937</td>
<td valign="top" align="center"><bold>0.9289</bold></td>
<td valign="top" align="center">0.8359</td>
<td valign="top" align="center">0.8566</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center"><bold>0.7881</bold></td>
<td valign="top" align="center"><bold>0.9148</bold></td>
<td valign="top" align="center">0.9257</td>
<td valign="top" align="center">0.8415</td>
<td valign="top" align="center"><bold>0.8675</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="6"><bold>R<sub>atlas_nonzero</sub></bold></td>
</tr>
<tr>
<td valign="top" align="left">(Achim)</td>
<td valign="top" align="center">0.7598</td>
<td valign="top" align="center">0.6658</td>
<td valign="top" align="center">0.8523</td>
<td valign="top" align="center">0.5124</td>
<td valign="top" align="center">0.6976</td>
</tr>
<tr>
<td valign="top" align="left">Seurat v1 (Satija)</td>
<td valign="top" align="center">0.7570</td>
<td valign="top" align="center">0.6776</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center"><bold>0.7173</bold></td>
</tr>
<tr>
<td valign="top" align="left">DistMap (Karaiskos)</td>
<td valign="top" align="center">0.7584</td>
<td valign="top" align="center">0.6709</td>
<td valign="top" align="center">0.8527</td>
<td valign="top" align="center">0.5127</td>
<td valign="top" align="center">0.6987</td>
</tr>
<tr>
<td valign="top" align="left">(Peng)</td>
<td valign="top" align="center">0.7570</td>
<td valign="top" align="center">0.6682</td>
<td valign="top" align="center">0.8530</td>
<td valign="top" align="center"><bold>0.5135</bold></td>
<td valign="top" align="center">0.6979</td>
</tr>
<tr>
<td valign="top" align="left">Two-norm (baseline)</td>
<td valign="top" align="center">0.7582</td>
<td valign="top" align="center">0.6755</td>
<td valign="top" align="center">0.8530</td>
<td valign="top" align="center">0.5135</td>
<td valign="top" align="center">0.7001</td>
</tr>
<tr>
<td valign="top" align="left">Inf-norm (baseline)</td>
<td valign="top" align="center">0.7583</td>
<td valign="top" align="center">0.6745</td>
<td valign="top" align="center">0.8534</td>
<td valign="top" align="center">0.5130</td>
<td valign="top" align="center">0.6998</td>
</tr>
<tr>
<td valign="top" align="left">% difference (baseline)</td>
<td valign="top" align="center">0.7573</td>
<td valign="top" align="center">0.6669</td>
<td valign="top" align="center">0.8524</td>
<td valign="top" align="center">0.5134</td>
<td valign="top" align="center">0.6975</td>
</tr>
<tr>
<td valign="top" align="left">LMNN (baseline)</td>
<td valign="top" align="center">0.7573</td>
<td valign="top" align="center">0.6764</td>
<td valign="top" align="center"><bold>0.8564</bold></td>
<td valign="top" align="center">0.5129</td>
<td valign="top" align="center">0.7008</td>
</tr>
<tr>
<td valign="top" align="left">DEEPsc (ours)</td>
<td valign="top" align="center"><bold>0.7724</bold></td>
<td valign="top" align="center"><bold>0.7079</bold></td>
<td valign="top" align="center">0.8527</td>
<td valign="top" align="center">0.5125</td>
<td valign="top" align="center">0.7114</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>A value closer to one signifies higher predictive reproducibility. A missing entry signifies that we were not able to run the relevant method on the given dataset. The best method for each term is bolded for each system.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Ridgeline plots of the zero <bold>(A)</bold> and nonzero <bold>(B)</bold> scRNA-seq predictive reproducibility of individual cells in the scRNA-seq datasets and zero <bold>(C)</bold> and nonzero <bold>(D)</bold> atlas predictive reproducibility of individual positions in the spatial atlas for the four studied systems. We were unable to run Seurat v1 on the Drosophila embryo and cortex data due to memory constraints.</p></caption>
<graphic xlink:href="fgene-12-636743-g006.tif"/>
</fig>
<p>For the baseline models, we linearly normalized each gene in the log-normalized scRNA-seq dataset onto the interval [0,1]. Continuous spatial atlases with expression values in the [0,1] range were used for the follicle, Drosophila embryo, and murine frontal cortex systems, the latter two having been linearly normalized to [0,1] in the same fashion as the scRNA-seq data. Since a continuous spatial atlas for Zebrafish embryo is lacking, we applied a spatial convolution to the binary atlas and added a small amount of Gaussian noise to simulate a continuous atlas. The 2-norm, Inf-norm, percent difference, and LMNN baseline models are then applied to the vectors of the commonly expressed genes in the spatial atlas and scRNA-seq data. For DEEPsc, we first applied a PCA reduction to the spatial atlas, and then applied the same linear transformation to the normalized expression values of the common genes in the scRNA-seq data. The feature vectors for the locations in the spatial atlas and the cells in the scRNA-seq data in the PCA space were then fed to the neural network. For the four existing methods, we followed the procedure as described in the associated original publications, scaling the resulting correspondence scores to [0,1] for direct comparison with baseline methods. For all the methods, we compute the predictive reproducibility by iterating over all common genes, attempting to reconstruct the expression of one gene using the k-fold cross validation scheme described in the previous section. We used <italic>k=4</italic> for the follicle and Drosophila embryo dataset, and <italic>k=5</italic> for the zebrafish embryo and cortex dataset.</p>
<p>DEEPsc has a comparable accuracy compared to other methods, and it also has a consistent performance across different systems (<xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="fig" rid="F6">Figure 6</xref>). This consistent performance further demonstrates the system-adaptive advantage of DEEPsc and the benefit of using adaptive metrics over predefined ones. We also notice that similar to the simulated case, DEEPsc also achieves a balance between precision and robustness in the case of real scRNA-seq data. For example, while it exhibits high precision by mapping the example cell to a specific local spot in the Zebrafish embryo or a local strip in Drosophila embryo, it also robustly maps a cell to the entire outer bulge of the follicle instead of only part of it (<xref ref-type="fig" rid="F7">Figure 7</xref>). The high precision ensures that we can resolve the heterogeneity in the spatial environment and further relate them to the heterogeneity in cell fates. The high robustness prevents the identification of false correlations. Overall, DEEPsc achieves a high predictive reproducibility across all cells in the scRNA-seq dataset.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Example mappings of real single cells produced by various existing methods on four different biological systems, with DEEPsc mappings for comparison. The input cell for the murine follicle system is cell 710 from the Joost dataset. For the Zebrafish system (for which Seurat v1 was introduced), the input cell is cell 877 from the scRNA-seq dataset (<xref ref-type="bibr" rid="B37">Satija et al., 2015</xref>). For Drosophila (for which DistMap was introduced), the input cell is cell 130 from the scRNA-seq dataset (<xref ref-type="bibr" rid="B16">Karaiskos et al., 2017</xref>). For the murine frontal cortex, the input cell is cell 885 from the Allen reference dataset (<xref ref-type="bibr" rid="B44">Tasic et al., 2016</xref>).</p></caption>
<graphic xlink:href="fgene-12-636743-g007.tif"/>
</fig>
</sec>
<sec id="S2.SS5">
<title>Comparison of Dimensionality Reduction Methods</title>
<p>Dimension reduction is a crucial initial step of DEEPsc. A dimension reduction method that can be trained on one dataset and deterministically applied to another is needed due to the separated training and predicting steps. Here, we explore two different representative dimension reduction methods in the linear and nonlinear categories, PCA and Uniform Manifold Approximation and Projection (UMAP; <xref ref-type="bibr" rid="B28">McInnes et al., 2018</xref>). To compare these two methods, we trained several networks with varying amounts of added noise on the reference atlases of the four studied biological systems (<xref ref-type="fig" rid="F8">Figure 8</xref>). We compared PCA (8 principal components), UMAP30 (n_components = 8, n_neighbors = 30), and UMAP5 (n_components = 8, n_neighbors = 5). While on the follicle system all three reduction methods performed virtually identically, on all three other systems PCA outperformed the other reduction methods by achieving a higher robustness score while maintaining similar accuracy.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>A comparison of the performance of DEEPsc networks using different dimensionality reduction methods on each of the biological systems for various levels of added noise during training. We compare principal component analysis (PCA) to Uniform Manifold Approximation and Projection (UMAP) with n_neighbors = 30 (UMAP30) and n_neighbors = 5 (UMAP5). Each of these methods reduce the dimensionality of the initial dataset to n_dimensions = 8. These scores are each defined to be one minus the corresponding penalty term in the performance score, so that a higher score is better. Since most methods have near perfect accuracy scores, the x-axis shows a mean of the precision and accuracy scores. The y-axis shows the robustness scores for each method.</p></caption>
<graphic xlink:href="fgene-12-636743-g008.tif"/>
</fig>
</sec>
</sec>
<sec id="S3">
<title>Discussion</title>
<p>We have developed the DEEPsc framework, which trains a deep neural network using the known expression levels of a small subset of genes in a spatial context, then imputes that spatial information onto a non-spatial scRNA-seq dataset. Instead of using a predefined metric, DEEPsc finds a metric adaptive to data. This framework is system-adaptive and designed to be robust to noise. DEEPsc consistently performs at or above the level of several existing methods across all four biological systems studied herein, including systems for which existing methods were originally developed (<xref ref-type="fig" rid="F3">Figure 3</xref> and <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T2">2</xref>), based on our comprehensive performance measure. While DEEPsc achieves comparable accuracy and precision to other methods, it is significantly more robust to noise.</p>
<p>The source of DEEPsc&#x2019;s ability to perform well across multiple biological systems is likely the generality of its neural network architecture and the multiple checks for robustness employed during training on the reference atlas. The various parameters for training a DEEPsc network, though chosen empirically, appear to translate to multiple systems effectively, so we expect DEEPsc to continue to perform well across more biological systems in future study.</p>
<p>One notable weakness of DEEPsc is the significant amount of training time required to produce a final mapping. While most existing reference atlas methods simply involve a deterministic calculation to produce a mapping, DEEPsc requires an initial training, and the training time depends on the number of locations in the spatial atlas. The training process of DEEPsc can be effectively accelerated by iterating over a subset of possible location pairs. Due to the dimension reduction step, DEEPsc can still be trained efficiently on datasets with large amount of genes, for example, the spatial transcriptomics data on the murine frontal cortex. Though the predefined metrics including the 2-norm and inf-norm perform well in terms of accuracy and precision, they are less robust to noise. This is further the case for LMNN as it tries to amplify any small variations. This drawback in robustness is mitigated by DEEPsc by controlling the balance between precision and robustness.</p>
<p>Learning a metric from high-dimensional datasets can be generally useful for analysis and integration of omics datasets. A future research interest is to decrease training time in such framework by developing a better method for reducing the size of the training set to a small, targeted fraction of relevant examples, particularly for very large atlases such as those derived from spatial transcriptomics assays. Since the size of the training set can increase quadratically with the number of positions in the atlas, it is beneficial to develop a more efficient training pipeline. We have developed a method of sparsifying the training set (section &#x201C;Materials and Methods&#x201D;), so that its size only increases linearly with the number of positions in the atlas, though further improvement may be warranted. The largest atlas studied here was that of Drosophila (<italic>P=3039</italic>), the training of which took several hours even with the sparsified training set. Typical numbers of distinct spatial locations in a spatial transcriptomics dataset can be orders of magnitude larger.</p>
<p>DEEPsc aside, the performance score we have created can serve as a comprehensive measure of mapping performance for future work. The performance score is able to be calculated for any mapping method that assigns a likelihood of origin from each spatial location, particularly within the reference atlas framework. It is not dependent on any specific system or mapping method, and the individual terms which constitute it allow for a detailed analysis and comparison of various methods. Potential improvements might include incorporating some amount of spatial awareness into the calculation. Currently each spatial position is treated as completely independent from every other spatial position, so the precision term, for example, can yield unintuitive results if a method maps a cell, for example, with high probability to two positions on opposite sides of a system and low probability everywhere else, compared to a different method mapping the same cell with high probability to five positions in a tightly clustered, spatially compact region of the system. If, for example, the various correspondence scores for each position with high probability were weighted by their physical distance from other cells with high probability, this term might more accurately reflect the intuitive idea of precision. Other improvements might include simplifying the calculation of the robustness term to require fewer intensive calculations.</p>
</sec>
<sec id="S4">
<title>Conclusion</title>
<p>DEEPsc achieves an accuracy comparable to several existing methods while attaining improved precision and robustness. It also has a more consistent performance across the four different biological systems tested thanks to the system-adaptive design. As spatially resolved gene expression data becomes more readily available, our method will serve as a useful tool to infer spatial origins from non-spatial scRNA-seq data.</p>
<p>Additionally, our comprehensive performance score and the collection of reproductions of previously developed methods in a single software framework will serve as useful tools for future comparisons of spatial mapping methods. This systematic approach to imputing spatial information to scRNA-seq data is crucial to studying the spatial impact on cell fate dynamics.</p>
</sec>
<sec id="S5" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S5.SS1">
<title>Data Preparation for DEEPsc</title>
<p>Given a matrix of scRNA-seq read counts where each row is a different gene and each column is a different cell, and a matrix representing a spatial reference atlas where each row is a different gene and each column is a different spatial position, we first select common genes by eliminating rows in each corresponding to genes not in the other matrix. Once we have eliminated genes not in common, we are left with a number of cells (<italic>C</italic>) &#x00D7; number of genes (<italic>G</italic>) matrix for the scRNA-seq data and a number of positions (<italic>P</italic>) &#x00D7; number of genes (<italic>G</italic>) matrix for the spatial reference atlas.</p>
<p>We then apply dimensionality reduction to the atlas in the form of a PCA projection, selecting a user-configurable number of principal components to serve as feature vectors. We find in our analysis that keeping the top eight principal components yields satisfactory results. The same PCA coefficients are used to project the scRNA-seq matrix into these principal components. After projection, both matrices are normalized by dividing by the largest element in each, so that the elements are all in [0,1].</p>
<p>For the comparisons in section &#x201C;Comparison of Dimensionality Reduction Methods,&#x201D; we use the UMAP implementation by <xref ref-type="bibr" rid="B29">Meehan et al. (2021)</xref>, found on the MATLAB Central File Exchange at <ext-link ext-link-type="uri" xlink:href="https://www.mathworks.com/matlabcentral/fileexchange/71902">https://www.mathworks.com/matlabcentral/fileexchange/71902</ext-link>. Specifically, we ran the run_umap() function on the spatial reference atlas with n_dimensions = 8 and n_neighbors = 30 or n_neighbors = 5 for UMAP30 and UMAP5, respectively.</p>
</sec>
<sec id="S5.SS2">
<title>Training a DEEPsc Network</title>
<p>To train the DEEPsc network, we use the spatial position feature vectors themselves as simulated scRNA-seq data. The training data is a set of P<sup>2</sup> vectors of length <italic>2N</italic>, where <italic>N</italic> is the reduced dimensionality of the reference atlas. The first <italic>N</italic> components correspond to a feature vector of one position in the reference atlas (functioning as a simulated cell) and the last <italic>N</italic> components correspond to some other position in the reference atlas. Each simulated cell is compared pairwise with every position in the spatial reference atlas; if the simulated cell is an exact match to a given position, the target output is chosen to be 1 (a high likelihood of origin), and if the simulated cell and chosen position are not an exact match, the target output is chosen to be 0 (a low likelihood of origin).</p>
<p>The DEEPsc architecture is an artificial neural network with <italic>2N</italic> inputs, two fully connected hidden layers with N nodes each and a single node in the output layer. Sigmoid activation functions are attached to each node, including the output node, so that the resulting output is in [0,1] and can be interpreted as a likelihood that the input cell originated from the input spatial position. To preserve robustness and avoid overfitting the training data, a layer of Gaussian noise is added to the simulated cells so that the network is pushed to learn complex nonlinear relationships among the spatial positions in the reference atlas rather than simply activate when an exact match is encountered. This Gaussian noise layer allows the user to configure the standard deviation of the added noise, as well as to configure the probability that any noise will be added in a given training epoch. We find empirically that a noise level of about 0.10 and a probability of 0.5 yield reasonable robustness to noise, though this may vary from system to system.</p>
<p>Since the training data will naturally consist of many more non-matches than matches, and thus the target data will contain many more zeros than ones, we use a novel custom objective function,</p>
<disp-formula id="S5.Ex2">
<mml:math id="M2">
<mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>P</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>1.001</mml:mn>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>y</italic><sub><italic>i</italic></sub> is the network&#x2019;s predicted output and <italic>t</italic><sub><italic>i</italic></sub> is the target output (<italic>t</italic><sub><italic>i</italic></sub> = 1 if exact match and <italic>t</italic><sub><italic>i</italic></sub> = 0 if not), to more heavily penalize the network when it gives a false negative (low likelihood when it should be high) than when it gives a false positive (high likelihood when it should be low). This acts to counteract the tendency of the network to &#x201C;learn&#x201D; to simply return 0 for every single input and &#x201C;ignore&#x201D; any comparably rare training data with <italic>t</italic><sub><italic>i</italic></sub> = 1.</p>
<p>To further account for the sparsity of exact matches in the training set, we split it into a test and validation set, the former consisting of a configurable fraction of the inputs corresponding to exact matches as well as a configurable multiple of the inputs corresponding to non-matches. If <italic>trainFrac = 0.9</italic> and <italic>trainingMultiple = 99</italic>, for example, 90% of the exact matches will be added to the training set and 99x more non-matches will be added, so that the exact matches make up 1% of the training set. The rest of the inputs are reserved for the (generally much larger) validation set. This is beneficial in reducing training time because it allows us to train with a much smaller fraction of the <italic>P</italic><sup>2</sup> input vectors, giving preference to the exact matches. Indeed, this reduces the size of the actual training set to scale linearly with the size of the atlas rather than quadratically.</p>
<p>Training is performed in MATLAB using the <italic>trainNetwork()</italic> function in the Deep Learning Toolbox (<xref ref-type="bibr" rid="B45">The Mathworks, Inc, 2019a</xref>), for which we implemented the above-described custom network layers. Since the input data is already normalized in preprocessing, we disable the default normalization of <italic>trainNetwork().</italic> We use the default Glorot (<xref ref-type="bibr" rid="B52">Xavier and Yoshua, 2010</xref>) initialization of weights and biases in the fully connected layers. We then train each network for a maximum of 50,000 epochs of standard gradient descent with a learning rate of &#x03B7;=0.01, shuffle the order of the data each epoch, and use the ADAM optimization method (<xref ref-type="bibr" rid="B18">Kingma and Ba, 2014</xref>) with the default parameters &#x03B2;<sub>1</sub> = 0.9, &#x03B2;<sub>2</sub> = 0.999, and <italic>&#x03B5;</italic> = 10<sup>&#x2212;8</sup>. In addition to the custom objective function layer we describe above, <italic>trainNetwork()</italic> by default adds an <italic>L</italic><sup>2</sup>-regularization term to the loss with a regularization factor of &#x03BB;=0.0001. We monitor the RMSE of the validation set throughout training and manually stop training if it is no longer improving before the maximum number of epochs has been reached. The <italic>trainNetwork()</italic> function also allows for parallel computation via the Parallel Computing Toolbox (<xref ref-type="bibr" rid="B46">The Mathworks, Inc, 2019b</xref>), which is highly recommended but not strictly required for training.</p>
</sec>
<sec id="S5.SS3">
<title>Creating a Reference Atlas for the Murine Follicle</title>
<p>To create a spatial reference atlas for the murine follicle system, we patterned the spatial coordinates of each position in the atlas off of a standard diagram of a mouse follicle found in Figure 1 of <xref ref-type="bibr" rid="B15">Joost et al. (2016)</xref>. We constructed a Voronoi diagram around each of the cell centers and made manual adjustments to the vertices as we saw fit aesthetically. We then selected the eight genes in the atlas from the systematic staining catalog made available by Joost. We chose the genes based on a combination of high image quality and spatial diversity. Gene expression levels in [0,1] were chosen manually to best represent the images, though to eliminate any implicit bias we also added a small level of Gaussian noise to the atlas. For all methods requiring a binary atlas, we chose a cutoff of 0.2 to represent &#x201C;on&#x201D; expression in this atlas.</p>
</sec>
<sec id="S5.SS4">
<title>Large Margin Nearest Neighbor Baseline</title>
<p>To implement a LMNN baseline for benchmarking comparison, we used code from the MATLAB Toolbox for Dimensionality Reduction found at <ext-link ext-link-type="uri" xlink:href="https://lvdmaaten.github.io/drtoolbox/">https://lvdmaaten.github.io/drtoolbox/</ext-link> and modified it for our uses. Specifically, we used the <italic>lmnn()</italic> function in the &#x201C;techniques&#x201D; subfolder, and modified the code to set <italic>mu = 1</italic>, i.e., to remove the &#x201C;pull&#x201D; term, as well as setting the number of targets to 1 (the point itself) and treating all other points as imposters. Further, we modified the slack variables to enforce a minimum separation of <inline-formula><mml:math id="INEQ38"><mml:msqrt><mml:mi>D</mml:mi></mml:msqrt></mml:math></inline-formula>, where <italic>D</italic> is the dimensionality of the space (<italic>D=G</italic> for our applications). For the numerical experiments of the LMNN method with the cortex dataset, a PCA dimension reduction (50 PCs) was performed before applying LMNN to accommodate the large number of genes.</p>
</sec>
</sec>
<sec id="S6">
<title>Data Availability Statement</title>
<p>The original data used in this paper can be accessed through the following links: (1) zebrafish embryo spatial data: downloaded from (<ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/ev78jelev0jgu5s/seurat_files_zfin.zip?dl=1">https://www.dropbox.com/s/ev78jelev0jgu5s/seurat_files_zfin.zip?dl=1</ext-link>) (<xref ref-type="bibr" rid="B37">Satija et al., 2015</xref>); (2) zebrafish embryo scRNA-seq data: GEO accession codes: <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="GSE66688">GSE66688</ext-link> (<xref ref-type="bibr" rid="B37">Satija et al., 2015</xref>); (3) Drosophila embryo spatial and scRNA-seq data: accessible at the Dream Single cell Transcriptomics Challenge through Synapse ID (syn15665609) (<xref ref-type="bibr" rid="B16">Karaiskos et al., 2017</xref>); (4) mouse frontal cortex spatial data: downloaded from 10x Genomics Spatial Gene Expression Datasets (<ext-link ext-link-type="uri" xlink:href="https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Mouse_Brain_Sagittal_Anterior">https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Mouse_Brain_Sagittal_Anterior</ext-link>); (5) mouse frontal cortex scRNA-seq data: downloaded from (<ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1">https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1</ext-link>) (<xref ref-type="bibr" rid="B44">Tasic et al., 2016</xref>); (6) follicle scRNA-seq data and spatial imaging data from which the reference atlas was derived: downloaded from the supplementary of the associated publication (<xref ref-type="bibr" rid="B15">Joost et al., 2016</xref>). All software developed for the purposes of this comparison are made freely available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/fmaseda/DEEPsc">https://github.com/fmaseda/DEEPsc</ext-link>.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>FM carried out computer implementation and data analysis. ZC and QN supervised the project. FM and ZC wrote the original manuscript. All authors conceived and designed the work, interpreted the simulation results, and contributed to the writing of the final manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> The work was partially supported by the NIH grants U01AR073159 and P30AR075047, NSF grants DMS1763272 and MCB2028424, and a grant from the Simons Foundation (594598, QN).</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Achim</surname> <given-names>K.</given-names></name> <name><surname>Pettit</surname> <given-names>J.-B.</given-names></name> <name><surname>Saraiva</surname> <given-names>L. R.</given-names></name> <name><surname>Gavriouchkina</surname> <given-names>D.</given-names></name> <name><surname>Larsson</surname> <given-names>T.</given-names></name> <name><surname>Arendt</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>33</volume> <fpage>503</fpage>&#x2013;<lpage>509</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3209</pub-id> <pub-id pub-id-type="pmid">25867922</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andersson</surname> <given-names>A.</given-names></name> <name><surname>Bergenstr&#x00E5;hle</surname> <given-names>J.</given-names></name> <name><surname>Asp</surname> <given-names>M.</given-names></name> <name><surname>Bergenstr&#x00E5;hle</surname> <given-names>L.</given-names></name> <name><surname>Jurek</surname> <given-names>A.</given-names></name> <name><surname>Navarro</surname> <given-names>J. F.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography.</article-title> <source><italic>Commun. Biol.</italic></source> <volume>3</volume>:<issue>565</issue>. <pub-id pub-id-type="doi">10.1038/s42003-020-01247-y</pub-id> <pub-id pub-id-type="pmid">33037292</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boufea</surname> <given-names>K.</given-names></name> <name><surname>Seth</surname> <given-names>S.</given-names></name> <name><surname>Batada</surname> <given-names>N. N.</given-names></name></person-group> (<year>2020</year>). <article-title>scID uses discriminant analysis to identify transcriptionally equivalent cell types across single-Cell RNA-seq data with batch effect.</article-title> <source><italic>iScience</italic></source> <volume>23</volume>:<issue>100914</issue>. <pub-id pub-id-type="doi">10.1016/j.isci.2020.100914</pub-id> <pub-id pub-id-type="pmid">32151972</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Butler</surname> <given-names>A.</given-names></name> <name><surname>Hoffman</surname> <given-names>P.</given-names></name> <name><surname>Smibert</surname> <given-names>P.</given-names></name> <name><surname>Papalexi</surname> <given-names>E.</given-names></name> <name><surname>Satija</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>Integrating single-cell transcriptomic data across different conditions, technologies, and species.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>36</volume> <fpage>411</fpage>&#x2013;<lpage>420</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.4096</pub-id> <pub-id pub-id-type="pmid">29608179</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cang</surname> <given-names>Z.</given-names></name> <name><surname>Nie</surname> <given-names>Q.</given-names></name></person-group> (<year>2020</year>). <article-title>Inferring spatial and signaling relationships between cells from single cell transcriptomic data.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>11</volume>:<issue>2084</issue>. <pub-id pub-id-type="doi">10.1038/s41467-020-15968-5</pub-id> <pub-id pub-id-type="pmid">32350282</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Siamese neural networks: an overview.</article-title> <source><italic>Methods Mol. Biol.</italic></source> <volume>2190</volume> <fpage>73</fpage>&#x2013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-0716-0826-5_3</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Codeluppi</surname> <given-names>S.</given-names></name> <name><surname>Borm</surname> <given-names>L. E.</given-names></name> <name><surname>Zeisel</surname> <given-names>A.</given-names></name> <name><surname>La Manno</surname> <given-names>G.</given-names></name> <name><surname>Van Lunteren</surname> <given-names>J. A.</given-names></name> <name><surname>Svensson</surname> <given-names>C. I.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Spatial organization of the somatosensory cortex revealed by osmFISH.</article-title> <source><italic>Nat. Methods</italic></source> <volume>15</volume> <fpage>932</fpage>&#x2013;<lpage>935</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-018-0175-z</pub-id> <pub-id pub-id-type="pmid">30377364</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dries</surname> <given-names>R.</given-names></name> <name><surname>Zhu</surname> <given-names>Q.</given-names></name> <name><surname>Eng</surname> <given-names>C.-H. L.</given-names></name> <name><surname>Sarkar</surname> <given-names>A.</given-names></name> <name><surname>Bao</surname> <given-names>F.</given-names></name> <name><surname>George</surname> <given-names>R. E.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Giotto, a pipeline for integrative analysis and visualization of single-cell spatial transcriptomic data.</article-title> <source><italic>bioRxiv</italic></source> <comment>[preprint]</comment>. <pub-id pub-id-type="doi">10.1101/701680</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eng</surname> <given-names>C.-H. L.</given-names></name> <name><surname>Lawson</surname> <given-names>M.</given-names></name> <name><surname>Zhu</surname> <given-names>Q.</given-names></name> <name><surname>Dries</surname> <given-names>R.</given-names></name> <name><surname>Koulena</surname> <given-names>N.</given-names></name> <name><surname>Takei</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+.</article-title> <source><italic>Nature</italic></source> <volume>568</volume> <fpage>235</fpage>&#x2013;<lpage>239</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1049-y</pub-id> <pub-id pub-id-type="pmid">30911168</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fowlkes</surname> <given-names>C. C.</given-names></name> <name><surname>Hendriks</surname> <given-names>C. L. L.</given-names></name> <name><surname>Ker&#x00E4;nen</surname> <given-names>S. V. E.</given-names></name> <name><surname>Weber</surname> <given-names>G. H.</given-names></name> <name><surname>R&#x00FC;bel</surname> <given-names>O.</given-names></name> <name><surname>Huang</surname> <given-names>M.-Y.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>A quantitative spatiotemporal Atlas of gene expression in the <italic>Drosophila</italic> blastoderm.</article-title> <source><italic>Cell</italic></source> <volume>133</volume> <fpage>364</fpage>&#x2013;<lpage>374</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2008.01.053</pub-id> <pub-id pub-id-type="pmid">18423206</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>G.</given-names></name> <name><surname>Huss</surname> <given-names>M.</given-names></name> <name><surname>Tong</surname> <given-names>G. Q.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Li Sun</surname> <given-names>L.</given-names></name> <name><surname>Clarke</surname> <given-names>N. D.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst.</article-title> <source><italic>Dev. Cell</italic></source> <volume>18</volume> <fpage>675</fpage>&#x2013;<lpage>685</lpage>. <pub-id pub-id-type="doi">10.1016/j.devcel.2010.02.012</pub-id> <pub-id pub-id-type="pmid">20412781</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halpern</surname> <given-names>K. B.</given-names></name> <name><surname>Shenhav</surname> <given-names>R.</given-names></name> <name><surname>Matcovitch-Natan</surname> <given-names>O.</given-names></name> <name><surname>T&#x00F3;th</surname> <given-names>B.</given-names></name> <name><surname>Lemze</surname> <given-names>D.</given-names></name> <name><surname>Golan</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Single-cell spatial reconstruction reveals global division of labour in the mammalian liver.</article-title> <source><italic>Nature</italic></source> <volume>542</volume> <fpage>352</fpage>&#x2013;<lpage>356</lpage>. <pub-id pub-id-type="doi">10.1038/nature21065</pub-id> <pub-id pub-id-type="pmid">28166538</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hie</surname> <given-names>B.</given-names></name> <name><surname>Bryson</surname> <given-names>B.</given-names></name> <name><surname>Berger</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Efficient integration of heterogeneous single-cell transcriptomes using Scanorama.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>37</volume> <fpage>685</fpage>&#x2013;<lpage>691</lpage>. <pub-id pub-id-type="doi">10.1038/s41587-019-0113-3</pub-id> <pub-id pub-id-type="pmid">31061482</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Hu</surname> <given-names>G.</given-names></name> <name><surname>Lyu</surname> <given-names>Y.</given-names></name> <name><surname>Susztak</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis.</article-title> <source><italic>bioRxiv</italic></source> <comment>[preprint]</comment>. <pub-id pub-id-type="doi">10.1038/s42256-020-00233-7</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joost</surname> <given-names>S.</given-names></name> <name><surname>Zeisel</surname> <given-names>A.</given-names></name> <name><surname>Jacob</surname> <given-names>T.</given-names></name> <name><surname>Sun</surname> <given-names>X.</given-names></name> <name><surname>La manno</surname> <given-names>G.</given-names></name> <name><surname>L&#x00F6;nnerberg</surname> <given-names>P.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Single-Cell transcriptomics reveals that differentiation and spatial signatures shape epidermal and hair follicle heterogeneity.</article-title> <source><italic>Cell Syst.</italic></source> <volume>3</volume> <fpage>221</fpage>&#x2013;<lpage>237.e229</lpage>. <pub-id pub-id-type="doi">10.1016/j.cels.2016.08.010</pub-id> <pub-id pub-id-type="pmid">27641957</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karaiskos</surname> <given-names>N.</given-names></name> <name><surname>Wahle</surname> <given-names>P.</given-names></name> <name><surname>Alles</surname> <given-names>J.</given-names></name> <name><surname>Boltengagen</surname> <given-names>A.</given-names></name> <name><surname>Ayoub</surname> <given-names>S.</given-names></name> <name><surname>Kipar</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>The <italic>Drosophila</italic> embryo at single-cell transcriptome resolution.</article-title> <source><italic>Science</italic></source> <volume>358</volume> <fpage>194</fpage>&#x2013;<lpage>199</lpage>. <pub-id pub-id-type="doi">10.1126/science.aan3235</pub-id> <pub-id pub-id-type="pmid">28860209</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaya</surname> <given-names>M.</given-names></name> <name><surname>Bilge</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>Deep metric learning: a survey.</article-title> <source><italic>Symmetry</italic></source> <volume>11</volume>:<issue>1066</issue>. <pub-id pub-id-type="doi">10.3390/sym11091066</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization.</article-title> <source><italic>arXiv</italic></source> <comment>[preprint]</comment>:<volume>1412.6980</volume>. <issue>arXiv</issue> <comment>e-prints</comment></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiselev</surname> <given-names>V. Y.</given-names></name> <name><surname>Andrews</surname> <given-names>T. S.</given-names></name> <name><surname>Hemberg</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Challenges in unsupervised clustering of single-cell RNA-seq data.</article-title> <source><italic>Nat. Rev. Genet.</italic></source> <volume>20</volume> <fpage>273</fpage>&#x2013;<lpage>282</lpage>. <pub-id pub-id-type="doi">10.1038/s41576-018-0088-9</pub-id> <pub-id pub-id-type="pmid">30617341</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiselev</surname> <given-names>V. Y.</given-names></name> <name><surname>Yiu</surname> <given-names>A.</given-names></name> <name><surname>Hemberg</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>scmap: projection of single-cell RNA-seq data across data sets.</article-title> <source><italic>Nat. Methods</italic></source> <volume>15</volume> <fpage>359</fpage>&#x2013;<lpage>362</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.4644</pub-id> <pub-id pub-id-type="pmid">29608555</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Korsunsky</surname> <given-names>I.</given-names></name> <name><surname>Millard</surname> <given-names>N.</given-names></name> <name><surname>Fan</surname> <given-names>J.</given-names></name> <name><surname>Slowikowski</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>F.</given-names></name> <name><surname>Wei</surname> <given-names>K.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Fast, sensitive and accurate integration of single-cell data with Harmony.</article-title> <source><italic>Nat. Methods</italic></source> <volume>16</volume> <fpage>1289</fpage>&#x2013;<lpage>1296</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0619-0</pub-id> <pub-id pub-id-type="pmid">31740819</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kulis</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Metric learning: a survey.</article-title> <source><italic>Found. Trends<sup>&#x00AE;</sup> Mach. Learn.</italic></source> <volume>5</volume> <fpage>287</fpage>&#x2013;<lpage>364</lpage>. <pub-id pub-id-type="doi">10.1561/2200000019</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lieberman</surname> <given-names>Y.</given-names></name> <name><surname>Rokach</surname> <given-names>L.</given-names></name> <name><surname>Shay</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>CaSTLe &#x2013; Classification of single cells by transfer learning: harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments.</article-title> <source><italic>PLoS One</italic></source> <volume>13</volume>:<issue>e0205499</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0205499</pub-id> <pub-id pub-id-type="pmid">30304022</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopez</surname> <given-names>R.</given-names></name> <name><surname>Nazaret</surname> <given-names>A.</given-names></name> <name><surname>Langevin</surname> <given-names>M.</given-names></name> <name><surname>Samaran</surname> <given-names>J.</given-names></name> <name><surname>Regier</surname> <given-names>J.</given-names></name> <name><surname>Jordan</surname> <given-names>M. I.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements.</article-title> <source><italic>arXiv</italic></source> <comment>[preprint]</comment> <volume>arXiv</volume>:<issue>1905.02269</issue>.</citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopez</surname> <given-names>R.</given-names></name> <name><surname>Regier</surname> <given-names>J.</given-names></name> <name><surname>Cole</surname> <given-names>M. B.</given-names></name> <name><surname>Jordan</surname> <given-names>M. I.</given-names></name> <name><surname>Yosef</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep generative modeling for single-cell transcriptomics.</article-title> <source><italic>Nat. Methods</italic></source> <volume>15</volume> <fpage>1053</fpage>&#x2013;<lpage>1058</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-018-0229-2</pub-id> <pub-id pub-id-type="pmid">30504886</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luecken</surname> <given-names>M. D.</given-names></name> <name><surname>Theis</surname> <given-names>F. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Current best practices in single-cell RNA-seq analysis: a tutorial.</article-title> <source><italic>Mol. Syst. Biol.</italic></source> <volume>15</volume>:<issue>e8746</issue>. <pub-id pub-id-type="doi">10.15252/msb.20188746</pub-id> <pub-id pub-id-type="pmid">31217225</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>F.</given-names></name> <name><surname>Pellegrini</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>ACTINN: automated identification of cell types in single cell RNA sequencing.</article-title> <source><italic>Bioinformatics</italic></source> <volume>36</volume> <fpage>533</fpage>&#x2013;<lpage>538</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz592</pub-id> <pub-id pub-id-type="pmid">31359028</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McInnes</surname> <given-names>L.</given-names></name> <name><surname>Healy</surname> <given-names>J.</given-names></name> <name><surname>Melville</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <source><italic>UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2018arXiv180203426M">https://ui.adsabs.harvard.edu/abs/2018arXiv180203426M</ext-link> <comment>(accessed February 01, 2018)</comment>.</citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meehan</surname> <given-names>C.</given-names></name> <name><surname>Ebrahimian</surname> <given-names>J.</given-names></name> <name><surname>Moore</surname> <given-names>W.</given-names></name> <name><surname>Meehan</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <source><italic>Uniform Manifold Approximation and Projection (UMAP). MATLAB Central File Exchange</italic>.</source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.mathworks.com/matlabcentral/fileexchange/71902">https://www.mathworks.com/matlabcentral/fileexchange/71902</ext-link></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moffitt</surname> <given-names>J. R.</given-names></name> <name><surname>Bambah-Mukku</surname> <given-names>D.</given-names></name> <name><surname>Eichhorn</surname> <given-names>S. W.</given-names></name> <name><surname>Vaughn</surname> <given-names>E.</given-names></name> <name><surname>Shekhar</surname> <given-names>K.</given-names></name> <name><surname>Perez</surname> <given-names>J. D.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region.</article-title> <source><italic>Science</italic></source> <volume>362</volume>:<issue>eaau5324</issue>. <pub-id pub-id-type="doi">10.1126/science.aau5324</pub-id> <pub-id pub-id-type="pmid">30385464</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nitzan</surname> <given-names>M.</given-names></name> <name><surname>Karaiskos</surname> <given-names>N.</given-names></name> <name><surname>Friedman</surname> <given-names>N.</given-names></name> <name><surname>Rajewsky</surname> <given-names>N.</given-names></name></person-group> (<year>2019</year>). <article-title>Gene expression cartography.</article-title> <source><italic>Nature</italic></source> <volume>576</volume> <fpage>132</fpage>&#x2013;<lpage>137</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1773-3</pub-id> <pub-id pub-id-type="pmid">31748748</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pandey</surname> <given-names>S.</given-names></name> <name><surname>Shekhar</surname> <given-names>K.</given-names></name> <name><surname>Regev</surname> <given-names>A.</given-names></name> <name><surname>Schier</surname> <given-names>A. F.</given-names></name></person-group> (<year>2018</year>). <article-title>Comprehensive identification and spatial mapping of habenular neuronal types using single-Cell RNA-Seq.</article-title> <source><italic>Curr. Biol.</italic></source> <volume>28</volume> <fpage>1052</fpage>&#x2013;<lpage>1065.e1057</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2018.02.040</pub-id> <pub-id pub-id-type="pmid">29576475</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>G.</given-names></name> <name><surname>Suo</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Liu</surname> <given-names>C.</given-names></name> <name><surname>Yu</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Spatial transcriptome for the molecular annotation of lineage fates and cell identity in mid-gastrula mouse embryo.</article-title> <source><italic>Dev. Cell</italic></source> <volume>36</volume> <fpage>681</fpage>&#x2013;<lpage>697</lpage>. <pub-id pub-id-type="doi">10.1016/j.devcel.2016.02.020</pub-id> <pub-id pub-id-type="pmid">27003939</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Puram</surname> <given-names>S. V.</given-names></name> <name><surname>Tirosh</surname> <given-names>I.</given-names></name> <name><surname>Parikh</surname> <given-names>A. S.</given-names></name> <name><surname>Patel</surname> <given-names>A. P.</given-names></name> <name><surname>Yizhak</surname> <given-names>K.</given-names></name> <name><surname>Gillespie</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Single-Cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer.</article-title> <source><italic>Cell</italic></source> <volume>171</volume> <fpage>1611</fpage>&#x2013;<lpage>1624.e1624</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2017.10.044</pub-id> <pub-id pub-id-type="pmid">29198524</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rodriques</surname> <given-names>S. G.</given-names></name> <name><surname>Stickels</surname> <given-names>R. R.</given-names></name> <name><surname>Goeva</surname> <given-names>A.</given-names></name> <name><surname>Martin</surname> <given-names>C. A.</given-names></name> <name><surname>Murray</surname> <given-names>E.</given-names></name> <name><surname>Vanderburg</surname> <given-names>C. R.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution.</article-title> <source><italic>Science</italic></source> <volume>363</volume> <fpage>1463</fpage>&#x2013;<lpage>1467</lpage>. <pub-id pub-id-type="doi">10.1126/science.aaw1219</pub-id> <pub-id pub-id-type="pmid">30923225</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenberg</surname> <given-names>A. B.</given-names></name> <name><surname>Roco</surname> <given-names>C. M.</given-names></name> <name><surname>Muscat</surname> <given-names>R. A.</given-names></name> <name><surname>Kuchina</surname> <given-names>A.</given-names></name> <name><surname>Sample</surname> <given-names>P.</given-names></name> <name><surname>Yao</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding.</article-title> <source><italic>Science</italic></source> <volume>360</volume> <fpage>176</fpage>&#x2013;<lpage>182</lpage>. <pub-id pub-id-type="doi">10.1126/science.aam8999</pub-id> <pub-id pub-id-type="pmid">29545511</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satija</surname> <given-names>R.</given-names></name> <name><surname>Farrell</surname> <given-names>J. A.</given-names></name> <name><surname>Gennert</surname> <given-names>D.</given-names></name> <name><surname>Schier</surname> <given-names>A. F.</given-names></name> <name><surname>Regev</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Spatial reconstruction of single-cell gene expression data.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>33</volume> <fpage>495</fpage>&#x2013;<lpage>502</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3192</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shah</surname> <given-names>S.</given-names></name> <name><surname>Lubeck</surname> <given-names>E.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Cai</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus.</article-title> <source><italic>Neuron</italic></source> <volume>92</volume> <fpage>342</fpage>&#x2013;<lpage>357</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2016.10.001</pub-id> <pub-id pub-id-type="pmid">27764670</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sprague</surname> <given-names>J.</given-names></name> <name><surname>Bayraktaroglu</surname> <given-names>L.</given-names></name> <name><surname>Clements</surname> <given-names>D.</given-names></name> <name><surname>Conlin</surname> <given-names>T.</given-names></name> <name><surname>Fashena</surname> <given-names>D.</given-names></name> <name><surname>Frazer</surname> <given-names>K.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>The zebrafish information network: the zebrafish model organism database.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>34</volume> <fpage>D581</fpage>&#x2013;<lpage>D585</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkj086</pub-id> <pub-id pub-id-type="pmid">16381936</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>St&#x00E5;hl</surname> <given-names>P. L.</given-names></name> <name><surname>Salm&#x00E9;n</surname> <given-names>F.</given-names></name> <name><surname>Vickovic</surname> <given-names>S.</given-names></name> <name><surname>Lundmark</surname> <given-names>A.</given-names></name> <name><surname>Navarro</surname> <given-names>J. F.</given-names></name> <name><surname>Magnusson</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Visualization and analysis of gene expression in tissue sections by spatial transcriptomics.</article-title> <source><italic>Science</italic></source> <volume>353</volume> <fpage>78</fpage>&#x2013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1126/science.aaf2403</pub-id> <pub-id pub-id-type="pmid">27365449</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stuart</surname> <given-names>T.</given-names></name> <name><surname>Butler</surname> <given-names>A.</given-names></name> <name><surname>Hoffman</surname> <given-names>P.</given-names></name> <name><surname>Hafemeister</surname> <given-names>C.</given-names></name> <name><surname>Papalexi</surname> <given-names>E.</given-names></name> <name><surname>Mauck</surname> <given-names>W. M.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Comprehensive integration of single-cell data.</article-title> <source><italic>Cell</italic></source> <volume>177</volume> <fpage>1888</fpage>&#x2013;<lpage>1902.e1821</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2019.05.031</pub-id> <pub-id pub-id-type="pmid">31178118</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Svensson</surname> <given-names>V.</given-names></name> <name><surname>Vento-Tormo</surname> <given-names>R.</given-names></name> <name><surname>Teichmann</surname> <given-names>S. A.</given-names></name></person-group> (<year>2018</year>). <article-title>Exponential scaling of single-cell RNA-seq in the past decade.</article-title> <source><italic>Nat. Protoc.</italic></source> <volume>13</volume> <fpage>599</fpage>&#x2013;<lpage>604</lpage>.</citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>Y.</given-names></name> <name><surname>Cahan</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species.</article-title> <source><italic>Cell Syst.</italic></source> <volume>9</volume> <fpage>207</fpage>&#x2013;<lpage>213.e202</lpage>.</citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tasic</surname> <given-names>B.</given-names></name> <name><surname>Menon</surname> <given-names>V.</given-names></name> <name><surname>Nguyen</surname> <given-names>T. N.</given-names></name> <name><surname>Kim</surname> <given-names>T. K.</given-names></name> <name><surname>Jarsky</surname> <given-names>T.</given-names></name> <name><surname>Yao</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Adult mouse cortical cell taxonomy revealed by single cell transcriptomics.</article-title> <source><italic>Nat. Neurosci.</italic></source> <volume>19</volume> <fpage>335</fpage>&#x2013;<lpage>346</lpage>.</citation></ref>
<ref id="B45"><citation citation-type="journal"><collab>The Mathworks, Inc</collab> (<year>2019a</year>). <source><italic>MATLAB Deep Learning Toolbox Release 2019b.</italic></source> <publisher-loc>Natick</publisher-loc>: <publisher-name>The Mathworks, Inc</publisher-name>.</citation></ref>
<ref id="B46"><citation citation-type="journal"><collab>The Mathworks, Inc</collab> (<year>2019b</year>). <source><italic>MATLAB Parallel Toolbox Release 2019b.</italic></source> <publisher-loc>Natick</publisher-loc>: <publisher-name>The Mathworks, Inc</publisher-name>.</citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wagner</surname> <given-names>F.</given-names></name> <name><surname>Yanai</surname> <given-names>I.</given-names></name></person-group> (<year>2018</year>). <article-title>Moana: a robust and scalable cell type classification framework for single-cell RNA-Seq data.</article-title> <source><italic>bioRxiv</italic></source> <comment>[preprint]</comment>. <pub-id pub-id-type="doi">10.1101/456129</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Karikomi</surname> <given-names>M.</given-names></name> <name><surname>Maclean</surname> <given-names>A. L.</given-names></name> <name><surname>Nie</surname> <given-names>Q.</given-names></name></person-group> (<year>2019</year>). <article-title>Cell lineage and communication network inference via optimization for single-cell transcriptomics.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>47</volume>:<issue>e66</issue>.</citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Allen</surname> <given-names>W. E.</given-names></name> <name><surname>Wright</surname> <given-names>M. A.</given-names></name> <name><surname>Sylwestrak</surname> <given-names>E. L.</given-names></name> <name><surname>Samusik</surname> <given-names>N.</given-names></name> <name><surname>Vesuna</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Three-dimensional intact-tissue sequencing of single-cell transcriptional states.</article-title> <source><italic>Science</italic></source> <volume>361</volume>:<issue>eaat5691</issue>.</citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weinberger</surname> <given-names>K. Q.</given-names></name> <name><surname>Saul</surname> <given-names>L. K.</given-names></name></person-group> (<year>2009</year>). <article-title>Distance metric learning for large margin nearest neighbor classification.</article-title> <source><italic>J. Mach. Learn. Res.</italic></source> <volume>10</volume> <fpage>207</fpage>&#x2013;<lpage>244</lpage>.</citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Welch</surname> <given-names>J. D.</given-names></name> <name><surname>Kozareva</surname> <given-names>V.</given-names></name> <name><surname>Ferreira</surname> <given-names>A.</given-names></name> <name><surname>Vanderburg</surname> <given-names>C.</given-names></name> <name><surname>Martin</surname> <given-names>C.</given-names></name> <name><surname>Macosko</surname> <given-names>E. Z.</given-names></name></person-group> (<year>2019</year>). <article-title>Single-Cell multi-omic integration compares and contrasts features of brain cell identity.</article-title> <source><italic>Cell</italic></source> <volume>177</volume> <fpage>1873</fpage>&#x2013;<lpage>1887.e1817</lpage>.</citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xavier</surname> <given-names>G.</given-names></name> <name><surname>Yoshua</surname> <given-names>B.</given-names></name></person-group> (<year>2010</year>). <article-title>&#x201C;Understanding the difficulty of training deep feedforward neural networks: PMLR,&#x201D; in</article-title> <source><italic>Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings</italic></source>, <volume>Vol. 9</volume> (<publisher-loc>Sardinia</publisher-loc>: <publisher-name>Chia Laguna Resort</publisher-name>), <fpage>249</fpage>&#x2013;<lpage>256</lpage>.</citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>G.-C.</given-names></name> <name><surname>Cai</surname> <given-names>L.</given-names></name> <name><surname>Elowitz</surname> <given-names>M.</given-names></name> <name><surname>Enver</surname> <given-names>T.</given-names></name> <name><surname>Fan</surname> <given-names>G.</given-names></name> <name><surname>Guo</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Challenges and emerging directions in single-cell analysis.</article-title> <source><italic>Genome Biol.</italic></source> <volume>18</volume>:<issue>84</issue>.</citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>Q.</given-names></name> <name><surname>Shah</surname> <given-names>S.</given-names></name> <name><surname>Dries</surname> <given-names>R.</given-names></name> <name><surname>Cai</surname> <given-names>L.</given-names></name> <name><surname>Yuan</surname> <given-names>G.-C.</given-names></name></person-group> (<year>2018</year>). <article-title>Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>36</volume> <fpage>1183</fpage>&#x2013;<lpage>1190</lpage>.</citation></ref>
</ref-list>
</back>
</article>