<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">680535</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2021.680535</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Graph Mapper: Seeing Graphs Through the Neural Lens</article-title>
<alt-title alt-title-type="left-running-head">Bodnar et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Deep Graph Mapper</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Bodnar</surname>
<given-names>Cristian</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/782075/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Cangea</surname>
<given-names>C&#x103;t&#x103;lina</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1279839/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li&#xf2;</surname>
<given-names>Pietro</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/80755/overview"/>
</contrib>
</contrib-group>
<aff>Department of Computer Science and Technology, University of Cambridge, <addr-line>Cambridge</addr-line>, <country>United&#x20;Kingdom</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/918510/overview">Umberto Lupo</ext-link>, &#xc9;cole Polytechnique F&#xe9;d&#xe9;rale de Lausanne, Switzerland</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1275289/overview">Mustafa Hajij</ext-link>, Santa Clara University, United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1278552/overview">Stanislav Schmidt</ext-link>, Ecole polytechnique f&#xe9;d&#xe9;rale de Lausanne (EPFL), Switzerland</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Cristian Bodnar, <email>cb2015@cam.ac.uk</email>; C&#x103;t&#x103;lina Cangea, <email>ccc53@cam.ac.uk</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Machine Learning and Artificial&#x20;Intelligence, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>16</day>
<month>06</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>680535</elocation-id>
<history>
<date date-type="received">
<day>14</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>05</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Bodnar, Cangea and Li&#xf2;.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Bodnar, Cangea and Li&#xf2;</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Graph summarization has received much attention lately, with various works tackling the challenge of defining pooling operators on data regions with arbitrary structures. These contrast the grid-like ones encountered in image inputs, where techniques such as max-pooling have been enough to show empirical success. In this work, we merge the Mapper algorithm with the expressive power of graph neural networks to produce topologically grounded graph summaries. We demonstrate the suitability of Mapper as a topological framework for graph pooling by proving that Mapper is a generalization of pooling methods based on soft cluster assignments. Building upon this, we show how easy it is to design novel pooling algorithms that obtain competitive results with other state-of-the-art methods. Additionally, we use our method to produce GNN-aided visualisations of attributed complex networks.</p>
</abstract>
<kwd-group>
<kwd>mapper</kwd>
<kwd>graph neural networks</kwd>
<kwd>pooling</kwd>
<kwd>graph summarization</kwd>
<kwd>graph classification</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>The abundance of relational information in the real world and the success of deep learning techniques have brought renowned interest in learning from graph-structured data. Efforts in this direction have been primarily focused on replicating the hierarchy of convolutional filters and pooling operators, which have achieved previous success in computer vision <xref ref-type="bibr" rid="B38">Sperduti. (1994)</xref>; <xref ref-type="bibr" rid="B18">Goller and Kuchler. (1996)</xref>; <xref ref-type="bibr" rid="B19">Gori et&#x20;al. (2005)</xref>; <xref ref-type="bibr" rid="B34">Scarselli et&#x20;al. (2009)</xref>; <xref ref-type="bibr" rid="B5">Bruna et&#x20;al. (2014)</xref>; <xref ref-type="bibr" rid="B27">Li et&#x20;al. (2015)</xref>, within relational data domains. In contrast to image processing applications, where the signal is defined on a grid-like structure, designing graph coarsening (pooling) operators is a much more difficult problem, due to the arbitrary structure typically present in graphs.</p>
<p>In this work, we introduce Structural Deep Graph Mapper (SDGM)<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref>&#x2014;an adaptation of Mapper (<xref ref-type="bibr" rid="B37">Singh et&#x20;al., 2007</xref>), an algorithm from the field of Topological Data Analysis (TDA) (<xref ref-type="bibr" rid="B10">Chazal and Michel, 2017</xref>), to graph domains. First, we prove that SDGM is a generalization of pooling methods based on soft cluster assignments, which include state-of-the-art algorithms like minCUT (<xref ref-type="bibr" rid="B3">Bianchi et&#x20;al., 2019</xref>) and DiffPool (<xref ref-type="bibr" rid="B46">Ying et&#x20;al., 2018</xref>). Building upon this topological perspective of graph pooling, we propose two pooling algorithms leveraging fully differentiable and fixed PageRank-based &#x201c;lens&#x201d; functions, respectively. We demonstrate that these operators achieve results competitive with other state-of-the-art pooling methods on graph classification benchmarks. Furthermore, we show how our method offers a means to flexibly visualize graphs and the complex data living on them through a GNN &#x201c;lens&#x201d; function.</p>
</sec>
<sec id="s2">
<title>2 Related Work</title>
<p>In this section, we investigate the existing work in the two broad areas that our method is part of&#x2014;graph pooling (also deemed hierarchical representation learning) and network visualisations.</p>
<sec id="s2-1">
<title>2.1 Graph Pooling</title>
<p>Algorithms have already been considerably explored within GNN frameworks for graph classification. <xref ref-type="bibr" rid="B28">Luzhnica et&#x20;al. (2019)</xref> propose a topological approach to pooling, which coarsens the graph by aggregating its maximal cliques into new clusters. However, cliques are local topological features, whereas our methods leverage a global perspective of the graph during pooling. Two paradigms distinguish themselves among learnable pooling layers: Top-<italic>k</italic> pooling based on a learnable ranking (<xref ref-type="bibr" rid="B17">Gao and Ji, 2019</xref>), and learning the cluster assignment (<xref ref-type="bibr" rid="B46">Ying et&#x20;al., 2018</xref>) with additional entropy and link prediction losses for more stable training (DiffPool). Following these two trends, several variants and incremental improvements have been proposed. The Top-<italic>k</italic> approach is explored in conjunction with jumping-knowledge networks (<xref ref-type="bibr" rid="B6">Cangea et&#x20;al., 2018</xref>), attention (<xref ref-type="bibr" rid="B21">Huang et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B25">Lee et&#x20;al., 2019</xref>) and self-attention for cluster assignment (<xref ref-type="bibr" rid="B33">Ranjan et&#x20;al., 2019</xref>). Similarly to DiffPool, the method suggested by <xref ref-type="bibr" rid="B3">Bianchi et&#x20;al. (2019)</xref> uses several loss terms to enforce clusters with strongly connected nodes, similar sizes and orthogonal assignments. A different approach is also proposed by <xref ref-type="bibr" rid="B29">Ma et&#x20;al. (2019)</xref>, who leverage spectral clustering.</p>
</sec>
<sec id="s2-2">
<title>2.2 Graph Visualization</title>
<p>Graph visualization is a vast topic in network science. We therefore refer the reader to existing surveys, for a complete view of the field (<xref ref-type="bibr" rid="B31">Nobre et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B43">von Landesberger et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B2">Beck et&#x20;al., 2017</xref>), and focus here only on methods that, similarly to ours, produce node-link-based visual summaries through the aggregation of static graphs. Previous methods rely on grouping nodes into a set of predefined motifs (<xref ref-type="bibr" rid="B13">Dunne and Shneiderman, 2013</xref>), modules (<xref ref-type="bibr" rid="B14">Dwyer et&#x20;al., 2013</xref>) or clusters with basic topological properties (<xref ref-type="bibr" rid="B1">Batagelj et&#x20;al., 2010</xref>). Recent approaches have considered attribute-driven aggregation schemes for multivariate networks. For instance, PivotGraph (<xref ref-type="bibr" rid="B44">Wattenberg, 2006</xref>) groups the nodes based on categorical attributes, while <xref ref-type="bibr" rid="B40">van den Elzen and van Wijk. (2014)</xref> propose a more sophisticated method using a combination of manually specified groupings and attribute queries. However, these mechanisms are severely constrained by the simple types of node groupings allowed and the limited integration between graph topology and attributes. Closest to our work, Mapper-based summaries for graphs have recently been considered by <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref>. Despite the advantages provided by Mapper, their approach relies on hand-crafted graph-theoretic &#x201c;lenses,&#x201d; such as the average geodesic distance, graph density functions or eigenvectors of the graph Laplacian. Not only are these functions unable to fully adapt to the graph of interest, but they are also computationally inefficient and do not take into account the attributes of the&#x20;graph.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Background and Formal Problem Statement</title>
<sec id="s3-1">
<title>3.1 Formal Problem Statement</title>
<p>Consider a dataset whose samples are formed by a graph <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, A <italic>d</italic>-dimensional signal defined over the nodes of the graph <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and a label <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> associated with the graph, where <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, a finite indexing set for the dataset samples. We are interested in the setting where graph neural networks are used to classify such graphs using a sequence of (graph) convolutions and pooling operators. While convolutional operators act like filters of the graph signal, pooling operators coarsen the graph and reduce its spatial resolution. Unlike image processing tasks, where the inputs exhibit a regular grid structure, graph domains pose challenges for pooling. In this work, we design topologically inspired pooling operators based on Mapper. As an additional contribution, we also investigate graph pooling as a tool for the visualization of attributed graphs.</p>
<p>We briefly review the Mapper (<xref ref-type="bibr" rid="B37">Singh et&#x20;al., 2007</xref>) algorithm, with a focus on graph domains (<xref ref-type="bibr" rid="B20">Hajij et&#x20;al., 2018</xref>). We first introduce the required mathematical background.</p>
<p>
<statement content-type="definition" id="definition_3_1">
<label>
<bold>Definition 3.1:</bold>
</label>
<p> Let <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Z</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> be two topological spaces, <inline-formula id="inf6">
<mml:math id="m6">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>Z</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, a continuous function, and <inline-formula id="inf7">
<mml:math id="m7">
<mml:mrow>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> a cover of <italic>Z</italic>. Then, the pull back cover <inline-formula id="inf8">
<mml:math id="m8">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>X</italic> induced by <inline-formula id="inf9">
<mml:math id="m9">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the collection of open sets <inline-formula id="inf10">
<mml:math id="m10">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, for some indexing set <italic>I</italic>. For each <inline-formula id="inf11">
<mml:math id="m11">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, let <inline-formula id="inf12">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> be a partition of <inline-formula id="inf13">
<mml:math id="m13">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> indexed by <inline-formula id="inf14">
<mml:math id="m14">
<mml:mrow>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. We refer to the elements of these partitions as clusters. The resulting collection of clusters forms another cover of <italic>X</italic> called the refined pull back cover <inline-formula id="inf15">
<mml:math id="m15">
<mml:mrow>
<mml:mi mathvariant="normal">&#x211b;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</statement>
</p>
<p>
<statement content-type="definition" id="definition_3_2">
<label>
<bold>Definition 3.2:</bold>
</label>
<p> Let <italic>X</italic> be a topological space with an open cover <inline-formula id="inf16">
<mml:math id="m16">
<mml:mrow>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The 1-skeleton of the nerve <inline-formula id="inf17">
<mml:math id="m17">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of <inline-formula id="inf18">
<mml:math id="m18">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula>, which we denote by <inline-formula id="inf19">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>sk</mml:mtext>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, is the graph with vertices given by <inline-formula id="inf20">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, where two vertices <inline-formula id="inf21">
<mml:math id="m21">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are connected if and only if <inline-formula id="inf22">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2260;</mml:mo>
<mml:mo>&#x2205;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</statement>
</p>
</sec>
<sec id="s3-2">
<title>3.2 Mapper</title>
<p>Given a topological space <italic>X</italic>, a carefully chosen lens function <inline-formula id="inf23">
<mml:math id="m23">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>Z</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and a cover <inline-formula id="inf24">
<mml:math id="m24">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> of <italic>Z</italic>, Mapper produces a graph representation of the topological space by computing the 1-skeleton of the nerve of the refined pull back cover <inline-formula id="inf25">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>sk</mml:mtext>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x211b;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, which we denote by <inline-formula id="inf26">
<mml:math id="m26">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2133;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. We note that, more generally, the skeleton operator might be omitted, in which case the output of the algorithm becomes a simplicial complex. However, for the purpose of this work, we are only interested in graph outputs. Typically, the input to the Mapper algorithm is a point cloud and the connected components are inferred using a statistical clustering algorithm, with the help of a metric defined in the space where the points&#x20;live.</p>
<p>
<bold>Mapper for Graphs</bold>. More recently, <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref> considered the case when the input topological space <inline-formula id="inf27">
<mml:math id="m27">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>G</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a also a graph with vertices <italic>V</italic> and edge set <italic>E</italic>. In a typical point cloud setting, the relationships between points are statistically inferred; in a graph setting, the underlying relationships are given by the edges of the graph. The adaptation of Mapper for graphs proposed by <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref> uses a lens function <inline-formula id="inf28">
<mml:math id="m28">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>&#x211d;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> based on graph-theoretic functions and a cover <inline-formula id="inf29">
<mml:math id="m29">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> formed of open intervals of the real line. Additionally, the connected components <inline-formula id="inf30">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are given by the vertices of the connected components of the subgraph induced by <inline-formula id="inf31">
<mml:math id="m31">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>However, the graph version of Mapper described above has two main limitations. Firstly, the graph-theoretic functions considered for <italic>f</italic> are rather limited, not taking into account the signals which are typically defined on the graph in signal processing tasks, such as graph classification. Secondly, by using a pull back cover only over the graph vertices, as opposed to a cover of the entire graph, the method relies exclusively on the lens function to capture the structure of the graph and the edge-connections between the clusters. This may end up discarding valuable structural information, as we later show in <xref ref-type="sec" rid="s7-7">Section&#x20;7.7</xref>.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Structural Deep Graph Mapper</title>
<p>Structural Graph Mapper. One of the disadvantages of the graph version of Mapper (described in the background section) is that its output does not explicitly capture the connections between the resulting collections of clusters. This is primarily because the lens function <italic>f</italic> is defined only over the set of vertices <italic>V</italic> and, consequently, the resulting pull-back cover only covers <italic>V</italic>. In contrast, one should aim to obtain a cover for the graph <italic>G</italic>, which automatically includes the edges. While this could be resolved by considering a lens function over the geometric realization of the graph, handling only a finite set of vertices is computationally convenient.</p>
<p>To balance these trade-offs, we add an extra step to the Mapper algorithm. Concretely, we extend the refined pull back cover into a cover over both nodes and edges. Given the set of refined clusters <inline-formula id="inf32">
<mml:math id="m32">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, we compute a new set of clusters <inline-formula id="inf33">
<mml:math id="m33">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mo>&#x2032;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> where each cluster <inline-formula id="inf34">
<mml:math id="m34">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mo>&#x2032;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> contains the elements of <inline-formula id="inf35">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> as well as all the edges incident to the vertices in <inline-formula id="inf36">
<mml:math id="m36">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. We use <inline-formula id="inf37">
<mml:math id="m37">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x211b;</mml:mi>
<mml:mi>E</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (the edge-refined pull back cover) to refer to this open cover of the graph <italic>G</italic> computed from <inline-formula id="inf38">
<mml:math id="m38">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Then, our algorithm can be written as <inline-formula id="inf39">
<mml:math id="m39">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>sk</mml:mtext>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x211b;</mml:mi>
<mml:mi>E</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and we denote it by <inline-formula id="inf40">
<mml:math id="m40">
<mml:mrow>
<mml:mtext>GM</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>
<statement content-type="remark" id="remark_1">
<label>
<bold>Remark 1:</bold>
</label>
<p> We note that Structural Mapper, unlike the original Mapper method, encodes two types of relationships via the edges of the output graph. The semantic connections highlight a similarity between clusters, according to the lens function (that is, two clusters have common nodes&#x2014;as before), while structural connections show how two clusters are connected (namely, two clusters have at least one edge in common). This latter type of connection is the result of considering the extended cover over the edges. The two types of connections are not mutually exclusive because two clusters might have both nodes and edges in common.</p>
<p>&#x2003;We now broadly outline our proposed method, using the three main degrees of freedom of the Mapper algorithm to guide our discussion: the lens function, the cover, and the clustering algorithm.</p>
</statement>
</p>
<sec id="s4-1">
<title>4.1 Lens</title>
<p>The lens is a function <inline-formula id="inf41">
<mml:math id="m41">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> over the vertices, which acts as a filter that emphasizes certain features of the graph. Typically, d is a small integer&#x2014;in our case, <inline-formula id="inf42">
<mml:math id="m42">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1,2</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The choice of <italic>f</italic> depends on the graph properties that should be highlighted by the visualization. In this work, we leverage the recent progress in the field of graph representation learning and propose a parameterized lens function based on graph neural networks (GNNs). We thus consider a function <inline-formula id="inf43">
<mml:math id="m43">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>v</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, where <italic>g</italic> is a GNN with parameters <italic>&#x3b8;</italic> taking as input a graph <inline-formula id="inf44">
<mml:math id="m44">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with <italic>n</italic> nodes and node features <inline-formula id="inf45">
<mml:math id="m45">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. For visualization purposes, we often consider a function composition <inline-formula id="inf46">
<mml:math id="m46">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x2218;</mml:mo>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>v</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf47">
<mml:math id="m47">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>:</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mi>d</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is a dimensionality reduction algorithm like <italic>t</italic>-SNE (<xref ref-type="bibr" rid="B41">van der Maaten and Hinton, 2008</xref>).</p>
<p>Unlike the traditional graph theoretic lens functions proposed by <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref>, GNNs can naturally learn to integrate the features associated with the graph and its topology, while also scaling computationally to large, complex graphs. Additionally, visualisations can be flexibly tuned for the task of interest, by adjusting the lens <inline-formula id="inf48">
<mml:math id="m48">
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> through the loss function of the&#x20;model.</p>
</sec>
<sec id="s4-2">
<title>4.2 Cover</title>
<p>The cover <inline-formula id="inf49">
<mml:math id="m49">
<mml:mi>U</mml:mi>
</mml:math>
</inline-formula> determines the resolution of the output graph. For most purposes, we leverage the usual cover choice for Mapper, <inline-formula id="inf50">
<mml:math id="m50">
<mml:mrow>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. When <inline-formula id="inf51">
<mml:math id="m51">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, we use a set of equally sized overlapping intervals over the real line. When <inline-formula id="inf52">
<mml:math id="m52">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, this is generalized to a grid of overlapping cells in the real plane. Using more cells will produce more detailed visualisations, while higher overlaps between the cells will increase the connectivity of the output graph. When chosen suitably, these hyperparameters are a powerful mechanism for obtaining multi-scale visualisations.</p>
<p>Another choice that we employ for designing differentiable pooling algorithms is a set of RBF kernels, where the second arguments of kernel functions are distributed over the real line. We introduce this in detail in <xref ref-type="sec" rid="s5-2">Section&#x20;5.2</xref>.</p>
</sec>
<sec id="s4-3">
<title>4.3 Clustering</title>
<p>Clustering statistically approximates the (topological) connected components of the cover sets <inline-formula id="inf53">
<mml:math id="m53">
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Mapper does not require a particular type of clustering algorithm; however, when the input topological space <italic>X</italic> is a graph, a natural choice, also adopted by <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref>, is to take the connected components of the subgraphs induced by the vertices <inline-formula id="inf54">
<mml:math id="m54">
<mml:mrow>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Therefore, in principle, there is no need to resort to statistical clustering techniques.</p>
<p>However, relying on the topological connected components introduces certain challenges when the aim is to obtained a coarsened graph. Many real-world graphs comprise thousands of connected components, which is a lower bound to the number of connected components of the graph produced by <inline-formula id="inf55">
<mml:math id="m55">
<mml:mrow>
<mml:mtext>GM</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>. In the most extreme case, a graph containing only isolated nodes (namely, a point cloud) would never be coarsened by this procedure. Therefore, it is preferable to employ statistical techniques where the number of clusters can be specified. In our pooling experiments, we draw motivation from the relationship with other pooling algorithms and opt to assign all the nodes to the same cluster (which corresponds to no clustering).</p>
<p>We broadly refer to this instance of Structural Graph Mapper, with the choices described above, as Structural Deep Graph Mapper (SDGM). We summarize it step-by-step in the cartoon example in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> and encourage the reader to refer to&#x20;it.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>A cartoon illustration of Structural Deep Graph Mapper (SDGM) where, for simplicity, a graph neural network (GNN) approximates a &#x201c;height&#x201d; function over the nodes in the plane of the diagram. The input graph <bold>(A)</bold> is passed through the GNN, which maps the vertices of the graph to a real number (the height) <bold>(B&#x2013;C)</bold>. Given a cover <inline-formula id="inf56">
<mml:math id="m56">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> of the image of the GNN <bold>(C)</bold>, the edge-refined pull back cover <inline-formula id="inf57">
<mml:math id="m57">
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is computed <bold>(D&#x2013;E)</bold>. The dotted edges in <bold>(D)</bold> illustrate connections between the node clusters (strucutal connections), while the dotted boxes show nodes that appear in multiple clusters (semantic connections). The 1-skeleton of the nerve of the edge-refined pull back cover provides the pooled graph <bold>(F)</bold>.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g001.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<title>5 Structural Graph Mapper for Pooling</title>
<p>We begin this section by introducing several theoretical results, which provide a connection between our version of Mapper and other graph pooling algorithms. We then use these results to show how novel pooling algorithms can be designed.</p>
<sec id="s5-1">
<title>5.1 Relationship to Graph Pooling Methods</title>
<p>An early suggestion that Mapper could be suitable for graph pooling is given by the fact that it constitutes a generalization of binary spectral clustering, as observed by <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref>. This link is a strong indicator that Mapper can compute &#x201c;useful&#x201d; clusters for pooling. We formally restate this observation below and provide a short&#x20;proof.</p>
<p>
<statement content-type="proposition" id="proposition_5_1">
<label>
<bold>Proposition 5.1:</bold>
</label>
<p> Let <italic>L</italic> be the Laplacian of a graph <inline-formula id="inf58">
<mml:math id="m58">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf59">
<mml:math id="m59">
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> the eigenvector corresponding to the second lowest eigenvalue of <italic>L</italic>, also known as the Fiedler vector (<xref ref-type="bibr" rid="B15">Fiedler, 1973</xref>). Then, for a function <inline-formula id="inf60">
<mml:math id="m60">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>&#x211d;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, outputting the entry in the eigenvector <inline-formula id="inf61">
<mml:math id="m61">
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> corresponding to node <italic>v</italic> and a cover <inline-formula id="inf62">
<mml:math id="m62">
<mml:mrow>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b5;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, Mapper produces a spectral bi-partition of the graph for a sufficiently small positive <italic>&#x3f5;</italic>.</p>
<p>&#x2003;<italic>Proof</italic>: It is well known that the Fiedler vector can be used to obtain a &#x201c;good&#x201d; bi-partition of the graph based on the signature of the entries of the vector (i.e.,&#x20;<inline-formula id="inf63">
<mml:math id="m63">
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf64">
<mml:math id="m64">
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>) (please refer to <xref ref-type="bibr" rid="B11">Demmel. (1995)</xref> for a proof). Therefore, by setting <italic>&#x3f5;</italic> to a sufficiently small positive number <inline-formula id="inf65">
<mml:math id="m65">
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mi>v</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, the obtained pull back cover is a spectral bi-partition of the&#x20;graph.</p>
<p>&#x2003;The result above indicates that Mapper is a generalization of spectral clustering. As the latter is strongly related to min-cuts (<xref ref-type="bibr" rid="B26">Leskovec, 2016</xref>), the proposition also links them to Mapper. We now provide a much stronger result in that direction, showing that Structural Mapper is a generalization of all pooling methods based on soft-cluster assignments. Soft cluster assignment pooling methods use a soft cluster assignment matrix <inline-formula id="inf66">
<mml:math id="m66">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf67">
<mml:math id="m67">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> encodes the probability that node <italic>i</italic> belongs to cluster <italic>j</italic>, <italic>N</italic> is the number of nodes in the graph and <italic>K</italic> is the number of clusters. The adjacency matrix of the pooled graph is computed via <inline-formula id="inf68">
<mml:math id="m68">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Below, we prove a helpful result concerning this class of methods.</p>
</statement>
</p>
<p>
<statement content-type="lemma" id="lemma_5_1">
<label>
<italic>Lemma 5.1:</italic>
</label>
<p> The adjacency matrix <inline-formula id="inf69">
<mml:math id="m69">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> defines a pooled graph, where the nodes corresponding to clusters encoded by <inline-formula id="inf70">
<mml:math id="m70">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula> are connected if and only if there is a common edge (including self-loops) between&#x20;them.</p>
<p>&#x2003;<italic>Proof:</italic> Let <inline-formula id="inf71">
<mml:math id="m71">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>AS</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>. Then, <inline-formula id="inf72">
<mml:math id="m72">
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:msubsup>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> if and only if <inline-formula id="inf73">
<mml:math id="m73">
<mml:mrow>
<mml:msubsup>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> (node <italic>k</italic> does not belong to cluster <italic>i</italic>) or <inline-formula id="inf74">
<mml:math id="m74">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> (node <italic>k</italic> is not connected to any node belonging to cluster <italic>j</italic>), for all <italic>k</italic>. Therefore, <inline-formula id="inf75">
<mml:math id="m75">
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> if and only if there exists a node <italic>k</italic> such that <italic>k</italic> belongs to cluster <italic>i</italic> and <italic>k</italic> is connected to a node from cluster <italic>j</italic>. Due to the added self-loops, <inline-formula id="inf76">
<mml:math id="m76">
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> also holds if there is a node <italic>k</italic> belonging to both clusters.</p>
</statement>
</p>
<p>
<statement content-type="proposition" id="proposition_5_2">
<label>
<bold>Proposition 5.2:</bold>
</label>
<p> <inline-formula id="inf77">
<mml:math id="m77">
<mml:mrow>
<mml:mtext>GM</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> generalizes approaches based on soft-cluster assignments.</p>
<p>&#x2003;<italic>Proof:</italic> Let <inline-formula id="inf78">
<mml:math id="m78">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msub>
<mml:mo>&#x25b3;</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> be a soft cluster assignment function that maps the vertices to the <inline-formula id="inf79">
<mml:math id="m79">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>-dimensional unit simplex. We denote by <inline-formula id="inf80">
<mml:math id="m80">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> the probability that vertex <italic>v</italic> belongs to cluster <inline-formula id="inf81">
<mml:math id="m81">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf82">
<mml:math id="m82">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mi>K</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. This function can be completely specified by a cluster assignment matrix <inline-formula id="inf83">
<mml:math id="m83">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> with <inline-formula id="inf84">
<mml:math id="m84">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. This is the soft cluster assignment matrix computed by algorithms like minCut and DiffPool. Let <inline-formula id="inf85">
<mml:math id="m85">
<mml:mrow>
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> with <inline-formula id="inf86">
<mml:math id="m86">
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mo>&#x25b3;</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mi>x</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#xa0;</mml:mo>
<mml:mtext>and&#xa0;</mml:mtext>
<mml:msub>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x232a;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be an open cover of <inline-formula id="inf87">
<mml:math id="m87">
<mml:mrow>
<mml:msub>
<mml:mo>&#x25b3;</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Then consider an instance of <inline-formula id="inf88">
<mml:math id="m88">
<mml:mrow>
<mml:mtext>GM</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> where everything is assigned to a single cluster (i.e. same as no clustering). Clearly, there is a one-to-one correspondence between the vertices of <inline-formula id="inf89">
<mml:math id="m89">
<mml:mrow>
<mml:mtext>GM</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and the soft clusters. By Remark 1, the nodes corresponding to the clusters are connected only if the clusters share at least one node or at least one edge. Then, by <xref ref-type="statement" rid="lemma_5_1">
<bold>Lemma 5.1</bold>
</xref> the adjacency between the nodes of <inline-formula id="inf90">
<mml:math id="m90">
<mml:mrow>
<mml:mtext>GM</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">U</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> are the same as those described by <inline-formula id="inf91">
<mml:math id="m91">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Thus, the two pooled graphs are isomorphic.</p>
<p>&#x2003;We hope that this result will enable theoreticians to study pooling operators through the topological and statistical properties of Mapper (<xref ref-type="bibr" rid="B12">Dey et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B7">Carriere et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B8">Carri&#xe8;re and Oudot, 2018</xref>). At the same time, we encourage practitioners to take advantage of it and design new pooling methods in terms of a well-chosen lens function <italic>f</italic> and cover <inline-formula id="inf92">
<mml:math id="m92">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> for its image. To illustrate this idea and showcase the benefits of this new perspective over graph pooling methods, we introduce two Mapper-based operators.</p>
</statement>
</p>
</sec>
<sec id="s5-2">
<title>5.2 Differentiable Mapper Pooling</title>
<p>The main challenge for making pooling via Mapper differentiable is to differentiate through the pull back computation. To address this, we replace the cover of <italic>n</italic> overlapping intervals over the real line, described in the previous section, with a cover formed of overlapping RBF kernels <inline-formula id="inf93">
<mml:math id="m93">
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, evaluated at <italic>n</italic> fixed locations <inline-formula id="inf94">
<mml:math id="m94">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The overlap between these kernels can be adjusted through the scale <italic>&#x3b4;</italic> of the kernels. The soft cluster assignment matrix <inline-formula id="inf95">
<mml:math id="m95">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula> is given by the normalized kernel values:<disp-formula id="e1">
<mml:math id="m96">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where the lens function <inline-formula id="inf96">
<mml:math id="m97">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is a GNN layer, <italic>&#x3c3;</italic> is a sigmoid function ensuring the outputs are in <inline-formula id="inf97">
<mml:math id="m98">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf98">
<mml:math id="m99">
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mtext>l</mml:mtext>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the node features at layer <italic>l</italic>. Intuitively, the more closely a node is mapped to a location <inline-formula id="inf99">
<mml:math id="m100">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, the more it belongs to cluster <italic>i</italic>. By <xref ref-type="statement" rid="proposition_5_2">
<bold>Proposition 5.2</bold>
</xref>, we can compute the adjacency matrix of the pooled graph as <inline-formula id="inf100">
<mml:math id="m101">
<mml:mrow>
<mml:msup>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>; the features are given by <inline-formula id="inf101">
<mml:math id="m102">
<mml:mrow>
<mml:msup>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. This method can also be thought as a version of DiffPool (<xref ref-type="bibr" rid="B46">Ying et&#x20;al., 2018</xref>), where the low-entropy constraint on the cluster assignment distribution is topologically satisfied, since a point cannot be equally close to many other points on a line. Therefore, each node will belong only to a few clusters if the scale <italic>&#x3b4;</italic> is appropriately&#x20;set.</p>
<p>In <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> we show two examples of RBF kernel covers for the output space. The scale of the kernel, <italic>&#x3b4;</italic>, determines the amount of overlap between the cover elements. At bigger scales, there is a higher overlap between the clusters, as shown in the two plots. Because the line is one-dimensional, a point on the unit interval can only be part of a small number of clusters (that is, the kernels for which the value is greater than zero), assuming the scale <italic>&#x3b4;</italic> is not too large. Therefore, DMP can be seen as a DiffPool variant where the low-entropy constraint on the cluster assignment is satisfied topologically, rather than by a loss function enforcing&#x20;it.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Two covers of RBF kernels with different scales: <inline-formula id="inf102">
<mml:math id="m103">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.002</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf103">
<mml:math id="m104">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.01</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. The <italic>x</italic>-axis corresponds to the unit interval where the nodes of the graph are mapped. The <italic>y</italic>-axis represents the value of the normalized RBF kernels.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g002.tif"/>
</fig>
</sec>
<sec id="s5-3">
<title>5.3&#x20;Mapper-Based PageRank Pooling</title>
<p>To evaluate the effectiveness of the differentiable pooling operator, we also consider a fixed and scalable non-differentiable lens function <inline-formula id="inf104">
<mml:math id="m105">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>&#x211d;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> that is given by the normalized PageRank (PR) (<xref ref-type="bibr" rid="B32">Page et&#x20;al., 1999</xref>) of the nodes. The PageRank function assigns an importance value to each of the nodes based on their connectivity, according to the well-known recurrence relation:<disp-formula id="e2">
<mml:math id="m106">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x394;</mml:mi>
</mml:mover>
</mml:mrow>
<mml:mi>P</mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf105">
<mml:math id="m107">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the set of neighbors of the <italic>i</italic>th node in the graph and the damping factor was set to the typical value of <inline-formula id="inf106">
<mml:math id="m108">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.85</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. The resulting scores are values in <inline-formula id="inf107">
<mml:math id="m109">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> which reflect the probability of a random walk through the graph to end in a given node. Using the previously described overlapping intervals cover <inline-formula id="inf108">
<mml:math id="m110">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula>, the elements of the pull back cover form a soft cluster assignment matrix <inline-formula id="inf109">
<mml:math id="m111">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula>:<disp-formula id="e3">
<mml:math id="m112">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">I</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf110">
<mml:math id="m113">
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the <italic>n</italic>th cover set in the cover <inline-formula id="inf111">
<mml:math id="m114">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> of <inline-formula id="inf112">
<mml:math id="m115">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. It can be observed that the resulting clusters contain nodes with similar PageRank scores. Intuitively, this pooling method merges the (usually few) highly connected nodes in the graph, at the same time clustering the (typically many) dangling nodes that have a normalized PageRank score closer to zero. Therefore, this method favors the information attached to the most &#x201c;important&#x201d; nodes of the graph. The adjacency matrix of the pooled graph and the features are computed in the same manner as for&#x20;DMP.</p>
</sec>
<sec id="s5-4">
<title>5.4 Model</title>
<p>For the graph classification task, each example G is represented by a tuple <inline-formula id="inf113">
<mml:math id="m116">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf114">
<mml:math id="m117">
<mml:mi>X</mml:mi>
</mml:math>
</inline-formula> is the node feature matrix and <inline-formula id="inf115">
<mml:math id="m118">
<mml:mi>A</mml:mi>
</mml:math>
</inline-formula> is the adjacency matrix. Both our graph embedding and classification networks consist of a sequence of graph convolutional layers (<xref ref-type="bibr" rid="B24">Kipf and Welling, 2016</xref>); the <italic>l</italic>th layer operates on its input feature matrix as follows:<disp-formula id="e4">
<mml:math id="m119">
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mtext>l</mml:mtext>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mtext>l</mml:mtext>
</mml:msub>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mtext>l</mml:mtext>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf116">
<mml:math id="m120">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the adjacency matrix with self-loops, <inline-formula id="inf117">
<mml:math id="m121">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is the normalized node degree matrix, <inline-formula id="inf118">
<mml:math id="m122">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mtext>l</mml:mtext>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the weight matrix of the <italic>l</italic>-th layer and <italic>&#x3c3;</italic> is the activation function. After <italic>E</italic> layers, the embedding network simply outputs node features <inline-formula id="inf119">
<mml:math id="m123">
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>E</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, which are subsequently processed by a pooling layer to coarsen the graph. The classification network first takes as input node features of the Mapper-pooled graph,<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref> <inline-formula id="inf120">
<mml:math id="m124">
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mtext>MG</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and passes them through <inline-formula id="inf121">
<mml:math id="m125">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>C</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> graph convolutional layers. Following this, the network computes a graph summary given by the feature-wise node average and applies a final linear layer which predicts the class:<disp-formula id="e5">
<mml:math id="m126">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>soft</mml:mtext>
<mml:mi>max</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mtext>MG</mml:mtext>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mtext>MG</mml:mtext>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>C</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>We note that either of these pooling operators could readily be adapted to the recently proposed message passing simplicial neural networks (MPSNs) (<xref ref-type="bibr" rid="B4">Bodnar et&#x20;al., 2021</xref>) as a tool for coarsening simplicial complexes by dropping the 1-skeleton operator after computing the nerve. We leave this endeavor for future&#x20;work.</p>
</sec>
<sec id="s5-5">
<title>5.5 Complexity</title>
<p>The topology of the output graph can be computed in <inline-formula id="inf122">
<mml:math id="m127">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> time when using a cover over the unit interval, as described above. The output graph can be computed via (sparse) matrix multiplication given by <inline-formula id="inf123">
<mml:math id="m128">
<mml:mrow>
<mml:msup>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, to take advantage of GPU parallelism and compute the coefficients associated with the&#x20;edges.</p>
</sec>
</sec>
<sec id="s6">
<title>6 Pooling Experiments</title>
<sec id="s6-1">
<title>6.1 Tasks</title>
<p>We illustrate the applicability of the Mapper-GNN synthesis within a pooling framework, by evaluating DMP and MPR in a variety of settings: social (IMDB-Binary, IMDB-Multi, Reddit-Binary, Reddit-Multi-5k), citation networks (Collab) and chemical data (D&#x26;D, Mutag, NCI1, Proteins) (<xref ref-type="bibr" rid="B22">Kersting et&#x20;al., 2016</xref>).</p>
</sec>
<sec id="s6-2">
<title>6.2 Experimental Setup</title>
<p>We adopt a 10-fold cross-validation approach to evaluating the graph classification performance of DMP, MPR and other competitive state-of-the-art methods. The random seed was set to zero for all experiments (with respect to dataset splitting, shuffling and parameter initialisation), in order to ensure a fair comparison across architectures. All models were trained on a single Titan Xp GPU, using the Adam optimiser (<xref ref-type="bibr" rid="B23">Kingma and Ba, 2014</xref>) with early stopping on the validation set, for a maximum of 30 epochs. We report the classification accuracy using 95% confidence intervals calculated for a population size of 10 (the number of folds).</p>
</sec>
<sec id="s6-3">
<title>6.3 Baselines</title>
<p>We compare the performance of DMP and MPR to two other pooling methods that we identify mathematical connections with: minCUT (<xref ref-type="bibr" rid="B3">Bianchi et&#x20;al., 2019</xref>) and DiffPool (<xref ref-type="bibr" rid="B46">Ying et&#x20;al., 2018</xref>). Additionally, we include Graph U-Net (<xref ref-type="bibr" rid="B17">Gao and Ji, 2019</xref>) in our evaluation, as it has been shown to yield competitive results while performing pooling from the perspective of a learnable node ranking; we denote this approach by Top-<italic>k</italic> in the remainder of this section. The non-pooling baselines evaluated are the WL kernel (<xref ref-type="bibr" rid="B36">Shervashidze et&#x20;al., 2011</xref>), a &#x201c;flat&#x201d; model (2&#xa0;MP steps and global average pooling) and an average-readout linear classifier.</p>
<p>We optimize both DMP and MPR with respect to the cover cardinality <italic>n</italic>, the cover overlap (<italic>&#x3b4;</italic> for DMP, overlap percentage <italic>g</italic> for MPR), learning rate and hidden size. The Top-<italic>k</italic> architecture is evaluated using the code provided in the official repository, where separate configurations are defined for each of the benchmarks. The minCUT architecture is represented by the sequence of operations described by <xref ref-type="bibr" rid="B3">Bianchi et&#x20;al. (2019)</xref>: MP(32)-pooling-MP(32)-pooling-MP(32)-GlobalAvgPool, followed by a linear softmax classifier. The MP(32) block represents a message-passing operation performed by a graph convolutional layer with 32 hidden units:<disp-formula id="e6">
<mml:math id="m129">
<mml:mrow>
<mml:msup>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>Re</mml:mi>
<mml:mtext>LU</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msup>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where <inline-formula id="inf124">
<mml:math id="m130">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mn>&#x207B;1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mi>A</mml:mi>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mn>&#x207B;1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the symmetrically normalized adjacency matrix and <inline-formula id="inf125">
<mml:math id="m131">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are learnable weight matrices representing the message passing and skip-connection operations within the layer. The DiffPool model follows the same sequence of&#x20;steps.</p>
</sec>
<sec id="s6-4">
<title>6.4 Evaluation Procedure</title>
<p>The best procedure for evaluating GNN pooling layers remains a matter of debate in the graph machine learning community. One may consider a fixed GNN architecture with a different pooling layer for each baseline; alternatively, the whole architecture can be optimized for each type of pooling layer. The first option, more akin to the typical procedure for evaluating pooling layers in CNNs on image domains, is used in papers like minCUT (<xref ref-type="bibr" rid="B3">Bianchi et&#x20;al., 2019</xref>). The second option is more particular to GNNs and it is employed, for instance, by DiffPool (<xref ref-type="bibr" rid="B46">Ying et&#x20;al., 2018</xref>). In this work, we choose the latter option for our evaluation.</p>
<p>We argue that for non-Euclidean domains, such as graph ones, the relationships between the nodes of the pooled graph and the ones of the input graph are semantically different from one pooling method to another. This is because pooling layers have different behaviors and may interact in various ways with the interleaved convolutional layers. Therefore, evaluating the same architecture with only the pooling layer(s) swapped is restrictive and might hide the benefits of certain operators. For example, Top-<italic>k</italic> pooling (one of our baselines) simply drops nodes from the input graph, instead of computing a smaller number of clusters from all nodes. Assume we fix the pooled graph to have only one node. Then Top-<italic>k</italic> would only select one node from the original graph. In contrast, DiffPool would combine the information from the entire graph in a single node. DiffPool would thus have access to additional information with respect to Top-<italic>k</italic>, so it would be unfair to conclude that one model is better than the other in such a setting. These differences implicitly affect the features of the output graph at that layer, which in turn affect the next pooling layer, as its computation depends on the features. This can have a cascading effect on the overall performance of the model. One might also argue that this procedure makes the evaluated models more homogeneous and, therefore, easier to compare. While this is true, the conclusions one can draw from such a comparison are much more limited because they are restricted to the particular architecture that was chosen.</p>
<p>For this reason, we have either run models with hyperparameters as previously reported by the authors, or optimized them ourselves end-to-end, where applicable. The best-performing configurations were (Appendix A details the hyperparameter search):<list list-type="simple">
<list-item>
<p>&#x2022; MPR&#x2014;learning rate <inline-formula id="inf126">
<mml:math id="m132">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, hidden sizes <inline-formula id="inf127">
<mml:math id="m133">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>128,128</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (except for <inline-formula id="inf128">
<mml:math id="m134">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>64,64</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> on IMDB-Binary and <inline-formula id="inf129">
<mml:math id="m135">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>32,32</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> on IMDB-Multi), interval overlap <inline-formula id="inf130">
<mml:math id="m136">
<mml:mrow>
<mml:mn>25</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> on Proteins, Reddit-Binary, Mutag, IMDB-Multi and <inline-formula id="inf131">
<mml:math id="m137">
<mml:mrow>
<mml:mn>10</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> otherwise, batch size 32 (except for 128 on Proteins)&#x20;and:</p>
</list-item>
<list-item>
<p>&#x2022; D&#x26;D, Collab, Reddit-Binary, Reddit-Multi-5K: cover sizes <inline-formula id="inf132">
<mml:math id="m138">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>20,5</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</list-item>
<list-item>
<p>&#x2022; Proteins, NCI1: cover sizes <inline-formula id="inf133">
<mml:math id="m139">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>8,2</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</list-item>
<list-item>
<p>&#x2022; Mutag, IMDB-Binary, IMDB-Multi: cover sizes <inline-formula id="inf134">
<mml:math id="m140">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>4,1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</list-item>
<list-item>
<p>&#x2022; DMP&#x2014;learning rate <inline-formula id="inf135">
<mml:math id="m141">
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, hidden sizes <inline-formula id="inf136">
<mml:math id="m142">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>128,128</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf137">
<mml:math id="m143">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtext>cluster</mml:mtext>
<mml:mo>_</mml:mo>
<mml:mtext>size</mml:mtext>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>&#x20;and:</p>
</list-item>
<list-item>
<p>&#x2022; Proteins: cover sizes <inline-formula id="inf138">
<mml:math id="m144">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>8,2</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, batch size&#x20;128;</p>
</list-item>
<list-item>
<p>&#x2022; Others: cover sizes <inline-formula id="inf139">
<mml:math id="m145">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>20,5</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, batch size&#x20;32;</p>
</list-item>
<list-item>
<p>&#x2022; Top-<italic>k</italic>&#x2014;specific dataset configurations, as provided in the official GitHub repository<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref>;</p>
</list-item>
<list-item>
<p>&#x2022; minCUT&#x2014;learning rate <inline-formula id="inf140">
<mml:math id="m146">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mtext>e</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, same architecture as reported by the authors in the original work (<xref ref-type="bibr" rid="B3">Bianchi et&#x20;al., 2019</xref>);</p>
</list-item>
<list-item>
<p>&#x2022; DiffPool&#x2014;learning rate <inline-formula id="inf141">
<mml:math id="m147">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mtext>e</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, hidden size 32, two pooling steps, pooling ratio <inline-formula id="inf142">
<mml:math id="m148">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> for D&#x26;D, Proteins, Collab and Reddit-Binary and <inline-formula id="inf143">
<mml:math id="m149">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.25</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> for Mutag, NCI1, IMDB-Binary, IMDB-Multi and Reddit-Multi-5K, global average mean readout layer, with the exception of Collab and Reddit-Binary, where the hidden size was&#x20;128;</p>
</list-item>
<list-item>
<p>&#x2022; Flat: hidden size&#x20;32.</p>
</list-item>
</list>
</p>
</sec>
<sec id="s6-5">
<title>6.5 Pooling Results</title>
<p>The graph classification performance obtained by these models is reported in <xref ref-type="table" rid="T1">Table&#x20;1</xref>. We reveal that MPR ranks either first or second on all social datasets, or achieves accuracy scores within <inline-formula id="inf144">
<mml:math id="m150">
<mml:mrow>
<mml:mn>0.5</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> of the best-performing model. This result confirms that PageRank-based pooling exploits the power-law distributions in this domain. The performance of DMP is similar on social data and generally higher on molecular graphs. We attribute this to the fact that all nodes in molecular graphs tend to have a similar PageRank score&#x2014;MPR is therefore likely to assign all nodes to one cluster, effectively performing a readout. In this domain, DMP performs particularly well on Mutag, where it is second-best and improves by <inline-formula id="inf145">
<mml:math id="m151">
<mml:mrow>
<mml:mn>3.7</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> over MPR, showing the benefits of having a differentiable lens in challenging data settings. Overall, MPR achieves the best accuracy on two datasets (D&#x26;D, Collab) and the next best result on three more (Proteins, Reddit-Binary and Reddit-Multi-5k). DMP improves on MPR by less than 1% on NCI1, Proteins, IDMB-Binary and IMDB-Multi, showing the perhaps surprising strength of the simple, fixed-lens pooling MPR operator.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Results obtained on classification benchmarks. Accuracy measures with 95% confidence intervals are reported. The highest result is bolded and the second highest is underlined. The first columns four are molecular graphs, while the others are social graphs. Our models perform competitively with other state of the art models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">D&#x26;D</th>
<th align="center">Mutag</th>
<th align="center">NCI1</th>
<th align="center">Proteins</th>
<th align="center">Collab</th>
<th align="center">IMDB-B</th>
<th align="center">IMDB-M</th>
<th align="center">Reddit-B</th>
<th align="center">Reddit-5k</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DMP (ours)</td>
<td align="center">
<inline-formula id="inf146">
<mml:math id="m152">
<mml:mrow>
<mml:mn>77.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.6</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf147">
<mml:math id="m153">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>84.0</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>8.6</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf148">
<mml:math id="m154">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>70.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.2</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf149">
<mml:math id="m155">
<mml:mrow>
<mml:mn>75.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf150">
<mml:math id="m156">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>81.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.2</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf151">
<mml:math id="m157">
<mml:mrow>
<mml:mn>73.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf152">
<mml:math id="m158">
<mml:mrow>
<mml:mn>50.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf153">
<mml:math id="m159">
<mml:mrow>
<mml:mn>86.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf154">
<mml:math id="m160">
<mml:mrow>
<mml:mn>51.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">MPR (ours)</td>
<td align="center">
<inline-formula id="inf155">
<mml:math id="m161">
<mml:mrow>
<mml:mn>78.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf156">
<mml:math id="m162">
<mml:mrow>
<mml:mn>80.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf157">
<mml:math id="m163">
<mml:mrow>
<mml:mn>69.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf158">
<mml:math id="m164">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>75.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.2</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf159">
<mml:math id="m165">
<mml:mrow>
<mml:mn>81.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf160">
<mml:math id="m166">
<mml:mrow>
<mml:mn>73.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf161">
<mml:math id="m167">
<mml:mrow>
<mml:mn>50.6</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf162">
<mml:math id="m168">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>86.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.8</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf163">
<mml:math id="m169">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>52.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.6</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">Top-<italic>k</italic>
</td>
<td align="center">
<inline-formula id="inf164">
<mml:math id="m170">
<mml:mrow>
<mml:mn>75.1</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf165">
<mml:math id="m171">
<mml:mrow>
<mml:mn>82.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf166">
<mml:math id="m172">
<mml:mrow>
<mml:mn>67.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf167">
<mml:math id="m173">
<mml:mrow>
<mml:mn>74.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf168">
<mml:math id="m174">
<mml:mrow>
<mml:mn>75.0</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf169">
<mml:math id="m175">
<mml:mrow>
<mml:mn>69.6</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf170">
<mml:math id="m176">
<mml:mrow>
<mml:mn>45.0</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf171">
<mml:math id="m177">
<mml:mrow>
<mml:mn>79.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>7.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf172">
<mml:math id="m178">
<mml:mrow>
<mml:mn>48.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">minCUT</td>
<td align="center">
<inline-formula id="inf173">
<mml:math id="m179">
<mml:mrow>
<mml:mn>77.6</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf174">
<mml:math id="m180">
<mml:mrow>
<mml:mn>82.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf175">
<mml:math id="m181">
<mml:mrow>
<mml:mn>68.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf176">
<mml:math id="m182">
<mml:mrow>
<mml:mn>73.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.9</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf177">
<mml:math id="m183">
<mml:mrow>
<mml:mn>79.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf178">
<mml:math id="m184">
<mml:mrow>
<mml:mn>70.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf179">
<mml:math id="m185">
<mml:mrow>
<mml:mn>50.6</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf180">
<mml:math id="m186">
<mml:mrow>
<mml:mn>87.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>5.0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf181">
<mml:math id="m187">
<mml:mrow>
<mml:mn>52.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">DiffPool</td>
<td align="center">
<inline-formula id="inf182">
<mml:math id="m188">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>77.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.4</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf183">
<mml:math id="m189">
<mml:mrow>
<mml:mn>94.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>7.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf184">
<mml:math id="m190">
<mml:mrow>
<mml:mn>68.1</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf185">
<mml:math id="m191">
<mml:mrow>
<mml:mn>74.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf186">
<mml:math id="m192">
<mml:mrow>
<mml:mn>81.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf187">
<mml:math id="m193">
<mml:mrow>
<mml:mn>72.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf188">
<mml:math id="m194">
<mml:mrow>
<mml:mn>50.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf189">
<mml:math id="m195">
<mml:mrow>
<mml:mn>79.0</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf190">
<mml:math id="m196">
<mml:mrow>
<mml:mn>50.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">WL</td>
<td align="center">
<inline-formula id="inf191">
<mml:math id="m197">
<mml:mrow>
<mml:mn>77.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.6</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf192">
<mml:math id="m198">
<mml:mrow>
<mml:mn>74.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf193">
<mml:math id="m199">
<mml:mrow>
<mml:mn>76.4</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf194">
<mml:math id="m200">
<mml:mrow>
<mml:mn>74.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf195">
<mml:math id="m201">
<mml:mrow>
<mml:mn>78.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf196">
<mml:math id="m202">
<mml:mrow>
<mml:mn>72.1</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>3.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf197">
<mml:math id="m203">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>50.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.9</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf198">
<mml:math id="m204">
<mml:mrow>
<mml:mn>66.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>10.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf199">
<mml:math id="m205">
<mml:mrow>
<mml:mn>49.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">Flat</td>
<td align="center">
<inline-formula id="inf200">
<mml:math id="m206">
<mml:mrow>
<mml:mn>69.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf201">
<mml:math id="m207">
<mml:mrow>
<mml:mn>71.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf202">
<mml:math id="m208">
<mml:mrow>
<mml:mn>65.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf203">
<mml:math id="m209">
<mml:mrow>
<mml:mn>70.2</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.6</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf204">
<mml:math id="m210">
<mml:mrow>
<mml:mn>80.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf205">
<mml:math id="m211">
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mn>73.6</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>4.2</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf206">
<mml:math id="m212">
<mml:mrow>
<mml:mn>48.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf207">
<mml:math id="m213">
<mml:mrow>
<mml:mn>70.0</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>10.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf208">
<mml:math id="m214">
<mml:mrow>
<mml:mn>49.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">Avg-MLP</td>
<td align="center">
<inline-formula id="inf209">
<mml:math id="m215">
<mml:mrow>
<mml:mn>63.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf210">
<mml:math id="m216">
<mml:mrow>
<mml:mn>69.1</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>5.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf211">
<mml:math id="m217">
<mml:mrow>
<mml:mn>55.7</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf212">
<mml:math id="m218">
<mml:mrow>
<mml:mn>61.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.7</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf213">
<mml:math id="m219">
<mml:mrow>
<mml:mn>74.8</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf214">
<mml:math id="m220">
<mml:mrow>
<mml:mn>71.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.9</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf215">
<mml:math id="m221">
<mml:mrow>
<mml:mn>49.5</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf216">
<mml:math id="m222">
<mml:mrow>
<mml:mn>53.6</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>6.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf217">
<mml:math id="m223">
<mml:mrow>
<mml:mn>45.9</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1.6</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s7">
<title>7 Mapper for Visualisations</title>
<p>Graph pooling methods and summarized graph visualisations methods can be seen as two sides of the same coin, since both aim to condense the information in the graph. We now turn our attention to the latter.</p>
<sec id="s7-1">
<title>7.1 Visualisations in Supervised Learning</title>
<p>The first application of DGM is in a supervised learning context,&#x20;where <inline-formula id="inf218">
<mml:math id="m224">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is trained via a cross entropy loss function to classify&#x20;the&#x20;nodes of the graph. When the classification is binary, <inline-formula id="inf219">
<mml:math id="m225">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> outputs the probability that a node belongs to the&#x20;positive class. This probability acts directly as the&#x20;parameterization of the graph nodes. An example is shown in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> (left) for a synthetic dataset a network formed of spammers and non-spammers. Spammers are highly connected to many other nodes in the network, whereas non-spammers generally have fewer neighbors. For the lens function, we use a Graph Convolutional Network (GCN) (<xref ref-type="bibr" rid="B24">Kipf and Welling, 2016</xref>) with four layers (with <inline-formula id="inf220">
<mml:math id="m226">
<mml:mrow>
<mml:mn>32,64,128,128</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hidden units) and ReLU activations trained to classify the nodes of the graph. For the spammer graph, the lens is given by the predicted spam probability of each node and the cover consists of 10 intervals over <inline-formula id="inf221">
<mml:math id="m227">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, with <inline-formula id="inf222">
<mml:math id="m228">
<mml:mrow>
<mml:mn>10</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> overlap.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>SDGM visualization using as a lens function the GNN-predicted probability of a node in the network to be Spam. The <bold>(A)</bold> is colored with the average predicted spam probability in each cluster, whereas the <bold>(B)</bold> is colored by the proportion of true spammers in each&#x20;node.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g003.tif"/>
</fig>
<p>Through the central cluster node, the SDGM visualization correctly shows how spammers occupy an essential place in the network, while non-spammers tend to form many smaller disconnected communities. When labels are available, we also produce visualisations augmented with ground-truth information. These visualisations can provide a label-driven understanding of the graph. For instance, in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> (right) we color each node of the SDGM visualization according to the most frequent class in the corresponding cluster. This second visualization, augmented with the ground-truth information, can also be used to compare with the model predictions.</p>
</sec>
<sec id="s7-2">
<title>7.2 Visualization in Unsupervised Learning</title>
<p>The second application corresponds to an unsupervised learning scenario, where the challenge is obtaining a parameterization of the graph in the absence of labels. This is the typical use case for unsupervised graph representation learning models (<xref ref-type="bibr" rid="B9">Chami et&#x20;al., 2020</xref>). The approach we follow is to train a model to learn node embeddings in <inline-formula id="inf223">
<mml:math id="m229">
<mml:mrow>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>d</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> (in our experiments, <inline-formula id="inf224">
<mml:math id="m230">
<mml:mrow>
<mml:msup>
<mml:mi>d</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>512</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>), which can be reduced, as before, to a low-dimensional space via a dimensionality reduction method <italic>r</italic>. Unsupervised visualisations can be found in the qualitative evaluation in <xref ref-type="sec" rid="s7-3">Section&#x20;7.3</xref>.</p>
</sec>
<sec id="s7-3">
<title>7.3 Qualitative Evaluation</title>
<p>In this section, we qualitatively compare SDGM against the two best-performing graph theoretic lens functions proposed by <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref>, on the Cora and CiteSeer (<xref ref-type="bibr" rid="B35">Sen et&#x20;al., 2008</xref>) and PubMed (<xref ref-type="bibr" rid="B45">Yang et&#x20;al., 2016</xref>) citation networks. Namely, we compare against a PageRank (<xref ref-type="bibr" rid="B32">Page et&#x20;al., 1999</xref>) lens function and a graph density function <inline-formula id="inf225">
<mml:math id="m231">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, where <italic>D</italic> is the distance matrix of the graph. For SDGM, we use a composition of an unsupervised Deep Graph Infomax (DGI) (<xref ref-type="bibr" rid="B42">Veli&#x10d;kovi&#x107; et&#x20;al., 2018</xref>) model <inline-formula id="inf226">
<mml:math id="m232">
<mml:mrow>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mn>512</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and a dimensionality reduction function <inline-formula id="inf227">
<mml:math id="m233">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>:</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mn>512</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> based on <italic>t</italic>-SNE. To aid the comparison, we mark the nodes with the color of the most frequent class in the corresponding cluster. Additionally, we include a Graphviz (<xref ref-type="bibr" rid="B16">Gansner and North, 2000</xref>) plot of the full graph. We carefully fine-tuned the covers for each combination of model and&#x20;graph.</p>
<p>As depicted by <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, SDGM successfully summarizes many of the properties of the graphs that are also reflected by full graph visualisations. For instance, on Cora, Genetic Algorithms (in dark orange) are shown to be primarily connected to Reinforcement Learning (orange). At the same time, related classes that largely overlap in the full visualisation&#x2014;Probabilistic Methods and Neural Networks (NNs) on Cora or Information Retrieval (IR) and ML on CiteSeer&#x2014;are connected in the SDGM plot. In contrast, the baselines do not have the same level of granularity and fail to capture many such properties. Both PageRank and the graph density function tend to focus on the classes with the highest number of nodes, such as the IR class on CiteSeer or the NNs class on Cora, while largely de-emphasizing other classes.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Qualitative comparison between SDGM (first column), Mapper with an RBF graph density function (<xref ref-type="bibr" rid="B20">Hajij et&#x20;al., 2018</xref>) (second), and Mapper with a PageRank function (<xref ref-type="bibr" rid="B20">Hajij et&#x20;al., 2018</xref>) (third). The Graphviz visualization of the graph cores (fourth column) are added for reference. The rows show plots for Cora, CiteSeer, and PubMed, respectively. The graphs are colored based on the most frequent class in each cluster to aid the comparison. SDGM with unsupervised lens implicitly makes all dataset classes appear in the visualization more clearly separated. This does not happen in the baseline visualisations, which mainly focus on the class with the highest number of nodes from each&#x20;graph.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g004.tif"/>
</fig>
<sec id="s7-3-1">
<title>7.3.1 Limitations</title>
<p>The proposed visualisations also present certain limitations. In an unsupervised learning setting, in the absence of any labels or attributes for coloring the graph, the nodes have to be colored based on a colourmap associated with the abstract embedding space, thus affecting the interpretability of the visualisations. In contrast, even though the graph theoretic lens functions produce lower quality visualisations, their semantics are clearly understood mathematically. This is, however, a drawback shared even by some of the most widely used data visualization methods, such as <italic>t</italic>-SNE or UMAP (<xref ref-type="bibr" rid="B30">McInnes et&#x20;al., 2018</xref>). In what follows, we present additional visualisations and ablation studies.</p>
</sec>
</sec>
<sec id="s7-4">
<title>7.4 Ablation Study for Dimensionality Reduction</title>
<p>We study how the choice of the dimensionality reduction method for the unsupervised visualisations affects the output. To test this, we consider the following dimensionality reduction methods: <italic>t</italic>-SNE (<xref ref-type="bibr" rid="B41">van der Maaten and Hinton, 2008</xref>), UMAP (<xref ref-type="bibr" rid="B30">McInnes et&#x20;al., 2018</xref>), IsoMap (<xref ref-type="bibr" rid="B39">Tenenbaum et&#x20;al., 2000</xref>) and PCA. We use the same model as in <xref ref-type="sec" rid="s7-2">Section 7.2</xref> and <xref ref-type="sec" rid="s8">Section 8</xref>. 2D cells for the cover of all models. The overlap was set after fine-tuning to 0.2 for <italic>t</italic>-SNE and UMAP, and to 0.1 for the other two models. <xref ref-type="fig" rid="F5">Figure&#x20;5</xref> displays the four visualisations. As expected, <italic>t</italic>-SNE and UMAP produce more visually pleasing outputs, due to their superior ability to capture variation in the GNN embedding space. However, the features highlighted by all visualisations are largely similar, generally indicating the same binary relations between clusters. This demonstrates that the GNN embedding space is robust to the choice of the dimensionality reduction method.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Ablation for dimensionality reduction methods; left&#x2013;right, top&#x2013;bottom: <italic>t</italic>-SNE, PCA, Isomap, UMAP. While <italic>t</italic>-SNE and UMAP produce slightly better visualisations, the graph features displayed by the visualisations are roughly consistent across all of the four dimensionality reduction techniques.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g005.tif"/>
</fig>
</sec>
<sec id="s7-5">
<title>7.5 Ablation for the Unsupervised Lens</title>
<p>To better understand the impact of GNNs on improving the quality of the Mapper visualisations, we perform an ablation study on the type of unsupervised lens functions used within Mapper. The first model we consider is simply the identity function taking as input only graph features. The second model is a randomly initialized DGI model. Despite the apparent simplicity of a randomly initialized model, it was shown that such a method produces reasonably good embeddings, often outperforming other more sophisticated baselines (<xref ref-type="bibr" rid="B42">Veli&#x10d;kovi&#x107; et&#x20;al., 2018</xref>). Finally, we use our trained DGI model from <xref ref-type="sec" rid="s7-2">Section 7.2</xref>. For all models, we perform a <italic>t</italic>-SNE reduction of their embedding space to obtain a 2D output space and use 81 overlapping cells that cover this space. An overlap of 0.2 is used across all models.</p>
<p>The three resulting visualisations are depicted in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>. The identity model and the untrained DGI model do not manage to exploit the dataset structure and neither does particularly well. In contrast, the trained DGI model emphasizes all the classes in the visualization, together with their main interactions.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Ablation for different types of unsupervised lenses (identity, untrained DGI, trained DGI). The trained DGI model significantly improves the quality of the visualisations.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g006.tif"/>
</fig>
</sec>
<sec id="s7-6">
<title>7.6 Hierarchical Visualisations</title>
<p>One of the most powerful features of Mapper is the ability to produce multi-resolution visualisations through the flexibility offered by the cover hyperparameters. As described in <xref ref-type="sec" rid="s4">Section 4</xref>, having a higher number of cells covering the output space results in more granular visualisations containing more nodes, while a higher overlap between these cells results in increased connectivity. We highlight these trade-offs in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>, where we visualize the Cora citation network using nine combinations of cells and overlaps. These kinds of hierarchical visualisations can help one identify the persistent features of the graph. For instance, when inspecting the plots that use <inline-formula id="inf228">
<mml:math id="m234">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>64</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> cells, the connections between the light blue class and the yellow class persist for all 3 degrees of overlap, which indicates that this is a persistent feature of the graph. In contrast, the connection between the red and orange classes is relatively reduced (<inline-formula id="inf229">
<mml:math id="m235">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.25</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>) or none (<inline-formula id="inf230">
<mml:math id="m236">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>) for low values of overlap, but it clearly appears at <inline-formula id="inf231">
<mml:math id="m237">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.35</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in the top-right corner, suggesting that the semantic similarity between the two classes is very scale-sensitive (that is, less persistent).</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Hierarchical visualisations of the Cora citation network using various number of cover cells and degrees of overlap. Rows <bold>(top&#x2013;bottom)</bold> have a different overlap (<italic>g</italic>) between intervals: <inline-formula id="inf232">
<mml:math id="m238">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf233">
<mml:math id="m239">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.25</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf234">
<mml:math id="m240">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.35</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>; columns (left&#x2013;right): <inline-formula id="inf235">
<mml:math id="m241">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>16</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf236">
<mml:math id="m242">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>64</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf237">
<mml:math id="m243">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>256</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g007.tif"/>
</fig>
</sec>
<sec id="s7-7">
<title>7.7 The Importance of Capturing Structural Information</title>
<p>In this section, we revisit the synthetic spammer dataset to illustrate the importance of capturing structural information via the edge-refined pull back cover operator. To that end, we compare SDGM with a version using the usual refined pull back cover as in <xref ref-type="bibr" rid="B20">Hajij et&#x20;al. (2018)</xref>, while using the same lens function for both (a GCN classifier). We refer to the latter as DGM. The visualisations produced by the two models are included in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>. We note that while both models capture the large cluster of spammers at the center of the network and the smaller communities of non-spammers, DGM does not capture the structural relationships between spammers and non spammers since it encodes only semantic relations.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>DGM <bold>(A)</bold> vs SDGM <bold>(B)</bold> visualization of the sythetic spammer datasets. DGM does not capture important relational information between spammers and non-spammers.</p>
</caption>
<graphic xlink:href="fdata-04-680535-g008.tif"/>
</fig>
</sec>
</sec>
<sec id="s8">
<title>8 Conclusion</title>
<p>We have introduced Deep Graph Mapper, a topologically grounded method for producing informative graph visualisations with the help of GNNs. We have shown these visualisations are not only useful for understanding various graph properties, but can also aid in visually identifying classification mistakes. Additionally, we have proved that Mapper is a generalization of soft cluster assignment methods, effectively providing a bridge between graph pooling and the TDA literature. Based on this connection, we have proposed two Mapper-based pooling operators: a simple one that scores nodes using PageRank and a differentiable one that uses RBF kernels to simulate the cover. Our experiments show that both layers yield architectures competitive with several state-of-the-art methods on graph classification benchmarks.</p>
</sec>
</body>
<back>
<sec id="s9">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="s10">
<title>Author Contributions</title>
<p>CB and CC have contributed equally in designing the model. CB has performed the visualization experiments and proved the theoretical results. CC has performed the pooling experiments. PL is the senior author.</p>
</sec>
<sec id="s12">
<title>Funding</title>
<p>CC is supported by EPSRC NRAG/465 NERC CDT Dream (grant no. NE/M009009/1). PL is funded by EPSRC.</p>
</sec>
<sec sec-type="COI-statement" id="s11">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>We would like to thank both reviewers for their constructive feedback and useful iterations on the manuscript. We would like to thank Petar Veli&#x10d;kovi&#x107;, Ben Day, Felix Opolka, Simeon Spasov, Alessandro Di Stefano, Duo Wang, Jacob Deasy, Ramon Vi&#xf1;as, Alex Dumitru and Teodora Reu for their constructive comments. We are also grateful to Teo Stoleru for helping with the diagrams.</p>
</ack>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>Code to reproduce models and experimental results is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/crisbodnar/dgm">https://github.com/crisbodnar/dgm</ext-link>.</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>Note that one or more {embedding <inline-formula id="inf238">
<mml:math id="m244">
<mml:mo>&#x2192;</mml:mo>
</mml:math>
</inline-formula> pooling} operations may be sequentially performed in the pipeline.</p>
</fn>
<fn id="FN3">
<label>3</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/HongyangGao/Graph-U-Nets/blob/48aa171b16964a2466fceaf4cb06fc940d649294/run_GUNet.sh">https://github.com/HongyangGao/Graph-U-Nets/blob/48aa171b16964a2466fceaf4cb06fc940d649294/run_GUNet.sh</ext-link>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Batagelj</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Didimo</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Liotta</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Palladino</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Patrignani</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Visual Analysis of Large Graphs Using. x.Y)-Clustering and Hybrid Visualizations</article-title>. In (<year>2010</year>). <conf-name>IEEE Pacific Visualization Symposium (PacificVis)</conf-name>. <fpage>209</fpage>&#x2013;<lpage>216</lpage>. </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beck</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Burch</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Diehl</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Weiskopf</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A Taxonomy and Survey of Dynamic Graph Visualization</article-title>. <source>Comp. Graphics Forum</source> <volume>36</volume>, <fpage>133</fpage>&#x2013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1111/cgf.12791</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bianchi</surname>
<given-names>F. M.</given-names>
</name>
<name>
<surname>Grattarola</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Alippi</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Mincut Pooling in Graph Neural Networks</source> (<comment>arXiv preprint arXiv:1907.00481</comment>.</citation>
</ref>
<ref id="B4">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bodnar</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Frasca</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y. G.</given-names>
</name>
<name>
<surname>Otter</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Mont&#xfa;far</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Li&#xf2;</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <source>Weisfeiler and lehman Go Topological: Message Passing Simplicial Networks</source> (<comment>arXiv preprint arXiv:2103.03212</comment>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bruna</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zaremba</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Szlam</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lecun</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Spectral Networks and Locally Connected Networks on Graphs</article-title>. <source>ICLR</source>. </citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Cangea</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Veli&#x10d;kovi&#x107;</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Jovanovi&#x107;</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kipf</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Li&#xf2;</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Towards Sparse Hierarchical Graph Classifiers</source> (<comment>arXiv preprint arXiv:1811.01287</comment>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carriere</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Michel</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Oudot</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Statistical Analysis and Parameter Selection for Mapper</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>19</volume>, <fpage>478</fpage>&#x2013;<lpage>516</lpage>. </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carri&#xe8;re</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Oudot</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Structure and Stability of the One-Dimensional Mapper</article-title>. <source>Found. Comput. Math.</source> <volume>18</volume>, <fpage>1333</fpage>&#x2013;<lpage>1396</lpage>. <pub-id pub-id-type="doi">10.1007/s10208-017-9370-z</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Chami</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Abu-El-Haija</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Perozzi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>R&#xe9;</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Machine Learning on Graphs: A Model and Comprehensive Taxonomy</source> (<comment>ArXiv abs/2005.03675</comment>.</citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Chazal</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Michel</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2017</year>). <source>An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists</source> (<comment>arXiv preprint arXiv:1710.04019</comment>.</citation>
</ref>
<ref id="B11">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Demmel</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1995</year>). <source>UC Berkeley CS267 - Lecture 20: Partitioning Graphs without Coordinate Information II</source>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dey</surname>
<given-names>T. K.</given-names>
</name>
<name>
<surname>M&#xe9;moli</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers</article-title>. In <conf-name>Symposium on Computational Geometry</conf-name>. </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Dunne</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shneiderman</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Motif Simplification</article-title>. In <conf-name>Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery), CHI &#x2019;13</publisher-name>, <fpage>3247</fpage>&#x2013;<lpage>3256</lpage>. <pub-id pub-id-type="doi">10.1145/2470654.2466444</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dwyer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Riche</surname>
<given-names>N. H.</given-names>
</name>
<name>
<surname>Marriott</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Mears</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Edge Compression Techniques for Visualization of Dense Directed Graphs</article-title>. <source>IEEE Trans. Vis. Comput. Graphics</source> <volume>19</volume>, <fpage>2596</fpage>&#x2013;<lpage>2605</lpage>. <pub-id pub-id-type="doi">10.1109/TVCG.2013.151</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fiedler</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1973</year>). <article-title>Algebraic Connectivity of Graphs</article-title>. <source>Czech. Math. J.</source> <volume>23</volume>, <fpage>298</fpage>&#x2013;<lpage>305</lpage>. <pub-id pub-id-type="doi">10.21136/cmj.1973.101168</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gansner</surname>
<given-names>E. R.</given-names>
</name>
<name>
<surname>North</surname>
<given-names>S. C.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>An Open Graph Visualization System and its Applications to Software Engineering</article-title>. <source>Softw. Pract. Exper.</source> <volume>30</volume>, <fpage>1203</fpage>&#x2013;<lpage>1233</lpage>. <pub-id pub-id-type="doi">10.1002/1097-024x(200009)3011&#x3c;1203::aid-spe338&#x3e;3.0.co;2-n</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Graph U-Nets</article-title>. In <conf-name>International Conference on Machine Learning</conf-name>, <fpage>2083</fpage>&#x2013;<lpage>2092</lpage>. </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goller</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kuchler</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>Learning Task-dependent Distributed Representations by Backpropagation through Structure</article-title>. <source>ICNN</source>. </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gori</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Monfardini</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Scarselli</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>A New Model for Learning in Graph Domains</article-title>. <source>ICNN</source>. </citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hajij</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rosen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Mapper on Graphs for Network Visualization</source>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism</article-title>. In <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>. <fpage>6480</fpage>&#x2013;<lpage>6489</lpage>. </citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kersting</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kriege</surname>
<given-names>N. M.</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Mutzel</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Neumann</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Benchmark Data Sets for Graph Kernels</source>.</citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Ba</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Adam: A Method for Stochastic Optimization</source> (<comment>arXiv preprint arXiv:1412.6980</comment>.</citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kipf</surname>
<given-names>T. N.</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Semi-Supervised Classification with Graph Convolutional Networks</source> (<comment>arXiv preprint arXiv:1609.02907</comment>.</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Self-Attention Graph Pooling</article-title>. In <conf-name>International Conference on Machine Learning</conf-name>. <fpage>3734</fpage>&#x2013;<lpage>3743</lpage>. </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Leskovec</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <source>CS224W: Social and Information Network Analysis - Graph Clustering</source>.</citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tarlow</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Brockschmidt</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zemel</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). <source>Gated Graph Sequence Neural Networks</source> (<comment>arXiv:1511.05493</comment>.</citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Luzhnica</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lio</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Clique Pooling for Graph Classification</source> (<comment>arXiv preprint arXiv:1904.00374</comment>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Aggarwal</surname>
<given-names>C. C.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Graph Convolutional Networks with EigenPooling</article-title>. In <conf-name>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &#x26; Data Mining</conf-name>. <fpage>723</fpage>&#x2013;<lpage>731</lpage>. </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McInnes</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Healy</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Melville</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction</article-title>. <comment>ArXiv e-prints</comment>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nobre</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Streit</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lex</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>The State of the Art in Visualizing Multivariate Networks</article-title>. <source>Comp. Graphics Forum</source> <volume>38</volume>, <fpage>807</fpage>&#x2013;<lpage>832</lpage>. <pub-id pub-id-type="doi">10.1111/cgf.13728</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Page</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Brin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Motwani</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Winograd</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>1999</year>). <source>
<italic>The PageRank Citation Ranking: Bringing Order To the Web.</italic> Tech. Rep.</source> <publisher-loc>Stanford InfoLab</publisher-loc>.</citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ranjan</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Sanyal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Talukdar</surname>
<given-names>P. P.</given-names>
</name>
</person-group> (<year>2019</year>). <source>ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations</source> (<comment>arXiv preprint arXiv:1911.07979</comment>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Scarselli</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Gori</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ah Chung Tsoi</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Hagenbuchner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Monfardini</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Computational Capabilities of Graph Neural Networks</article-title>. <source>IEEE Trans. Neural Netw.</source> <volume>20</volume>, <fpage>81</fpage>&#x2013;<lpage>102</lpage>. <pub-id pub-id-type="doi">10.1109/TNN.2008.2005141</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Namata</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bilgic</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Getoor</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Galligher</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Eliassi-Rad</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Collective Classification in Network Data</article-title>. <source>AIMag</source> <volume>29</volume>, <fpage>93</fpage>. <pub-id pub-id-type="doi">10.1609/aimag.v29i3.2157</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shervashidze</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Schweitzer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Van Leeuwen</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Mehlhorn</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Borgwardt</surname>
<given-names>K. M.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Weisfeiler-lehman Graph Kernels</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>12</volume>, <fpage>2539</fpage>&#x2013;<lpage>2561</lpage>. </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>M&#xe9;moli</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Carlsson</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Topological Methods for the Analysis of High Dimensional Data Sets and 3d Object Recognition</article-title>. <source>SPBG</source> <volume>91</volume>, <fpage>100</fpage>. </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sperduti</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Encoding Labeled Graphs by Labeling Raam</article-title>. In <conf-name>NIPS</conf-name>. </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tenenbaum</surname>
<given-names>J.&#x20;B.</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>V. d.</given-names>
</name>
<name>
<surname>Langford</surname>
<given-names>J.&#x20;C.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>A Global Geometric Framework for Nonlinear Dimensionality Reduction</article-title>. <source>Science</source> <volume>290</volume>, <fpage>2319</fpage>&#x2013;<lpage>2323</lpage>. <pub-id pub-id-type="doi">10.1126/science.290.5500.2319</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van den Elzen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>van Wijk</surname>
<given-names>J.&#x20;J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Multivariate Network Exploration and Presentation: From Detail to Overview via Selections and Aggregations</article-title>. <source>IEEE Trans. Vis. Comput. Graphics</source> <volume>20</volume>, <fpage>2310</fpage>&#x2013;<lpage>2319</lpage>. <pub-id pub-id-type="doi">10.1109/tvcg.2014.2346441</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van der Maaten</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Visualizing Data Using T-SNE</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>9</volume>, <fpage>2579</fpage>&#x2013;<lpage>2605</lpage>. </citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Veli&#x10d;kovi&#x107;</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Fedus</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Hamilton</surname>
<given-names>W. L.</given-names>
</name>
<name>
<surname>Li&#xf2;</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hjelm</surname>
<given-names>R. D.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Deep Graph Infomax</source> (<comment>arXiv preprint arXiv:1809.10341</comment>.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>von Landesberger</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kuijper</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Schreck</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kohlhammer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>van Wijk</surname>
<given-names>J.&#x20;J.</given-names>
</name>
<name>
<surname>Fekete</surname>
<given-names>J.-D.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Visual Analysis of Large Graphs: State-Of-The-Art and Future Research Challenges</article-title>. <source>Comp.&#x20;Graphics&#x20;Forum</source> <volume>30</volume>, <fpage>1719</fpage>&#x2013;<lpage>1749</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-8659.2011.01898.x</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wattenberg</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Visual Exploration of Multivariate Graphs</article-title>. In <conf-name>Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery), CHI &#x2019;06</publisher-name>, <fpage>811</fpage>&#x2013;<lpage>819</lpage>. <pub-id pub-id-type="doi">10.1145/1124772.1124891</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>W. W.</given-names>
</name>
<name>
<surname>Salakhutdinov</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Revisiting Semi-supervised Learning with Graph Embeddings</article-title>. <source>ICML</source>. </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ying</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Hamilton</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Leskovec</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Hierarchical Graph Representation Learning with Differentiable Pooling</article-title>. In <conf-name>Advances in Neural Information Processing Systems</conf-name>. <fpage>4800</fpage>&#x2013;<lpage>4810</lpage>. </citation>
</ref>
</ref-list>
<app-group>
<app id="app1">
<title>Appendix</title>
<sec>
<title>A Model Architecture and Hyperparameters.</title>
<p>We additionally performed a hyperparameter search for DiffPool on hidden sizes <inline-formula id="inf239">
<mml:math id="m245">
<mml:mrow>
<mml:mn>32,64,128</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and for DGM, over the following sets of possible values:<list list-type="simple">
<list-item>
<p>&#x2022; all datasets: cover sizes <inline-formula id="inf240">
<mml:math id="m246">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>40,10</mml:mn>
<mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
</mml:mrow>
<mml:mn>20,5</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, interval overlap <inline-formula id="inf241">
<mml:math id="m247">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>10</mml:mn>
<mml:mtext>%</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mn>25</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</list-item>
<list-item>
<p>&#x2022; D&#x26;D: learning rate <inline-formula id="inf242">
<mml:math id="m248">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</list-item>
<list-item>
<p>&#x2022; Proteins: learning rate <inline-formula id="inf243">
<mml:math id="m249">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>5</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, cover sizes <inline-formula id="inf244">
<mml:math id="m250">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>24,6</mml:mn>
<mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
</mml:mrow>
<mml:mn>16,4</mml:mn>
<mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
</mml:mrow>
<mml:mn>12,3</mml:mn>
<mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
</mml:mrow>
<mml:mn>8,2</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, hidden sizes <inline-formula id="inf245">
<mml:math id="m251">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>64,128</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
</p>
</sec>
</app>
</app-group>
</back>
</article>
