<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Phys.</journal-id>
<journal-title>Frontiers in Physics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Phys.</abbrev-journal-title>
<issn pub-type="epub">2296-424X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">763904</article-id>
<article-id pub-id-type="doi">10.3389/fphy.2021.763904</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Node Classification in Attributed Multiplex Networks Using Random Walk and Graph Convolutional Networks</article-title>
<alt-title alt-title-type="left-running-head">Han et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Attributed Multiplex Networks Node Classification</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Han</surname>
<given-names>Beibei</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1587116/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wei</surname>
<given-names>Yingmei</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kang</surname>
<given-names>Lai</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1467638/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Qingyong</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1453990/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yang</surname>
<given-names>Yuxuan</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1463441/overview"/>
</contrib>
</contrib-group>
<aff>
<institution>College of Systems Engineering, National University of Defense Technology</institution>, <addr-line>Changsha</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1271072/overview">Shudong Li</ext-link>, Guangzhou University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/101109/overview">Chengyi Xia</ext-link>, Tianjin University of Technology, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1587535/overview">Jin Huang</ext-link>, South China Normal University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Yingmei Wei, <email>weiyingmei@nudt.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Social Physics, a section of the journal Frontiers in Physics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>01</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>9</volume>
<elocation-id>763904</elocation-id>
<history>
<date date-type="received">
<day>24</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>12</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Han, Wei, Kang, Wang and Yang.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Han, Wei, Kang, Wang and Yang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Node classification, as a central task in the graph data analysis, has been studied extensively with network embedding technique for single-layer graph network. However, there are some obstacles when extending the single-layer network embedding technique to the attributed multiplex network. The classification of a given node in the attributed multiplex network must consider the network structure in different dimensions, as well as rich node attributes, and correlations among the different dimensions. Moreover, the distance node context information of a given node in each dimension will also affect the classification of the given node. In this study, a novel network embedding approach for the node classification of attributed multiplex networks using random walk and graph convolutional networks (AMRG) is proposed. A random walk network embedding technique was used to extract distant node information and the results are considered as pre-trained node features to be concatenated with the original node features inputted into the graph convolutional networks (GCNs) to learn node representations for each dimension. Besides, the consensus regularization is introduced to capture the similarities among different dimensions, and the learnable neural network parameters of GCNs for different dimensions are also constrained by the regularization mechanism to improve the correlations. As well as an attention mechanism is explored to infer the importance for a given node in different dimensions. Extensive experiments demonstrated that our proposed technique outperforms many competitive baselines on several real-world multiplex network datasets.</p>
</abstract>
<kwd-group>
<kwd>attributed multiplex network</kwd>
<kwd>node classification</kwd>
<kwd>random walk</kwd>
<kwd>graph convolutional networks</kwd>
<kwd>network embedding</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Node classification [<xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B2">2</xref>] is a basic and central task in the graph data analysis, such as the user division in social networks [<xref ref-type="bibr" rid="B3">3</xref>], the paper classification in citation network [<xref ref-type="bibr" rid="B4">4</xref>]. Network embedding techniques (or network representation learning or graph embedding) utilize a dense low-dimensional vector to represent nodes [<xref ref-type="bibr" rid="B5">5</xref>&#x2013;<xref ref-type="bibr" rid="B7">7</xref>]. This provides an efficient way to solve various graph analytic problems, including node classification [<xref ref-type="bibr" rid="B5">5</xref>&#x2013;<xref ref-type="bibr" rid="B7">7</xref>], recommendation [<xref ref-type="bibr" rid="B8">8</xref>,<xref ref-type="bibr" rid="B9">9</xref>], link prediction [<xref ref-type="bibr" rid="B10">10</xref>,<xref ref-type="bibr" rid="B11">11</xref>]. Most existing network embedding techniques for node classification are designed for standard single-layer graph networks [<xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B2">2</xref>,<xref ref-type="bibr" rid="B5">5</xref>,<xref ref-type="bibr" rid="B12">12</xref>&#x2013;<xref ref-type="bibr" rid="B14">14</xref>], such as DeepWalk [<xref ref-type="bibr" rid="B13">13</xref>], node2vec [<xref ref-type="bibr" rid="B10">10</xref>], LINE [<xref ref-type="bibr" rid="B12">12</xref>], and classical graph neural networks (GNNs) such as graph convolutional networks (GCNs) [<xref ref-type="bibr" rid="B5">5</xref>], GAT Veli&#x10d;kovi&#x107; and Cucurull [<xref ref-type="bibr" rid="B15">15</xref>], and GraphSAGE [<xref ref-type="bibr" rid="B14">14</xref>]. However, most real-world complex interacting systems [<xref ref-type="bibr" rid="B16">16</xref>] are modeled as multilayer graph networks, including social networks [<xref ref-type="bibr" rid="B3">3</xref>], citation-collaboration networks [<xref ref-type="bibr" rid="B4">4</xref>], which are formed by several layers describing interactions of various types. For example, two users could be connected to each other across multiple social networks (e.g., Twitter, Facebook, and LinkedIn). Using multilayer graph networks can provide more comprehensive and accurate description about these two users. When the same set of nodes are connected in the way of multiple link types or relationship types, the resulting multilayer graph network is also called multiplex graph network or multiplex network [<xref ref-type="bibr" rid="B17">17</xref>,<xref ref-type="bibr" rid="B18">18</xref>] <xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref> If the nodes in multiplex graph network contain attributes, such network is called attributed multiplex graph network. The attributes can provide useful guidance to perform node classification graph data analysis. For example, if two users in a social network share hobbies or interests, these two users may belong to the same cluster.</p>
<p>Several studies have been conducted on multiplex network representation learning. However, some issues remain that require further consideration. For instance, previous techniques such as PMNE [<xref ref-type="bibr" rid="B19">19</xref>], MELL [<xref ref-type="bibr" rid="B20">20</xref>], MVE [<xref ref-type="bibr" rid="B21">21</xref>] and MNE [<xref ref-type="bibr" rid="B11">11</xref>] have learned to integrate node embedding information from different dimensions in multiplex network representations. However, these techniques have mostly overlooked node attributes. Other models that consider node attributes (e.g., mGCN [<xref ref-type="bibr" rid="B22">22</xref>], MGCN [<xref ref-type="bibr" rid="B23">23</xref>], DMGI [<xref ref-type="bibr" rid="B24">24</xref>], and HAN [<xref ref-type="bibr" rid="B25">25</xref>]) have either failed to consider interactions among diverse dimensions (mGCN) or focus on multiplex graphs with explicit adjacency links among different dimensions (MGCN [<xref ref-type="bibr" rid="B23">23</xref>]). DMGI and HAN consider heterogeneous graphs constructed based on the meta-path between different node, which differs from the multiplex network. MGAT [<xref ref-type="bibr" rid="B26">26</xref>] introduces a constrained regularization term in GAT [<xref ref-type="bibr" rid="B15">15</xref>] to learn the interactions between different dimensions, that is learnable parameters constraint of GAT. However, MGAT fails to consider the node embedding matrices similarity [<xref ref-type="bibr" rid="B27">27</xref>] among different relationships. Furthermore, while GAT can be calculated in parallel with multi-head attention, the memory complexity for parameter storage is higher than that of the GCN model [<xref ref-type="bibr" rid="B5">5</xref>]. In addition, MGAT utilizes a two-layer GAT to integrate node information from its neighbors, which only captures information from 2-hop nodes. Including more than two layers in a GNN often results in over-smoothing [<xref ref-type="bibr" rid="B7">7</xref>]. However, the node2vec network embedding [<xref ref-type="bibr" rid="B10">10</xref>], which is based on random walk technique, can be used to search for 10-hop contextual information [<xref ref-type="bibr" rid="B10">10</xref>] and to capture the structural equivalence (i.e. two nodes are far apart each other but have the same structural roles). This suggests that information from a larger receptive domain helps to capture more comprehensive node representations, which can be used to improve the results of node classification.</p>
<p>Recently, the Graph Neural Network (GNN) deep nonlinear network embedding framework represented by GCN [<xref ref-type="bibr" rid="B5">5</xref>], which can encode node attributes and network structure simultaneously, has achieved great success on the node classification graph data analysis task for attributed single-layer network. The direct method about node classification of the attributed multiplex network is to extend the GCN [<xref ref-type="bibr" rid="B5">5</xref>] to multiplex network. However, some obstacles are existed. First, different dimensions of an given attributed multiplex network share the same node set and node attributes. Hence, different dimensions are typically similarities or may have some characteristics in common [<xref ref-type="bibr" rid="B28">28</xref>]. For instance, citation networks represent citations between papers. Similarly, paper similarity networks represent the commonality among papers as articles that cite each other typically share a common research topic. Therefore, the citation and paper similarity dimensional networks exhibit a certain degree of overlap. Second, different dimensions of multiplex networks are related [<xref ref-type="bibr" rid="B17">17</xref>]. For example, in social networks, friend relationship dimensional network can determine the topology of a message forwarding dimensional network. Besides, the degree of importance differs in dimensional networks as the significance of a given node may vary in different dimensions. Third, two nodes may share similar structure roles but are far apart each other (i.e. structural equivalence [<xref ref-type="bibr" rid="B10">10</xref>]). These two distance nodes may belong to the same cluster. Therefore, the primary challenge for node classification of multiplex network is then designing a model to extract the node information, oriented by the downstream node classification task, capable of generating a comprehensive embedding (consensus) that considers node attributes, their interaction and similarities among different dimensions, the corresponding degree of importance in diverse dimension networks, and the distance node context information.</p>
<p>In this paper, we propose a novel graph embedding framework for the node classification of attributed multiplex graph to solve the above mentioned problems. At first, a random walk network embedding technique was included to address the over-smoothing problem [<xref ref-type="bibr" rid="B7">7</xref>] that occurs in GCNs, which is unable to capture distance node context information (more than 2-hop). We use the node2vec random walk network embedding to learn distance node context information for each dimension, and the obtained node embeddings are considered as pre-trained node features, which learn the distance node neighborhood to capture the structural equivalence. Then, pre-trained node features are concatenated with the original node attributes to form new node attributes inputted into the two-layer GCNs [<xref ref-type="bibr" rid="B5">5</xref>] to learn each dimensional graph network respectively. At the same time, a regularized consistency constraint was then introduced to node embeddings from different dimension to learn similarities [<xref ref-type="bibr" rid="B28">28</xref>] between nodes and their counterparts in others dimension. And the learnable weight parameters of different GCNs for different dimensional graph networks were then constrained using a regularization term [<xref ref-type="bibr" rid="B26">26</xref>]. Finally, the attention mechanism is used to adaptively learn the importance weights of a given node in different dimensional, prior to integrating the node embedding results from different dimensions to generate global consensus node representations. This model can be trained end-to-end oriented by the downstream node classification task. In this way, a more comprehensive, informative, and high-quality node representation for the node classification can be achieved using these strategies discussed above. The primary contributions of the proposed technique can be summarized as follows.<list list-type="simple">
<list-item>
<p>&#x2022; We provide a novel node classification methodology for attributed multiplex networks using random walk network embedding and graph convolutional networks (AMRG), which can fuse the node attributes and capture distant node context information.</p>
</list-item>
<list-item>
<p>&#x2022; We use regularized constraints to learn cross dimensional similarities and correlations among different dimensions. Then, the integration of node embeddings from different dimensional is performed based on an attention mechanism.</p>
</list-item>
<list-item>
<p>&#x2022; Extensive experiments were conducted to evaluate the effectiveness and efficiency of this approach, by comparing it with some competitive baselines on real-world attributed multiplex networks.</p>
</list-item>
</list>
</p>
<p>The remainder of this paper is organized as follows. First, we summarize related work in <xref ref-type="sec" rid="s2">Section 2</xref> and then introduce our approach in <xref ref-type="sec" rid="s3">Section 3</xref>. <xref ref-type="sec" rid="s4">Section 4</xref> provides experimental conditions and results. Finally, conclusions are discussed in <xref ref-type="sec" rid="s5">Section&#x20;5</xref>.</p>
</sec>
<sec id="s2">
<title>2 Related Work</title>
<p>This section summarizes related studies on network embedding for single-layer networks (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>) and multiplex networks (<xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>). The acquired node embeddings can be used to perform node classification&#x20;tasks.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Single-layer and multiplex networks with the same set of nodes but different link types. <bold>(A)</bold> The single-layer network. <bold>(B)</bold> Different nodes are connected to each other by a continuous line in each relationship. A single node is connected to its counterparts by a dashed line.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g001.tif"/>
</fig>
<sec id="s2-1">
<title>2.1&#x20;Single-Layer Network Embedding</title>
<p>Network embedding methods [<xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B5">5</xref>,<xref ref-type="bibr" rid="B10">10</xref>,<xref ref-type="bibr" rid="B12">12</xref>,<xref ref-type="bibr" rid="B13">13</xref>] are used to learn low-dimensional and dense vector representations for nodes in real graph networks, while preserving network structure and facilitating further analysis of graph networks. Various network embedding techniques have been proposed based on deep learning, inspired by word2vec models such as Skip-Gram [<xref ref-type="bibr" rid="B29">29</xref>], including DeepWalk [<xref ref-type="bibr" rid="B13">13</xref>] and node2vec [<xref ref-type="bibr" rid="B10">10</xref>]. DeepWalk [<xref ref-type="bibr" rid="B13">13</xref>] first performs a random walk on a network to generate an unbiased random sequence composed of nodes. The neural network (Skip-Gram) is subsequently used to train network node representations by treating nodes as words and node sequences as sentences. Node2vec [<xref ref-type="bibr" rid="B10">10</xref>] extends DeepWalk by introducing two parameters (<italic>p</italic> and <italic>q</italic>) used to improve the random walk strategy (i.e. BFS and DFS) exploring a more comprehensive graph structure (a biased random walk). Other network embedding models have focused on mining and analysis for specific network structures. For example, LINE [<xref ref-type="bibr" rid="B12">12</xref>] is a classic approach that learns node embedding information by preserving both first-order and second-order proximities in the graph. Similarly, SDNE [<xref ref-type="bibr" rid="B30">30</xref>] utilizes a semi-supervised deep autoencoder model to capture first-order and second-order proximities. NetMF [<xref ref-type="bibr" rid="B31">31</xref>] unifies DeepWalk, LINE, and node2vec into a single matrix factorization framework.</p>
<p>The techniques discussed above focus on mining graph structure, without considering node attribute information. However, nodes in real-world networks often contain rich attribute data, such as abstract text in a publication network and user profiles in social networks (called attributed networks). Considering node attribute information in the learning process has been shown to improve the quality of network representation learning and provide a more comprehensive node embedding strategy to facilitate downstream tasks [<xref ref-type="bibr" rid="B5">5</xref>,<xref ref-type="bibr" rid="B32">32</xref>&#x2013;<xref ref-type="bibr" rid="B34">34</xref>]. TADW [<xref ref-type="bibr" rid="B32">32</xref>] has been used to demonstrate the equivalence between DeepWalk and matrix factorization in attributed network representation learning. It was also the first algorithm used to jointly learn node attributes (textural features) and network structure, which are achieved via matrix factorization. However, TADW only considers second-order and higher-order proximity, leaving out first-order proximity considerations (i.e.,&#x20;homophily properties). HSCA [<xref ref-type="bibr" rid="B33">33</xref>] jointly learns homophily data, structural content, and node attributes to develop an effective node representation. MIRand [<xref ref-type="bibr" rid="B34">34</xref>] is an unsupervised algorithm for attributed single-layer graph network embedding based on random walk. This approach first establishes a two-layer graph network, one of which depicts structural information from the input graph while the other describes node attributes or content. MIRand performs the random walk according to node informativeness, intelligently traversing between structure and attribute layers.</p>
<p>Inspired by the success of CNNs in computer vision, graph neural networks (GNNs) [<xref ref-type="bibr" rid="B35">35</xref>] generalize 2D convolutions from Euclidean images to non-Euclidean graph data, which provides a powerful end-to-end method for learning node representations while addressing graph-related tasks. Graph convolutions are performed by aggregating neighborhood node information, which naturally considers node attributes. Representative work concerning convolution operators applied to graph data is found in Graph Convolutional Network (GCN) [<xref ref-type="bibr" rid="B5">5</xref>]. Michael Schlichtkrull et&#x20;al. first applied a GCN framework to model relational data, focusing on knowledge graph datasets, which is called Relational Graph Convolutional Networks (R-GCNs) [<xref ref-type="bibr" rid="B36">36</xref>]. In contrast, GAT [<xref ref-type="bibr" rid="B15">15</xref>] models specify different weights for varying neighborhood nodes when performing convolution operations. GAT assumes that different neighboring nodes exhibit different importance levels for the objective node while aggregating neighboring nodes. GraphSAGE [<xref ref-type="bibr" rid="B14">14</xref>] is an extension of the GCN framework that uses inductive node embedding. For a complete overview of network embedding techniques, readers are referred to recent studies on the topic [<xref ref-type="bibr" rid="B35">35</xref>,<xref ref-type="bibr" rid="B37">37</xref>].</p>
</sec>
<sec id="s2-2">
<title>2.2 Multiplex Network Embedding</title>
<p>Although these techniques have proven to be effective and efficient in various scenarios, they each attempt to process standard single-layer graphs, which implies the graph only consists of one type of relationship as shown in <xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>. However, in practical applications, most networks exhibit multiple relationships between nodes. For example, in social networks, the relationship between two users could be friendship, co-worker, or simply advice [<xref ref-type="bibr" rid="B38">38</xref>]. Although these diverse relationships can independently form different networks to be analyzed separately, specific interactions and associations exist among them [<xref ref-type="bibr" rid="B17">17</xref>,<xref ref-type="bibr" rid="B28">28</xref>].</p>
<p>PMNE [<xref ref-type="bibr" rid="B19">19</xref>] uses three methods to learn global embedding information for analyzing multiplex networks, including network aggregation, result aggregation, and layer co-analysis which considers interaction information and thus achieves the best overall performance than network aggregation and result aggregation. Ryuta Matsuno et&#x20;al. proposed MELL [<xref ref-type="bibr" rid="B20">20</xref>] for multiplex networks. It first requires embedding vectors, for the same nodes in diverse relationships, to be close to each other in order to share all layer structures. It then introduces a layer vector that can capture each layer&#x2019;s connectivity for use in differentiating edge probabilities in each relationship. As such, MELL focuses specifically on link prediction tasks. MVE [<xref ref-type="bibr" rid="B21">21</xref>] is a novel collaboration framework for multiplex network embedding, which promotes the collaboration of different views and introduces an attention mechanise to learning the weights of different views. Such can obtain robust node embedding results. However, MVE only consideres the network structure (i.e. attributes of nodes are ignored). MNE [<xref ref-type="bibr" rid="B11">11</xref>] uses one high-dimensional common embedding and a lower-dimensional additional embedding for each type of relationship, each of which can be learned jointly based on a unified framework. MANE [<xref ref-type="bibr" rid="B39">39</xref>] jointly models both connections in each relationship and network interactions from different relationships in a unified framework. Essentially, MANE focuses on processing heterogeneous graphs with different types of nodes and edges. However, node attributes are not considered by the models discussed&#x20;above.</p>
<p>HAN [<xref ref-type="bibr" rid="B25">25</xref>] uses a novel heterogeneous graph neural network with node-level and semantic-level attentions for attributed multiplex networks, generating node embeddings <italic>via</italic> aggregating features from meta-path based neighbors. This approach attempts to describe heterogeneous graphs generated from meta-paths considering the semantics between nodes. Similarly, mGCN [<xref ref-type="bibr" rid="B22">22</xref>] utilizes GCN to learn node representations for each relationship. In order to jointly learn cross-layer interactions, the authors used a weighted average over relation-specific representations to produce generalized descriptions in which weights were calculated based on projection matrices from different networks. Unlike in mGCN, our proposed approach adds regularized consensus constraints and trainable weight parameters constraints among GCN for different dimensional graph networks on the objective function. In fact, the mGCN only learns node embeddings for each dimension, using a weighted average of the embedding results to generate overall node representations without considering the interactions between different dimensional networks. Masha Ghorbani et&#x20;al. extended the GCN model to form a multi-layer graph embedding called MGCN [<xref ref-type="bibr" rid="B23">23</xref>]. This approach utilizes GCN models to learn node representations within relationships. However, MGCN focuses on multi-layer graphs with explicit adjacency links between nodes with different relationship types. The types of nodes found in these relationship networks can vary widely. For instance, one layer could denote an airport network, while another describes a power grid. MGAT [<xref ref-type="bibr" rid="B26">26</xref>], an extension of the GAT model, introduces regularization terms for model parameters on the objective function to optimize multiplex network embedding. The primary difference between our method and MGAT is that we extend the GCN model while requiring less memory than GAT for parameter storage [<xref ref-type="bibr" rid="B15">15</xref>]. Furthermore, our technique is advantageous because it uses a random walk network embedding technique to learn 10-hop node information [<xref ref-type="bibr" rid="B10">10</xref>] as the pre-trained node feature which concatenated with the original node features, and resulting new node features were input into the GCN model, while MGAT only learns 2-hop data. MGAT also fails to consider the similarities of node embeddings between different dimensional graph network.</p>
<p>The difference between single-layer and multiplex graph network embedding: With the advent of the big data where many different links of interconnected objects, it is difficult to model these interacting objects as single-layer graph networks but can naturally model using multiplex graph networks to describe different links. For example, two users could be connected to each other across multiple social platform (e.g. Twitter, Facebook, and LinkedIn). Therefore, each social relationship can be modeled as a graph network. The traditional graph network analysis methods with single-layer network embedding technique can be utilized to analyze these three graph networks separately, which may result in incorrect analysis results. As the single-layer graph network can only describe some or even biased information between nodes. Another way is to transform these three graph network (e.g. Twitter, Facebook, and LinkedIn) as a weighted or an unweighted single-layer graph network. The weight represents the number of link types (i.e. 0, 1, 2, 3) between connected nodes. Then the single-layer network embedding methods can be used on the transformed single-layer graph network. Although it is easy to perform the graph network analysis in this way, the interactive information among different relationships through the same nodes in different links are ignored. The multiplex graph network can model more comprehensively characterize of the complex systems than single-layer graph network. Besides, most of the multiplex graph network embedding methods capture the interactive information and common information among different relationships, and the differences between different relationships. Therefore, the graph analysis results of the multiplex graph network embedding are more accurate than single-layer network embedding.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Proposed Model</title>
<sec id="s3-1">
<title>3.1 Problem Statement and Framework</title>
<p>Attributed Multiplex Graph Network. Given a single-layer attributed network formally denoted as <italic>G</italic>&#x20;&#x3d; {<italic>V</italic>, <italic>E</italic>, <italic>X</italic>}, the vertices <italic>V</italic> represent <italic>n</italic> nodes in the graph. The term <italic>E</italic> is a set of edges representing the presence of a connection or relationship between two nodes, where <inline-formula id="inf1">
<mml:math id="m1">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>E</mml:mi>
</mml:math>
</inline-formula> describes the relationship between node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> and node <italic>v</italic>
<sub>
<italic>j</italic>
</sub>. In addition, &#x7c;<italic>V</italic>&#x7c; and &#x7c;<italic>E</italic>&#x7c; denote the size of the vertice and edge sets respectively, and <italic>X</italic>&#x20;&#x2208; R<sup>
<italic>n</italic>&#xd7;<italic>F</italic>
</sup> is a matrix that represents attributes for the <italic>n</italic> nodes, <italic>F</italic> represents the dimension of node features, <italic>A</italic> is an adjacency matrix for the graph <italic>G</italic> (with a size of <inline-formula id="inf2">
<mml:math id="m2">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>), and <inline-formula id="inf3">
<mml:math id="m3">
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> denotes connections for unweighted network graphs. The condition <italic>A</italic>
<sub>
<italic>ij</italic>
</sub> &#x3d; 1 represents a link between <italic>v</italic>
<sub>
<italic>i</italic>
</sub> and <italic>v</italic>
<sub>
<italic>j</italic>
</sub>, otherwise <italic>A</italic>
<sub>
<italic>ij</italic>
</sub> &#x3d; 0. In practical applications, <italic>A</italic> is typically sparse and high-dimensional, especially for very large-scale networks. Attributed multiplex networks with <inline-formula id="inf4">
<mml:math id="m4">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> different relationship types can be represented as <italic>G</italic>&#x20;&#x3d; {<italic>V</italic>, <italic>E</italic>
<sup>(1)</sup>, <italic>E</italic>
<sup>(2)</sup>, &#x2026;, <italic>E</italic>
<sup>(<italic>M</italic>)</sup>, <italic>X</italic>}, where <italic>G</italic>
<sup>(<italic>r</italic>)</sup> &#x3d; {<italic>V</italic>, <italic>E</italic>
<sup>(<italic>r</italic>)</sup>, <italic>X</italic>} is a graph of the relation type (or dimension) <italic>r</italic> and <inline-formula id="inf5">
<mml:math id="m5">
<mml:mi>A</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> is a set of adjacency matrices for the graph <italic>G</italic>. Multiplex network embedding attempts to learn global consensus node representations for each node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> &#x2208; <italic>V</italic> with a <italic>d</italic> dimensional dense vector, through better collaboration among different dimensions. This suggests the correlations among diverse relation types should be considered for a comprehensive and informative node representation. Low dimensional and consensus vectors can be represented as <italic>z</italic>
<sub>
<italic>i</italic>
</sub> &#x2208; <italic>Z</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>d</italic>
</sup> for each node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> &#x2208; <italic>V</italic>, where <inline-formula id="inf6">
<mml:math id="m6">
<mml:mi>d</mml:mi>
<mml:mo>&#x226a;</mml:mo>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>. These notations are summarized in <xref ref-type="table" rid="T1">Table&#x20;1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Notations.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Notation</th>
<th align="center">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>G</italic>&#x20;&#x3d; {<italic>G</italic>
<sup>(1)</sup>, <italic>G</italic>
<sup>(2)</sup>, &#x2026;, <italic>G</italic>
<sup>(<italic>M</italic>)</sup>}</td>
<td align="left">The multiplex network</td>
</tr>
<tr>
<td align="left">
<italic>G</italic>
<sup>
<italic>r</italic>
</sup>
</td>
<td align="left">The network for dimension <italic>r</italic>
</td>
</tr>
<tr>
<td align="left">
<italic>V</italic>
</td>
<td align="left">Set of vertices</td>
</tr>
<tr>
<td align="left">
<italic>E</italic>&#x20;&#x3d; {<italic>E</italic>
<sup>(1)</sup>, <italic>E</italic>
<sup>(2)</sup>, &#x2026;, <italic>E</italic>
<sup>(<italic>M</italic>)</sup>}</td>
<td align="left">Set of edges</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf7">
<mml:math id="m7">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>
</td>
<td align="left">Number of vertices</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf8">
<mml:math id="m8">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>
</td>
<td align="left">Number of edges</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf9">
<mml:math id="m9">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>
</td>
<td align="left">Number of relationship types</td>
</tr>
<tr>
<td align="left">
<italic>A</italic>&#x20;&#x3d; {<italic>A</italic>
<sup>(1)</sup>, <italic>A</italic>
<sup>(2)</sup>, &#x2026;, <italic>A</italic>
<sup>(<italic>M</italic>)</sup>}</td>
<td align="left">Adjacency matrices for <italic>G</italic>
</td>
</tr>
<tr>
<td align="left">
<italic>F</italic>
</td>
<td align="left">Dimension of node features</td>
</tr>
<tr>
<td align="left">
<italic>d</italic>
</td>
<td align="left">Dimension of learned node representations</td>
</tr>
<tr>
<td align="left">
<italic>d</italic>
<sub>
<italic>rw</italic>
</sub>
</td>
<td align="left">Dimension of learned node representations for distance node</td>
</tr>
<tr>
<td align="left">
<italic>v</italic>
<sub>
<italic>i</italic>
</sub>
</td>
<td align="left">The node <italic>i</italic>
</td>
</tr>
<tr>
<td align="left">
<italic>A</italic>
<sup>
<italic>r</italic>
</sup> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>n</italic>
</sup>
</td>
<td align="left">Adjacency matrix for <italic>G</italic>
<sup>
<italic>r</italic>
</sup>
</td>
</tr>
<tr>
<td align="left">
<italic>H</italic>
<sub>
<italic>r</italic>
</sub> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>d</italic>
</sup>
</td>
<td align="left">Node representations matrix for <italic>G</italic>
<sup>
<italic>r</italic>
</sup>
</td>
</tr>
<tr>
<td align="left">
<italic>X</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>F</italic>
</sup>
</td>
<td align="left">The node feature matrix</td>
</tr>
<tr>
<td align="left">
<italic>z</italic>
<sub>
<italic>i</italic>
</sub> &#x2208; <italic>R</italic>
<sup>1&#xd7;<italic>d</italic>
</sup>
</td>
<td align="left">The global consensus node embedding for node <italic>i</italic>
</td>
</tr>
<tr>
<td align="left">
<italic>Z</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>d</italic>
</sup>
</td>
<td align="left">The global consensus node embedding matrix</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>An Overview of the Framework. The overall framework for AMRG is illustrated in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>. It is primarily composed of four components, including 1) a random walk network embedding model used to capture distance neighboring node information as pre-trained node features, 2) dimension specific node embeddings with the GCN model, 3) cross dimension learning, and 4) an attention-based mechanism used to learn node importance in different dimensions for fusing different dimensions adequately.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The framework for our proposed method of attributed multiplex network embedding.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g002.tif"/>
</fig>
<p>We use the node2vec random walk network embedding technique to capture distance node feature of each dimensional graph network, and then the averaged node embedding of <inline-formula id="inf10">
<mml:math id="m10">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> node embeddings are concatenated with the original node feature. The resulting is considered as new node features, which are inputted into the GCN model. Then the GCN model can be utilized for the relation-type specific network <italic>G</italic>
<sup>
<italic>r</italic>
</sup>, to learn a set of node representations <italic>H</italic>
<sub>
<italic>r</italic>
</sub>. However, unlike the conventional GCN method [<xref ref-type="bibr" rid="B5">5</xref>], a weight was added to the self-connections which is down in the same way in [<xref ref-type="bibr" rid="B24">24</xref>]. Large weights (<italic>w</italic>&#x20;&#x3e; 1) indicate the node itself plays a more important role in generating its embedding than its neighboring nodes in the process of aggregating neighbor information. Furthermore, learnable weight parameters of <inline-formula id="inf11">
<mml:math id="m11">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> GCNs are constrained using a regularization term [<xref ref-type="bibr" rid="B26">26</xref>]. In addition, we introduce a regularized consistency constraint for each network embedding <italic>H</italic>
<sub>
<italic>r</italic>&#x2208;<italic>M</italic>
</sub> to capture node similarities from their counterparts. These two constraints from different dimensional graph network can be beneficial [<xref ref-type="bibr" rid="B17">17</xref>,<xref ref-type="bibr" rid="B19">19</xref>,<xref ref-type="bibr" rid="B24">24</xref>] for the downstream node classification, as this can capture more comprehensive information of the multiplex network. Finally, a global consensus node embedding was generated by weighted average different dimensional network embeddings based on the attention mechanism. This obtained embedding is a comprehensive, higher-quality, informative node representation, which can be used for classification and visualization&#x20;tasks.</p>
</sec>
<sec id="s3-2">
<title>3.2 Capturing Distance Neighboring Node Information</title>
<p>The advantage of GCN is that it not only considers the network structure, but also fuses the node attributes. GCN is typically using two convolution layers [<xref ref-type="bibr" rid="B5">5</xref>], which means that it can only captures 2-hop node neighboring information. However, more than two convolution layers will result in the over-smoothing problem. Node information from a larger receptive domain (10-hop) with node2vec random walk network embedding technique [<xref ref-type="bibr" rid="B10">10</xref>] can help to capture richer node features. However, node2vec leaves out of consideration the node attributes. We thus combine the node2vec random walk network embedding technique and GCN to learn the node embeddings of multiplex graph network. The resulting node embeddings can not only fuse node&#x2019;s attributes, but also make up for GCN&#x2019;s inability to learn distance node information to capture the structural equivalence.</p>
<p>Given the <italic>r</italic>th dimensional network, we can utilize the node2vec random walk technique to learn 10-hop neighboring node information. The resulting network embedding can be represented as <inline-formula id="inf12">
<mml:math id="m12">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> (<italic>d</italic>
<sub>
<italic>rw</italic>
</sub> &#x226a; <italic>n</italic>). For <inline-formula id="inf13">
<mml:math id="m13">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> different dimensional networks of multiplex graph network, we can use the same method to acquire <inline-formula id="inf14">
<mml:math id="m14">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. Then, we average <inline-formula id="inf15">
<mml:math id="m15">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> node embeddings as:<disp-formula id="e1">
<mml:math id="m16">
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>The <italic>X</italic>
<sub>
<italic>rw</italic>
</sub> is considered as the pre-trained node feature which contains the distance neighboring node information. Then the <italic>X</italic>
<sub>
<italic>rw</italic>
</sub> is concatenated with the original node attributes <italic>X</italic> (i.e.,&#x20;<inline-formula id="inf16">
<mml:math id="m17">
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>). <italic>X</italic>
<sub>
<italic>new</italic>
</sub>, considered as the new node feature, is inputted into the GCN&#x20;model.</p>
</sec>
<sec id="s3-3">
<title>3.3 Dimension Specific Node Embedding With GCN</title>
<p>Graph convolutional neural networks provide a powerful solution for generating node representations for a given graph [<xref ref-type="bibr" rid="B5">5</xref>], which naturally incorporate node attributes. In this section, we utilize multi-layer GCN to learn the dimension specific node embedding. For a given input graph (<italic>A</italic>, <italic>X</italic>), the layer-wise propagation rule can be expressed as [<xref ref-type="bibr" rid="B5">5</xref>]:<disp-formula id="e2">
<mml:math id="m18">
<mml:msup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf17">
<mml:math id="m19">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is an adjacency matrix with added self-connections, <italic>I</italic>
<sub>
<italic>n</italic>
</sub> is the identity matrix,<italic>X</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>F</italic>
</sup> is a feature matrix for the input graph, <inline-formula id="inf18">
<mml:math id="m20">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the degree matrix for <inline-formula id="inf19">
<mml:math id="m21">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>W</italic>
<sup>(<italic>l</italic>)</sup> is the trainable weight parameters matrix for the <italic>l</italic>th layer in the GCN, <italic>&#x3c3;</italic>(&#x22c5;) denotes an activation function (i.e.,&#x20;<inline-formula id="inf20">
<mml:math id="m22">
<mml:mi>Re</mml:mi>
<mml:mo movablelimits="false" form="prefix">LU</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2022;</mml:mo>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>max</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2022;</mml:mo>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>), and <italic>H</italic>
<sup>(<italic>l</italic>)</sup> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>D</italic>
</sup> is the activation matrix for the <italic>l</italic>th layer, initialized as <italic>H</italic>
<sup>(0)</sup> &#x3d; <italic>X</italic>. In this way, the resulting node embeddings that capture node attributes <italic>X</italic> and the graph structure <italic>A</italic> simultaneously&#x20;[<xref ref-type="bibr" rid="B5">5</xref>].</p>
<p>For a dimension specific network <italic>G</italic>
<sup>
<italic>r</italic>
</sup>, the representation learning model can be denoted as:<disp-formula id="e3">
<mml:math id="m23">
<mml:msubsup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>A</italic>
<sup>
<italic>r</italic>
</sup> &#x2208; <italic>R</italic>
<sup>&#x7c;<italic>V</italic>&#x7c;&#xd7;&#x7c;<italic>V</italic>&#x7c;</sup> is an adjacency matrix for the graph <italic>G</italic>
<sup>
<italic>r</italic>
</sup> and <italic>D</italic>
<sup>
<italic>r</italic>
</sup> is the corresponding diagonal degree matrix. Unlike in conventional GCN, we modify <inline-formula id="inf21">
<mml:math id="m24">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and define it as <inline-formula id="inf22">
<mml:math id="m25">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>w</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> Park and Kim [<xref ref-type="bibr" rid="B24">24</xref>]. In this expression,<italic>w</italic>&#x20;&#x2208; <italic>R</italic> is the weight of self-connections used to measure the relative importance between objective nodes and its neighboring nodes in generating objective node embeddings. A value of <italic>w</italic>&#x20;&#x3e; 1 implies the objective node itself is more important than its neighboring nodes, with increasing values of <italic>w</italic> representing higher importance. The term <inline-formula id="inf23">
<mml:math id="m26">
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> represents the degree matrix for <inline-formula id="inf24">
<mml:math id="m27">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. In this paper, a two-layer GCN was used to learn dimension specific node embeddings, and the node feature of the input graph is <italic>X</italic>
<sub>
<italic>new</italic>
</sub>, i.e.<italic>H</italic>
<sup>(0)</sup> &#x3d; <italic>X</italic>
<sub>
<italic>new</italic>
</sub>. The last layer output embedding matrix was denoted <italic>H</italic>
<sub>
<italic>r</italic>
</sub>, which describes a dimension specific node embedding for the graph <italic>G</italic>
<sup>
<italic>r</italic>
</sup>. The resulting node embeddings <italic>H</italic>
<sub>
<italic>r</italic>
</sub> that capture node original attributes <italic>X</italic>, distance node neighboring information <italic>X</italic>
<sub>
<italic>rw</italic>
</sub> and the graph structure <italic>A</italic> simultaneously&#x20;[<xref ref-type="bibr" rid="B5">5</xref>].</p>
</sec>
<sec id="s3-4">
<title>3.4 Cross Dimension Modeling</title>
<p>For a given dimension <italic>r</italic>&#x20;&#x2208; <italic>M</italic>, we can obtain the node representation <italic>H</italic>
<sub>
<italic>r</italic>&#x2208;<italic>M</italic>
</sub> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>d</italic>
</sup>, which provides distance information for <italic>G</italic>
<sup>
<italic>r</italic>
</sup>. Each <italic>H</italic>
<sub>
<italic>r</italic>&#x2208;<italic>M</italic>
</sub> is acquired independently by training a two-layer GCN model, as described by <xref ref-type="disp-formula" rid="e3">Eq. 3</xref>. However, these embedding matrices fail to take advantage of interactions and similarities between diverse dimensions. This inspired us to devise a way of jointly learning embedding information from diverse dimensional networks to develop a more comprehensive node representation. This task was accomplished by adding two regularization constraint mechanisms to the objective function representing the consistency constraint among the <inline-formula id="inf25">
<mml:math id="m28">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> network embedding matrices <italic>H</italic>
<sub>
<italic>r</italic>&#x2208;<italic>M</italic>
</sub>, and the trainable weight parameter constraint among <inline-formula id="inf26">
<mml:math id="m29">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> GCNs. These two constraints are discussed&#x20;below.</p>
<p>Regularized consistency constraints: We first applied the normalization to each node embedding matrix to obtain the normalized matrices (i.e. transforming the <italic>H</italic>
<sub>
<italic>r</italic>
</sub> to <italic>H</italic>
<sub>
<italic>r</italic>&#x2212;<italic>nor</italic>
</sub>). The normalized matrices were then exploited to collect similarity information between each pair of counterpart nodes by <inline-formula id="inf27">
<mml:math id="m30">
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>, and the resulting is the similarity of <italic>n</italic> nodes. The regularized consistency constraint can be defined as follows:<disp-formula id="e4">
<mml:math id="m31">
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mspace width="0.28em"/>
<mml:mspace width="0.28em"/>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:math>
<label>(4)</label>
</disp-formula>where &#x22c5;<sup>
<italic>T</italic>
</sup> represents the transpose. This constraint can adaptively capture node similarity information among diverse dimensional graph networks when training the proposed model oriented by the node classification&#x20;task.</p>
<p>Regularized trainable weight parameter constraints among <inline-formula id="inf28">
<mml:math id="m32">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> GCNs: As in Xie et&#x20;al. [<xref ref-type="bibr" rid="B26">26</xref>], we utilize regularized constraints for trainable weight parameters among <inline-formula id="inf29">
<mml:math id="m33">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> GCNs models. This constraint can be defined as follows:<disp-formula id="e5">
<mml:math id="m34">
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msubsup>
<mml:mrow>
<mml:mfenced open="&#x2016;" close="&#x2016;">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>W</italic>
<sub>
<italic>r</italic>
</sub> and <italic>W</italic>
<sub>
<italic>r</italic>&#x2019;</sub> are trainable GCN weight matrices for the relation type <italic>r</italic> and <italic>r</italic>&#x2032;.</p>
</sec>
<sec id="s3-5">
<title>3.5 Attention Mechanisms for Fusing Different Dimensions</title>
<p>Now attention mechanisms were used to learn corresponding importance weights from different dimensional graph network during each step of model iteration. When optimization of the proposed model ceases, the learned weights represent the importance of diverse dimensional graph networks. The learned dimensional attention matrices for the <italic>n</italic> nodes were used to embed <italic>H</italic>
<sub>
<italic>r</italic>&#x2208;<italic>M</italic>
</sub> into the final global consensus node representation as follows:<disp-formula id="e6">
<mml:math id="m35">
<mml:msub>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.28em"/>
<mml:mspace width="0.28em"/>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>&#x3b1;</italic>
<sub>
<italic>r</italic>
</sub> is the learned importance of <italic>n</italic> nodes for the <italic>r</italic> dimension network.</p>
<p>Now, the node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> can be used as an example to illustrate how importance values can be acquired for the node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> and how attention matrices <italic>&#x3b1;</italic>
<sub>
<italic>r</italic>
</sub> were acquired for the relation type <italic>r</italic>. For each <italic>r</italic>&#x20;&#x2208; <italic>M</italic>, the embedding of node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> in <italic>H</italic>
<sub>
<italic>r</italic>
</sub> is given by the row vector <inline-formula id="inf30">
<mml:math id="m36">
<mml:msubsup>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>. We first transform <inline-formula id="inf31">
<mml:math id="m37">
<mml:msubsup>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> <italic>via</italic> a nonlinear transformation (i.e. the <italic>f</italic>(<italic>x</italic>) function in <xref ref-type="disp-formula" rid="e7">Eq. 7</xref>) and then apply a shared attention vector <italic>p</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>h</italic>
</sup>&#x2032;<sup>&#xd7;1</sup>, designed to determine the weight <inline-formula id="inf32">
<mml:math id="m38">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>:<disp-formula id="e7">
<mml:math id="m39">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>Here, <italic>W</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>h</italic>
</sup>&#x2032;<sup>&#xd7;<italic>d</italic>
</sup> is a weight matrix, <italic>b</italic>&#x20;&#x2208; <italic>R</italic>
<sup>
<italic>h</italic>
</sup>&#x2032;<sup>&#xd7;1</sup> is a bias vector, and <inline-formula id="inf33">
<mml:math id="m40">
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> is the tanh function (an activation function). The <italic>Softmax</italic> function can then be used to normalize different attention values in diverse dimensional graph networks. The final weight for node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> can be calculated as:<disp-formula id="e8">
<mml:math id="m41">
<mml:msubsup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>Larger values of <inline-formula id="inf34">
<mml:math id="m42">
<mml:msubsup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> imply that corresponding embeddings are more important. For <italic>n</italic> nodes in the <italic>r</italic> dimension network, we can first obtain a learned weight column vector <inline-formula id="inf35">
<mml:math id="m43">
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and then transform the column vector <italic>a</italic>
<sub>
<italic>r</italic>
</sub> into a <italic>&#x3b1;</italic>
<sub>
<italic>r</italic>
</sub> &#x3d; <italic>diag</italic> (<italic>a</italic>
<sub>
<italic>r</italic>
</sub>), <italic>&#x3b1;</italic>
<sub>
<italic>r</italic>
</sub> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>n</italic>
</sup> diagonal matrix.</p>
</sec>
<sec id="s3-6">
<title>3.6 Optimization Objective</title>
<p>Node Classification: The output embedding matrix in <xref ref-type="disp-formula" rid="e6">Eq. 6</xref> was used for node classification tasks in combination with a linear transformation and a <italic>Softmax</italic> function. Prediction results were then calculated as follows:<disp-formula id="e9">
<mml:math id="m44">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(9)</label>
</disp-formula>where <italic>W</italic>
<sub>
<italic>nc</italic>
</sub> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>n</italic>
</sup> is a weight matrix, <italic>b</italic>
<sub>
<italic>nc</italic>
</sub> &#x2208; <italic>R</italic>
<sup>
<italic>n</italic>&#xd7;<italic>C</italic>
</sup> is a bias vector,<inline-formula id="inf36">
<mml:math id="m45">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is the predicted result for <italic>n</italic> nodes, and <inline-formula id="inf37">
<mml:math id="m46">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> represents the predicted probability of node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> belonging to class <italic>c</italic>. The <inline-formula id="inf38">
<mml:math id="m47">
<mml:mi>s</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> function is a normalizer across all classes. The cross-entropy loss was then minimized over all training nodes using a loss function defined as follows:<disp-formula id="e10">
<mml:math id="m48">
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(10)</label>
</disp-formula>where <italic>S</italic> represents the training set, <italic>y</italic>
<sub>
<italic>i</italic>
</sub> is the real label for node <italic>v</italic>
<sub>
<italic>i</italic>
</sub>, and <inline-formula id="inf39">
<mml:math id="m49">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the predicted&#x20;label.</p>
<p>Overall Objective Function: The consistency constraint in <xref ref-type="disp-formula" rid="e4">Eq. 4</xref> was jointly optimized, along with the trainable weight parameter constraints in <xref ref-type="disp-formula" rid="e5">Eq. 5</xref> and the node classification function in <xref ref-type="disp-formula" rid="e10">Eq. 10</xref>. An overall objective function was given by:<disp-formula id="e11">
<mml:math id="m50">
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>L</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>Here, <italic>&#x3b1;</italic> and <italic>&#x3b2;</italic> are hyper-parameters used to control the importance of consistency constraints and trainable weight parameter constraint terms. Labeled data were then used to guide the learnable parameters <inline-formula id="inf40">
<mml:math id="m51">
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>W</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1,2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>M</mml:mi>
</mml:math>
</inline-formula>, which were automatically learned and updated via gradient descent and back-propagation algorithms. Convergence of the overall objective function <italic>L</italic> could then be used to obtain a global node representation <italic>Z</italic>
<sub>
<italic>global</italic>
</sub>. This process is summarized in <xref ref-type="statement" rid="algorithm_1">Algorithm&#x20;1</xref>.</p>
<p>
<statement content-type="algorithm" id="algorithm_1">
<label>Algorithm 1</label>
<p>The proposed technique.</p>
<p>
<inline-graphic xlink:href="fphy-09-763904-fx1.tif"/>
</p>
<p>For a given multiplex graph network <inline-formula id="inf45">
<mml:math id="m56">
<mml:mi>G</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> with <italic>M</italic> relationships, we first use the node2vec random walk technique to capture the 10-hop neighnoring node information, and then <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> is utilized to obtain the pre-trained node feature <italic>X</italic>
<sub>
<italic>rw</italic>
</sub> concatenating with the original node attributes <italic>X</italic> to get <italic>X</italic>
<sub>
<italic>new</italic>
</sub>. The proposed multiplex graph convolutional network is used to encode all nodes of the multiplex graph network into vectors. We first set the hyper-parameters including the embedding dimension <italic>d</italic>, parameters <italic>&#x3b1;</italic>, <italic>&#x3b2;</italic>, <italic>w</italic>. When training the proposed model oriented with the node classification task, the learnable parameters are first random initialized, and optimized with the back-propagation algorithm. While the loss function in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref> has not converged, the proposed model will computer each dimension (relationship) node embedding with <xref ref-type="disp-formula" rid="e3">Eq. 3</xref>, cross dimension interactions with <xref ref-type="disp-formula" rid="e4">Eq 4</xref> and <xref ref-type="disp-formula" rid="e5">Eq. 5</xref> and attention coefficient using <xref ref-type="disp-formula" rid="e7">Eq. 7</xref> and <xref ref-type="disp-formula" rid="e8">Eq. 8</xref>. The global node embedding <italic>Z</italic>
<sub>
<italic>global</italic>
</sub> for all nodes from the multiplex relationships is generated according to the obtained attention coefficient.</p>
</statement>
</p>
</sec>
<sec id="s3-7">
<title>3.7 Time Complexity</title>
<p>Our proposed technique is primarily composed of four components: 1) the pre-training of node features using a random walk network embedding method, 2) dimension specific node embedding using a GCN model, 3) cross dimension modeling terms, and 4) an attention-based mechanism used to generate global node representations by integrating embeddings of different dimensions. The time complexity of random walk can be expressed as <italic>O</italic> (&#x7c;<italic>V</italic>&#x7c;). Dimension specific embeddings <italic>O</italic> (<italic>L</italic>&#x7c;<italic>E</italic>&#x7c;<italic>d</italic>&#x2032; &#x2b; <italic>L</italic>&#x7c;<italic>V</italic>&#x7c;(<italic>F</italic>&#x20;&#x2b; <italic>d</italic>
<sub>
<italic>rw</italic>
</sub>)<italic>d</italic>&#x2032;) could then be learned using a GCN model, where <italic>F</italic>&#x20;&#x2b; <italic>d</italic>
<sub>
<italic>rw</italic>
</sub> is the dimension of input node features and <italic>d</italic>&#x2032; is the output dimension of one convolution layer. The term <italic>L</italic> represents the number of GCN layers (2 in this study). The time complexity required for learning global node representations is given by <italic>O</italic> (&#x7c;<italic>V</italic>&#x7c;<italic>d</italic>&#x2032;&#x7c;<italic>M</italic>&#x7c;), where &#x7c;<italic>M</italic>&#x7c; is the number of relation types. Updating attention weights for diverse dimensions in the process of model training, the time complexity can be expressed as <italic>O</italic> (&#x7c;<italic>S</italic>&#x7c;<italic>d</italic>&#x2032;&#x7c;<italic>M</italic>&#x7c;), where &#x7c;<italic>S</italic>&#x7c; is the number of training data. In practice, this quantity of training data is typically small, such that &#x7c;<italic>S</italic>&#x7c; &#x226a; &#x7c;<italic>E</italic>&#x7c;. In most practical networks, &#x7c;<italic>V</italic>&#x7c; &#x226a; &#x7c;<italic>E</italic>&#x7c;. As such, the total time complexity of node classification tasks can be simplified as <italic>O</italic> (&#x7c;<italic>V</italic>&#x7c;(<italic>F</italic>&#x20;&#x2b; <italic>d</italic>
<sub>
<italic>rw</italic>
</sub>)<italic>d</italic>&#x2032; &#x2b; &#x7c;<italic>E</italic>&#x7c;<italic>d</italic>&#x2032;).</p>
</sec>
</sec>
<sec id="s4">
<title>4 Experiments</title>
<sec id="s4-1">
<title>4.1 Experimental Setup</title>
<p>We construct our experiment on the popular Pytorch framework (<ext-link ext-link-type="uri" xlink:href="https://pytorch.org">https://pytorch.org</ext-link>). All the experiments are performed on a computer with 2.6&#xa0;GHz 4-core Intel Core i9 processor and the GPU is RTX2080.</p>
<p>Datasets: The proposed method was evaluated on several real-world datasets, as described in <xref ref-type="table" rid="T2">Table&#x20;2</xref>. Lazega is a dense network, while the other datasets are sparse networks.<list list-type="simple">
<list-item>
<p>&#x2022; Citeseer [<xref ref-type="bibr" rid="B5">5</xref>]:Citeseer is a citation network consisting of 3,312 research papers, where nodes are publications divided into six different research areas [<xref ref-type="bibr" rid="B5">5</xref>]. Node features are bag-of-words representations for individual papers. We can construct the multiplex graph network including two dimensional: a citation dimensional network (where edges represent citation links between papers) and a paper similarity dimensional network. It is a k-nearest neighbor (kNN) graph constructed by calculating the cosine similarity based on the node features and edges representing the top 10 similar papers (i.e.,&#x20;k is&#x20;10).</p>
</list-item>
<list-item>
<p>&#x2022; Cora [<xref ref-type="bibr" rid="B5">5</xref>]: Cora is a citation network containing 2,708 machine learning papers divided into seven classes [<xref ref-type="bibr" rid="B5">5</xref>]. Node features are bag-of-words representations of individual papers. We can utilize the same approach as for the Citeseer dataset to construct a multiplex network with two dimensions (i.e.,&#x20;citation and node similarity dimensions).</p>
</list-item>
<list-item>
<p>&#x2022; Lazega [<xref ref-type="bibr" rid="B38">38</xref>]: Lazega is a multiplex social network with three relationship types (i.e.,&#x20;strong coworker, advice, and friendship networks) among 71 attorneys (partners and associates) at a law firm. The law school was selected for node label classification.</p>
</list-item>
<list-item>
<p>&#x2022; ACM [<xref ref-type="bibr" rid="B24">24</xref>,<xref ref-type="bibr" rid="B28">28</xref>]: It is a multiplex network about the paper-paper relationships consisting of two views which are the two papers are written by same author and two papers contain same subjects respectively. The features of nodes are the elements of a bag-of-words represented of keywords. The nodes are divided into three classes.</p>
</list-item>
<list-item>
<p>&#x2022; DBLP [<xref ref-type="bibr" rid="B24">24</xref>,<xref ref-type="bibr" rid="B28">28</xref>]: The dataset is made up of three views about the authors-authors re-pationships, which is another multiplex network from the DBLP. The three views are the two authors have worked together on papers, two authors have published papers with the same terms, and two papers have published papers with the same terms. The classes of the nodes represent the DM(KDD,WSDM, ICDM), AI(ICML, AAAI,IJCAI), CV(CVPR),NLP (ACL,NAACL, EMNLP), which are the authors&#x2019; research&#x20;areas.</p>
</list-item>
<list-item>
<p>&#x2022; IMDB [<xref ref-type="bibr" rid="B24">24</xref>,<xref ref-type="bibr" rid="B28">28</xref>]: This is a movie network from the IMDB dataset. In this paper, the IMDB dataset is made up of two relationships (i.e. movies are acted by the same actor and movies are directed by the same director). The features of nodes are the bag-of-words represented of plots. The nodes are divided according to the movies&#x2019;&#x20;genre.</p>
</list-item>
<list-item>
<p>&#x2022; Amazon [<xref ref-type="bibr" rid="B24">24</xref>,<xref ref-type="bibr" rid="B28">28</xref>]: This dataset is a multiplex network including three views (i.e. also-viewed, also-bought, and bought-together) between items. The items are divided into four different categories (i.e. Beauty, Automotive, Patio Lawn and Garden). The features of the items are the description of&#x20;items.</p>
</list-item>
</list>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Dataset statistics.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Nodes</th>
<th align="center">Links</th>
<th align="center">Features</th>
<th align="center">Relationships</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Citeseer</td>
<td align="char" char=".">3,312</td>
<td align="center">21,462</td>
<td align="char" char=".">3,703</td>
<td align="char" char=".">2</td>
</tr>
<tr>
<td align="left">Cora</td>
<td align="char" char=".">2,708</td>
<td align="center">19,023</td>
<td align="char" char=".">1,433</td>
<td align="char" char=".">2</td>
</tr>
<tr>
<td align="left">Lazega</td>
<td align="char" char=".">71</td>
<td align="center">2,571</td>
<td align="char" char=".">71</td>
<td align="char" char=".">3</td>
</tr>
<tr>
<td align="left">ACM</td>
<td align="char" char=".">3,025</td>
<td align="center">2,240&#x2009;042</td>
<td align="char" char=".">1,870</td>
<td align="char" char=".">2</td>
</tr>
<tr>
<td align="left">DBLP</td>
<td align="char" char=".">4,057</td>
<td align="center">11,783&#x2009;886</td>
<td align="char" char=".">334</td>
<td align="char" char=".">3</td>
</tr>
<tr>
<td align="left">IMDB</td>
<td align="char" char=".">4,780</td>
<td align="center">80,216</td>
<td align="char" char=".">2,000</td>
<td align="char" char=".">2</td>
</tr>
<tr>
<td align="left">Amazon</td>
<td align="char" char=".">7,621</td>
<td align="center">1,384&#x2009;799</td>
<td align="char" char=".">2,000</td>
<td align="char" char=".">3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<bold>Baseline</bold>: AMRG was compared with the following competitive baselines.<list list-type="simple">
<list-item>
<p>&#x2022; DeepWalk [<xref ref-type="bibr" rid="B13">13</xref>]: DeepWalk is designed for standard single-layer network embedding without considering node attributes [<xref ref-type="bibr" rid="B13">13</xref>]. This approach first utilizes random walk in the networks and then applies a skip-gram algorithm to learn node representations.</p>
</list-item>
<list-item>
<p>&#x2022; node2vec [<xref ref-type="bibr" rid="B10">10</xref>]: On the top of the DeepWalk, node2vec adds two parameters to control the random walk process, forming a biased random walk&#x20;[<xref ref-type="bibr" rid="B10">10</xref>].</p>
</list-item>
<list-item>
<p>&#x2022; NetMF [<xref ref-type="bibr" rid="B31">31</xref>]: It is a general framework that unifies DeepWalk, LINE, node2vec, and PTE by converting a negative sampling into a matrix factorization method for learning network representations.</p>
</list-item>
<list-item>
<p>&#x2022; MGAT [<xref ref-type="bibr" rid="B26">26</xref>]: MGAT is a multiplex network embedding with the Graph Attention Networks (GAT)&#x20;model.</p>
</list-item>
</list>
</p>
<p>MGAT was implemented using Pytorch, though the source code is not provided here. The source code published by the authors was utilized for all other baselines.</p>
<p>Parameter Settings: The output node embedding dimension for all datasets was set to 32 to provide a fair comparison. We carefully turn parameters of our proposed model to get optimal performance. For our method, a two-layer GCN was trained with hidden layer dimensions of 64, 128, 256,768, 512, 1024 and output dimensions of 32. The objective function in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref> was minimized for the training set using a learning rate of 0.000&#x2009;95&#x2013;0.005 and the Adam optimizer. A dropout rate of 0.5 was used in addition to weight decay values of 0.000&#x2009;three for Cora, 0.002 for Citeseer, 0.000&#x2009;seven for Lazega, 0.000&#x2009;five for DBLP and ACM, 0.000&#x2009;nine for Amazon and 0.03 for IMDB. Consistency constraint coefficients and trainable weight parameters (<italic>&#x3b1;</italic> and <italic>&#x3b2;</italic>) were searched in the intervals <inline-formula id="inf46">
<mml:math id="m57">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mn>0.001</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.05</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.9</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1.0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> and <inline-formula id="inf47">
<mml:math id="m58">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mn>0.04</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.05</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.6</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.7</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1.0</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>, respectively. The self-connection weight was set to 2.0 for Citeseer and Amazon datasets, 3.0 for ACM and IMDB datasets and to 1.0 for Cora, Lazega and DBLP datasets. The Lazega dataset contains node relationship types but not node features. As such, a unit diagonal matrix was used as the feature matrix. The node2vec random walk node embedding dimension (<italic>d</italic>
<sub>
<italic>rw</italic>
</sub>) for learning distance node information was set to 8. These values of parameters are summarized in <xref ref-type="table" rid="T3">Table&#x20;3</xref>.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>The values of parameters.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Parameters</th>
<th align="center">Values</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>d</italic>
</td>
<td align="center">32</td>
</tr>
<tr>
<td align="left">learning rate</td>
<td align="center">0.000&#x2009;95&#x2013;0.005</td>
</tr>
<tr>
<td align="left">hidden layer dimensions</td>
<td align="center">{64,128,256,768,512,1024}</td>
</tr>
<tr>
<td align="left">dropout rate</td>
<td align="center">0.5</td>
</tr>
<tr>
<td align="left">weight decay (Cora)</td>
<td align="center">0.000&#x2009;3</td>
</tr>
<tr>
<td align="left">weight decay (Citeseer)</td>
<td align="center">0.002</td>
</tr>
<tr>
<td align="left">weight decay (Lazega)</td>
<td align="center">0.000&#x2009;7</td>
</tr>
<tr>
<td align="left">weight decay (DBLP,ACM)</td>
<td align="center">0.000&#x2009;5</td>
</tr>
<tr>
<td align="left">weight decay (Amazon)</td>
<td align="center">0.000&#x2009;9</td>
</tr>
<tr>
<td align="left">weight decay (IMDB)</td>
<td align="center">0.03</td>
</tr>
<tr>
<td align="left">
<italic>&#x3b1;</italic>
</td>
<td align="center">{0.001,0.05,0.2,0.9,1.0,0.1}</td>
</tr>
<tr>
<td align="left">
<italic>&#x3b2;</italic>
</td>
<td align="center">{0.04,0.05,0.1,0.5,0.6,0.7,1.0}</td>
</tr>
<tr>
<td align="left">
<italic>w</italic> (Citeseer, Amazon)</td>
<td align="center">2.0</td>
</tr>
<tr>
<td align="left">
<italic>w</italic> (ACM,IMDB)</td>
<td align="center">3.0</td>
</tr>
<tr>
<td align="left">
<italic>w</italic> (Cora, Lazega,DBLP)</td>
<td align="center">1.0</td>
</tr>
<tr>
<td align="left">
<italic>d</italic>
<sub>
<italic>rw</italic>
</sub>
</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">
<italic>p</italic>, <italic>q</italic>
</td>
<td align="center">{1,2,0.5}</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For baselines, DeepWalk is a special case of node2vec with <italic>p</italic>&#x20;&#x3d; <italic>q</italic>&#x20;&#x3d; 1. In the conventional node2vec algorithm, hyperparameters were set to <italic>p</italic>&#x20;&#x3d; 2 and <italic>q</italic>&#x20;&#x3d; 0.5, with a window size of 10 and five for negative samples. Other baseline hyperparameters were set as in the original papers.</p>
</sec>
<sec id="s4-2">
<title>4.2 Node Classification</title>
<p>Accuracy (ACC) was used to evaluate node classification performance for the all datasets. We randomly selected 10<italic>%</italic> of the nodes to establish a training set, 10<italic>%</italic> to form the validation set, and the remaining 80<italic>%</italic> formed the test set. A total of 120 epochs were used for each of the three datasets. The proposed approach was compared with its variants and state-of-the-art baselines to monitor node classification performance for the following scenarios.<list list-type="simple">
<list-item>
<p>&#x2022; AMRG/rw: A random walk strategy was not used to learn distance neighboring node information.</p>
</list-item>
<list-item>
<p>&#x2022; AMRG/att: Our proposed method without the attention mechanism.</p>
</list-item>
<list-item>
<p>&#x2022; AMRG/cc: Our proposed method without the regularized consistency constraint.</p>
</list-item>
<list-item>
<p>&#x2022; AMRG/wp: Our proposed method without the regularized trainable weight parameters constraints among &#x7c;<italic>M</italic>&#x7c;&#x20;GCNs.</p>
</list-item>
</list>
</p>
<p>The baselines of Node2vec and NetMF were designed for standard single-layer networks. As such, we first acquired node embeddings for each dimension separately and averaged the resulting embeddings together to produce the overall node embeddings. This process was repeated 5&#x20;times to produce averaged results for baselines. For our proposed method, as the node2vec random walk technique to capture the 10-hop distance node information and the GCN model will produce a slightly different outcome each time, so the performance of the node classification for our proposed model will be evaluated 5 times, and average 5&#x20;times results.</p>
<p>Node classification accuracy (ACC) values are reported in <xref ref-type="table" rid="T4">Table&#x20;4</xref>, where the bolded number denotes the best result. The following are evident from the table:<list list-type="simple">
<list-item>
<p>&#x2022; Compared with all other baselines, our proposed method consistently achieved the best performance for all datasets. These results demonstrate the effectiveness of our model for attributed multiplex network embedding, oriented by node classification tasks. From <xref ref-type="table" rid="T4">Table&#x20;4</xref>, we observe the random walk technique improved the accuracy of node classification by learning 10-hop node information. This suggests that node information from larger receptive fields is important.</p>
</list-item>
<list-item>
<p>&#x2022; AMRG/att generates node embeddings using a two-layer GCN for each dimensional network and simply averages the embeddings without considering the attention mechanism. <xref ref-type="table" rid="T4">Table&#x20;4</xref> suggests that attention-based weighting in each dimension can boost overall performance. This result is consistent with our assumption that node importance differs in each dimension. However, DBLP can obtain better performance on AMRG/att, it implies that three dimensional graph networks are almost equal degree of importance.</p>
</list-item>
<list-item>
<p>&#x2022; The coefficients (<italic>&#x3b1;</italic> and <italic>&#x3b2;</italic>) for AMRG/cc and AMRG/wp were set to zero respectively in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref>. The results suggest these two methods are inferior to AMRG. This indicates that consistency and weight parameters constraints are important for improving the overall performance, demonstrating the need to capture similarities and interactions among diverse dimensions.</p>
</list-item>
<list-item>
<p>&#x2022; In addition, our technique improved on the classification performance of MGAT (the best performing baseline method except for Amazon dataset, the MGAT is not suitable for Amazon) by as much as 6.69<italic>%</italic> for the DBLP dataset. This performance was only slightly better than MGAT for the Lazega data. This is likely because Lazega is a small, dense multiplex network with only 71 nodes. As such, node information spreads quickly in graphs through the two-layer GCN. The accuracy of node classification for the Cora all dataset was worse than that of other variants without pre-trained node features, which again indicates the importance of node information from larger receptive fields. However, as NetMF method uses the matrix factorization, the model does not convergence for Amazon, which explains the shortcomings of the NetMF.</p>
</list-item>
<list-item>
<p>&#x2022; The difference between our method and MGAT is that we used consistency constraints to embed each dimension network with pre-trained 10-hop node information. <xref ref-type="table" rid="T4">Table&#x20;4</xref> indicates the overall performance of our method is superior to MGAT, which suggests that consistency constraints and pre-trained methods with random walk are important mechanisms for learning high-quality node embeddings for multiplex network. Furthermore, compared with MGAT, our approach achieved superior node embeddings through two-layer general and simple GCNs with fewer model parameters than MGAT, which is based on the GAT model with multiple attention heads that lead to rapid growth in the number of parameters Veli&#x10d;kovi&#x107; and Cucurull&#x20;[<xref ref-type="bibr" rid="B15">15</xref>].</p>
</list-item>
</list>
</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Node classification Accuracy (<italic>%</italic>) (bold: Best).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Lazega</th>
<th align="center">Cora</th>
<th align="center">Citeseer</th>
<th align="center">ACM</th>
<th align="center">DBLP</th>
<th align="center">IMDB</th>
<th align="center">Amazon</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DeepWalk</td>
<td align="char" char=".">91.02</td>
<td align="char" char=".">66.12</td>
<td align="char" char=".">57.25</td>
<td align="char" char=".">73.16</td>
<td align="char" char=".">52.01</td>
<td align="char" char=".">53.52</td>
<td align="char" char=".">63.33</td>
</tr>
<tr>
<td align="left">Node2vec</td>
<td align="char" char=".">90.25</td>
<td align="char" char=".">67.92</td>
<td align="char" char=".">58.01</td>
<td align="char" char=".">73.10</td>
<td align="char" char=".">53.91</td>
<td align="char" char=".">52.23</td>
<td align="char" char=".">65.12</td>
</tr>
<tr>
<td align="left">NetMF</td>
<td align="char" char=".">92.05</td>
<td align="char" char=".">72.95</td>
<td align="char" char=".">57.15</td>
<td align="char" char=".">74.65</td>
<td align="char" char=".">54.98</td>
<td align="char" char=".">57.64</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">MGAT</td>
<td align="char" char=".">96.32</td>
<td align="char" char=".">85.49</td>
<td align="char" char=".">72.92</td>
<td align="char" char=".">92.26</td>
<td align="char" char=".">84.75</td>
<td align="char" char=".">66.87</td>
<td align="char" char=".">51.30</td>
</tr>
<tr>
<td align="left">AMRG</td>
<td align="char" char=".">96.49</td>
<td align="char" char=".">86.25</td>
<td align="char" char=".">74.31</td>
<td align="char" char=".">94.05</td>
<td align="char" char=".">91.44</td>
<td align="char" char=".">70.28</td>
<td align="char" char=".">78.12</td>
</tr>
<tr>
<td align="left">AMRG/rw</td>
<td align="char" char=".">96.33</td>
<td align="char" char=".">85.56</td>
<td align="char" char=".">74.27</td>
<td align="char" char=".">93.18</td>
<td align="char" char=".">88.79</td>
<td align="char" char=".">69.05</td>
<td align="char" char=".">76.72</td>
</tr>
<tr>
<td align="left">AMRG/att</td>
<td align="char" char=".">94.74</td>
<td align="char" char=".">85.69</td>
<td align="char" char=".">73.93</td>
<td align="char" char=".">93.72</td>
<td align="char" char=".">93.04</td>
<td align="char" char=".">70.00</td>
<td align="char" char=".">76.14</td>
</tr>
<tr>
<td align="left">AMRG/cc</td>
<td align="char" char=".">92.23</td>
<td align="char" char=".">85.74</td>
<td align="char" char=".">73.89</td>
<td align="char" char=".">93.43</td>
<td align="char" char=".">91.37</td>
<td align="char" char=".">69.86</td>
<td align="char" char=".">78.09</td>
</tr>
<tr>
<td align="left">AMRG/wp</td>
<td align="char" char=".">92.98</td>
<td align="char" char=".">85.60</td>
<td align="char" char=".">72.58</td>
<td align="char" char=".">93.06</td>
<td align="char" char=".">87.06</td>
<td align="char" char=".">68.87</td>
<td align="char" char=".">77.58</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Similarly, node classification F1-score values also are presented in <xref ref-type="table" rid="T5">Table&#x20;5</xref>, where the bolded number denotes the best result. These results demonstrate that our method is stable and competitive.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Node classification F1-score (<italic>%</italic>) (bold: Best).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Lazega</th>
<th align="center">Cora</th>
<th align="center">Citeseer</th>
<th align="center">ACM</th>
<th align="center">DBLP</th>
<th align="center">IMDB</th>
<th align="center">Amazon</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DeepWalk</td>
<td align="char" char=".">89.32</td>
<td align="char" char=".">64.78</td>
<td align="char" char=".">56.52</td>
<td align="char" char=".">72.67</td>
<td align="char" char=".">51.99</td>
<td align="char" char=".">51.54</td>
<td align="char" char=".">61.79</td>
</tr>
<tr>
<td align="left">Node2vec</td>
<td align="char" char=".">89.89</td>
<td align="char" char=".">66.01</td>
<td align="char" char=".">57.56</td>
<td align="char" char=".">72.76</td>
<td align="char" char=".">53.35</td>
<td align="char" char=".">51.33</td>
<td align="char" char=".">64.31</td>
</tr>
<tr>
<td align="left">NetMF</td>
<td align="char" char=".">92.01</td>
<td align="char" char=".">75.98</td>
<td align="char" char=".">61.41</td>
<td align="char" char=".">74.65</td>
<td align="char" char=".">54.11</td>
<td align="char" char=".">55.21</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">MGAT</td>
<td align="char" char=".">96.31</td>
<td align="char" char=".">84.06</td>
<td align="char" char=".">69.43</td>
<td align="char" char=".">91.01</td>
<td align="char" char=".">84.09</td>
<td align="char" char=".">65.67</td>
<td align="char" char=".">51.02</td>
</tr>
<tr>
<td align="left">AMRG</td>
<td align="char" char=".">96.32</td>
<td align="char" char=".">84.08</td>
<td align="char" char=".">71.32</td>
<td align="char" char=".">92.90</td>
<td align="char" char=".">90.51</td>
<td align="char" char=".">70.24</td>
<td align="char" char=".">72.85</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-3">
<title>4.3 Analysis of Attention Mechanisms</title>
<p>Now we use the Lazega,a dense network, and three sparse networks (Cora, IMDB, and DBLP) as examples to were used to analyze changing trends in attention values for node classification tasks. The results are shown in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>, where the <italic>x</italic>-axis denotes different numbers of epochs during the model training process and the <italic>y</italic>-axis is the corresponding attention value. As seen in <xref ref-type="fig" rid="F3">Figure&#x20;3A</xref>, the attention values for advice, friendship, and strong coworker dimensional networks are nearly the same for the Lazega dataset in epoch 0. However, this attention value varies with increasing training epochs. The attention value for the advice dimensional network increases in epoch 20 (compared to epoch 0), while the attention value for the strong coworkers dimensional network decreases. Similarly, the attention values for the advice and strong coworker dimensional networks gradually decrease until the model converges. However, the attention value continues increasing for the friendship dimensional network during the training process. This illustrates that our approach can adaptively learn the importance weights of diverse relation type network embeddings during individual steps. For example, information provided by the friendship dimension network was more important than that of the advice and strong coworker dimension for the Lazega dataset. Similar trends were observed for the Cora, IMDB and DBLP datasets. The citation dimension network also proved to be more important than the paper similarity dimension network in the overall system, as shown in <xref ref-type="fig" rid="F3">Figure&#x20;3B</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>
<bold>(A)</bold> Lazega. <bold>(B)</bold> Cora. <bold>(C)</bold> IMDB. <bold>(D)</bold> DBLP.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g003.tif"/>
</fig>
<p>For the IMDB dataset, when the epoch is 18, the accuracy of node classification is the best (i.e. 70.28). At this moment, the attention value for co-actor network and co-director network is almost equal. From <xref ref-type="table" rid="T4">Table&#x20;4</xref>, we can see that when the attention value for co-actor network and co-director network is equal (i.e.AMRG/att, attention values are 0.5 and 0.5 respectively), the accuracy is 70.00, which is in close proximity to 70.28. This demonstrate that the attention mechanism is important. For DBLP, when the epoch is 103, the accuracy of node classification is the&#x20;best.</p>
</sec>
<sec id="s4-4">
<title>4.4 Analysis of Variants</title>
<p>In this section, we analyze the effectiveness and sensitivity of pre-trained node vector dimensions, two regularization constraints, and the weight of self-connections. Specifically, we evaluate the performance (i.e.,&#x20;accuracy and macro-F1) of our method for node classification tasks with respect to <italic>&#x3b1;</italic>,<italic>&#x3b2;</italic>,<italic>w</italic> and <italic>d</italic>
<sub>
<italic>rw</italic>
</sub>. Since these datasets produced similar results, we use the Citeseer, ACM and Cora datasets as examples respectively.</p>
<p>Analysis for <italic>&#x3b1;</italic>: In this test, the value of <italic>&#x3b2;</italic> was set to zero, which eliminates <italic>L</italic>
<sub>
<italic>WP</italic>
</sub> in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref>. The parameter <italic>&#x3b1;</italic> was varied from 0.1 to 1.0, as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4A</xref>. The accuracy and macro-F1 score for these node classification tasks remained relatively stable with increasing <italic>&#x3b1;</italic> for Citeseer. And when the <italic>&#x3b1;</italic> is 0.3, the accuracy and macro-F1 score is the best for ACM in <xref ref-type="fig" rid="F4">Figure&#x20;4D</xref>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>The performance of our method for parameters <italic>&#x3b1;</italic>, <italic>&#x3b2;</italic>, <italic>w</italic> on the Citeseer <bold>(A&#x2013;C)</bold> and ACM <bold>(D&#x2013;F)</bold> datasets for node classification&#x20;task.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g004.tif"/>
</fig>
<p>Analysis for <italic>&#x3b2;</italic>: Here, the value of <italic>&#x3b1;</italic> was set to zero, which eliminates the consistency constraint <italic>L</italic>
<sub>
<italic>CC</italic>
</sub> in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref>. <xref ref-type="fig" rid="F4">Figure&#x20;4B</xref> demonstrates how different values of the coefficient <italic>&#x3b2;</italic> affected node classification performance. The value of <italic>&#x3b2;</italic> was increased from 0.1 to 1.0, as the accuracy slowly increased and decreased (the maximum was in 0.7), and the macro-F1 score remained nearly constant. Similarity trends can be found in <xref ref-type="fig" rid="F4">Figure&#x20;4E</xref> and the maximum was <italic>&#x3b2;</italic> in&#x20;0.6.</p>
<p>Analysis for <italic>w</italic>: The impact of the weight on these self-connections was observed by varying <italic>w</italic> from 0.0 to 10.0, as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4C</xref> for Citeseer. A value of <italic>w</italic>&#x20;&#x3d; 0.0 implies the node itself only considers neighboring nodes in generating embeddings, producing low accuracy and macro-F1 scores. Larger values (<italic>w</italic>&#x20;&#x3e; 1) indicate the node itself is more important than its neighboring nodes. The accuracy first increases and then remains relatively stable, before dropping quickly for <italic>w</italic>&#x20;&#x3d; 10.0. The maximum accuracy and macro-F1 scores occurred for <italic>w</italic>&#x20;&#x3d; 2 and <italic>w</italic>&#x20;&#x3d; 1, respectively, with macro-F1 decreasing slowly as <italic>w</italic> increased. For ACM in <xref ref-type="fig" rid="F4">Figure&#x20;4F</xref>, the best accuracy and macro-F1 occurred for <italic>w</italic>&#x20;&#x3d;&#x20;3.</p>
<p>Analysis of random walk dimension <italic>d</italic>
<sub>
<italic>rw</italic>
</sub>: <xref ref-type="fig" rid="F5">Figure&#x20;5</xref> suggests the accuracy of node classification was maximized for <italic>d</italic>
<sub>
<italic>rw</italic>
</sub> in 8, decreasing for higher dimensions. This demonstrates that pre-trained node features, which contains distance node neighboring information, can improve model performance. When the embedding dimension <italic>d</italic>
<sub>
<italic>rw</italic>
</sub> is small, the obtained pre-trained node features are not enough to describe the distance neighborhood information. On the contrary, if the value of embedding dimension <italic>d</italic>
<sub>
<italic>rw</italic>
</sub> is large, it will introduce some noise on the obtained pre-trained node features.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>The performance of our method for parameters <italic>d</italic>
<sub>
<italic>rw</italic>
</sub> on the Cora dataset for node classification&#x20;task.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g005.tif"/>
</fig>
<p>Analysis of different training nodes: We tested the performance of our method and its variants using training node fractions of 10, 20, and 40<italic>%</italic> for the attributed multiplex Citeeer network, as shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>. The proposed method consistently outperformed its variants, indicating the importance and effectiveness of attention mechanisms, consistency constraints, and trainable weight parameter constraints of &#x7c;<italic>M</italic>&#x7c; GCN for each dimensional graph network for boosting overall performance. The influence of attention mechanisms and these two constraints on node classification performance varied with different training label node ratios.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>
<bold>(A)</bold> Accuracy. <bold>(B)</bold> B Macro-F1.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g006.tif"/>
</fig>
</sec>
<sec id="s4-5">
<title>4.5 Analysis of Key Factor</title>
<p>In the previous section, we analyse the effect of the different single variant on the performance of the proposed model. At present, we take the Citeseer, ACM and Amazon as examples to evaluate the key factor of regularized consistency constraints, weight constraints and attention mechanism. That is to say, which one of three factors is the most significant to improve the AMRG. The evaluation indicator is the node classification accuracy.<list list-type="simple">
<list-item>
<p>&#x2022; AMRG(wp): The AMRG only with the weight constraints.</p>
</list-item>
<list-item>
<p>&#x2022; AMRG(cc): The AMRG only with the regularized consistency constraints.</p>
</list-item>
<list-item>
<p>&#x2022; AMRG(att): The AMRG only with the attention mechanism.</p>
</list-item>
</list>
</p>
<p>
<xref ref-type="table" rid="T6">Table&#x20;6</xref> shows that these three strategies (i.e. regularized consistency constraints, weight constraints and attention mechanism) have different importance on the three datasets. For Citeseer, it shows that the weight constraints is the key factor, as the performance of AMRG (wp) is better than AMRG (att) and AMRG (cc). From the results of ACM and Amazon, we can see that the regularized consistency constraints is the key factor for ACM and the attention mechanism is the key factor for Amazon. As the performance of AMRG (cc) is better and AMRG (att) is better than the other variants.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Node classification accuracy (<italic>%</italic>) (Bold: Best; underline: Runner-up).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Citeseer</th>
<th align="center">ACM</th>
<th align="center">Amazon</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">AMRG</td>
<td align="char" char=".">74.31</td>
<td align="char" char=".">94.05</td>
<td align="char" char=".">78.12</td>
</tr>
<tr>
<td align="left">AMRG (att)</td>
<td align="char" char=".">71.90</td>
<td align="char" char=".">93.26</td>
<td align="char" char=".">77.23</td>
</tr>
<tr>
<td align="left">AMRG (cc)</td>
<td align="char" char=".">72.92</td>
<td align="char" char=".">93.72</td>
<td align="char" char=".">76.40</td>
</tr>
<tr>
<td align="left">AMRG (wp)</td>
<td align="char" char=".">73.70</td>
<td align="char" char=".">93.22</td>
<td align="char" char=".">75.53</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-6">
<title>4.6 Visualization</title>
<p>Task visualization will be performed to provide a more intuitive comparison and to further show the effectiveness of our proposed method. As the results on different datasets will exhibit similar trends, we take the Citeseer dataset as an example to evaluate our method. Output embeddings in the last layer, prior to the <italic>Softmax</italic> operation, were utilized for node classification tasks and to plot the resulting node embeddings using t-SNE Maaten and van der Hinton [<xref ref-type="bibr" rid="B40">40</xref>]. Results for the Citeseer dataset are shown in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref> and are colored using real labels.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>
<bold>(A)</bold> Node2vec. <bold>(B)</bold> NetMF. <bold>(C)</bold> MGAT. <bold>(D)</bold> AMRG. <bold>(E)</bold> AMRG/rw. <bold>(F)</bold> AMRG/cc. <bold>(G)</bold> AMRG/wp <bold>(H)</bold> AMRG/att.</p>
</caption>
<graphic xlink:href="fphy-09-763904-g007.tif"/>
</fig>
<p>It is evident from the figure that the results of the node2vec and NetMF baselines are not satisfactory because nodes with different classes are mixed together. In contrast, MGAT and our method consider node features, producing better results than node2vec and NetMF. This further demonstrates the importance of node features for mining hidden graph information. The three variants are inferior to our method, which is consistent with the results of <xref ref-type="table" rid="T4">Table&#x20;4</xref>. Our approach has the clearest distinct boundaries among the diverse classes and the same classes are grouped together. This demonstrates the importance of pre-trained mechanisms with random walk network embedding, consistency constraints, trainable weight parameter constraints among <inline-formula id="inf48">
<mml:math id="m59">
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> GCN, and attention mechanisms.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Conclusion</title>
<p>In this paper, a new approach for node classification in attributed multiplex networks was developed using random walk network embedding and GCN. Random walk network embedding was first used to capture 10-hop node information as pre-trained node features. Then they were concatenated with the original node attributes, the resulting as the new node features inputted into the GCN model. A two-layer GCN was then utilized to learn each dimensional network. In order to achieve more comprehensive, informative, higher-quality and global consensus node representations, we introduced regularized consistency constraints to capture the similarities among different dimensional network embeddings. Besides, trainable weight parameter constraints were used to learn the interactive information from diverse dimensions. Furthermore, an attention mechanism was utilized to learn weights for diverse dimensional networks, and then to fuse them based on the weights. Extensive experiments were conducted using real-world networks applied to node classification, visualization, analysis of attention mechanisms, and parameter sensitivity. The results demonstrated that these strategies of our approach can boost overall performance of node classification for multiplex graph networks, which outperformed many competitive baselines. In the future, we plan to extend this framework to larger, more complex, and time-varying graphs. Another promising direction involves learning node representations combining edge features.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>Conceptualization, BH, YW, and LK; Data curation, BH, QW, and YY; Funding acquisition, YW and LK; Methodology, BH, YW, LK, and QW; Validation, Visualization, BH and YW; Supervision, YW and LK; Writing&#x2014;Original draft preparation, BH and YW; Writing&#x2014;Reviewing and Editing, BH, YW, LK, QW, and YY; All authors have read and agreed to the published version of the manuscript.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This paper is support by the National Natural Science Foundation of China (NSFC) under grant number 61&#x2009;873&#x2009;274 and Postgraduate Scientific Research Innovation Project of Hunan Province under grant number CX20200075.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>In this paper, we use the terminology <italic>graph network</italic>, <italic>network</italic>, <italic>graph</italic> interchangeably.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>Z</given-names>
</name>
</person-group>. <article-title>Zero-shot Node Classification with Decomposed Graph Prototype Network</article-title>. In: <conf-name>KDD &#x2019;21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2021</year>). p. <fpage>1769</fpage>&#x2013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1145/3447548.3467230</pub-id> </citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>Graph Contrastive Learning with Adaptive Augmentation</article-title>. In: <conf-name>WWW &#x2019;21: Proceedings of the Web Conference 2021</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2021</year>). p. <fpage>2069</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1145/3442381.3449802</pub-id> </citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tajeuna</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Bouguessa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Modeling and Predicting Community Structure Changes in Time-Evolving Social Networks</article-title>. <source>IEEE Trans Knowl Data Eng</source> (<year>2019</year>) <volume>31</volume>:<fpage>1166</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2018.2851586</pub-id> </citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Ranking-based Clustering of Heterogeneous Information Networks with star Network Schema</article-title>. In: <conf-name>KDD &#x2019;09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2009</year>). p. <fpage>797</fpage>&#x2013;<lpage>806</lpage>. <pub-id pub-id-type="doi">10.1145/1557019.1557107</pub-id> </citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kipf</surname>
<given-names>TN</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Semi-supervised Classification with Graph Convolutional Networks</article-title>. In: <conf-name>International Conference on Learning Representations (ICLR)</conf-name> (<year>2017</year>). </citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>ZWX-M</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>Q</given-names>
</name>
</person-group>. <source>Deeper Insights into Graph Convolutional Networks for Semi-supervised Learning</source>. <publisher-loc>Menlo Park, California, USA</publisher-loc>: <publisher-name>AAAI</publisher-name> (<year>2018</year>). </citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Muller</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Thabet</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ghanem</surname>
<given-names>B</given-names>
</name>
</person-group>. <article-title>Deepgcns: Can Gcns Go as Deep as Cnns?</article-title>. In: <conf-name>2019 IEEE/CVF International Conference on Computer Vision (ICCV)</conf-name>, <conf-loc>Seoul, South Korea</conf-loc>, <conf-date>October 27&#x2013;November 3, 2019</conf-date> (<year>2019</year>). p. <fpage>9266</fpage>&#x2013;<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1109/ICCV.2019.00936</pub-id> </citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X-m</given-names>
</name>
</person-group>. <article-title>M2grl: A Multi-Task Multi-View Graph Representation Learning Framework for Web-Scale Recommender Systems</article-title>. In: <conf-name>KDD &#x2019;20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2020</year>). p. <fpage>2349</fpage>&#x2013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1145/3394486.3403284</pub-id> </citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ying</surname>
<given-names>R</given-names>
</name>
<name>
<surname>He</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Eksombatchai</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hamilton</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Leskovec</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Graph Convolutional Neural Networks for Web-Scale Recommender Systems</article-title>. In <conf-name>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>, <conf-loc>London</conf-loc>, <conf-date>August 19 - 23, 2018</conf-date>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2018</year>). <fpage>974</fpage>&#x2013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1145/3219819.3219890</pub-id> </citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grover</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Leskovec</surname>
<given-names>J</given-names>
</name>
</person-group>.. <article-title>node2vec: Scalable Feature Learning for Networks</article-title>. <source>KDD</source> (<year>2016</year>) <volume>2016</volume>:<fpage>855</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939754</pub-id> </citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yi</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Scalable Multiplex Network Embedding</article-title>. In: <conf-name>IJCAI&#x27;18: Proceedings of the 27th International Joint Conference on Artificial Intelligence</conf-name>, <conf-loc>Stockholm, Sweden</conf-loc>, <conf-date>July 13-19, 2018</conf-date>. <publisher-name>AAAI Press</publisher-name> (<year>2018</year>). p. <fpage>3082</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2018/428</pub-id> </citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Qu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mei</surname>
<given-names>Q</given-names>
</name>
</person-group>. <article-title>Line</article-title>. In: <conf-name>Proceedings of the 24th International Conference on World Wide Web (WWW&#x2019;15)</conf-name>, <conf-loc>Montr&#x00E9;al, Qu&#x00E9;bec, Canada</conf-loc>, <conf-date>April 11 - 15, 2016</conf-date> (<year>2015</year>). p. <fpage>1067</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1145/2736277.2741093</pub-id> </citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Perozzi</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Al-Rfou</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Skiena</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>DeepWalk</article-title>. In: <conf-name>Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>, <conf-loc>New York, NY</conf-loc>, <conf-date>August 24 - 27, 2014</conf-date>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2014</year>). p. <fpage>701</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1145/2623330.2623732</pub-id> </citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hamilton</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Ying</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Leskovec</surname>
<given-names>J</given-names>
</name>
</person-group>. <source>Inductive Representation Learning on Large Graphs</source>. <publisher-loc>Granada, Spain</publisher-loc>: <publisher-name>NIPS</publisher-name> (<year>2017</year>). </citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Veli&#x10d;kovi&#x107;</surname>
<given-names>GCARALPBY</given-names>
</name>
<name>
<surname>Cucurull</surname>
<given-names>P</given-names>
</name>
</person-group>. <source>Graph Attention Networks</source>. <publisher-loc>Vienna</publisher-loc>: <publisher-name>ICLR</publisher-name> (<year>2018</year>). </citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Identification of Influential Spreaders Based on Classified Neighbors in Real-World Complex Networks</article-title>. <source>Appl Maths Comput</source> (<year>2018</year>) <volume>320</volume>:<fpage>512</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1016/j.amc.2017.10.001</pub-id> </citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nicosia</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Latora</surname>
<given-names>V</given-names>
</name>
</person-group>. <article-title>Measuring and Modeling Correlations in Multiplex Networks</article-title>. <source>Phys Rev E</source> (<year>2015</year>) <volume>92</volume>:<fpage>032805</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevE.92.032805</pub-id> </citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>The Impact of Awareness Diffusion on Sir-like Epidemics in Multiplex Networks</article-title>. <source>Appl Maths Comput</source> (<year>2019</year>) <volume>349</volume>:<fpage>134</fpage>&#x2013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.amc.2018.12.045</pub-id> </citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>P-Y</given-names>
</name>
<name>
<surname>Yeung</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Suzumura</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>Principled Multilayer Network Embedding</article-title>. <source>Principled multilayer Netw embedding</source> (<year>2017</year>) <volume>2017</volume>:<fpage>134</fpage>&#x2013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1109/ICDMW.2017.23</pub-id> </citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Murata</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Matsuno</surname>
<given-names>T</given-names>
</name>
</person-group>. <article-title>Mell: Effective Embedding Method for Multiplex Networks</article-title>. In: <conf-name>WWW &#x27;18: Companion Proceedings of the The Web Conference 2018</conf-name>, <conf-loc>Lyon, France</conf-loc>, <conf-date>April 23&#x2013;27, 2018</conf-date>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2018</year>). p. <fpage>1261</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1145/3184558.3191565</pub-id> </citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Qu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>An Attention-Based Collaboration Framework for Multi-View Network Representation Learning</article-title>. In: <conf-name>Proceedings of the 2017 ACM on Conference on Information and Knowledge Management</conf-name>, <conf-loc>Singapore</conf-loc>, <conf-date>November 6&#x2013;10, 2017</conf-date>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2017</year>). p. <fpage>1767</fpage>&#x2013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1145/3132847.3133021</pub-id> </citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ma</surname>
<given-names>SACYDTJ</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <source>Multi-dimensional Graph Convolutional Networks</source> (<year>2018</year>). </citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ghorbani</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Baghshah</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Rabiee</surname>
<given-names>HR</given-names>
</name>
</person-group>. <article-title>MGCN: Semi-supervised Classification in Multi-Layer Graphs with Graph Convolutional Networks</article-title>. In: <conf-name>ASONAM &#x27;19: International Conference on Advances in Social Networks Analysis and Mining</conf-name> (<year>2019</year>). p. <fpage>208</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1145/3341161.3342942</pub-id> </citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>DHJYH</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>C</given-names>
</name>
</person-group>. <source>Unsupervised Attributed Multiplex Network Embedding</source>. <publisher-loc>Menlo Park, California, USA</publisher-loc>: <publisher-name>AAAI</publisher-name> (<year>2020</year>). </citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>P</given-names>
</name>
<etal/>
</person-group> <article-title>Heterogeneous Graph Attention Network</article-title>. In: <conf-name>The World Wide Web Conference (WWW &#x2019;19)</conf-name>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2019</year>). 2022&#x2013;2032. </citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>MZ</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Mgat: Multi-View Graph Attention Networks</article-title>. <source>Neural Networks</source> (<year>2020</year>) <volume>132</volume>:<fpage>180</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2020.08.021</pub-id> </citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gong</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Heuristic 3d Interactive Walk for Multilayer Network Embedding</article-title>. <source>IEEE Trans Knowl Data Eng</source> (<year>2020</year>) <volume>2020</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2020.3021393</pub-id> </citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Fan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>B</given-names>
</name>
</person-group>.. <article-title>One2multi Graph Autoencoder for Multi-View Graph Clustering</article-title>. In: <conf-name>Proceedings of The Web Conference 2020 (WWW &#x2019;20)</conf-name>, <conf-loc>Taipei, Taiwan</conf-loc>, <conf-date>April 20&#x2013;24, 2020</conf-date>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2020</year>). 3070&#x2013;76. <pub-id pub-id-type="doi">10.1145/3366423.3380079</pub-id> </citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mikolov</surname>
<given-names>ICKCGJ</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>T</given-names>
</name>
</person-group>. <source>Distributed Representations of Words and Phrases and Their Compositionality</source>. <publisher-loc>Granada, Spain</publisher-loc>: <publisher-name>NIPS</publisher-name> (<year>2013</year>). p. <fpage>3111</fpage>&#x2013;<lpage>9</lpage>. </citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>W</given-names>
</name>
</person-group>. <article-title>Structural Deep Network Embedding</article-title>. In: <conf-name>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016</conf-name>, <conf-loc>San Francisco, CA</conf-loc>, <conf-date>August 13&#x2013;17, 2016</conf-date>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2016</year>). p. <fpage>1225</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939753</pub-id> </citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Network Embedding as Matrix Factorization: Unifying Deepwalk, Line, Pte, and Node2vec</article-title>. In: <conf-name>WSDM &#x27;18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining</conf-name>, <conf-loc>Marina Del Rey, CA</conf-loc>, <conf-date>February 5&#x2013;9, 2018</conf-date>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2018</year>). p. <fpage>459</fpage>&#x2013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1145/3159652.3159706</pub-id> </citation>
</ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>EY</given-names>
</name>
</person-group>. <article-title>Network Representation Learning with Rich Text Information</article-title>. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI&#x0027;15). Buenos Aires Argentina (<year>2015</year>). July 25 - 31, 2015. AAAI Press, 2111&#x2013;2117. </citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Homophily, Structure, and Content Augmented Network Representation Learning</article-title>. In: <conf-name>2016 IEEE 16th International Conference on Data Mining (ICDM)</conf-name>, <conf-loc>Barcelona, Spain</conf-loc>, <conf-date>December 12&#x2013;15, 2016</conf-date>. <publisher-name>IEEE</publisher-name> (<year>2016</year>). p. <fpage>609</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2016.0072</pub-id> </citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bandyopadhyay</surname>
<given-names>AKHMM</given-names>
</name>
<name>
<surname>Biswas</surname>
<given-names>S</given-names>
</name>
</person-group>. <source>A Multilayered Informative Random Walk for Attributed Social Network Embedding</source>. <publisher-loc>Santiago de Compostela</publisher-loc>: <publisher-name>ECAI</publisher-name> (<year>2020</year>). </citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>GZZYCLZWLLCSM</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Graph Neural Networks: A Review of Methods and Applications</article-title>. <source>arXiv:1812.08434 [Cs, Stat] 2019</source> (<year>2019</year>). </citation>
</ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Schlichtkrull</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kipf</surname>
<given-names>TN</given-names>
</name>
<name>
<surname>Bloem</surname>
<given-names>P</given-names>
</name>
<name>
<surname>van den Berg</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Titov</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Modeling Relational Data with Graph Convolutional Networks</article-title>. In: <conf-name>European Semantic Web Conference</conf-name>. <publisher-loc>Heraklion, Greece</publisher-loc>: <publisher-name>Springer, Cham</publisher-name> (<year>2018</year>). p. <fpage>593</fpage>&#x2013;<lpage>607</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-93417-4_38</pub-id> </citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goyal</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Ferrara</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Graph Embedding Techniques, Applications, and Performance: A Survey</article-title>. <source>Knowledge-Based Syst</source> (<year>2018</year>) <volume>151</volume>:<fpage>78</fpage>&#x2013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2018.03.022</pub-id> </citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grossetti</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lazega</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>The Collegial Phenomenon. The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership</article-title>. <source>Revue Fran&#xe7;aise de Sociologie</source> (<year>2003</year>) <volume>44</volume>:<fpage>186</fpage>. <pub-id pub-id-type="doi">10.2307/3323128</pub-id> </citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>Multi-layered Network Embedding</article-title>. In: <conf-name>Proceedings of the 2018 SIAM International Conference on Data Mining (SDM)</conf-name>. <publisher-loc>San Diego</publisher-loc>: <publisher-name>Society for Industrial and Applied Mathematics</publisher-name> (<year>2018</year>) p. <fpage>684</fpage>&#x2013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611975321.77</pub-id> </citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maaten</surname>
<given-names>G</given-names>
</name>
<name>
<surname>van der Hinton</surname>
<given-names>LJP</given-names>
</name>
</person-group>. <article-title>Visualizing High-Dimensional Data Using T-Sne</article-title>. <source>J&#x20;Machine Learn Res</source> (<year>2008</year>) <volume>9</volume>:<fpage>2579</fpage>&#x2013;<lpage>605</lpage>. </citation>
</ref>
</ref-list>
</back>
</article>