<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2024.1489306</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Promoting fairness in link prediction with graph enhancement</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Yezi</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2831576/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Hanning</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2783078/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Imani</surname> <given-names>Mohsen</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1392248/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/funding-acquisition/"/>
<role content-type="https://credit.niso.org/contributor-roles/project-administration/"/>
<role content-type="https://credit.niso.org/contributor-roles/resources/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Electrical Engineering and Computer Science, University of California, Irvine</institution>, <addr-line>Irvine, CA</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Donald Bren School of Information and Computer Sciences, University of California, Irvine</institution>, <addr-line>Irvine, CA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Wei Jin, Emory University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jiaxin Yang, Facebook, United States</p>
<p>Haochen Liu, Fidelity Investments, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Mohsen Imani <email>m.imani&#x00040;uci.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>10</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>7</volume>
<elocation-id>1489306</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>08</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>03</day>
<month>10</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2024 Liu, Chen and Imani.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Liu, Chen and Imani</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Link prediction is a crucial task in network analysis, but it has been shown to be prone to biased predictions, particularly when links are unfairly predicted between nodes from different sensitive groups. In this paper, we study the fair link prediction problem, which aims to ensure that the predicted link probability is independent of the sensitive attributes of the connected nodes. Existing methods typically incorporate debiasing techniques within graph embeddings to mitigate this issue. However, training on large real-world graphs is already challenging, and adding fairness constraints can further complicate the process. To overcome this challenge, we propose <monospace>FairLink</monospace>, a method that learns a fairness-enhanced graph to bypass the need for debiasing during the link predictor&#x00027;s training. <monospace>FairLink</monospace> maintains link prediction accuracy by ensuring that the enhanced graph follows a training trajectory similar to that of the original input graph. Meanwhile, it enhances fairness by minimizing the absolute difference in link probabilities between node pairs within the same sensitive group and those between node pairs from different sensitive groups. Our extensive experiments on multiple large-scale graphs demonstrate that <monospace>FairLink</monospace> not only promotes fairness but also often achieves link prediction accuracy comparable to baseline methods. Most importantly, the enhanced graph exhibits strong generalizability across different GNN architectures. <monospace>FairLink</monospace> is highly scalable, making it suitable for deployment in real-world large-scale graphs, where maintaining both fairness and accuracy is critical.</p></abstract>
<kwd-group>
<kwd>fairness</kwd>
<kwd>large-scale graphs</kwd>
<kwd>link prediction</kwd>
<kwd>trustworthy graph neural network</kwd>
<kwd>data-centric machine learning</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="4"/>
<equation-count count="9"/>
<ref-count count="70"/>
<page-count count="13"/>
<word-count count="9309"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Data Mining and Management</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>The scale of graph-structured data has expanded rapidly across various disciplines, including social networks (Liben-Nowell and Kleinberg, <xref ref-type="bibr" rid="B39">2003</xref>), citation networks (Yang et al., <xref ref-type="bibr" rid="B58">2016</xref>), knowledge graphs (Liu et al., <xref ref-type="bibr" rid="B43">2023</xref>; Zhang et al., <xref ref-type="bibr" rid="B65">2022</xref>), and telecommunication networks (Nanavati et al., <xref ref-type="bibr" rid="B48">2006</xref>; Xie et al., <xref ref-type="bibr" rid="B57">2022</xref>). This growth has spurred the development of advanced computational techniques aimed at modeling, discovering, and extracting complex structural patterns hidden within large graph datasets. Consequently, research has increasingly focused on inference learning to identify potential connections, leading to the creation of algorithms that enhance the accuracy of link prediction (Mara et al., <xref ref-type="bibr" rid="B45">2020</xref>; Li et al., <xref ref-type="bibr" rid="B37">2024</xref>). Despite the strong performance of these models in link prediction, they can exhibit biases in their predictions (Angwin et al., <xref ref-type="bibr" rid="B2">2022</xref>; Bose and Hamilton, <xref ref-type="bibr" rid="B3">2019</xref>). These biases may result in harmful social impacts on historically disadvantaged and underserved communities, particularly in areas such as ranking (Karimi et al., <xref ref-type="bibr" rid="B30">2018</xref>), social perception (Lee et al., <xref ref-type="bibr" rid="B35">2019</xref>), and job promotion (Clifton et al., <xref ref-type="bibr" rid="B7">2019</xref>). Given the widespread application of these models, it is crucial to address the fairness issues in link prediction.</p>
<p>Many existing studies have introduced the concept of fairness in link prediction and proposed algorithms to achieve it. For instance, <monospace>FairAdj</monospace> (Li et al., <xref ref-type="bibr" rid="B38">2021</xref>) introduces <italic>dyadic fairness</italic>, which requires equal treatment in the prediction of links between two nodes from different sensitive groups, as well as between two nodes from the same sensitive group. These approaches are predominantly model-centric, incorporating debiasing methods during the training process (Rahman et al., <xref ref-type="bibr" rid="B51">2019</xref>; Masrour et al., <xref ref-type="bibr" rid="B46">2020</xref>; Tsioutsiouliklis et al., <xref ref-type="bibr" rid="B54">2021</xref>; Li et al., <xref ref-type="bibr" rid="B38">2021</xref>; Current et al., <xref ref-type="bibr" rid="B9">2022</xref>). However, promoting fairness in models trained on large-scale graphs is particularly challenging. State-of-the-art link predictors, often deep learning methods like GNNs, are already difficult to train on large graphs (Zhang S. et al., <xref ref-type="bibr" rid="B66">2021</xref>; Hu et al., <xref ref-type="bibr" rid="B24">2021</xref>; Ferludin et al., <xref ref-type="bibr" rid="B16">2022</xref>; Han et al., <xref ref-type="bibr" rid="B21">2022</xref>). Introducing fairness considerations adds another layer of complexity, making the training process even more demanding. Therefore, model-centric approaches that attempt to enforce fairness during training may not be practical, as they introduce additional objectives that further complicate the already challenging training process (Liu, <xref ref-type="bibr" rid="B40">2023</xref>).</p>
<p>To address this challenge, we propose <monospace>FairLink</monospace>, a data-centric approach that incorporates dyadic fairness regularizer into the learning of the enhanced graph. This is achieved by optimizing a fairness loss function jointly with a utility loss. The utility loss is computed by evaluating the gradient distance (Zhao et al., <xref ref-type="bibr" rid="B67">2020</xref>; Jin et al., <xref ref-type="bibr" rid="B26">2023</xref>, <xref ref-type="bibr" rid="B25">2022a</xref>), which measures the differences in gradients between the enhanced and original graphs. This approach ensures that the task-specific performance is maintained in the learned graph (Zhao et al., <xref ref-type="bibr" rid="B67">2020</xref>). Additionally, the dyadic fairness loss directs the learning process toward generating a <italic>fair</italic> graph for link prediction, while the utility loss ensures the preservation of link prediction performance. In contrast to model-centric approaches (Zha et al., <xref ref-type="bibr" rid="B62">2023a</xref>,<xref ref-type="bibr" rid="B63">b</xref>; Jin et al., <xref ref-type="bibr" rid="B27">2022b</xref>), which focus on designing fairness-aware link predictors, <monospace>FairLink</monospace>emphasizes the creation of a generalizable fair graph specifically for link prediction tasks. We summarize our contributions as follows:</p>
<list list-type="bullet">
<list-item><p>This paper addresses the challenge of fair link prediction. While most existing methods concentrate on developing fairness-aware link predictors, we propose a novel data-centric approach. Our method focuses on constructing a fairness-enhanced graph. This graph can subsequently be used to train a link predictor without the need for debiasing techniques, while still ensuring fair link prediction.</p></list-item>
<list-item><p>To ensure fairness in the fairness-enhanced graph, <monospace>FairLink</monospace> optimizes a dyadic fairness loss function. Additionally, to preserve utility, <monospace>FairLink</monospace> minimizes the gradient distance between the fairness-enhanced graph and the original input graph. To improve the measurement of gradient distance, we introduce a novel scale-sensitive distance function.</p></list-item>
<list-item><p>The extensive experiments validate that, (1) the link prediction on the enhanced graph generated from <monospace>FairLink</monospace> is comparable with the link prediction on the input graph, (2) the fairness-utility trade-off of the enhanced graph is better than the baselines trained on the input graph, (3) the enhanced graph demonstrates strong generalizability, meaning it can achieve good fairness and utility performance on a test GNN architecture, even when it has been trained on a different GNN architecture.</p></list-item>
</list></sec>
<sec id="s2">
<title>2 Preliminaries</title>
<p>In the following section, we will start by introducing the notations used in our study. Next, we will explore the concept of fairness within the context of link prediction, which involves estimating the probability of a connection between two nodes in a network. We will then extend the principles of fair machine learning to the fairness of link prediction.</p>
<sec>
<title>2.1 Notation</title>
<p>Let <inline-formula><mml:math id="M8"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> as a graph, where <inline-formula><mml:math id="M9"><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:math></inline-formula> is the set of <italic>N</italic> nodes, <inline-formula><mml:math id="M10"><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow><mml:mo>&#x02286;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:math></inline-formula> is the edge set, <italic>X</italic>&#x02208;&#x0211D;<sup><italic>N</italic>&#x000D7;<italic>D</italic></sup> is the node features with <italic>D</italic> dimensions. <italic>A</italic>&#x02208;{0, 1}<sup><italic>N</italic>&#x000D7;<italic>N</italic></sup> is the adjacency matrix, where <italic>A</italic><sub><italic>uv</italic></sub> &#x0003D; 1 if there is an edge between nodes <italic>u</italic> and <italic>v</italic>. (<italic>u, v</italic>) denotes an edge between node <italic>u</italic> and node <italic>v</italic>. <italic>S</italic>&#x02208;&#x0211D;<sup><italic>N</italic>&#x000D7;<italic>K</italic></sup> is the vector containing sensitive attributes, <italic>K</italic> is the number of sensitive attributes can take on, (e.g., <italic>S</italic><sub><italic>u</italic></sub>&#x02208;{Female, Male, Unkown} for node <italic>u</italic>). <italic>g</italic>(&#x000B7;, &#x000B7;):&#x0211D;<sup><italic>H</italic></sup>&#x000D7;&#x0211D;<sup><italic>H</italic></sup> &#x02192; &#x0211D; is the bivariate link predictor, and <italic>g</italic>(<italic>z</italic><sub><italic>u</italic></sub>, <italic>z</italic><sub><italic>v</italic></sub>) is the predicted probability of an edge <inline-formula><mml:math id="M11"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow></mml:math></inline-formula> in a given graph, where <italic>z</italic><sub><italic>u</italic></sub> and <italic>z</italic><sub><italic>v</italic></sub> are the node embedding vectors with dimension <italic>H</italic> for node <italic>u</italic> and <italic>v</italic>. The <italic>problem</italic> of fair link prediction aims to learn a synthetic graph <inline-formula><mml:math id="M12"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, where a link predictor <italic>g</italic>(&#x000B7;, &#x000B7;) trained on <inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> will obtain comparable performance with it trained on the original graph <inline-formula><mml:math id="M14"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>, and the link predictions are fair. In our experiemnts, <inline-formula><mml:math id="M15"><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>|</mml:mo></mml:math></inline-formula> and <inline-formula><mml:math id="M16"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>.</p></sec>
<sec>
<title>2.2 Fairness in link prediction</title>
<p>Previous research in fair machine learning has typically defined fairness in the context of binary classification as the condition where the predicted label is independent of the sensitive attribute. In the domain of link prediction, which involves estimating the probability of a link between pairs of nodes in a graph, fairness can be extended by ensuring that the estimated probability is independent of the sensitive attributes of the two nodes involved. In this subsection, we introduce two fairness concepts relevant to link prediction: demographic parity and equal opportunity.</p>
<sec>
<title>2.2.1 Demographic parity</title>
<p>Demographic Parity (DP) requires that predictions are independent of the sensitive attribute. It has been extensively applied in previous fair machine learning studies, and by replacing the classification probability with link prediction probability. In the context of link prediction, DP fairness requires that the predicted probability of a link&#x00027;s existence be independent of the sensitive attributes of both nodes in the link. This concept is also referred to as <italic>dyadic fairness</italic> in prior literature (Li et al., <xref ref-type="bibr" rid="B38">2021</xref>), and is defined as follows:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02260;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Ideally, achieving dyadic fairness entails predicting intra- and inter-link relationships at the same rate from a set of candidate links. The metric used to assess dyadic fairness in link prediction is as follows:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M18"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x00394;</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>D</mml:mi><mml:mi>P</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02260;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec>
<title>2.2.2 Equal opportunity</title>
<p>Compared to Demographic Parity, which requires the probability of an instance being classified as a positive outcome to be equal for both sensitive groups, Equal Opportunity (EO) requires that, <italic>among instances from the positive class</italic>, the probability of being assigned a positive outcome is equal for both sensitive groups. In other words, EO ensures that the true positive rate is independent of the sensitive attribute. In link prediction, EO fairness requires that the probability of a link existing between two nodes be the same for node pairs within the same sensitive group as well as for node pairs from different sensitive groups. The formal definition of EO in link prediction is as follows:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02260;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Specifically, for link prediction, EO requires that the predicted probability <italic>g</italic>(<italic>u, v</italic>) of an existing link <inline-formula><mml:math id="M20"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow></mml:math></inline-formula> should be equal for node pairs within the same sensitive group (<italic>S</italic><sub><italic>u</italic></sub> &#x0003D; <italic>S</italic><sub><italic>v</italic></sub>) and for node pairs from different sensitive groups (<italic>S</italic><sub><italic>u</italic></sub>&#x02260;<italic>S</italic><sub><italic>v</italic></sub>). This approach aims to prevent any group from being unfairly disadvantaged. The method for assessing distance of EO fairness in link prediction is defined as follows:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x00394;</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>E</mml:mi><mml:mi>O</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02260;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow><mml:mo>|</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec></sec></sec>
<sec id="s3">
<title>3 Fairness-enhanced graph learning</title>
<p>In this section, we provide a comprehensive description of <monospace>FairLink</monospace>. Our objectives are twofold: (1) ensuring fairness within the fairness-enhanced graph and (2) preserving the utility of the fairness-enhanced graph. Specifically, our approach involves constructing a fairness-enhanced graph from the input graph to improve fairness in link prediction. To achieve the first objective, <monospace>FairLink</monospace> incorporates a dyadic regularization term that promotes fairness. For the second objective, <monospace>FairLink</monospace> maintains utility by minimizing the gradient distance between the input graph and the enhanced graph. Additionally, we introduce a novel scale-sensitive distance function to optimize the learned graph and measure the gradient distance effectively. To simplify the notation, we omit the training epoch <italic>t</italic> when introducing the loss function at a specific epoch. A framework of <monospace>FairLink</monospace> is provided in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The overall framework of <monospace>FairLink</monospace> aims to learn a fairness-enhanced graph in which both fairness is promoted and utility is preserved. Initially, a synthetic graph <inline-formula><mml:math id="M1"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is created with the same size as the input graph <inline-formula><mml:math id="M2"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula> and random link connections. Both the input graph and the synthetic graph are then fed into a trainable link predictor. The gradient of the cross-entropy loss with respect to the predictor&#x00027;s parameters is computed for both <inline-formula><mml:math id="M3"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M4"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The optimization of <inline-formula><mml:math id="M5"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> involves minimizing a fairness loss in conjunction with the gradient distance between <inline-formula><mml:math id="M6"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1489306-g0001.tif"/>
</fig>


<sec>
<title>3.1 Fairness enhancement</title>
<p>In this subsection, we describe how to equip the learned graph with fairness-aware properties. This is achieved by incorporating a dyadic fairness regularizer, as specified in <xref ref-type="disp-formula" rid="E1">Equation 1</xref>, into the learning process of the fairness-enhanced graph. Further details on this process can be found in Section 2.2. The schematic diagram in <xref ref-type="fig" rid="F2">Figure 2</xref> illustrates the fairness objective of the fairness-enhanced graph learning within <monospace>FairLink</monospace>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Fair link prediction objective in <monospace>FairLink</monospace>: Ensure equal probability for links between nodes from different sensitive groups and those from the same group.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1489306-g0002.tif"/>
</fig>


<p>The concept of fairness constraint has been investigated in Zafar et al. (<xref ref-type="bibr" rid="B60">2015</xref>, <xref ref-type="bibr" rid="B59">2017</xref>) by minimizing the disparity in fairness between users&#x00027; sensitive attributes and the signed distance from the users&#x00027; features to the decision boundary in fair linear classifiers. In this paper, we incorporate a fairness regularizer derived from &#x00394;<sub><italic>DP</italic></sub> (Chuang and Mroueh, <xref ref-type="bibr" rid="B6">2021</xref>; Zemel et al., <xref ref-type="bibr" rid="B61">2013</xref>), which quantifies the difference in the average predictive probability between various demographic groups. The fairness loss function <inline-formula><mml:math id="M22"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>f</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi></mml:mstyle></mml:mrow></mml:msub></mml:math></inline-formula> at training epoch <italic>t</italic> is defined as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M23"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>f</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02260;</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>E</italic> is estimated <inline-formula><mml:math id="M24"><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> is generated from a link between any node pair in the graph. where <inline-formula><mml:math id="M25"><mml:mover accent="true"><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> represents the prediction probability of the downstream task. The variable <italic>N</italic> denotes the total number of instances, while <italic>N</italic><sub><bold>s</bold> &#x0003D; 0/1</sub> refers to the total number of samples in the group associated with the sensitive attribute values of 0/1 respectively. The fundamental requirement for &#x00394;<sub><italic>DP</italic></sub> is that the average predictive probability <inline-formula><mml:math id="M26"><mml:mover accent="true"><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> within the same sensitive attribute group serves as a reliable approximation of the true conditional probability <inline-formula><mml:math id="M27"><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>|</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">S</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> or <inline-formula><mml:math id="M28"><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>|</mml:mo><mml:mstyle class="text"><mml:mtext mathvariant="bold">S</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></sec>
<sec>
<title>3.2 Utility preserving</title>
<p>In this section, we address the <italic>first objective</italic>: determining how to learn a fairness-enhanced graph such that a link predictor trained on it exhibits comparable performance to one trained on the input graph. <monospace>FairLink</monospace> first computes the link prediction loss for the original graph, denoted as <inline-formula><mml:math id="M29"><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, by calculating the cross-entropy loss between the predicted link distribution (based on the dot product scores of the node embeddings) and the actual link distribution. Similarly, the link prediction loss for the synthetic graph, <inline-formula><mml:math id="M30"><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, is computed in the same manner. The gradients of both graphs with respect to the link predictors&#x00027; weights, denoted as <inline-formula><mml:math id="M31"><mml:msub><mml:mrow><mml:mo>&#x02207;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M32"><mml:msub><mml:mrow><mml:mo>&#x02207;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, are then obtained. We define the utility loss <inline-formula><mml:math id="M33"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:mstyle></mml:mrow></mml:msub></mml:math></inline-formula> as the sum of the distances between these gradients across all training epochs.</p>
<p>Previous studies have utilized Cosine Distance to measure the distance between two gradients (Zhao et al., <xref ref-type="bibr" rid="B67">2020</xref>; Jin et al., <xref ref-type="bibr" rid="B26">2023</xref>; Liu and Shen, <xref ref-type="bibr" rid="B42">2024b</xref>,<xref ref-type="bibr" rid="B41">a</xref>). While effective, Cosine Distance is scale-insensitive, meaning it ignores the magnitude of the vectors. Since the magnitude of the gradient is critical for optimization, incorporating it into the distance measurement is important. To address this limitation, we propose a combined approach that integrates Cosine Distance with Euclidean Distance, which accounts for vector magnitudes. Thus, the revised distance function <italic>D</italic> is defined as:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M34"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x02207;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02207;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>e</mml:mi><mml:mi>u</mml:mi><mml:mi>c</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>D</italic><sub>cos</sub> denotes the Cosine Distance, <italic>D</italic><sub>euc</sub> denotes the Euclidean Distance, &#x003B3; serves as a trade-off hyperparameter, and &#x003B8;<sub><italic>t</italic></sub> is the trainable parameters for link predictor at training epoch <italic>t</italic>. The definitions of these distances are as follows:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M35"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi mathvariant='script'>G</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:msub><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:msub><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi mathvariant='script'>G</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mrow><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:msub><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mrow><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:msub><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi mathvariant='script'>G</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x02016;</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>u</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi mathvariant='script'>G</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mrow><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:msub><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mo>&#x02207;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B8;</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>&#x02112;</mml:mi><mml:msub><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi mathvariant='script'>G</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The utility loss at a specific epoch <italic>t</italic>, denoted as <inline-formula><mml:math id="M36"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">util</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:math></inline-formula>, is computed by summing the gradient distances between <inline-formula><mml:math id="M37"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M38"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> across all training epochs. It is formally defined as follows:</p>
<disp-formula id="E8"><mml:math id="M39"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x02207;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02207;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula>
<p>Minimizing the utility loss ensures that the training trajectory of <inline-formula><mml:math id="M40"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> closely follows that of <inline-formula><mml:math id="M41"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>, leading to parameters learned on <inline-formula><mml:math id="M42"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> closely approximating those learned on <inline-formula><mml:math id="M43"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>. As a result, <inline-formula><mml:math id="M44"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> preserves the essential information of the input graph <inline-formula><mml:math id="M45"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>.</p></sec>
<sec>
<title>3.3 Optimization</title>
<p>Optimizing a fairness-enhanced graph directly is computationally expensive due to the quadratic complexity involved in learning <bold>A</bold><sub><italic>f</italic></sub>. To address this challenge, previous work (Jin et al., <xref ref-type="bibr" rid="B26">2023</xref>) proposed modeling <bold>A</bold><sub><italic>f</italic></sub> as a function of <bold>X</bold><sub><italic>f</italic></sub>. We initialize the node feature <bold>X</bold><sub><italic>f</italic></sub> by randomly selecting original features from each class. Note that learning a fairness-aware feature matrix for fair link prediction is important because this matrix will be used for node embedding when training a new link predictor on the fairness-enhanced graph. Therefore, it is necessary for the feature matrix itself to be fairness-aware. We further simplify this approach by using a multi-layer perceptron parameterized by &#x003C8; with a sigmoid activation function to model the relationship, thereby reducing the computational burden. Thus, the final loss function is as follows:</p>
<disp-formula id="E9"><label>(8)</label><mml:math id="M46"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant='bold-italic'><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x003C8;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x1D53C;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>f</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi></mml:mstyle></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msup><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>T</italic> is the total training epochs, &#x003B1; and &#x003B2; are hyperparameters that govern the influence of two critical aspects: the gradient matching loss and the <italic>L</italic><sub>2</sub> norm regularization, respectively.</p>
<p>Jointly optimizing <bold>X</bold><sub><italic>f</italic></sub> and &#x003C8; is often challenging due to the interdependence between them. To overcome this, we employ an alternating optimization strategy. We first update <bold>X</bold><sub><italic>f</italic></sub> for &#x003C4;<sub>1</sub> epochs, then update &#x003C8; for &#x003C4;<sub>2</sub> epochs. This process is repeated alternately until the stopping criterion is satisfied.</p></sec>
<sec>
<title>3.4 Fair link prediction</title>
<p>To achieve fair link prediction, we first use the fairness-enhanced graph <inline-formula><mml:math id="M47"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to train a link predictor. This link predictor can differ in architecture from the model that produced <inline-formula><mml:math id="M48"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and does not necessarily incorporate fairness considerations. In this paper, we define the link prediction function <italic>g</italic>(&#x000B7;, &#x000B7;) as the inner product between the embeddings of two nodes <italic>u</italic> and <italic>v</italic>, for each node pair <inline-formula><mml:math id="M49"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:math></inline-formula>. Specifically, the function is defined as <italic>g</italic>(<italic>u, v</italic>) &#x0003D; <italic>u</italic><sup>&#x022A4;</sup>&#x003A3;<italic>v</italic>, where &#x003A3; is a positive-definite matrix that scales the input vectors directionally. In our implementation, &#x003A3; is set to an identity matrix, simplifying <italic>g</italic>(&#x000B7;, &#x000B7;) to the dot product, which is commonly used in link prediction research (Trouillon et al., <xref ref-type="bibr" rid="B53">2016</xref>; Kipf and Welling, <xref ref-type="bibr" rid="B33">2016b</xref>).</p></sec></sec>
<sec sec-type="discussion" id="s4">
<title>4 Discussion</title>
<p>In this paper, the fairness-enhanced graph produced by <monospace>FairLink</monospace> retains the same size as the input graph, as discussed in Section 2.1. To facilitate fairness-aware training on large-scale graphs, our approach concentrates on learning a fairness-enhanced graph that can be reused, thereby eliminating the need for repeated debiasing in future training with different link predictors. Future work could investigate methods for learning a smaller, fairness-enhanced graph derived from large-scale real-world graphs.</p></sec>
<sec id="s5">
<title>5 Experiments</title>
<p>In this section, we evaluate the effectiveness of <monospace>FairLink</monospace> on four large-scale real-world graphs. We focus on assessing its performance in link prediction and fairness, as well as the trade-off between fairness and utility by comparing <monospace>FairLink</monospace> with seven baseline methods. Additionally, we examine the generalizability of the graphs generated by <monospace>FairLink</monospace> by applying them to various GNN architectures.</p>
<sec>
<title>5.1 Experimental setup</title>
<sec>
<title>5.1.1 Datasets</title>
<p>We consider four large-scale graphs that have been extensively used in previous studies on fair link prediction. These graphs span a diverse range of domains, including citation networks, co-authorship networks, and social networks, each characterized by different sensitive attributes. We consider the nodes that take minority as the protected node group (e.g., Female nodes <sans-serif>Google&#x0002B;</sans-serif> and male nodes in the <sans-serif>Facebook</sans-serif>). The statistics of the datasets are in <xref ref-type="table" rid="T1">Table 1</xref>.</p>



<list list-type="bullet">
<list-item><p><sans-serif><bold>Pubmed</bold></sans-serif><xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>: <sans-serif>Pubmed</sans-serif> is a dataset where each node represents an article, characterized by a bag-of-words feature vector. An edge between two nodes indicates a citation between the corresponding articles, regardless of direction. The topic of an article, i.e., its category, is used as the sensitive attribute in this dataset.</p></list-item>
<list-item><p><sans-serif><bold>DBLP</bold></sans-serif><xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> (Tang et al., <xref ref-type="bibr" rid="B52">2008</xref>): <sans-serif>DBLP</sans-serif> is a co-authorship network constructed from the DBLP computer science bibliography database. The network comprises nodes representing authors extracted from papers accepted at eight different conferences. An edge exists between two nodes if the corresponding authors have collaborated on at least one paper. The sensitive attribute in this dataset is the continent of the author&#x00027;s institution.</p></list-item>
<list-item><p><sans-serif><bold>Google&#x0002B;</bold></sans-serif><xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> (Leskovec and Mcauley, <xref ref-type="bibr" rid="B36">2012</xref>): <sans-serif>Google&#x0002B;</sans-serif> is a social network dataset. The data was collected from users who chose to share their social circles, where they manually categorized their friends on the Google&#x0002B; platform.</p></list-item>
<list-item><p><sans-serif><bold>Facebook</bold></sans-serif><xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> (Leskovec and Mcauley, <xref ref-type="bibr" rid="B36">2012</xref>): <sans-serif>Facebook</sans-serif> is a dataset that contains anonymized feature vectors for each node, representing various attributes of a person&#x00027;s Facebook profile.</p></list-item>
</list>

<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Link prediction and fairness results on large-scale graphs.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Metric</bold></th>
<th valign="top" align="left"><bold>VGAE</bold></th>
<th valign="top" align="left"><bold>Node2vec</bold></th>
<th valign="top" align="left"><bold>FairPR</bold></th>
<th valign="top" align="left"><bold>Fairwalk</bold></th>
<th valign="top" align="left"><bold>FairAdj</bold></th>
<th valign="top" align="left"><bold>FLIP</bold></th>
<th valign="top" align="left"><bold>FairEGM</bold></th>
<th valign="top" align="left"><bold>FairLink</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:#dee1e1;color:#ffffff">
<td valign="top" align="left" colspan="9"><sans-serif><bold>Pubmed</bold></sans-serif> <bold>&#x00023;Nodes:</bold> 19, 717 <bold>&#x00023;Edges:</bold> 88, 648 <bold>Sensitive Attribute: Topic</bold></td>
</tr> <tr>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left"><bold>93.18</bold>&#x000B1;1.07</td>
<td valign="top" align="left">86.50 &#x000B1; 1.48</td>
<td valign="top" align="left">83.33 &#x000B1; 2.79</td>
<td valign="top" align="left">85.20 &#x000B1; 2.53</td>
<td valign="top" align="left">84.25 &#x000B1; 1.21</td>
<td valign="top" align="left">83.48 &#x000B1; 1.79</td>
<td valign="top" align="left">83.70 &#x000B1; 1.68</td>
<td valign="top" align="left"><underline>90.46&#x000B1;1.67</underline></td>
</tr> <tr>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left"><bold>96.20</bold>&#x000B1;0.85</td>
<td valign="top" align="left">93.27 &#x000B1; 1.23</td>
<td valign="top" align="left">88.21 &#x000B1; 0.62</td>
<td valign="top" align="left">91.43 &#x000B1; 1.11</td>
<td valign="top" align="left">90.53 &#x000B1; 1.03</td>
<td valign="top" align="left">87.44 &#x000B1; 1.36</td>
<td valign="top" align="left">88.12 &#x000B1; 2.33</td>
<td valign="top" align="left"><underline>95.24&#x000B1;1.65</underline></td>
</tr>
<tr>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">20.88 &#x000B1; 12.48</td>
<td valign="top" align="left">19.14 &#x000B1; 11.93</td>
<td valign="top" align="left">17.31 &#x000B1; 6.32</td>
<td valign="top" align="left">18.42 &#x000B1; 8.65</td>
<td valign="top" align="left"><underline>14.73&#x000B1;5.98</underline></td>
<td valign="top" align="left">15.42 &#x000B1; 7.69</td>
<td valign="top" align="left">17.52 &#x000B1; 6.30</td>
<td valign="top" align="left"><bold>5.42</bold>&#x000B1;2.65</td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">18.84 &#x000B1; 10.98</td>
<td valign="top" align="left">20.33 &#x000B1; 8.74</td>
<td valign="top" align="left">15.39 &#x000B1; 9.52</td>
<td valign="top" align="left">20.18 &#x000B1; 7.75</td>
<td valign="top" align="left"><underline>16.39&#x000B1;4.64</underline></td>
<td valign="top" align="left">19.43 &#x000B1; 8.01</td>
<td valign="top" align="left">19.29 &#x000B1; 9.44</td>
<td valign="top" align="left"><bold>4.86</bold>&#x000B1;1.34</td>
</tr> <tr style="background-color:#dee1e1;color:#ffffff">
<td valign="top" align="left" colspan="9"><sans-serif><bold>DBLP</bold></sans-serif> <bold>&#x00023;Nodes:</bold> 13, 015 <bold>&#x00023;Edges:</bold> 79, 972 <bold>Sensitive Attribute: Continent</bold></td>
</tr> <tr>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left"><bold>82.23</bold>&#x000B1;1.66</td>
<td valign="top" align="left">78.15 &#x000B1; 1.72</td>
<td valign="top" align="left">80.05 &#x000B1; 1.27</td>
<td valign="top" align="left">80.88 &#x000B1; 2.81</td>
<td valign="top" align="left">81.62 &#x000B1; 1.58</td>
<td valign="top" align="left">77.62 &#x000B1; 1.71</td>
<td valign="top" align="left">80.45 &#x000B1; 0.92</td>
<td valign="top" align="left"><underline>81.69&#x000B1;1.55</underline></td>
</tr> <tr>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left"><bold>90.77</bold>&#x000B1;1.82</td>
<td valign="top" align="left">83.21 &#x000B1; 2.94</td>
<td valign="top" align="left">72.43 &#x000B1; 1.30</td>
<td valign="top" align="left">88.39 &#x000B1; 1.59</td>
<td valign="top" align="left">84.51 &#x000B1; 2.25</td>
<td valign="top" align="left">78.14 &#x000B1; 3.41</td>
<td valign="top" align="left">80.43 &#x000B1; 2.62</td>
<td valign="top" align="left"><underline>88.72&#x000B1;1.76</underline></td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">7.42 &#x000B1; 3.95</td>
<td valign="top" align="left">8.43 &#x000B1; 5.25</td>
<td valign="top" align="left">11.65 &#x000B1; 4.33</td>
<td valign="top" align="left">9.86 &#x000B1; 4.04</td>
<td valign="top" align="left"><underline>3.55&#x000B1;3.37</underline></td>
<td valign="top" align="left">6.34 &#x000B1; 4.22</td>
<td valign="top" align="left">5.82 &#x000B1; 5.33</td>
<td valign="top" align="left"><bold>1.32</bold>&#x000B1;0.45</td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">8.53 &#x000B1; 3.60</td>
<td valign="top" align="left">7.22 &#x000B1; 4.37</td>
<td valign="top" align="left">9.37 &#x000B1; 5.24</td>
<td valign="top" align="left">7.10 &#x000B1; 3.57</td>
<td valign="top" align="left">5.82 &#x000B1; 3.91</td>
<td valign="top" align="left"><underline>5.39&#x000B1;4.37</underline></td>
<td valign="top" align="left">7.33 &#x000B1; 6.32</td>
<td valign="top" align="left"><bold>2.19</bold>&#x000B1;1.01</td>
</tr>
<tr style="background-color:#dee1e1;color:#ffffff">
<td valign="top" align="left" colspan="9"><sans-serif><bold>Google&#x0002B;</bold></sans-serif> <bold>&#x00023;Nodes:</bold> 4, 938 <bold>&#x00023;Edges:</bold> 547, 923 <bold>Sensitive Attribute: Gender</bold></td>
</tr> <tr>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left"><bold>88.33</bold>&#x000B1;1.21</td>
<td valign="top" align="left">81.11 &#x000B1; 1.50</td>
<td valign="top" align="left">76.22 &#x000B1; 1.36</td>
<td valign="top" align="left">82.47 &#x000B1; 1.08</td>
<td valign="top" align="left">84.77 &#x000B1; 1.19</td>
<td valign="top" align="left">78.35 &#x000B1; 2.02</td>
<td valign="top" align="left">80.69 &#x000B1; 1.53</td>
<td valign="top" align="left"><underline>85.34&#x000B1;0.81</underline></td>
</tr> <tr>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left"><bold>94.85</bold>&#x000B1;0.91</td>
<td valign="top" align="left">88.74 &#x000B1; 2.84</td>
<td valign="top" align="left">67.29 &#x000B1; 1.53</td>
<td valign="top" align="left">93.01 &#x000B1; 0.58</td>
<td valign="top" align="left">93.37 &#x000B1; 0.22</td>
<td valign="top" align="left">81.86 &#x000B1; 1.54</td>
<td valign="top" align="left">80.26 &#x000B1; 1.61</td>
<td valign="top" align="left"><underline>94.42&#x000B1;1.86</underline></td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">6.42 &#x000B1; 2.05</td>
<td valign="top" align="left">7.88 &#x000B1; 4.72</td>
<td valign="top" align="left">7.14 &#x000B1; 1.83</td>
<td valign="top" align="left">5.61 &#x000B1; 4.20</td>
<td valign="top" align="left">3.79 &#x000B1; 1.22</td>
<td valign="top" align="left"><bold>1.19</bold>&#x000B1;1.93</td>
<td valign="top" align="left">4.55 &#x000B1; 2.11</td>
<td valign="top" align="left"><underline>1.42&#x000B1;0.96</underline></td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">7.92 &#x000B1; 4.48</td>
<td valign="top" align="left">9.35 &#x000B1; 3.19</td>
<td valign="top" align="left">6.35 &#x000B1; 3.09</td>
<td valign="top" align="left">4.42 &#x000B1; 1.93</td>
<td valign="top" align="left">3.76 &#x000B1; 1.47</td>
<td valign="top" align="left"><underline>2.21&#x000B1;1.12</underline></td>
<td valign="top" align="left">5.37 &#x000B1; 3.65</td>
<td valign="top" align="left"><bold>1.01</bold>&#x000B1;0.75</td>
</tr> <tr style="background-color:#dee1e1;color:#ffffff">
<td valign="top" align="left" colspan="9"><sans-serif><bold>Facebook</bold></sans-serif> <bold>&#x00023;Nodes:</bold> 1, 045 <bold>&#x00023;Edges:</bold> 53, 498 <bold>Sensitive Attribute: Gender</bold></td>
</tr>
<tr>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left"><bold>82.41</bold>&#x000B1;1.23</td>
<td valign="top" align="left">79.35 &#x000B1; 0.95</td>
<td valign="top" align="left">76.22 &#x000B1; 1.30</td>
<td valign="top" align="left">78.11 &#x000B1; 0.78</td>
<td valign="top" align="left">81.14 &#x000B1; 1.23</td>
<td valign="top" align="left">78.5 &#x000B1; 1.42</td>
<td valign="top" align="left">79.77 &#x000B1; 2.92</td>
<td valign="top" align="left"><underline>82.37&#x000B1;0.41</underline></td>
</tr> <tr>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left"><bold>94.66</bold>&#x000B1;0.55</td>
<td valign="top" align="left">90.57 &#x000B1; 1.24</td>
<td valign="top" align="left">70.30 &#x000B1; 1.09</td>
<td valign="top" align="left">91.56 &#x000B1; 0.63</td>
<td valign="top" align="left">92.53 &#x000B1; 1.49</td>
<td valign="top" align="left">83.0 &#x000B1; 1.51</td>
<td valign="top" align="left">85.42 &#x000B1; 1.45</td>
<td valign="top" align="left"><underline>93.73&#x000B1;1.72</underline></td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">2.03 &#x000B1; 0.81</td>
<td valign="top" align="left">1.70 &#x000B1; 1.43</td>
<td valign="top" align="left">2.33 &#x000B1; 1.91</td>
<td valign="top" align="left">1.97 &#x000B1; 1.51</td>
<td valign="top" align="left">1.77 &#x000B1; 0.81</td>
<td valign="top" align="left"><underline>1.17&#x000B1;0.55</underline></td>
<td valign="top" align="left">2.21 &#x000B1; 1.05</td>
<td valign="top" align="left"><bold>0.83</bold>&#x000B1;0.36</td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">3.78 &#x000B1; 2.15</td>
<td valign="top" align="left">2.10 &#x000B1; 1.60</td>
<td valign="top" align="left">2.95 &#x000B1; 1.10</td>
<td valign="top" align="left">1.83 &#x000B1; 1.39</td>
<td valign="top" align="left"><bold>1.25</bold>&#x000B1;0.74</td>
<td valign="top" align="left">2.21 &#x000B1; 1.52</td>
<td valign="top" align="left">2.55 &#x000B1; 1.34</td>
<td valign="top" align="left"><underline>1.56&#x000B1;2.21</underline></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>An upward arrow (&#x02191;) indicates that a higher value is better, while a downward arrow (&#x02193;) signifies the opposite. For each metric, the best results are highlighted in bold, and the runner-up results are underlined.</p>
</table-wrap-foot>
</table-wrap>


</sec></sec></sec>
<sec id="s6">
<title>6 Baselines</title>
<p>We compare with two link prediction approaches, <monospace>VGAE</monospace> and <monospace>Node2vec</monospace>, and five state-of-the-art fair link prediction methods, <monospace>FairPR</monospace>, <monospace>Fairwalk</monospace>, <monospace>FairAdj</monospace>, <monospace>FLIP</monospace>, and <monospace>FairEGM</monospace>.</p>
<list list-type="bullet">
<list-item><p><bold>Link prediction methods</bold>: We consider two popular link prediction baselines: (1) The Variational Graph Autoencoder (<monospace>VGAE</monospace>) (Kipf and Welling, <xref ref-type="bibr" rid="B33">2016b</xref>), which is based on the variational autoencoder model. <monospace>VGAE</monospace>uses a GNN as the inference model and employs latent variables to reconstruct graph connections. (2) <monospace>Node2vec</monospace> (Grover and Leskovec, <xref ref-type="bibr" rid="B18">2016</xref>), a widely-used graph embedding approach based on random walks. It represents nodes as low-dimensional vectors that capture proximity information, enabling link prediction through these node embeddings.</p></list-item>
<list-item><p><bold>Fair link prediction methods</bold>: To evaluate fairness in link prediction, we compare against five state-of-the-art approaches: (1) <monospace>FairPR</monospace> (Tsioutsiouliklis et al., <xref ref-type="bibr" rid="B54">2021</xref>), which extends the PageRank algorithm by incorporating group fairness considerations. (2) <monospace>Fairwalk</monospace> (Rahman et al., <xref ref-type="bibr" rid="B51">2019</xref>), built upon <monospace>Node2vec</monospace>, modifies transition probabilities during random walks based on the sensitive attributes of a node&#x00027;s neighbors to promote fairness. (3) <monospace>FairAdj</monospace> (Li et al., <xref ref-type="bibr" rid="B38">2021</xref>), a regularization-based link prediction algorithm, ensures dyadic fairness by maintaining the utility of link prediction through a <monospace>VGAE</monospace>, while enforcing fairness with a dyadic loss regularizer. (4) <monospace>FLIP</monospace> (Masrour et al., <xref ref-type="bibr" rid="B46">2020</xref>) enhances structural fairness in graphs by reducing homophily and evaluates fairness by measuring reductions in modularity. (5) <monospace>FairEGM</monospace> (Current et al., <xref ref-type="bibr" rid="B9">2022</xref>), a collection of three methods, emulates different types of graph modifications to improve fairness. In our experiments, we use Constrained Fairness Optimization (GFO) as the representative method from this collection.</p></list-item>
</list>
<p>For a detailed discussion of the fair link prediction baselines, please refer to Section 10.3.</p></sec>
<sec id="s7">
<title>7 Metrics</title>
<p>To evaluate the accuracy of link prediction, we use two metrics: the F1-score and the area under the receiver operating characteristic curve (AUC) (Current et al., <xref ref-type="bibr" rid="B9">2022</xref>; Li et al., <xref ref-type="bibr" rid="B38">2021</xref>; Masrour et al., <xref ref-type="bibr" rid="B46">2020</xref>). For assessing group fairness, we adopt two additional metrics: the difference in demographic parity (&#x00394;<sub><italic>DP</italic></sub>) (Feldman et al., <xref ref-type="bibr" rid="B15">2015</xref>) and the difference in equal opportunity (&#x00394;<sub><italic>EO</italic></sub>) (Hardt et al., <xref ref-type="bibr" rid="B22">2016</xref>), as introduced in Section 2.2. Lower values of &#x00394;<sub><italic>DP</italic></sub> and &#x00394;<sub><italic>EO</italic></sub> indicate better fairness, making these metrics crucial for evaluating the fairness of the model.</p></sec>
<sec id="s8">
<title>8 Protocols</title>
<p>For the implementation of <monospace>FairLink</monospace>, we utilize a two-layer GraphSAGE (Hamilton et al., <xref ref-type="bibr" rid="B20">2017</xref>) as the feature embedding and inference mechanism. For <monospace>VGAE</monospace> and <monospace>Node2vec</monospace>, we adhere to the hyperparameter settings outlined in Masrour et al. (<xref ref-type="bibr" rid="B46">2020</xref>), while for the other baselines, we follow the configurations provided in their respective original papers. To fine-tune the model, we perform a grid search over the hyperparameters &#x003B1;, &#x003B2;, and &#x003B3; for each dataset. Specifically, &#x003B1; and &#x003B2; are selected from the set {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 1.5, 2, and 2.5}, and &#x003B3; is chosen from {0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, and 2.5}. Each experiment is conducted 10 times, with training set to 200 epochs across all datasets. The learning rate is specifically tuned for each dataset: 0.005 for <sans-serif>Pubmed</sans-serif>, and 0.01 for <sans-serif>DBLP</sans-serif>, <sans-serif>Google&#x0002B;</sans-serif>, and <sans-serif>Facebook</sans-serif>. All losses are optimized using the Adam optimizer (Kingma and Ba, <xref ref-type="bibr" rid="B31">2014</xref>). We followed the splitting from previous studies (Gurukar et al., <xref ref-type="bibr" rid="B19">2019</xref>; Current et al., <xref ref-type="bibr" rid="B9">2022</xref>), and conducted all experiments across 10 runs, employing different random seeds and train/test splits for each run.</p>
<sec>
<title>8.1 Link prediction and fairness performance of <monospace>FairLink</monospace></title>
<p>To evaluate the performance of our proposed method in both link prediction and fairness, we conducted a comprehensive comparison with the previously mentioned baselines using four real-world datasets. The results, which include the mean and standard deviations for all models across these datasets, are detailed in <xref ref-type="table" rid="T1">Table 1</xref>. From these results, we can draw the following observations:</p>
<list list-type="bullet">
<list-item><p>Our proposed method, <monospace>FairLink</monospace>, consistently demonstrates superior fairness performance in terms of both &#x00394;<sub><italic>DP</italic></sub> and &#x00394;<sub><italic>EO</italic></sub> across all evaluated datasets. For example, compared to <monospace>VGAE</monospace>, <monospace>FairLink</monospace>reduces &#x00394;<sub><italic>DP</italic></sub> by 74.0%, 82.2%, 77.9%, and 59.1% on the <sans-serif>Pubmed</sans-serif>, <sans-serif>DBLP</sans-serif>, <sans-serif>Google&#x0002B;</sans-serif>, and <sans-serif>Facebook</sans-serif>, respectively.</p></list-item>
<list-item><p>Regarding utility, <monospace>FairLink</monospace>typically achieves the second-best performance in terms of F1-score and AUC. For instance, <monospace>FairLink</monospace>retains 97.08%, 99.26%, 96.61%, and 99.95% of the F1-score of <monospace>VGAE</monospace>on the <sans-serif>Pubmed</sans-serif>, <sans-serif>DBLP</sans-serif>, <sans-serif>Google&#x0002B;</sans-serif>, and <sans-serif>Facebook</sans-serif>datasets, respectively.</p></list-item>
<list-item><p>Fair link prediction baselines, such as <monospace>FairPR</monospace>, <monospace>Fairwalk</monospace>, <monospace>FairAdj</monospace>, <monospace>FLIP</monospace>, and <monospace>FairEGM</monospace>, exhibit less predictive bias compared to standard link prediction models like <monospace>VGAE</monospace> and <monospace>Node2vec</monospace>. Among these, <monospace>fairadj</monospace> generally performs second-best after <monospace>FairLink</monospace>. Specifically, <monospace>FairAdj</monospace> shows better performance on the <sans-serif>Facebook</sans-serif>, while <monospace>FLIP</monospace> outperforms the others on the <sans-serif>Google&#x0002B;</sans-serif>.</p></list-item>
</list></sec>
<sec>
<title>8.2 Fairness-utility trade-off comparison</title>
<p>In <xref ref-type="fig" rid="F3">Figure 3</xref>, different colors are employed to distinguish between <monospace>FairPR</monospace>, <monospace>Fairwalk</monospace>, <monospace>FairAdj</monospace>, <monospace>FLIP</monospace>, <monospace>FairEGM</monospace>, and <monospace>FairLink</monospace>. Ideally, a debiasing method should be positioned in the upper-left corner of the plot to achieve the optimal balance between utility and unbiasedness. As depicted in the figures, methods based on <monospace>FairLink</monospace> generally provide the most favorable trade-offs between these two competing objectives. In contrast, while <monospace>FairAdj</monospace> usually offers superior debiasing with minimal utility loss, <monospace>Fairwalk</monospace> excels in maintaining high utility but is less effective in reducing bias. Although <monospace>FairPR</monospace> can achieve reasonable unbiasedness in embeddings, it significantly compromises utility compared to <monospace>FairLink</monospace>, as illustrated in the <sans-serif>DBLP</sans-serif> and <sans-serif>Google&#x0002B;</sans-serif> datasets. In contrast, <monospace>FairEGM</monospace> does not show a notable debiasing effect.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Trade-off between fairness and link prediction accuracy across four datasets. Results in the upper left corner, which exhibit both lower bias and higher accuracy, represent the ideal balance.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1489306-g0003.tif"/>
</fig>


<p>The ability of <monospace>FairLink</monospace> to achieve a superior trade-off is attributed to its objective design, as outlined in <xref ref-type="disp-formula" rid="E9">Equation 8</xref>. Previous studies have demonstrated the effectiveness of gradient matching in generating synthetic data that maintains prediction accuracy during machine learning model training (Zhao et al., <xref ref-type="bibr" rid="B67">2020</xref>; Jin et al., <xref ref-type="bibr" rid="B26">2023</xref>, <xref ref-type="bibr" rid="B25">2022a</xref>; Liu and Shen, <xref ref-type="bibr" rid="B41">2024a</xref>). It is important to note that the synthetic graph derived from minimizing the gradient matching objective is not a singular solution; rather, it is quite flexible. Inspired by this insight, the proposed <monospace>FairLink</monospace> seeks to promote fairness in the learned data&#x02013;the fairness-enhanced graph&#x02013;by incorporating a fairness constraint into the gradient matching objective. In essence, <monospace>FairLink</monospace> aims to identify a solution that is close to the optimized graph space, where the gradient matching loss is zero, while simultaneously minimizing the fairness loss. A prior study also confirms the favorable fairness-utility trade-off in the learned graph when utilizing this design for the node classification task (Liu, <xref ref-type="bibr" rid="B40">2023</xref>).</p></sec>
<sec>
<title>8.3 Generalizability to other link prediction models</title>
<p>To validate the generalizability of the fairness-enhanced graph, we perform a cross-architecture analysis. Initially, we used GraphSAGE (SAGE) to generate synthetic graphs. These graphs are then evaluated across various architectures, including GCN, GAT, and VGAE, as well as on the original GraphSAGE model. Additionally, we apply <monospace>FairLink</monospace> with different structures to all datasets and assess the cross-architecture generalization performance of the fairness-enhanced graphs. The results of these experiments are documented in <xref ref-type="fig" rid="F4">Figures 4A</xref>&#x02013;<xref ref-type="fig" rid="F4">D</xref>.</p>


<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Cross-architecture performance of <monospace>FairLink</monospace>with different test architectures on four datasets. <bold>(A)</bold> <sans-serif>Pubmed</sans-serif>. <bold>(B)</bold> <sans-serif>DBLP</sans-serif>. <bold>(C)</bold> <sans-serif>Google&#x0002B;</sans-serif>. <bold>(D)</bold> <sans-serif>Facebook</sans-serif>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1489306-g0004.tif"/>
</fig>


<p>Compared to <xref ref-type="table" rid="T1">Table 1</xref>, <monospace>FairLink</monospace>demonstrates improved fairness performance over <monospace>VGAE</monospace>and <monospace>Node2vec</monospace>across most model-dataset combinations. This indicates that <monospace>FairLink</monospace>is versatile and consistently achieves gains across various architectures and datasets. Our fairness-enhanced graphs show generally superior performance in fairness metrics (e.g., &#x00394;<sub><italic>DP</italic></sub> and &#x00394;<sub><italic>EO</italic></sub>) and utility metrics (e.g., F1-score and AUC) across all datasets. Specifically, GraphSAGE excels in fairness across all datasets and achieves the best utility on <sans-serif>Pubmed</sans-serif>, <sans-serif>DBLP</sans-serif>, and <sans-serif>Google&#x0002B;</sans-serif>.</p></sec>
<sec>
<title>8.4 Ablation study</title>
<p>To evaluate the necessity of generating a synthetic graph for fair link prediction in architectures without built-in fairness considerations, we conducted an ablation study to compare the performance of the proposed <monospace>FairLink</monospace> with two of its variants: (1) <monospace>FairLink</monospace><italic>m</italic>, which is a model-centric variant of <monospace>FairLink</monospace>, excludes the synthetic graph and directly applies the dyadic fairness loss in <xref ref-type="disp-formula" rid="E5">Equation 5</xref> to the training of a link predictor, and (2) <monospace>FairLink</monospace><italic>cos</italic>, which only uses the cosine distance function in the gradient matching process in <xref ref-type="disp-formula" rid="E6">Equation 6</xref>, by setting &#x003B3; &#x0003D; 0. We evaluated both link prediction accuracy and fairness performance across various architectures that do not explicitly account for fairness.</p>
<p>For <monospace>FairLink</monospace>, we first trained GraphSAGE to obtain a fairness-enhanced graph. This graph was then used for training diverse architectures without any fairness design, including GraphSAGE, GAT, and VGAE, and we tested on the trained link predictor. In the variant without the synthetic graph generation, we incorporated the fairness loss directly into the training of GraphSAGE. However, similar to <monospace>FairLink</monospace>, we excluded fairness loss when training the other architectures, such as GAT and VGAE, to ensure a fair comparison.</p>
<p><bold>(1) Data-centric vs. model-centric debiasing:</bold> From the results in the &#x0201C;(GAT)&#x0201D; and &#x0201C;(VGAE)&#x0201D; columns of <xref ref-type="table" rid="T2">Table 2</xref>, which correspond to architectures without fairness-aware designs, we observe that <monospace>FairLink</monospace>can generalize fairness when applied to different architectures, whereas <monospace>FairLink</monospace><italic>m</italic> cannot. Comparing the &#x0201C;(Self)&#x0201D; columns for <monospace>FairLink</monospace> and <monospace>FairLink</monospace><italic>m</italic>, it is evident that directly adding fairness loss during the training of a link predictor significantly degrades accuracy. This aligns with findings from prior studies, where applying fairness loss directly during the training of node classifiers led to a similar drop in performance (Qian et al., <xref ref-type="bibr" rid="B50">2024</xref>; Dong et al., <xref ref-type="bibr" rid="B13">2024</xref>). However, <monospace>FairLink</monospace>, by utilizing a gradient matching loss to preserve link prediction accuracy on the fairness-enhanced graph, successfully alleviates this trade-off. As a result, <monospace>FairLink</monospace> achieves both higher accuracy and better generalization of fairness across different architectures.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>An ablation study comparing the proposed <monospace>FairLink</monospace> with its variants <monospace>FairLink</monospace><italic>m</italic> and <monospace>FairLink</monospace><italic>cos</italic>.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="left"><bold>Metric</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(Self)</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(GAT)</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(VGAE)</bold></th>
<th valign="top" align="left"><bold><sans-serif>Google&#x0002B;</sans-serif>(Self)</bold></th>
<th valign="top" align="left"><bold><sans-serif>Facebook</sans-serif>(GAT)</bold></th>
<th valign="top" align="left"><bold><sans-serif>Google&#x0002B;</sans-serif>(VGAE)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><monospace>FairLink</monospace> </td>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left">81.69</td>
<td valign="top" align="left">79.31</td>
<td valign="top" align="left">80.28</td>
<td valign="top" align="left">85.34</td>
<td valign="top" align="left">83.62</td>
<td valign="top" align="left">83.94</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left">88.72</td>
<td valign="top" align="left">85.22</td>
<td valign="top" align="left">86.19</td>
<td valign="top" align="left">94.42</td>
<td valign="top" align="left">92.20</td>
<td valign="top" align="left">92.65</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">1.32</td>
<td valign="top" align="left">3.29</td>
<td valign="top" align="left">7.42</td>
<td valign="top" align="left">1.42</td>
<td valign="top" align="left">5.29</td>
<td valign="top" align="left">3.42</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">2.19</td>
<td valign="top" align="left">4.13</td>
<td valign="top" align="left">5.16</td>
<td valign="top" align="left">1.01</td>
<td valign="top" align="left">4.13</td>
<td valign="top" align="left">4.16</td>
</tr> <tr>
<td valign="top" align="left"><monospace>FairLink</monospace><italic>m</italic></td>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left">78.57</td>
<td valign="top" align="left">79.90</td>
<td valign="top" align="left">80.35</td>
<td valign="top" align="left">83.80</td>
<td valign="top" align="left">80.04</td>
<td valign="top" align="left">83.23</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left">84.32</td>
<td valign="top" align="left">84.34</td>
<td valign="top" align="left">86.36</td>
<td valign="top" align="left">91.05</td>
<td valign="top" align="left">89.36</td>
<td valign="top" align="left">91.90</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">4.25</td>
<td valign="top" align="left">7.27</td>
<td valign="top" align="left">11.31</td>
<td valign="top" align="left">3.83</td>
<td valign="top" align="left">7.72</td>
<td valign="top" align="left">8.71</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">5.73</td>
<td valign="top" align="left">8.14</td>
<td valign="top" align="left">9.75</td>
<td valign="top" align="left">3.92</td>
<td valign="top" align="left">8.10</td>
<td valign="top" align="left">9.22</td>
</tr> <tr>
<td valign="top" align="left"><monospace>FairLink</monospace><italic>cos</italic></td>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left">79.15</td>
<td valign="top" align="left">78.63</td>
<td valign="top" align="left">78.06</td>
<td valign="top" align="left">81.17</td>
<td valign="top" align="left">79.84</td>
<td valign="top" align="left">80.85</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left">86.24</td>
<td valign="top" align="left">82.25</td>
<td valign="top" align="left">85.32</td>
<td valign="top" align="left">89.23</td>
<td valign="top" align="left">87.48</td>
<td valign="top" align="left">88.45</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">1.41</td>
<td valign="top" align="left">5.19</td>
<td valign="top" align="left">8.11</td>
<td valign="top" align="left">1.31</td>
<td valign="top" align="left">5.97</td>
<td valign="top" align="left">4.37</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">2.76</td>
<td valign="top" align="left">4.78</td>
<td valign="top" align="left">5.73</td>
<td valign="top" align="left">1.25</td>
<td valign="top" align="left">5.68</td>
<td valign="top" align="left">3.28</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The columns labeled &#x0201C;(Self)&#x0201D; indicate that both the training and testing architectures are GraphSAGE, with fairness constraints applied during training. In contrast, the columns labeled &#x0201C;(GAT)&#x0201D; or &#x0201C;(VGAE)&#x0201D; indicate that the test architectures are GAT or VGAE, respectively, and do not incorporate fairness-aware design. An upward arrow (&#x02191;) indicates that a higher value is better, while a downward arrow (&#x02193;) signifies the opposite.</p>
</table-wrap-foot>
</table-wrap>


<p><bold>(2) Euclidean &#x00026; cosine distance vs. cosine distance:</bold> From the results of <monospace>FairLink</monospace> and <monospace>FairLink</monospace><italic>cos</italic>, we observe a decline in link prediction performance when the Euclidean function is excluded from gradient matching. This finding highlights the importance of minimizing the gradient magnitude when optimizing the fairness-enhanced graph. In conclusion, <monospace>FairLink</monospace>, which utilizes both Euclidean and Cosine distance functions, is more effective at preserving the original graph&#x00027;s information in the learned graph.</p></sec></sec>
<sec id="s9">
<title>9 Further discussions</title>
<sec>
<title>9.1 Complexity analysis</title>
<p>Let <italic>L</italic> denote the number of MLP layers used for learning the adjacency matrix, and let <italic>d</italic> represent the number of hidden units. The complexity of <monospace>FairLink</monospace> is constituted by several calculations: (1) Calculation of <italic>A</italic>&#x02032;: This step has a complexity of <italic>O</italic>(<italic>N</italic><sup>2</sup><italic>d</italic><sup>2</sup>). (2) Forward Pass of GNN: Computing the forward pass on the full graph incurs a complexity of <italic>O</italic>(<italic>kLNd</italic><sup>2</sup>), where <italic>k</italic> is the size of the sampled nodes per training instance. (3) Training on Fairness-Enhanced Graph: The complexity for training on the fairness-enhanced graph is <italic>O</italic>(<italic>LNd</italic>). (4) Gradient Matching Strategy: Including the calculation of additional matching metrics, the complexity of the gradient matching strategy is <italic>O</italic>(2|&#x003B8;|&#x0002B;|<italic>A</italic>&#x02032;|&#x0002B;|<italic>X</italic>&#x02032;|). Considering <italic>T</italic> iterations and <italic>M</italic> different initializations, the total complexity is <italic>MT</italic> times the sum of the aforementioned complexities. (5) For link prediction tasks, calculating the dot product of node embeddings adds an extra cost of <italic>O</italic>(<italic>N</italic><sup>2</sup><italic>D</italic>). Therefore, the overall complexity of <monospace>FairLink</monospace> is the sum of all these components.</p></sec>
<sec>
<title>9.2 A smaller size of the fairness-enhanced graph</title>
<p>To evaluate whether it is feasible to learn a smaller fairness-enhanced graph, we implement <monospace>method</monospace> by initializing a synthetic graph with 75% of the nodes from the input training graph. In this experiment, we fine-tune all the hyperparameters in <monospace>FairLink</monospace> using the same settings as for the full graph on the <sans-serif>DBLP</sans-serif>dataset. We then compare the performance of a link predictor trained on both the full fairness-enhanced graph and the smaller graph. We assess the performance on two different architectures: the same architecture used for generating the graph (labeled as &#x0201C;Self&#x0201D;) and a different architecture (labeled as &#x0201C;GAT&#x0201D;).</p>
<p>From the results presented in <xref ref-type="table" rid="T3">Table 3</xref>, we find that <monospace>FairLink</monospace> is capable of learning a compact and generalizable fairness-enhanced graph for <sans-serif>DBLP</sans-serif>dataset. This demonstrates the scalability of <monospace>FairLink</monospace> for large-scale graphs and highlights its potential to be applied in the wild.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>An ablation study comparing the proposed <monospace>FairLink</monospace> with its variant <monospace>FairLink</monospace><italic>m</italic>.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Metric</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(Full, Self)</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(Full, GAT)</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(75<italic>%</italic>, Self)</bold></th>
<th valign="top" align="left"><bold><sans-serif>DBLP</sans-serif>(75<italic>%</italic>, GAT)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">F1 (&#x02191;)</td>
<td valign="top" align="left">81.69</td>
<td valign="top" align="left">79.31</td>
<td valign="top" align="left">80.67</td>
<td valign="top" align="left">78.52</td>
</tr> <tr>
<td valign="top" align="left">AUC (&#x02191;)</td>
<td valign="top" align="left">88.72</td>
<td valign="top" align="left">85.22</td>
<td valign="top" align="left">88.02</td>
<td valign="top" align="left">83.29</td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>DP</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">1.32</td>
<td valign="top" align="left">3.29</td>
<td valign="top" align="left">1.63</td>
<td valign="top" align="left">4.37</td>
</tr> <tr>
<td valign="top" align="left">&#x00394;<sub><italic>EO</italic></sub> (&#x02193;)</td>
<td valign="top" align="left">2.19</td>
<td valign="top" align="left">4.13</td>
<td valign="top" align="left">2.75</td>
<td valign="top" align="left">5.33</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The columns labeled &#x0201C;(Self)&#x0201D; indicate that both the training and testing architectures are GraphSAGE, with fairness constraints applied during training. In contrast, the columns labeled &#x0201C;(GAT)&#x0201D; or &#x0201C;(VGAE)&#x0201D; indicate that the test architectures are GAT or VGAE, respectively, and do not incorporate fairness-aware design. An upward arrow (&#x02191;) indicates that a higher value is better, while a downward arrow (&#x02193;) signifies the opposite.</p>
</table-wrap-foot>
</table-wrap>

</sec>
<sec>
<title>9.3 Choice of <monospace>FairLink</monospace> architecture</title>
<p>To identify the optimal architecture for <monospace>FairLink</monospace>, we conduct a cross-architecture experiment using different graph generation models. Specifically, we use one architecture to generate the fairness-enhanced graph and another architecture to train on the generated graph, followed by performance evaluation through testing.</p>
<p>From the results in <xref ref-type="table" rid="T4">Table 4A</xref>, we can find that the highest link prediction accuracy is achieved when the same architecture is used for both generation and testing. While GraphSAGE and VGAE exhibit similar levels of accuracy, a key distinction emerges when examining generalization performance. Specifically, fairness-enhanced graphs generated by GraphSAGE demonstrate better generalizability across different architectures, such as GCN, GAT, and VGAE. Furthermore, although both GCN and GraphSAGE show comparable fairness performance as shown in <xref ref-type="table" rid="T4">Table 4B</xref>, GraphSAGE exhibits a slight advantage in terms of generalization.</p>


<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Cross-architecture experiment results on various generation and testing architectures.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left" colspan="5"><bold>(A) DBLP, F1</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:#919498;color:#ffffff">
<td valign="top" align="left"><bold>Gen</bold>\<bold>Te</bold></td>
<td valign="top" align="left"><bold>GCN</bold></td>
<td valign="top" align="left"><bold>GAT</bold></td>
<td valign="top" align="left"><bold>SAGE</bold></td>
<td valign="top" align="left"><bold>VGAE</bold></td>
</tr> <tr>
<td valign="top" align="left">GCN</td>
<td valign="top" align="left"><bold>79.3</bold></td>
<td valign="top" align="left">77.4</td>
<td valign="top" align="left">76.2</td>
<td valign="top" align="left">78.4</td>
</tr> <tr>
<td valign="top" align="left">GAT</td>
<td valign="top" align="left">76.0</td>
<td valign="top" align="left"><bold>79.5</bold></td>
<td valign="top" align="left">79.3</td>
<td valign="top" align="left">77.8</td>
</tr> <tr>
<td valign="top" align="left">SAGE</td>
<td valign="top" align="left">78.9</td>
<td valign="top" align="left">79.3</td>
<td valign="top" align="left"><bold>82.3</bold></td>
<td valign="top" align="left">80.3</td>
</tr> <tr>
<td valign="top" align="left">VGAE</td>
<td valign="top" align="left">77.6</td>
<td valign="top" align="left">78.6</td>
<td valign="top" align="left">78.2</td>
<td valign="top" align="left"><bold>81.8</bold></td>
</tr> <tr style="background-color:#dee1e1;color:#ffffff">
<td valign="top" align="left" colspan="5"><bold>(B) DBLP</bold>, &#x00394;<sub><italic>DP</italic></sub></td>
</tr> <tr>
<td valign="top" align="left"><bold>Gen</bold>\<bold>Te</bold></td>
<td valign="top" align="left"><bold>GCN</bold></td>
<td valign="top" align="left"><bold>GAT</bold></td>
<td valign="top" align="left"><bold>SAGE</bold></td>
<td valign="top" align="left"><bold>VGAE</bold></td>
</tr> <tr>
<td valign="top" align="left">GCN</td>
<td valign="top" align="left"><bold>1.21</bold></td>
<td valign="top" align="left">3.52</td>
<td valign="top" align="left">1.79</td>
<td valign="top" align="left">4.65</td>
</tr> <tr>
<td valign="top" align="left">GAT</td>
<td valign="top" align="left">3.19</td>
<td valign="top" align="left"><bold>2.40</bold></td>
<td valign="top" align="left">2.26</td>
<td valign="top" align="left">4.15</td>
</tr> <tr>
<td valign="top" align="left">SAGE</td>
<td valign="top" align="left">2.12</td>
<td valign="top" align="left">3.29</td>
<td valign="top" align="left"><bold>1.32</bold></td>
<td valign="top" align="left">3.87</td>
</tr> <tr>
<td valign="top" align="left">VGAE</td>
<td valign="top" align="left">2.71</td>
<td valign="top" align="left">3.65</td>
<td valign="top" align="left">3.25</td>
<td valign="top" align="left"><bold>3.09</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>We report the F1 score and &#x00394;<sub><italic>DP</italic></sub> on the <sans-serif>DBLP</sans-serif>dataset. &#x0201C;SAGE&#x0201D; refers to GraphSAGE, &#x0201C;Gen&#x0201D; refers to generation, and &#x0201C;Te&#x0201D; represents testing. For each metric, the best results are highlighted in bold.</p>
</table-wrap-foot>
</table-wrap>

</sec></sec>
<sec id="s10">
<title>10 Related work</title>
<sec>
<title>10.1 Fairness in machine learning</title>
<p>In recent years, numerous fairness definitions in machine learning have been proposed. These definitions generally fall into three categories. (1) <italic>Group fairness</italic>, which aims to ensure that certain statistical measures are approximately equal across protected groups (e.g., racial or gender groups) (Feldman et al., <xref ref-type="bibr" rid="B15">2015</xref>; Hardt et al., <xref ref-type="bibr" rid="B22">2016</xref>). (2) <italic>Individual fairness</italic> (Dwork et al., <xref ref-type="bibr" rid="B14">2012</xref>; Kang et al., <xref ref-type="bibr" rid="B29">2020</xref>; Dong et al., <xref ref-type="bibr" rid="B11">2021</xref>, <xref ref-type="bibr" rid="B12">2023</xref>) requires that similar individuals should be treated similarly. Compared with group fairness, individual fairness does not take sensitive features into account. Instead, it emphasizes fairness at the individual level, such as for each node in graph data. (3) <italic>Counterfactual fairness</italic> (Kusner et al., <xref ref-type="bibr" rid="B34">2017</xref>; Ma et al., <xref ref-type="bibr" rid="B44">2022</xref>; Zuo et al., <xref ref-type="bibr" rid="B70">2022</xref>) seeks to ensure fairness by considering how decisions would hold under alternative scenarios. Counterfactual fairness is achieved when the prediction results for an individual remain consistent across their counterfactuals. In this context, &#x0201C;counterfactuals&#x0201D; refer to different versions of the same individual, where their sensitive features have been altered to various values. Thus, the prediction outcomes for the individual and their counterfactuals should be identical. In our experiments, we adopt two widely used definitions of group fairness: demographic parity and equal opportunity. Demographic parity (Feldman et al., <xref ref-type="bibr" rid="B15">2015</xref>) requires that members of different protected classes are represented in the positive class at the same rate, meaning the distribution of protected attributes in the positive class should reflect the overall population distribution. In contrast, equal opportunity (Hardt et al., <xref ref-type="bibr" rid="B22">2016</xref>) focuses on the model&#x00027;s performance rather than just the outcome; it requires that true positive rates are equal across different protected groups, ensuring that the model performs consistently for all groups. Methodologically, existing bias mitigation techniques in machine learning can be broadly categorized into three approaches: (1) <italic>Pre-processing</italic>, where bias is mitigated at the data level before training begins (Calders et al., <xref ref-type="bibr" rid="B5">2009</xref>; Kamiran and Calders, <xref ref-type="bibr" rid="B28">2009</xref>; Feldman et al., <xref ref-type="bibr" rid="B15">2015</xref>); (2) <italic>In-processing</italic>, where the machine learning model itself is modified by incorporating fairness constraints during training (Zafar et al., <xref ref-type="bibr" rid="B59">2017</xref>; Goh et al., <xref ref-type="bibr" rid="B17">2016</xref>); and (3) <italic>Post-processing</italic>, where the outcomes of a trained model are adjusted to achieve fairness across different groups (Hardt et al., <xref ref-type="bibr" rid="B22">2016</xref>).</p></sec>
<sec>
<title>10.2 Link prediction</title>
<p>Link prediction involves inferring new or previously unknown relationships within a network. It is a well-studied problem in network analysis, with various algorithms developed over the past two decades (Liben-Nowell and Kleinberg, <xref ref-type="bibr" rid="B39">2003</xref>; Al Hasan et al., <xref ref-type="bibr" rid="B1">2006</xref>; Hasan and Zaki, <xref ref-type="bibr" rid="B23">2011</xref>). Specifically, <italic>heuristic methods</italic> define a score based on the graph structure to indicate the likelihood of a link&#x00027;s existence (Liben-Nowell and Kleinberg, <xref ref-type="bibr" rid="B39">2003</xref>; Newman, <xref ref-type="bibr" rid="B49">2001</xref>; Zhou et al., <xref ref-type="bibr" rid="B68">2009</xref>). The primary advantage of heuristic methods is their simplicity, and most of these approaches do not require any training. <italic>Graph embedding methods</italic> learn low-dimensional node embeddings, which are then used to predict the likelihood of links between node pairs (Grover and Leskovec, <xref ref-type="bibr" rid="B18">2016</xref>; Menon and Elkan, <xref ref-type="bibr" rid="B47">2011</xref>). These embeddings are typically trained to capture the structural properties of the graph. <italic>Deep neural network methods</italic> have emerged as state-of-the-art for the link prediction task in recent years (Kipf and Welling, <xref ref-type="bibr" rid="B32">2016a</xref>,<xref ref-type="bibr" rid="B33">b</xref>; Hamilton et al., <xref ref-type="bibr" rid="B20">2017</xref>; Velickovic et al., <xref ref-type="bibr" rid="B55">2018</xref>). This category includes GNNs, which leverage the multi-hop graph structure through the message-passing paradigm. Additionally, GNNs augmented with auxiliary information, such as pairwise information (Zhang M. et al., <xref ref-type="bibr" rid="B64">2021</xref>), have been proposed to enhance link prediction. These advanced methods incorporate additional data to better capture the relationships between nodes (Zhang M. et al., <xref ref-type="bibr" rid="B64">2021</xref>; Zhu et al., <xref ref-type="bibr" rid="B69">2021</xref>; Wang et al., <xref ref-type="bibr" rid="B56">2022</xref>).</p></sec>
<sec>
<title>10.3 Fair link prediction</title>
<p>With the success of GNNs, there has been increasing attention on fairness in graph representation learning (Dai et al., <xref ref-type="bibr" rid="B10">2022</xref>). Some works have focused on creating fair node embeddings, which are subsequently used in link prediction (Bose and Hamilton, <xref ref-type="bibr" rid="B3">2019</xref>; Buyl and De Bie, <xref ref-type="bibr" rid="B4">2020</xref>; Cui et al., <xref ref-type="bibr" rid="B8">2018</xref>). Others have directly targeted the task of fair link prediction (Masrour et al., <xref ref-type="bibr" rid="B46">2020</xref>; Li et al., <xref ref-type="bibr" rid="B38">2021</xref>). Specifically, <italic>dyadic fairness</italic> has been proposed for link prediction, which requires the prediction to be independent of whether the two vertices involved in a link share the same sensitive attribute (Li et al., <xref ref-type="bibr" rid="B38">2021</xref>). To achieve dyadic fairness, the authors proposed <monospace>FairAdj</monospace> (Li et al., <xref ref-type="bibr" rid="B38">2021</xref>), which leverages a variational graph auto-encoder (Kipf and Welling, <xref ref-type="bibr" rid="B33">2016b</xref>) for learning the graph structure and incorporates a dyadic loss regularizer to enforce fairness. FairPageRank (<monospace>FairPR</monospace>) (Tsioutsiouliklis et al., <xref ref-type="bibr" rid="B54">2021</xref>) is a fairness-sensitive variation of the PageRank algorithm. It modifies the jump vector to ensure fairness, both globally and locally. The locally fair PageRank variant specifically guarantees that each node behaves in a fair manner individually. <italic>DeBayes</italic> (Buyl and De Bie, <xref ref-type="bibr" rid="B4">2020</xref>) adopts a Bayesian approach to model the structural properties of the graph, aiming to learn debiased embeddings using biased prior conditional network embeddings. Meanwhile, <monospace>Fairwalk</monospace> (Rahman et al., <xref ref-type="bibr" rid="B51">2019</xref>) adapts <monospace>Node2vec</monospace> (Grover and Leskovec, <xref ref-type="bibr" rid="B18">2016</xref>) to enhance fairness in node embeddings by adjusting the transition probabilities in random walks, weighing the neighborhood of each node based on their sensitive attributes. Finally, <monospace>FLIP</monospace> (Masrour et al., <xref ref-type="bibr" rid="B46">2020</xref>) tackles graph structural debiasing by reducing homophily (the tendency of similar nodes to connect) in the graph. The fairness is assessed by the reduction in modularity, which measures the strength of the division of a graph into modules. <monospace>FairEGM</monospace> (Current et al., <xref ref-type="bibr" rid="B9">2022</xref>), a collection of three methods that emulate the effects of a variety of graph modifications for the purpose of improving graph fairness.</p></sec></sec>
<sec sec-type="conclusions" id="s11">
<title>11 Conclusion</title>
<p>We study fairness in link prediction. Existing methods primarily focus on integrating debiasing techniques during training to learn unbiased graph embeddings. However, these methods complicate the training process, especially when applied to large-scale graphs. Additionally, they are model-specific, requiring a redesign of the debiasing approach whenever the model changes. To address these challenges, we propose a data-centric debiasing method, <monospace>FairLink</monospace>, which aims to enhance fairness in link prediction without modifying the training of large-scale graphs. <monospace>FairLink</monospace> optimizes both fairness and utility by learning a fairness-enhanced graph. It minimizes the difference between the training trajectory of the fairness-enhanced graph and the input graph, incorporating fairness loss in the training of the fairness-enhanced graph. Extensive experiments on benchmark datasets demonstrate the effectiveness of <monospace>FairLink</monospace>, as well as its ability to generalize across different GNN architectures.</p></sec>
</body>
<back>
<sec sec-type="data-availability" id="s12">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="author-contributions" id="s13">
<title>Author contributions</title>
<p>YL: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing &#x02013; original draft, Writing &#x02013; review &#x00026; editing. HC: Writing &#x02013; review &#x00026; editing. MI: Funding acquisition, Project administration, Resources, Supervision, Visualization, Writing &#x02013; review &#x00026; editing.</p>
</sec>
<sec sec-type="funding-information" id="s14">
<title>Funding</title>
<p>The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported in part by the DARPA Young Faculty Award, the US Army Research Office, the National Science Foundation (NSF) under Grants &#x00023;2127780, &#x00023;2319198, &#x00023;2321840, &#x00023;2312517, and &#x00023;2235472, the Semiconductor Research Corporation (SRC), the Office of Naval Research through the Young Investigator Program Award, and Grants &#x00023;N00014-21-1-2225 and &#x00023;N00014-22-1-2067. Additionally, support was provided by the Air Force Office of Scientific Research under Award &#x00023;FA9550-22-1-0253, along with generous gifts from Xilinx and Cisco.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s15">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><sans-serif>Pubmed</sans-serif>: <ext-link ext-link-type="uri" xlink:href="https://linqs.org/datasets/">https://linqs.org/datasets/</ext-link>.</p></fn>
<fn id="fn0002"><p><sup>2</sup><sans-serif>DBLP</sans-serif>: <ext-link ext-link-type="uri" xlink:href="https://dblp.dagstuhl.de/xml/">https://dblp.dagstuhl.de/xml/</ext-link>.</p></fn>
<fn id="fn0003"><p><sup>3</sup><sans-serif>Google&#x0002B;</sans-serif>: <ext-link ext-link-type="uri" xlink:href="https://snap.stanford.edu/data/ego-Gplus.html">https://snap.stanford.edu/data/ego-Gplus.html</ext-link>.</p></fn>
<fn id="fn0004"><p><sup>4</sup><sans-serif>Facebook</sans-serif>: <ext-link ext-link-type="uri" xlink:href="https://snap.stanford.edu/data/ego-Facebook.html">https://snap.stanford.edu/data/ego-Facebook.html</ext-link>.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Al Hasan</surname> <given-names>M.</given-names></name> <name><surname>Chaoji</surname> <given-names>V.</given-names></name> <name><surname>Salem</surname> <given-names>S.</given-names></name> <name><surname>Zaki</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>&#x0201C;Link prediction using supervised learning,&#x0201D;</article-title> in <source>SDM06: workshop on link analysis, counter-terrorism and security</source>, <fpage>798</fpage>&#x02013;<lpage>805</lpage>.</citation>
</ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Angwin</surname> <given-names>J.</given-names></name> <name><surname>Larson</surname> <given-names>J.</given-names></name> <name><surname>Mattu</surname> <given-names>S.</given-names></name> <name><surname>Kirchner</surname> <given-names>L.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Machine bias,&#x0201D;</article-title> in <source>Ethics of Data and Analytics</source> (<publisher-loc>Auerbach Publications</publisher-loc>), <fpage>254</fpage>&#x02013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1201/9781003278290-37</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bose</surname> <given-names>A.</given-names></name> <name><surname>Hamilton</surname> <given-names>W.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Compositional fairness constraints for graph embeddings,&#x0201D;</article-title> in <source>ICML</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>715</fpage>&#x02013;<lpage>724</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Buyl</surname> <given-names>M.</given-names></name> <name><surname>De Bie</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Debayes: a bayesian method for debiasing network embeddings,&#x0201D;</article-title> in <source>ICML</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>1220</fpage>&#x02013;<lpage>1229</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Calders</surname> <given-names>T.</given-names></name> <name><surname>Kamiran</surname> <given-names>F.</given-names></name> <name><surname>Pechenizkiy</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Building classifiers with independency constraints,&#x0201D;</article-title> in <source>2009 IEEE International Conference on Data Mining Workshops</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>13</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1109/ICDMW.2009.83</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chuang</surname> <given-names>C.-Y.</given-names></name> <name><surname>Mroueh</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Fair mixup: fairness via interpolation</article-title>. <source>arXiv preprint arXiv:2103.06503</source>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clifton</surname> <given-names>S. M.</given-names></name> <name><surname>Hill</surname> <given-names>K.</given-names></name> <name><surname>Karamchandani</surname> <given-names>A. J.</given-names></name> <name><surname>Autry</surname> <given-names>E. A.</given-names></name> <name><surname>McMahon</surname> <given-names>P.</given-names></name> <name><surname>Sun</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). <article-title>Mathematical model of gender bias and homophily in professional hierarchies</article-title>. <source>Chaos</source> <volume>29</volume>:<fpage>450</fpage>. <pub-id pub-id-type="doi">10.1063/1.5066450</pub-id><pub-id pub-id-type="pmid">30823713</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cui</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Zhu</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>A survey on network embedding</article-title>. <source>IEEE Trans. Knowl. Data Eng</source>. <volume>31</volume>, <fpage>833</fpage>&#x02013;<lpage>852</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2018.2849727</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Current</surname> <given-names>S.</given-names></name> <name><surname>He</surname> <given-names>Y.</given-names></name> <name><surname>Gurukar</surname> <given-names>S.</given-names></name> <name><surname>Parthasarathy</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Fairegm: fair link prediction and recommendation via emulated graph modification,&#x0201D;</article-title> in <source>Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization</source>, 1&#x02013;14. <pub-id pub-id-type="doi">10.1145/3551624.3555287</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>E.</given-names></name> <name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Zhu</surname> <given-names>H.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Guo</surname> <given-names>Z.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>A comprehensive survey on trustworthy graph neural networks: privacy, robustness, fairness, and explainability</article-title>. <source>arXiv preprint arXiv:2204.08570</source>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>Y.</given-names></name> <name><surname>Kang</surname> <given-names>J.</given-names></name> <name><surname>Tong</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Individual fairness for graph neural networks: a ranking based approach,&#x0201D;</article-title> in <source>Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &#x00026;Data Mining</source>, 300&#x02013;310. <pub-id pub-id-type="doi">10.1145/3447548.3467266</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>Y.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name></person-group> (<year>2023</year>). <article-title>Fairness in graph mining: a survey</article-title>. <source>IEEE Trans. Knowl. Data Eng</source>. <volume>35</volume>, <fpage>10583</fpage>&#x02013;<lpage>10602</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2023.3265598</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Lei</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>Z.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>A benchmark for fairness-aware graph learning</article-title>. <source>arXiv preprint arXiv:2407.12112</source>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dwork</surname> <given-names>C.</given-names></name> <name><surname>Hardt</surname> <given-names>M.</given-names></name> <name><surname>Pitassi</surname> <given-names>T.</given-names></name> <name><surname>Reingold</surname> <given-names>O.</given-names></name> <name><surname>Zemel</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Fairness through awareness,&#x0201D;</article-title> in <source>Proceedings of the 3rd Innovations in Theoretical Computer Science Conference</source>, 214&#x02013;226. <pub-id pub-id-type="doi">10.1145/2090236.2090255</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feldman</surname> <given-names>M.</given-names></name> <name><surname>Friedler</surname> <given-names>S. A.</given-names></name> <name><surname>Moeller</surname> <given-names>J.</given-names></name> <name><surname>Scheidegger</surname> <given-names>C.</given-names></name> <name><surname>Venkatasubramanian</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Certifying and removing disparate impact,&#x0201D;</article-title> in <source>KDD</source>, 259&#x02013;268. <pub-id pub-id-type="doi">10.1145/2783258.2783311</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferludin</surname> <given-names>O.</given-names></name> <name><surname>Eigenwillig</surname> <given-names>A.</given-names></name> <name><surname>Blais</surname> <given-names>M.</given-names></name> <name><surname>Zelle</surname> <given-names>D.</given-names></name> <name><surname>Pfeifer</surname> <given-names>J.</given-names></name> <name><surname>Sanchez-Gonzalez</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>TF-GNN: graph neural networks in tensorflow</article-title>. <source>arXiv preprint arXiv:2207.03522</source>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goh</surname> <given-names>G.</given-names></name> <name><surname>Cotter</surname> <given-names>A.</given-names></name> <name><surname>Gupta</surname> <given-names>M.</given-names></name> <name><surname>Friedlander</surname> <given-names>M. P.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Satisfying real-world goals with dataset constraints,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source>, 29.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grover</surname> <given-names>A.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;node2vec: scalable feature learning for networks,&#x0201D;</article-title> in <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, 855&#x02013;864. <pub-id pub-id-type="doi">10.1145/2939672.2939754</pub-id><pub-id pub-id-type="pmid">27853626</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gurukar</surname> <given-names>S.</given-names></name> <name><surname>Vijayan</surname> <given-names>P.</given-names></name> <name><surname>Srinivasan</surname> <given-names>A.</given-names></name> <name><surname>Bajaj</surname> <given-names>G.</given-names></name> <name><surname>Cai</surname> <given-names>C.</given-names></name> <name><surname>Keymanesh</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Network representation learning: Consolidation and renewed bearing</article-title>. <source>arXiv preprint arXiv:1905.00987</source>.</citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hamilton</surname> <given-names>W.</given-names></name> <name><surname>Ying</surname> <given-names>Z.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Inductive representation learning on large graphs,&#x0201D;</article-title> in <source>NeurIPS</source>, 30.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>X.</given-names></name> <name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Shah</surname> <given-names>N.</given-names></name></person-group> (<year>2022</year>). <article-title>Mlpinit: embarrassingly simple gnn training acceleration with mlp initialization</article-title>. <source>arXiv preprint arXiv:2210.00102</source>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hardt</surname> <given-names>M.</given-names></name> <name><surname>Price</surname> <given-names>E.</given-names></name> <name><surname>Srebro</surname> <given-names>N.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Equality of opportunity in supervised learning,&#x0201D;</article-title> in <source>NeurIPS</source>, <fpage>3315</fpage>&#x02013;<lpage>3323</lpage>.</citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hasan</surname> <given-names>M. A.</given-names></name> <name><surname>Zaki</surname> <given-names>M. J.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;A survey of link prediction in social networks,&#x0201D;</article-title> in <source>Social Network Data Analytics</source>, 243&#x02013;275. <pub-id pub-id-type="doi">10.1007/978-1-4419-8462-3_9</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>You</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Zhou</surname> <given-names>E.</given-names></name> <name><surname>Gao</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Graph-mlp: node classification without message passing in graph</article-title>. <source>arXiv preprint arXiv:2106.04051</source>.</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>W.</given-names></name> <name><surname>Tang</surname> <given-names>X.</given-names></name> <name><surname>Jiang</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2022a</year>). <article-title>&#x0201C;Condensing graphs via one-step gradient matching,&#x0201D;</article-title> in <source>KDD</source>, 720&#x02013;730. <pub-id pub-id-type="doi">10.1145/3534678.3539429</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>W.</given-names></name> <name><surname>Zhao</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Shah</surname> <given-names>N.</given-names></name></person-group> (<year>2023</year>). <article-title>Graph condensation for graph neural networks</article-title>. <source>arXiv preprint arXiv:2110.07580</source>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>W.</given-names></name> <name><surname>Zhao</surname> <given-names>T.</given-names></name> <name><surname>Ding</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Shah</surname> <given-names>N.</given-names></name></person-group> (<year>2022b</year>). <article-title>Empowering graph representation learning with test-time graph transformation</article-title>. <source>arXiv preprint arXiv:2210.03561</source>.</citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kamiran</surname> <given-names>F.</given-names></name> <name><surname>Calders</surname> <given-names>T.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Classifying without discriminating,&#x0201D;</article-title> in <source>2009 2nd International Conference on Computer, Control and Communication</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/IC4.2009.4909197</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kang</surname> <given-names>J.</given-names></name> <name><surname>He</surname> <given-names>J.</given-names></name> <name><surname>Maciejewski</surname> <given-names>R.</given-names></name> <name><surname>Tong</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Inform: individual fairness on graph mining,&#x0201D;</article-title> in <source>Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &#x00026;Data Mining</source>, 379&#x02013;389. <pub-id pub-id-type="doi">10.1145/3394486.3403080</pub-id><pub-id pub-id-type="pmid">36621750</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karimi</surname> <given-names>F.</given-names></name> <name><surname>G&#x000E9;nois</surname> <given-names>M.</given-names></name> <name><surname>Wagner</surname> <given-names>C.</given-names></name> <name><surname>Singer</surname> <given-names>P.</given-names></name> <name><surname>Strohmaier</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Homophily influences ranking of minorities in social networks</article-title>. <source>Sci. Rep</source>. <volume>8</volume>:<fpage>11077</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-018-29405-7</pub-id><pub-id pub-id-type="pmid">30038426</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization</article-title>. <source>arXiv preprint arXiv:1412.6980</source>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>T. N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2016a</year>). <article-title>Semi-supervised classification with graph convolutional networks</article-title>. <source>CoRR, abs/1609.02907</source>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>T. N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2016b</year>). <article-title>Variational graph auto-encoders</article-title>. <source>arXiv preprint arXiv:1611.07308</source>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kusner</surname> <given-names>M. J.</given-names></name> <name><surname>Loftus</surname> <given-names>J.</given-names></name> <name><surname>Russell</surname> <given-names>C.</given-names></name> <name><surname>Silva</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Counterfactual fairness,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source>, 30.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>E.</given-names></name> <name><surname>Karimi</surname> <given-names>F.</given-names></name> <name><surname>Wagner</surname> <given-names>C.</given-names></name> <name><surname>Jo</surname> <given-names>H.-H.</given-names></name> <name><surname>Strohmaier</surname> <given-names>M.</given-names></name> <name><surname>Galesic</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Homophily and minority-group size explain perception biases in social networks</article-title>. <source>Nat. Hum. Behav</source>. <volume>3</volume>, <fpage>1078</fpage>&#x02013;<lpage>1087</lpage>. <pub-id pub-id-type="doi">10.1038/s41562-019-0677-4</pub-id><pub-id pub-id-type="pmid">31406337</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leskovec</surname> <given-names>J.</given-names></name> <name><surname>Mcauley</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Learning to discover social circles in ego networks,&#x0201D;</article-title> in <source>NeurIPS</source>, 25.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Shomer</surname> <given-names>H.</given-names></name> <name><surname>Mao</surname> <given-names>H.</given-names></name> <name><surname>Zeng</surname> <given-names>S.</given-names></name> <name><surname>Ma</surname> <given-names>Y.</given-names></name> <name><surname>Shah</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>&#x0201C;Evaluating graph neural networks for link prediction: current pitfalls and new benchmarking,&#x0201D;</article-title> in <source>NeurIPS</source>, 36.</citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Hong</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;On dyadic fairness: exploring and mitigating bias in graph connections,&#x0201D;</article-title> in <source>ICLR</source>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liben-Nowell</surname> <given-names>D.</given-names></name> <name><surname>Kleinberg</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). <article-title>&#x0201C;The link prediction problem for social networks,&#x0201D;</article-title> in <source>CIKM</source>, 556&#x02013;559. <pub-id pub-id-type="doi">10.1145/956863.956972</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;Fairgraph: automated graph debiasing with gradient matching,&#x0201D;</article-title> in <source>Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</source>, 4135&#x02013;4139. <pub-id pub-id-type="doi">10.1145/3583780.3615176</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name></person-group> (<year>2024a</year>). <article-title>&#x0201C;TinyData: joint dataset condensation with dimensionality reduction,&#x0201D;</article-title> in <source>2024 32nd European Signal Processing Conference (EUSIPCO)</source>, <fpage>2037</fpage>&#x02013;<lpage>2041</lpage>.</citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name></person-group> (<year>2024b</year>). <article-title>Tinygraph: joint feature and node condensation for graph neural networks</article-title>. <source>arXiv preprint arXiv:2407.08064</source>.</citation>
</ref>
<ref id="B43">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Du</surname> <given-names>M.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2023</year>). <article-title>&#x0201C;Error detection on knowledge graphs with triple embedding,&#x0201D;</article-title> in <source>2023 31st European Signal Processing Conference (EUSIPCO)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>1604</fpage>&#x02013;<lpage>1608</lpage>. <pub-id pub-id-type="doi">10.23919/EUSIPCO58844.2023.10289852</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Guo</surname> <given-names>R.</given-names></name> <name><surname>Wan</surname> <given-names>M.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>A.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Learning fair node representations with graph counterfactual fairness,&#x0201D;</article-title> in <source>Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining</source>, 695&#x02013;703. <pub-id pub-id-type="doi">10.1145/3488560.3498391</pub-id><pub-id pub-id-type="pmid">39388994</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mara</surname> <given-names>A. C.</given-names></name> <name><surname>Lijffijt</surname> <given-names>J.</given-names></name> <name><surname>De Bie</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Benchmarking network embedding models for link prediction: are we making progress?&#x0201D;</article-title> in <source>2020 IEEE 7th International conference on data science and advanced analytics (DSAA)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>138</fpage>&#x02013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1109/DSAA49011.2020.00026</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Masrour</surname> <given-names>F.</given-names></name> <name><surname>Wilson</surname> <given-names>T.</given-names></name> <name><surname>Yan</surname> <given-names>H.</given-names></name> <name><surname>Tan</surname> <given-names>P.-N.</given-names></name> <name><surname>Esfahanian</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Bursting the filter bubble: Fairness-aware network link prediction,&#x0201D;</article-title> in <source>AAAI</source>, 841&#x02013;848. <pub-id pub-id-type="doi">10.1609/aaai.v34i01.5429</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Menon</surname> <given-names>A. K.</given-names></name> <name><surname>Elkan</surname> <given-names>C.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;Link prediction via matrix factorization,&#x0201D;</article-title> in <source>Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part II 22</source> (<publisher-loc>Springer</publisher-loc>), <fpage>437</fpage>&#x02013;<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-23783-6_28</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nanavati</surname> <given-names>A. A.</given-names></name> <name><surname>Gurumurthy</surname> <given-names>S.</given-names></name> <name><surname>Das</surname> <given-names>G.</given-names></name> <name><surname>Chakraborty</surname> <given-names>D.</given-names></name> <name><surname>Dasgupta</surname> <given-names>K.</given-names></name> <name><surname>Mukherjea</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>&#x0201C;On the structural properties of massive telecom call graphs: findings and implications,&#x0201D;</article-title> in <source>CIKM</source>, 435&#x02013;444. <pub-id pub-id-type="doi">10.1145/1183614.1183678</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Newman</surname> <given-names>M. E.</given-names></name></person-group> (<year>2001</year>). <article-title>Clustering and preferential attachment in growing networks</article-title>. <source>Phys. Rev. E</source> <volume>64</volume>:<fpage>025102</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevE.64.025102</pub-id><pub-id pub-id-type="pmid">11497639</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qian</surname> <given-names>X.</given-names></name> <name><surname>Guo</surname> <given-names>Z.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Mao</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2024</year>). <article-title>&#x0201C;Addressing shortcomings in fair graph learning datasets: Towards a new benchmark,&#x0201D;</article-title> in <source>Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>, 5602&#x02013;5612. <pub-id pub-id-type="doi">10.1145/3637528.3671616</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rahman</surname> <given-names>T.</given-names></name> <name><surname>Surma</surname> <given-names>B.</given-names></name> <name><surname>Backes</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Fairwalk: towards fair graph embedding,&#x0201D;</article-title> in <source>International Joint Conference on Artificial Intelligence</source>. <pub-id pub-id-type="doi">10.24963/ijcai.2019/456</pub-id><pub-id pub-id-type="pmid">37915376</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Yao</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Su</surname> <given-names>Z.</given-names></name></person-group> (<year>2008</year>). <article-title>&#x0201C;Arnetminer: extraction and mining of academic social networks,&#x0201D;</article-title> in <source>Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, 990&#x02013;998. <pub-id pub-id-type="doi">10.1145/1401890.1402008</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Trouillon</surname> <given-names>T.</given-names></name> <name><surname>Welbl</surname> <given-names>J.</given-names></name> <name><surname>Riedel</surname> <given-names>S.</given-names></name> <name><surname>Gaussier</surname> <given-names>&#x000C9;.</given-names></name> <name><surname>Bouchard</surname> <given-names>G.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Complex embeddings for simple link prediction,&#x0201D;</article-title> in <source>ICML</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>2071</fpage>&#x02013;<lpage>2080</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsioutsiouliklis</surname> <given-names>S.</given-names></name> <name><surname>Pitoura</surname> <given-names>E.</given-names></name> <name><surname>Tsaparas</surname> <given-names>P.</given-names></name> <name><surname>Kleftakis</surname> <given-names>I.</given-names></name> <name><surname>Mamoulis</surname> <given-names>N.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Fairness-aware pagerank,&#x0201D;</article-title> in <source>Proceedings of the Web Conference</source> <volume>2021</volume>, <fpage>3815</fpage>&#x02013;<lpage>3826</lpage>. <pub-id pub-id-type="doi">10.1145/3442381.3450065</pub-id><pub-id pub-id-type="pmid">34587074</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Velickovic</surname> <given-names>P.</given-names></name> <name><surname>Cucurull</surname> <given-names>G.</given-names></name> <name><surname>Casanova</surname> <given-names>A.</given-names></name> <name><surname>Romero</surname> <given-names>A.</given-names></name> <name><surname>Li&#x000F3;</surname> <given-names>P.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Graph attention networks,&#x0201D;</article-title> in <source>ICLR</source>.</citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Yin</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>Equivariant and stable positional encoding for more powerful graph neural networks</article-title>. <source>arXiv preprint arXiv:2203.00199</source>.</citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Explaining dynamic graph neural networks via relevance back-propagation</article-title>. <source>arXiv preprint arXiv:2207.11175</source>.</citation>
</ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Cohen</surname> <given-names>W.</given-names></name> <name><surname>Salakhudinov</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Revisiting semi-supervised learning with graph embeddings,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>40</fpage>&#x02013;<lpage>48</lpage>.</citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zafar</surname> <given-names>M. B.</given-names></name> <name><surname>Valera</surname> <given-names>I.</given-names></name> <name><surname>Gomez Rodriguez</surname> <given-names>M.</given-names></name> <name><surname>Gummadi</surname> <given-names>K. P.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Fairness beyond disparate treatment &#x00026;disparate impact: Learning classification without disparate mistreatment,&#x0201D;</article-title> in <source>Proceedings of the 26th International Conference on World Wide Web</source>, 1171&#x02013;1180. <pub-id pub-id-type="doi">10.1145/3038912.3052660</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zafar</surname> <given-names>M. B.</given-names></name> <name><surname>Valera</surname> <given-names>I.</given-names></name> <name><surname>Rodriguez</surname> <given-names>M. G.</given-names></name> <name><surname>Gummadi</surname> <given-names>K. P.</given-names></name></person-group> (<year>2015</year>). <article-title>Fairness constraints: mechanisms for fair classification</article-title>. <source>arXiv preprint arXiv:1507.05259</source>.</citation>
</ref>
<ref id="B61">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zemel</surname> <given-names>R.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Swersky</surname> <given-names>K.</given-names></name> <name><surname>Pitassi</surname> <given-names>T.</given-names></name> <name><surname>Dwork</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;Learning fair representations,&#x0201D;</article-title> in <source>ICML</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>325</fpage>&#x02013;<lpage>333</lpage>.</citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zha</surname> <given-names>D.</given-names></name> <name><surname>Bhat</surname> <given-names>Z. P.</given-names></name> <name><surname>Lai</surname> <given-names>K.-H.</given-names></name> <name><surname>Yang</surname> <given-names>F.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2023a</year>). <article-title>Data-centric AI: perspectives and challenges</article-title>. <source>arXiv preprint arXiv:2301.04819</source>.</citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zha</surname> <given-names>D.</given-names></name> <name><surname>Bhat</surname> <given-names>Z. P.</given-names></name> <name><surname>Lai</surname> <given-names>K.-H.</given-names></name> <name><surname>Yang</surname> <given-names>F.</given-names></name> <name><surname>Jiang</surname> <given-names>Z.</given-names></name> <name><surname>Zhong</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2023b</year>). <article-title>Data-centric artificial intelligence: a survey</article-title>. <source>arXiv preprint arXiv:2303.10158</source>.</citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Xia</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Jin</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>Labeling trick: a theory of using graph neural networks for multi-node representation learning</article-title>. <source>NeurIPS</source> <volume>34</volume>, <fpage>9061</fpage>&#x02013;<lpage>9073</lpage>.</citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Dong</surname> <given-names>J.</given-names></name> <name><surname>Duan</surname> <given-names>K.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>L.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Contrastive knowledge graph error detection,&#x0201D;</article-title> in <source>Proceedings of the 31st ACM International Conference on Information &#x00026;Knowledge Management</source>, 2590&#x02013;2599. <pub-id pub-id-type="doi">10.1145/3511808.3557264</pub-id><pub-id pub-id-type="pmid">18579964</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Shah</surname> <given-names>N.</given-names></name></person-group> (<year>2021</year>). <article-title>Graph-less neural networks: Teaching old mlps new tricks via distillation</article-title>. <source>arXiv preprint arXiv:2110.08727</source>.</citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>B.</given-names></name> <name><surname>Mopuri</surname> <given-names>K. R.</given-names></name> <name><surname>Bilen</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Dataset condensation with gradient matching</article-title>. <source>arXiv preprint arXiv:2006.05929</source>.<pub-id pub-id-type="pmid">37836977</pub-id></citation></ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>T.</given-names></name> <name><surname>L&#x000FC;</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.-C.</given-names></name></person-group> (<year>2009</year>). <article-title>Predicting missing links via local information</article-title>. <source>Eur. Phys. J. B</source> <volume>71</volume>, <fpage>623</fpage>&#x02013;<lpage>630</lpage>. <pub-id pub-id-type="doi">10.1140/epjb/e2009-00335-8</pub-id></citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Xhonneux</surname> <given-names>L.-P.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Neural bellman-ford networks: a general graph neural network framework for link prediction,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source>, <fpage>29476</fpage>&#x02013;<lpage>29490</lpage>.</citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zuo</surname> <given-names>A.</given-names></name> <name><surname>Wei</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name> <name><surname>Han</surname> <given-names>B.</given-names></name> <name><surname>Zhang</surname> <given-names>K.</given-names></name> <name><surname>Gong</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Counterfactual fairness with partially known causal graph,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source>. <pub-id pub-id-type="doi">10.1155/2022/7438464</pub-id></citation>
</ref>
</ref-list>
</back>
</article>