<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Phys.</journal-id>
<journal-title>Frontiers in Physics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Phys.</abbrev-journal-title>
<issn pub-type="epub">2296-424X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1623161</article-id>
<article-id pub-id-type="doi">10.3389/fphy.2025.1623161</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Network intrusion detection based on relative mutual K-nearest neighbor density peak clustering</article-title>
<alt-title alt-title-type="left-running-head">Ren et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fphy.2025.1623161">10.3389/fphy.2025.1623161</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ren</surname>
<given-names>Chunhua</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/3007329/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wang</surname>
<given-names>Changyuan</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/3049171/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yu</surname>
<given-names>Yang</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yang</surname>
<given-names>Wanan</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guo</surname>
<given-names>Ruiqi</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Yibin University</institution>, <institution>School of Computer Science and Technology</institution>, <addr-line>Yibin</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Southwest Petroleum University</institution>, <institution>School of Computer and Software</institution>, <addr-line>Chengdu</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1006916/overview">Jianping Gou</ext-link>, Southwest University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3063721/overview">Tian Ran</ext-link>, Northwest Normal University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/3066420/overview">Rui Lyu</ext-link>, Chengdu University of Technology, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Changyuan Wang, <email>184780948@qq.com</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>07</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>13</volume>
<elocation-id>1623161</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>05</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>06</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2025 Ren, Wang, Yu, Yang and Guo.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Ren, Wang, Yu, Yang and Guo</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Network security is the core guarantee for the stable operation of Cyber-Physical-Social Systems (CPSS), and intrusion detection technology, as a key link in network security, is crucial to ensuring the security and reliability of CPSS. The application of traditional clustering algorithms in intrusion detection usually relies on a preset number of clusters. However, network intrusion data is highly random and dynamic, and the number and distribution structure of clusters are often difficult to determine in advance, resulting in limited detection accuracy and adaptability. To tackle this issue, this paper introduces a density peak clustering algorithm, RMKNN-FDPC, which integrates relative mutual K-nearest neighbor local density with a fuzzy allocation strategy for network intrusion detection, aiming to enhance the capability of identifying unknown attack patterns. Firstly, in the stage of local density calculation, the relative mutual K-nearest neighbor method is used instead of the traditional truncation distance method to more accurately characterize the local density distribution by considering the mutual neighborhood relationship between data points. Secondly, in the remaining point allocation stage, the fuzzy allocation strategy of the mutual K-nearest neighbor effectively avoids the error propagation problem caused by chain allocation in traditional density peaks clustering algorithm (DPC). Finally, a large number of experiments were conducted, including KDD-CUP-1999 experiments, synthetic dataset experiments, real dataset experiments, face dataset experiments, parameter analysis experiments, and run time analysis experiments. The experimental results show that the proposed method performs exceptionally well in the clustering task and can effectively mine network intrusion information.</p>
</abstract>
<kwd-group>
<kwd>CPSS</kwd>
<kwd>network intrusion detection</kwd>
<kwd>relative mutual k-nearest neighbor</kwd>
<kwd>fuzzy allocation strategy</kwd>
<kwd>density peak clustering</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Social Physics</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>CPSS integrate computing, physical devices, and human interactions, presenting unique challenges for network intrusion detection due to their complex interdependencies and dynamic nature [<xref ref-type="bibr" rid="B1">1</xref>]. Traditional intrusion detection methods often struggle to interpret the high-dimensional and sequential data of CPSS, urgently requiring advanced artificial intelligence-driven approaches to detect network anomalies [<xref ref-type="bibr" rid="B2">2</xref>].</p>
<p>Clustering algorithms play a pivotal role in network intrusion detection by autonomously identifying hidden attack patterns from unlabeled traffic data, enabling efficient anomaly detection without prior knowledge of attack signatures [<xref ref-type="bibr" rid="B3">3</xref>]. Clustering algorithms aim to divide a dataset into clusters with similar features [<xref ref-type="bibr" rid="B4">4</xref>], and is widely applied in fields such as data mining [<xref ref-type="bibr" rid="B5">5</xref>], fraud detection [<xref ref-type="bibr" rid="B6">6</xref>], and image processing [<xref ref-type="bibr" rid="B7">7</xref>]. Traditional clustering algorithms (such as K-means and hierarchical clustering) perform well on simple datasets but often struggle to achieve satisfactory results when dealing with complex structures, such as non-spherical clusters, multi-density clusters, or noisy data. While traditional clustering methods face challenges in handling high-dimensional and imbalanced intrusion data, recent advances in density-based and deep clustering techniques offer promising solutions to improve detection accuracy and adaptability in dynamic network environments [<xref ref-type="bibr" rid="B8">8</xref>].</p>
<p>In recent years, the density peaks clustering algorithm (DPC) has attracted significant attention due to its ability to automatically identify cluster centers and handle clusters of arbitrary shapes [<xref ref-type="bibr" rid="B9">9</xref>]. DPC achieves efficient clustering by calculating the local density and relative distance of data points and selecting points with high density and long distance as cluster centers [<xref ref-type="bibr" rid="B10">10</xref>]. The DPC algorithm requires only one parameter and can obtain clusters of any shape. It can also handle noisy data and has great application prospects in network intrusion detection. However, DPC still has some limitations when dealing with high-dimensional data or manifold data. For instance, the calculation of local density relies on the truncation distance parameter, which is sensitive to the parameter selection. Meanwhile, the traditional DPC algorithm may cause a &#x201c;domino effect&#x201d; when allocating non-central points, leading to the error propagation.</p>
<p>To address these issues, researchers have proposed various improvement methods, such as optimizing the calculation of local density and enhancing the allocation method for the non-central points [<xref ref-type="bibr" rid="B11">11</xref>,<xref ref-type="bibr" rid="B12">12</xref>].</p>
<p>In terms of local density calculation, the density peaks clustering based on K-nearest neighbor (DPC-KNN) is an important early achievement [<xref ref-type="bibr" rid="B13">13</xref>]. This algorithm redefines a new local density through the K-nearest neighbor method, fully considering the distribution differences among data points, thereby effectively avoiding the difficulty in choosing the truncation distance parameter in the traditional DPC algorithm. Compared with DPC, DPC-KNN has improved clustering performance, but it still struggles to accurately obtain the true cluster centers when dealing with datasets with uneven density distributions. In 2016, Xie proposed the classic fuzzy weighted K-nearest neighbor density peak clustering algorithm (FKNN-DPC) [<xref ref-type="bibr" rid="B14">14</xref>], which combines the K-nearest neighbor method with fuzzy set theory to design a new way of calculating local density. In the same year, Liu proposed the shared-nearest-neighbor-based clustering by fast search and find of density peaks (SNN-DPC) [<xref ref-type="bibr" rid="B15">15</xref>], which redefines local density based on the concepts of nearest neighbor and shared nearest neighbor and can better adapt to the local environment of sample points. However, both of these algorithms require the use of a fixed K-nearest neighbor parameter during the clustering process, which to some extent limits their adaptability to complex datasets. Additionally, numerous derivative algorithms have also focused on optimizing local density. For instance, the comparative density peaks clustering algorithm (CDP) improves using a comparative density method [<xref ref-type="bibr" rid="B16">16</xref>]. Subsequent enhancements to the DPC algorithm framework have incorporated various technical approaches. These include the residual error-based density peak clustering algorithm (REDPC) [<xref ref-type="bibr" rid="B17">17</xref>], the density peaks clustering algorithm based on fuzzy and weighted shared neighbor (DPC-FWSN) [<xref ref-type="bibr" rid="B18">18</xref>], the standard deviation weighted distance based density peak clustering algorithm (SFKNN-DPC) [<xref ref-type="bibr" rid="B19">19</xref>], and the adaptive nearest neighbor density clustering algorithm (ANN-DPC) [<xref ref-type="bibr" rid="B20">20</xref>]. These algorithms improve local density through different methods, but all are calculated using absolute density.</p>
<p>In handling the distribution of remaining points, the FKNN-DPC algorithm adopts a two-stage strategy to improve accuracy [<xref ref-type="bibr" rid="B14">14</xref>]. However, this strategy adopts a fixed k-value in each distribution, failing to fully consider the local distribution characteristics of sample points. To address the issue of density imbalance in the datasets, the relative density-based clustering algorithm for identifying diverse density clusters (IDDC) was proposed [<xref ref-type="bibr" rid="B21">21</xref>]. This algorithm looks for unallocated points from the perspective of clustering and designs a new distribution strategy, but it requires manual specification of two parameters. Based on FKNN-DPC, Xie further proposed the SFKNN-DPC algorithm [<xref ref-type="bibr" rid="B19">19</xref>]. This algorithm takes into account the contribution of each feature to the distance between data points and designs a divide-and-conquer distribution strategy, thereby achieving better robustness. To address the problem that DPC cannot find the clustering centers of sparse clusters, ANN-DPC was proposed [<xref ref-type="bibr" rid="B20">20</xref>]. This algorithm adopts an adaptive nearest neighbor algorithm and combines breadth-first search and fuzzy weighted adaptive nearest neighbor algorithm to design a new distribution strategy. Although ANN-DPC performs well in terms of performance, it still requires the pre-specification of the number of clusters. In addition, Zhu proposed a density peak clustering algorithm based on shared proximity and probability allocation (SP-DPC) [<xref ref-type="bibr" rid="B22">22</xref>], which utilizes the transfer probability allocation strategy and evidence probability allocation strategy to jointly optimize the distribution of remaining data points. However, the parameter k of this algorithm still needs to be specified manually.</p>
<p>To further improve the limitations of the DPC algorithm and enhance the accuracy of intrusion detection. This paper proposes a density peak clustering algorithm integrating relative mutual K-nearest neighbor local density and fuzzy allocation strategy (RMKNN-FDPC). The main contributions of this algorithm are as follows:<list list-type="simple">
<list-item>
<p>
<inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mo>&#x2022;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> First, the RMKNN-FDPC algorithm mitigates the dependence on parameter selection inherent in traditional DPC methods by incorporating relative mutual K-nearest neighbor for local density estimation, thereby enabling more accurate identification of cluster centers.</p>
</list-item>
<list-item>
<p>
<inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:mo>&#x2022;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> Second, RMKNN-FDPC employs a fuzzy allocation strategy for data point allocation, which effectively reduces the likelihood of error propagation through its two-stage allocation mechanism.</p>
</list-item>
<list-item>
<p>
<inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:mo>&#x2022;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> Third, empirical validation was carried out across multiple dimensions: (1)The KDD-CUP-1999 test for intrusion detection, (2) synthetic dataset testing, (3) real-world dataset verification, and (4) facial image dataset evaluation, accompanied by systematic parameter sensitivity assessment and run time analysis. The results show that RMKNN-FDPC achieves excellent clustering performance and enhanced robustness when dealing with intrusion detection, and performs particularly well in various data scenarios.</p>
</list-item>
</list>
</p>
<p>This paper presents an original framework aimed at advancing density peak clustering methodologies and lays a theoretical foundation for their promotion in intrusion detection applications. This study is organized as follows: <xref ref-type="sec" rid="s2">Section 2</xref> provides a comprehensive analysis of the traditional DPC algorithm. <xref ref-type="sec" rid="s3">Section 3</xref> presents in detail the proposed algorithm. Experimental results and corresponding analyses are discussed in <xref ref-type="sec" rid="s4">Section 4</xref>. The paper concludes with a discussion of its contributions in <xref ref-type="sec" rid="s5">Section 5</xref>, followed by recommendations for further exploration in this research domain.</p>
</sec>
<sec id="s2">
<title>2 Traditional DPC algorithm and analysis</title>
<p>DPC is an unsupervised clustering algorithm based on local density and relative distance, which was proposed by Rodriguez and Laio in 2014. The core idea of this algorithm is that the cluster centers usually have high local density and are far away from other high-density points. DPC can automatically identify the cluster centers and handle clusters of any shape, while having certain robustness to noise data. This section introduces the DPC algorithm from three aspects: algorithm principle, formula definition, and algorithm analysis.</p>
<sec id="s2-1">
<title>2.1 The principle of DPC algorithm</title>
<p>The principle of the DPC algorithm is based on two assumptions: (1) The local density of the cluster center (density peak point) should be much higher than that of its neighboring points, this indicates that the cluster center is usually located in the area where the data points are relatively dense, while the density of the surrounding points is relatively low. (2) The relative distance between different cluster centers is large, the distance from one cluster center to other higher-density points is relatively far, which indicates that there is sufficient separation between different cluster centers.</p>
<p>Based on two assumptions, the core process of DPC for clustering are as follows: (1) Measure local density: measure the density around each data point. (2) Calculate relative distance: calculate the minimum distance from each data point to the nearest high-density point. (3) Select cluster centers: choose the cluster centers based on local density and relative distance. (4) Allocate remaining points: allocate the non-center points to the cluster to which the nearest higher-density point belongs.</p>
</sec>
<sec id="s2-2">
<title>2.2 The calculation formula of DPC algorithm</title>
<sec id="s2-2-1">
<title>2.2.1 Local density measure</title>
<p>The local density <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the density around data point <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. DPC offers two commonly used calculation methods.</p>
<p>One method is the truncation distance method:<disp-formula id="e1">
<mml:math id="m6">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mi>&#x3c7;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>Where, <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the distance between data points <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf9">
<mml:math id="m10">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the truncation distance, which is also the sole user-defined input parameter, <inline-formula id="inf10">
<mml:math id="m11">
<mml:mrow>
<mml:mi>&#x3c7;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a function that takes the value of 1 when <inline-formula id="inf11">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and 0 otherwise.</p>
<p>The second method is the Gaussian kernel function method:<disp-formula id="e2">
<mml:math id="m13">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>The Gaussian kernel function method calculates the local density more smoothly and is suitable for dealing with noisy data.</p>
</sec>
<sec id="s2-2-2">
<title>2.2.2 Relative distance calculation</title>
<p>Relative distance <inline-formula id="inf12">
<mml:math id="m14">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the minimum distance from data point <inline-formula id="inf13">
<mml:math id="m15">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> to other high-density points.<disp-formula id="e3">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>For the point with the highest density, its relative distance is defined as the maximum distance to all other points.</p>
</sec>
<sec id="s2-2-3">
<title>2.2.3 Cluster center selection</title>
<p>By drawing a decision graph, that is, a two-dimensional graph of <inline-formula id="inf14">
<mml:math id="m17">
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf15">
<mml:math id="m18">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, the points with larger local density and relative distance are selected as the cluster centers. The calculation method of the reference value of the cluster center <inline-formula id="inf16">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is as follows.<disp-formula id="e4">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2217;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
</sec>
<sec id="s2-2-4">
<title>2.2.4 Remaining points allocation</title>
<p>For non-central points, allocate them to the cluster to which the nearest higher-density point belongs.<disp-formula id="e5">
<mml:math id="m21">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>arg</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>
<inline-formula id="inf17">
<mml:math id="m22">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the cluster label of data point <inline-formula id="inf18">
<mml:math id="m23">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</sec>
</sec>
<sec id="s2-3">
<title>2.3 Analysis of DPC algorithm</title>
<p>The DPC algorithm has three major advantages. Firstly, it does not require presetting the number of clusters: the cluster centers can be selected intuitively through the decision graph, avoiding the problem that traditional algorithms (such as K-means) need to preset the number of clusters in advance. Secondly, it can handle clusters of any shape: based on the density characteristic, it can identify non-spherical clusters. Thirdly, it has strong robustness: it has a certain tolerance to noisy data. However, DPC has two fatal flaws. At first, it is highly sensitive to the parameter <inline-formula id="inf19">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>: the calculation of the key parameter local density in the algorithm relies on the truncation distance. Next, there is a chain allocation problem: when distributing the remaining points, incorrect allocation may spread and affect the final clustering result.</p>
</sec>
</sec>
<sec id="s3">
<title>3 The proposed RMKNN-FDPC algorithm</title>
<p>This section elaborates in detail on the RMKNN-FDPC that integrates relative mutual K-nearest neighbor local density and fuzzy allocation. The core contributions of this algorithm mainly lie in the following two aspects: Firstly, we have designed a local density calculation method based on relative mutual K-nearest neighbor, which not only can effectively distinguish data points of different density levels but also provides a reliable basis for the accurate selection of cluster centers. Secondly, in response to the error propagation problem caused by the chain-like allocation in the traditional DPC algorithm, we have proposed a fuzzy allocation strategy for remaining points based on mutual K-nearest neighbor, thereby significantly improving the accuracy of clustering results. The following subsections will conduct in-depth discussions around these technical details.</p>
<sec id="s3-1">
<title>3.1 Relative mutual K-nearest neighbor local density</title>
<p>In DPC, the calculation of local density usually relies on the key parameter <inline-formula id="inf20">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, the truncation distance. However, determining the optimal value of <inline-formula id="inf21">
<mml:math id="m26">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is often challenging. To address this issue, methods such as DPC-KNN and FKNN-DPC adopt K-nearest neighbor to calculate local density, thereby avoiding the selection of <inline-formula id="inf22">
<mml:math id="m27">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Nevertheless, the local density calculated by these methods and most derived DPC algorithms still belong to absolute density and are difficult to effectively distinguish datasets with different density levels. Therefore, in this section, a new relative density calculation method, called relative mutual K-nearest neighbor local density, is proposed, aiming to enhance the ability to distinguish multi-density-level data, such as nested clusters.</p>
<p>Firstly, the set of K-nearest neighbor for data point <inline-formula id="inf23">
<mml:math id="m28">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, denoted as <inline-formula id="inf24">
<mml:math id="m29">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, is mathematically established in <xref ref-type="disp-formula" rid="e6">Equation 6</xref>.<disp-formula id="e6">
<mml:math id="m30">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>Where, <inline-formula id="inf25">
<mml:math id="m31">
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> represents the size of the dataset; <inline-formula id="inf26">
<mml:math id="m32">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> indicates the Euclidean distance between data points <inline-formula id="inf27">
<mml:math id="m33">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf28">
<mml:math id="m34">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>; <inline-formula id="inf29">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the distance between data point <inline-formula id="inf30">
<mml:math id="m36">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and the <inline-formula id="inf31">
<mml:math id="m37">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>-th nearest neighbor.</p>
<p>Subsequently, the inverse K-nearest neighbor set <inline-formula id="inf32">
<mml:math id="m38">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf33">
<mml:math id="m39">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> was defined, and the calculation method is as shown in <xref ref-type="disp-formula" rid="e7">Equation 7</xref>.<disp-formula id="e7">
<mml:math id="m40">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>The inverse K-nearest neighbor method can provide a deep understanding of the relationships and structures among data points.</p>
<p>Secondly, the mutual K-nearest neighbor set <inline-formula id="inf34">
<mml:math id="m41">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf35">
<mml:math id="m42">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is defined, and the specific computation process is explicitly given by <xref ref-type="disp-formula" rid="e8">Equation 8</xref>.<disp-formula id="e8">
<mml:math id="m43">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>R</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>
<inline-formula id="inf36">
<mml:math id="m44">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the intersection of <inline-formula id="inf37">
<mml:math id="m45">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf38">
<mml:math id="m46">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, provided that the intersection is not empty; otherwise, <inline-formula id="inf39">
<mml:math id="m47">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is approximately equivalent to <inline-formula id="inf40">
<mml:math id="m48">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The mutual K-nearest neighbor approach provides enhanced local structure characterization by incorporating both direct neighborhood relationships and their reciprocal connections between data points.</p>
<p>Next, the absolute density <inline-formula id="inf41">
<mml:math id="m49">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">abs</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf42">
<mml:math id="m50">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is defined as in <xref ref-type="disp-formula" rid="e9">Equation 9</xref>.<disp-formula id="e9">
<mml:math id="m51">
<mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">abs</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>Finally, based on <xref ref-type="disp-formula" rid="e1">Equations 1</xref>, <xref ref-type="disp-formula" rid="e2">2</xref>, the relative mutual K-nearest neighbor local density <inline-formula id="inf43">
<mml:math id="m52">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf44">
<mml:math id="m53">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is defined, and the calculation method is as shown in <xref ref-type="disp-formula" rid="e10">Equation 10</xref>.<disp-formula id="e10">
<mml:math id="m54">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">abs</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">abs</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
<p>The relative mutual K-nearest neighbor local density <inline-formula id="inf45">
<mml:math id="m55">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the ratio of the density of data point <inline-formula id="inf46">
<mml:math id="m56">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> to the average density of surrounding points in the mutual K-nearest neighbor set. The advantage of using mutual K-nearest neighbor local density is that it can distinguish complex manifold datasets with uneven density distribution, and is also conducive to the extraction of cluster centers.</p>
</sec>
<sec id="s3-2">
<title>3.2 The fuzzy allocation strategy for the remaining points of mutual K-nearest neighbor</title>
<p>The DPC algorithm has an error propagation problem during the process of the remaining point allocation, that is, the incorrect allocation of a certain data point may trigger a chain reaction, thereby significantly reducing the clustering performance. To address this limitation, this paper proposes a fuzzy remaining point allocation strategy based on mutual K-nearest neighbor. This strategy consists of two stages: Firstly, Strategy 1 (<xref ref-type="table" rid="T1">Table 1</xref>) adopts a mutual K-nearest neighbor-based approach to determine preferential data point allocation; Secondly, in Strategy 2 (<xref ref-type="table" rid="T2">Table 2</xref>), the secondary allocation is carried out by calculating the fuzzy membership degree of data points. For the remaining points that have not been allocated after the above two stages, the allocation method of the DPC algorithm is finally adopted for processing. This allocation strategy effectively reduces the risk of error propagation and can improve the accuracy of clustering results.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Strategy 1.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Strategy 1. Priority allocation based on mutual K-nearest neighbor</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Step 1: Select an unvisited cluster center <inline-formula id="inf47">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> from the cluster center set <inline-formula id="inf48">
<mml:math id="m58">
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and mark <inline-formula id="inf49">
<mml:math id="m59">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> as visited</td>
</tr>
<tr>
<td align="left">Step 2: Using <xref ref-type="disp-formula" rid="e6">Equations 6</xref>&#x2013;<xref ref-type="disp-formula" rid="e8">8</xref>, calculate the mutual K-nearest neighbor set <inline-formula id="inf50">
<mml:math id="m60">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of <inline-formula id="inf51">
<mml:math id="m61">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and sequentially enter the data points of this set into the queue <inline-formula id="inf52">
<mml:math id="m62">
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Meanwhile, set the labels of the data points in the set as the category of <inline-formula id="inf53">
<mml:math id="m63">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">Step 3: Remove the head element <inline-formula id="inf54">
<mml:math id="m64">
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>; process any data point <inline-formula id="inf55">
<mml:math id="m65">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in a loop. If <inline-formula id="inf56">
<mml:math id="m66">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> has not been allocated a category, set its category to be the same as that of <inline-formula id="inf57">
<mml:math id="m67">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and add <inline-formula id="inf58">
<mml:math id="m68">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> to the queue <inline-formula id="inf59">
<mml:math id="m69">
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>; remove the head element</td>
</tr>
<tr>
<td align="left">Step 4: If the queue <inline-formula id="inf60">
<mml:math id="m70">
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is not empty, proceed to step 3</td>
</tr>
<tr>
<td align="left">Step 5: If there are still unvisited cluster centers, return to step 1; otherwise, terminate Strategy 1</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Strategy 2.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Strategy 2. Quadratic allocation based on mutual K-nearest neighbor and fuzzy membership</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Step 1: For any unallocated data point <inline-formula id="inf67">
<mml:math id="m79">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, calculate its mutual K-nearest neighbor <inline-formula id="inf68">
<mml:math id="m80">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">Step 2: Calculate the fuzzy membership degree <inline-formula id="inf69">
<mml:math id="m81">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>of each data point <inline-formula id="inf70">
<mml:math id="m82">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>using <xref ref-type="disp-formula" rid="e11">Equation 11</xref>
</td>
</tr>
<tr>
<td align="left">Step 3: Obtain the cluster label <inline-formula id="inf71">
<mml:math id="m83">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>of data point <inline-formula id="inf72">
<mml:math id="m84">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>by using <xref ref-type="disp-formula" rid="e12">Equation 12</xref>
</td>
</tr>
<tr>
<td align="left">Step 4: If there are still data points to be allocated, return to Step 1</td>
</tr>
<tr>
<td align="left">Step 5: If there are isolated data points (meaning data points without K-nearest neighbor), then use <xref ref-type="disp-formula" rid="e5">Equation 5</xref> to allocate the remaining points; otherwise, terminate Strategy 2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Strategy 1 mainly utilizes mutual K-nearest neighbor and queues. The detailed procedure is outlined below.</p>
<p>Assuming that the allocation Strategy 1 is completed, after that, for the data points that have been allocated, <inline-formula id="inf61">
<mml:math id="m71">
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> clusters <inline-formula id="inf62">
<mml:math id="m72">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are formed. For any unallocated data point <inline-formula id="inf63">
<mml:math id="m73">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, its fuzzy membership degree <inline-formula id="inf64">
<mml:math id="m74">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is defined as follows in <xref ref-type="disp-formula" rid="e11">Equation 11</xref>.<disp-formula id="e11">
<mml:math id="m75">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>K</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>The cluster label <inline-formula id="inf65">
<mml:math id="m76">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf66">
<mml:math id="m77">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> can be obtained based on the fuzzy membership degree. The calculation method is shown as <xref ref-type="disp-formula" rid="e12">Equation 12</xref>.<disp-formula id="e12">
<mml:math id="m78">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>arg</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>Therefore, the specific steps of the secondary allocation based on mutual K-nearest neighbor and fuzzy membership degrees are as follows.</p>
</sec>
<sec id="s3-3">
<title>3.3 RMKNN-FDPC algorithm and analysis</title>
<p>Combining the relative mutual K-nearest neighbor local density and the remaining point fuzzy allocation strategies, this section introduces the execution process of the RMKNN-FDPC algorithm, as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. Meanwhile, <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> provides detailed steps for the algorithm.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Flowchart of algorithm 1.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g001.tif">
<alt-text content-type="machine-generated">Flowchart illustrating a clustering process. It starts with inputting a dataset and parameter k, then standardizing data and calculating the Euclidean distance matrix. It proceeds to calculate K-nearest neighbors and related metrics. Next, it constructs a decision diagram to select cluster centers. An allocation strategy is implemented; if all data points are allocated, it ends, otherwise, another strategy is applied, finally outputting data labels.</alt-text>
</graphic>
</fig>
<p>
<statement content-type="algorithm" id="Algorithm_1">
<label>Algorithm 1</label>
<p>Density Peak Clustering Algorithm Integrating Relative Mutual K-Nearest Neighbor Local Density and Fuzzy Allocation Strategy.<list list-type="simple">
<list-item>
<p>Require: Dateset <inline-formula id="inf73">
<mml:math id="m85">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, parameter <inline-formula id="inf74">
<mml:math id="m86">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>Ensure: Label of data points.</p>
</list-item>
<list-item>
<p>&#x2003;1: Normalize the dataset <inline-formula id="inf75">
<mml:math id="m87">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and calculate the Euclidean distance between data points.</p>
</list-item>
<list-item>
<p>&#x2003;2: Utilize <xref ref-type="disp-formula" rid="e6">Equations 6</xref>&#x2013;<xref ref-type="disp-formula" rid="e8">8</xref> to calculate the K-nearest neighbor, inverse K-nearest neighbor and mutual K-nearest neighbor of each data point <inline-formula id="inf76">
<mml:math id="m88">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> in sequence.</p>
</list-item>
<list-item>
<p>&#x2003;3: Utilize <xref ref-type="disp-formula" rid="e9">Equations 9</xref>, <xref ref-type="disp-formula" rid="e10">10</xref> to calculate the relative mutual K-nearest neighbor local density <inline-formula id="inf77">
<mml:math id="m89">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf78">
<mml:math id="m90">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>&#x2003;4: Utilize <xref ref-type="disp-formula" rid="e3">Equation 3</xref> to calculate the relative distance <inline-formula id="inf79">
<mml:math id="m91">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> of data point <inline-formula id="inf80">
<mml:math id="m92">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>&#x2003;5: Utilize <xref ref-type="disp-formula" rid="e4">Equation 4</xref> to construct the decision graph and select the appropriate cluster centers.</p>
</list-item>
<list-item>
<p>&#x2003;6: Prioritize the allocation of data points by applying Strategy 1.</p>
</list-item>
<list-item>
<p>&#x2003;7: Conduct a secondary allocation of data points by using Strategy 2.</p>
</list-item>
</list>
</p>
</statement>
</p>
<p>Subsequently, a comprehensive analysis will be performed to examine the computational complexity of individual steps as well as the overall time complexity of the RMKNN-FDPC algorithm. Suppose the size of the dataset is <inline-formula id="inf81">
<mml:math id="m93">
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and the K-nearest neighbor parameter is <inline-formula id="inf82">
<mml:math id="m94">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>: (1) Step 1 is to calculate the Euclidean distance between all data points, which requires calculating the distance matrix of <inline-formula id="inf83">
<mml:math id="m95">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and the time complexity is <inline-formula id="inf84">
<mml:math id="m96">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. (2) Step 2 separately calculates K-nearest neighbor, inverse K-nearest neighbor, and mutual K-nearest neighbor. K-nearest neighbor is to find the K-nearest neighbor for each data point. In the worst case, the time complexity is <inline-formula id="inf85">
<mml:math id="m97">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>; inverse K-nearest neighbor is the same as K-nearest neighbor, and the time complexity is also <inline-formula id="inf86">
<mml:math id="m98">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>; mutual K-nearest neighbor mainly checks whether the K-nearest neighbor of each data point are also its inverse K-nearest neighbor, and the time complexity is <inline-formula id="inf87">
<mml:math id="m99">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. (3) Step 3 calculates the relative mutual K-nearest neighbor local density, and the time complexity is <inline-formula id="inf88">
<mml:math id="m100">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Step 4 calculates the relative distance, and the time complexity is <inline-formula id="inf89">
<mml:math id="m101">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Step 5 builds the two-dimensional decision graph, and the time complexity is <inline-formula id="inf90">
<mml:math id="m102">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and the time complexity of selecting cluster centers is also <inline-formula id="inf91">
<mml:math id="m103">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Step 6 prioritizes the allocation of data points, and in the worst case, the time complexity is <inline-formula id="inf92">
<mml:math id="m104">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Step 7 performs secondary allocation of data points, and in the worst case, it is <inline-formula id="inf93">
<mml:math id="m105">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Ultimately, our analysis reveals that RMKNN-FDPC preserves the <inline-formula id="inf94">
<mml:math id="m106">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> time complexity characteristic of the original DPC methodology.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Experiment and analysis</title>
<sec id="s4-1">
<title>4.1 Experimental preparation</title>
<p>To evaluate the network intrusion detection effect and clustering ability of the proposed RMKNN-FDPC method, this paper selects the KDD-CUP-1999 dataset for network intrusion detection, six artificially synthesized two-dimensional datasets with varying shapes and uneven densities, and six real datasets from the UCI database for experiments. [<xref ref-type="bibr" rid="B23">23</xref>]. The specific information of the datasets is shown in <xref ref-type="table" rid="T3">Tables 3</xref>&#x2013;<xref ref-type="table" rid="T5">5</xref>. Four classic clustering evaluation indicators are selected, namely, Accuracy (ACC) [<xref ref-type="bibr" rid="B11">11</xref>], the Adjusted Rand Index (ARI) [<xref ref-type="bibr" rid="B24">24</xref>], the Adjusted Mutual Information (AMI) [<xref ref-type="bibr" rid="B25">25</xref>], and the Fowlkes Mallows Index (FMI) [<xref ref-type="bibr" rid="B26">26</xref>,<xref ref-type="bibr" rid="B27">27</xref>]. The maximum value of each evaluation indicator is 1, and the closer the indicator value is to 1, the better the clustering effect. Five algorithms are chosen for comparison, including the original DPC [<xref ref-type="bibr" rid="B9">9</xref>], DPC-KNN [<xref ref-type="bibr" rid="B13">13</xref>], FKNN-DPC [<xref ref-type="bibr" rid="B14">14</xref>], the density peaks clustering based on weighted local density sequence and nearest neighbor assignment (DPCSA) [<xref ref-type="bibr" rid="B28">28</xref>], and the density peaks clustering based on local fair density and fuzzy K-nearest neighbor membership allocation strategy (LF-DPC) [<xref ref-type="bibr" rid="B29">29</xref>].</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>KDD-CUP-1999 dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Attack types</th>
<th align="center">Size</th>
<th align="center">Dimension</th>
<th align="center">Clusters</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Normal</td>
<td align="center">97,278</td>
<td align="center">41</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">Probe</td>
<td align="center">4,107</td>
<td align="center">41</td>
<td align="center">4</td>
</tr>
<tr>
<td align="left">Dos</td>
<td align="center">391,458</td>
<td align="center">41</td>
<td align="center">6</td>
</tr>
<tr>
<td align="left">R2L</td>
<td align="center">1,126</td>
<td align="center">41</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">U2R</td>
<td align="center">52</td>
<td align="center">41</td>
<td align="center">4</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Artificially synthesized datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Size</th>
<th align="center">Dimension</th>
<th align="center">Clusters</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Jain</td>
<td align="center">373</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">Spiral</td>
<td align="center">312</td>
<td align="center">2</td>
<td align="center">3</td>
</tr>
<tr>
<td align="left">Pathbased</td>
<td align="center">300</td>
<td align="center">2</td>
<td align="center">3</td>
</tr>
<tr>
<td align="left">Compound</td>
<td align="center">399</td>
<td align="center">2</td>
<td align="center">6</td>
</tr>
<tr>
<td align="left">Twomoons</td>
<td align="center">1,502</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">Ring</td>
<td align="center">1,000</td>
<td align="center">2</td>
<td align="center">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Real datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Size</th>
<th align="center">Dimension</th>
<th align="center">Clusters</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Libras</td>
<td align="center">360</td>
<td align="center">91</td>
<td align="center">15</td>
</tr>
<tr>
<td align="left">SCADI</td>
<td align="center">70</td>
<td align="center">206</td>
<td align="center">7</td>
</tr>
<tr>
<td align="left">Ecoli</td>
<td align="center">336</td>
<td align="center">8</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">Banknote</td>
<td align="center">1,372</td>
<td align="center">4</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">WDBC</td>
<td align="center">569</td>
<td align="center">30</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">Dermatology</td>
<td align="center">366</td>
<td align="center">33</td>
<td align="center">6</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-2">
<title>4.2 Experimental and analytical study on KDD-CUP-1999 dataset</title>
<p>For this experiment, the KDD-CUP-1999 dataset was selected. This dataset is mainly used for network intrusion detection and includes four types of attacks: U2R, DOS, R2L and Probe. The KDD-CUP-1999 dataset presents a highly imbalanced density distribution, with normal traffic accounting for only 19.69%, while attack traffic accounts for 80.31%, mainly DoS attacks (79.24%), and the remaining attacks (Probe, R2L, U2R) have a very low proportion. Numerical features mostly exhibit a long tail distribution and are close to zero, while categorical features such as TCP and HTTP dominate. Different attack types show significant differences in protocol and traffic patterns.</p>
<p>Due to the large scale of the dataset, two test sets (U1 and U2) were randomly selected from the dataset, each containing 1,000 and 800 data records respectively. In each group, both normal and abnormal data records are included.</p>
<p>The evaluation indicators for the experiment are ACC and FMI. The experimental results are presented in <xref ref-type="table" rid="T6">Table 6</xref> and the optimal clustering value is highlighted in bold. Through the experiment, it can be found that for both U1 and U2, the detection accuracy of the RMKNN-FDPC algorithm is higher than that of other comparison algorithms, and it can effectively deal with network intrusion detection. This is because the advantages brought by the improvement of local density and the optimization of remaining point allocation.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Cluster results on U1 and U2 from KDD-CUP-1999 dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithm</th>
<th align="center">ACC</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
<th align="center">ACC</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
</tr>
<tr>
<th colspan="4" align="center">U1</th>
<th colspan="3" align="center">U2</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DPC</td>
<td align="center">0.8343</td>
<td align="center">0.8429</td>
<td align="center">2%</td>
<td align="center">0.8613</td>
<td align="center">0.8725</td>
<td align="center">8%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.8186</td>
<td align="center">0.8227</td>
<td align="center">2%</td>
<td align="center">0.7875</td>
<td align="center">0.8579</td>
<td align="center">1%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.8143</td>
<td align="center">0.8182</td>
<td align="center">4</td>
<td align="center">0.8613</td>
<td align="center">0.8725</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.8043</td>
<td align="center">0.7708</td>
<td align="center">-</td>
<td align="center">0.8613</td>
<td align="center">0.8725</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">0.8214</td>
<td align="center">0.8249</td>
<td align="center">4</td>
<td align="center">0.6825</td>
<td align="center">0.5882</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>0.8586</bold>
</td>
<td align="center">
<bold>0.8692</bold>
</td>
<td align="center">4</td>
<td align="center">
<bold>0.8950</bold>
</td>
<td align="center">
<bold>0.9147</bold>
</td>
<td align="center">8</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values indicate that the corresponding algorithm achieved optimal performance on specific evaluation metrics (ACC, FMI) on U1 and U2 from KDD-CUP-1999 dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s4-3">
<title>4.3 Experimental and analytical study on synthetic datasets</title>
<p>This section presents the visualized clustering effect of the RMKNN-FDPC algorithm and five comparison algorithms on the synthetic datasets, as shown in <xref ref-type="fig" rid="F2">Figures 2</xref>&#x2013;<xref ref-type="fig" rid="F7">7</xref> respectively. The cluster centers in each sub-figure are represented by red squares. The specific evaluation values of clustering indicators are shown in <xref ref-type="table" rid="T7">Table 7</xref>. The bold font indicates the best clustering result for each dataset.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Visualization of clustering results of different algorithms on the Jain dataset. <bold>(a)</bold> DPC, <bold>(b)</bold> DPC-KNN, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> DPCSA, <bold>(e)</bold> LF-DPC, <bold>(f)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g002.tif">
<alt-text content-type="machine-generated">Six scatter plots show different clustering methods: (a) DPC, (b) DPC-KNN, (c) FKNN-DPC, (d) DPCSA, (e) LF-DPC, (f) RMKNN-FDPC. Each plot displays data points in two colors, illustrating distinct clusters. Notable center points are labeled with numbers. The arrangement and density of clusters vary slightly across methods.</alt-text>
</graphic>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Visualization of clustering results of different algorithms on the Spiral dataset. <bold>(a)</bold> DPC, <bold>(b)</bold> DPC-KNN, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> DPCSA, <bold>(e)</bold> LF-DPC, <bold>(f)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g003.tif">
<alt-text content-type="machine-generated">Six scatter plots labeled a) DPC, b) DPC-KNN, c) FKNN-DPC, d) DPCSA, e) LF-DPC, and f) RMKNN-FDPC display spiral patterns in yellow, blue and green. Each plot features marked data points with labels like 199, 204, 305, and 106, indicating variations in clustering techniques. Axes range from 0 to 1.</alt-text>
</graphic>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Visualization of clustering results of different algorithms on the Pathbased dataset. <bold>(a)</bold> DPC, <bold>(b)</bold> DPC-KNN, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> DPCSA, <bold>(f)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g004.tif">
<alt-text content-type="machine-generated">Six scatter plots titled (a) DPC, (b) DPC-KNN, (c) FKNN-DPC, (d) DPCSA, (e) LF-DPC, and (f) RMKNN-FDPC show clustered data points. Clusters are denoted by different colors and each plot presents variations in cluster formation across different clustering methods.</alt-text>
</graphic>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Visualization of clustering results of different algorithms on the Compound dataset. <bold>(a)</bold> DPC. <bold>(b)</bold> DPC-KNN. <bold>(c)</bold> FKNN-DPC. <bold>(d)</bold> DPCSA. <bold>(e)</bold> LF-DPC, <bold>(f)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g005.tif">
<alt-text content-type="machine-generated">Six scatter plots display clustering results using different methods: DPC (a), DPC-KNN (b), FKNN-DPC (c), DPCSA (d), LF-DPC (e), RMKNN-FDPC (f). Each plot shows clusters in different colors, with center points marked by red squares and labeled numerically. Axes range from zero to one.</alt-text>
</graphic>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Visualization of clustering results of different algorithms on the Twomoons dataset. <bold>(a)</bold> DPC, <bold>(b)</bold> DPC-KNN, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> DPCSA, <bold>(e)</bold> LF-DPC, <bold>(f)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g006.tif">
<alt-text content-type="machine-generated">Six scatter plots showing data clustering methods: (a) DPC, (b) DPC-KNN, (c) FKNN-DPC, (d) DPCSA, (e) LF-DPC, and (f) RMKNN-EDPC. Plots use yellow and green dots to represent clusters, with varying orientations and overlaps. Center points are labeled with numbers to indicate specific features or findings.</alt-text>
</graphic>
</fig>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Visualization of clustering results of different algorithms on the Ring dataset. <bold>(a)</bold> DPC, <bold>(b)</bold> DPC-KNN, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> DPCSA, <bold>(e)</bold> LF-DPC, <bold>(f)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g007.tif">
<alt-text content-type="machine-generated">Six scatter plots labeled a to f show variations of clustering algorithms. Each plot displays two concentric circular clusters with green and yellow data points. The algorithms are DPC, DPC-KNN, FKNN-DPC, DPCSA, LF-DPC, and RMKNN-FDPC. Several highlighted data points within and outside clusters have numeric labels, indicating different cluster densities and boundaries.</alt-text>
</graphic>
</fig>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Cluster results on Synthetic datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithm</th>
<th align="center">ARI</th>
<th align="center">AMI</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
<th align="center">ARI</th>
<th align="center">AMI</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td colspan="5" align="center">Jain</td>
<td colspan="4" align="center">Spiral</td>
</tr>
<tr>
<td align="left">DPC</td>
<td align="center">0.6183</td>
<td align="center">0.5396</td>
<td align="center">0.8386</td>
<td align="center">1%</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">2%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.7146</td>
<td align="center">0.6183</td>
<td align="center">0.8819</td>
<td align="center">2%</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">2%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.8224</td>
<td align="center">0.7092</td>
<td align="center">0.9359</td>
<td align="center">43</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">6</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.0442</td>
<td align="center">0.2167</td>
<td align="center">0.5924</td>
<td align="center">-</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">0.4059</td>
<td align="center">0.2936</td>
<td align="center">0.8270</td>
<td align="center">40</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">5</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">8</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">9</td>
</tr>
<tr>
<td colspan="5" align="center">Pathbased</td>
<td colspan="4" align="center">Compound</td>
</tr>
<tr>
<td align="left">DPC</td>
<td align="center">0.4530</td>
<td align="center">0.4997</td>
<td align="center">0.6585</td>
<td align="center">2%</td>
<td align="center">0.5989</td>
<td align="center">0.7798</td>
<td align="center">0.6963</td>
<td align="center">2%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.4602</td>
<td align="center">0.5080</td>
<td align="center">0.6617</td>
<td align="center">2%</td>
<td align="center">0.8087</td>
<td align="center">0.7913</td>
<td align="center">0.8661</td>
<td align="center">0.50%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.7323</td>
<td align="center">0.7744</td>
<td align="center">0.8226</td>
<td align="center">8</td>
<td align="center">0.8479</td>
<td align="center">0.8341</td>
<td align="center">0.8941</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.6133</td>
<td align="center">0.7073</td>
<td align="center">0.7511</td>
<td align="center">-</td>
<td align="center">0.8284</td>
<td align="center">0.8392</td>
<td align="center">0.8707</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">
<bold>0.9699</bold>
</td>
<td align="center">
<bold>0.9525</bold>
</td>
<td align="center">
<bold>0.9799</bold>
</td>
<td align="center">8</td>
<td align="center">0.8409</td>
<td align="center">0.8231</td>
<td align="center">0.8891</td>
<td align="center">10</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">0.9109</td>
<td align="center">0.8769</td>
<td align="center">0.9406</td>
<td align="center">11</td>
<td align="center">
<bold>0.9871</bold>
</td>
<td align="center">
<bold>0.9700</bold>
</td>
<td align="center">
<bold>0.9903</bold>
</td>
<td align="center">7</td>
</tr>
<tr>
<td colspan="5" align="center">Twomoons</td>
<td colspan="4" align="center">Ring</td>
</tr>
<tr>
<td align="left">DPC</td>
<td align="center">0.5896</td>
<td align="center">0.5524</td>
<td align="center">0.8075</td>
<td align="center">2%</td>
<td align="center">0.1248</td>
<td align="center">0.2041</td>
<td align="center">0.6473</td>
<td align="center">2%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.4921</td>
<td align="center">0.4881</td>
<td align="center">0.7604</td>
<td align="center">2%</td>
<td align="center">0.3130</td>
<td align="center">0.3602</td>
<td align="center">0.6892</td>
<td align="center">3%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.3862</td>
<td align="center">0.4254</td>
<td align="center">0.7103</td>
<td align="center">6</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.2746</td>
<td align="center">0.3647</td>
<td align="center">0.6607</td>
<td align="center">-</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">0.2746</td>
<td align="center">0.3647</td>
<td align="center">0.6607</td>
<td align="center">8</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">10</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">6</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">
<bold>1</bold>
</td>
<td align="center">8</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values indicate that the corresponding algorithm achieved optimal performance on specific evaluation metric (ARI, AMI, FMI) of the synthetic dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The Jain dataset is a typical manifold dataset with uneven density distribution, consisting of two semi-circular arcs with different densities. <xref ref-type="fig" rid="F2">Figure 2</xref> shows the clustering results of each algorithm on this dataset. In DPC and its derivative algorithms, DPC, DPC-KNN, and DPCSA failed to correctly identify the cluster centers of the sparse clusters. The main reason is that the local density calculation did not fully consider the local distribution characteristics of the sample points. Although FKNN-DPC and LF-DPC can correctly find the cluster centers of the sparse clusters, in the upper semi-circle, some sample points of the sparse clusters are wrongly allocated to the clusters in the lower semi-circle. This phenomenon is mainly due to the limitations of the allcoation strategy. Comparatively, our proposed method demonstrates enhanced capability in precisely detecting the centroids of both clusters, but also can completely and correctly complete the allcoation of the remaining points. This advantage mainly benefits from the novel local density calculation method proposed in this paper and the fuzzy allcoation strategy of remaining points based on mutual K-nearest neighbor, thereby effectively solving the key problems in the identification and allcoation of sparse clusters.</p>
<p>The Spiral dataset consists of three spiral-shaped clusters and is a typical dataset with non-spherical cluster distribution. As can be seen from <xref ref-type="fig" rid="F3">Figure 3</xref>, all six algorithms can achieve perfect clustering results, with only minor differences in the selection of cluster centers. This result fully demonstrates the advantages of density-based clustering algorithms in dealing with complex manifold structures. At the same time, the experimental results also further verify that the RMKNN-FDPC algorithm proposed in this paper has significant superiority in detecting non-spherical clusters of any shape, and can accurately identify complex cluster structures while maintaining high clustering accuracy.</p>
<p>The Pathbased dataset is a typical manifold dataset, whose structure is composed of a circular cluster enclosing two spherical clusters. Due to the fact that the sample points on the left and right sides of the spherical clusters are closely proximate to the circular cluster, it is prone to cause misallocation, which poses a significant challenge for most clustering algorithms. As can be seen from the experimental results in <xref ref-type="fig" rid="F4">Figure 4</xref>, both DPC and its derivative algorithms can successfully identify three cluster centers. However, in DPC and DPC-KNN, the sample points on both sides of the circular cluster are wrongly allocated to the spherical cluster. This phenomenon mainly stems from the allocation strategy that solely relies on the distance principle. Although FKNN-DPC and DPCSA have improved the allocation strategy for the remaining points and successfully avoided the misallocation of sample points on the left side of the circular cluster, there are still misallocations for the sample points on the right side. In contrast, LF-DPC and RMKNN-FDPC can more accurately allocate the sample points on both sides of the circular cluster to the correct clusters, although there are still a few boundary points that are misallocated due to the adhesion problem. The RMKNN-FDPC algorithm ranks second in performance among all the compared algorithms, second only to LF-DPC, and demonstrates its superiority in handling complex manifold structures.</p>
<p>
<xref ref-type="fig" rid="F5">Figure 5</xref> shows the clustering performance of RMKNN-FDPC compared with other benchmark algorithms on the Compound dataset. Featuring an asymmetric density distribution, the Compound dataset consists of six clusters with varying morphological characteristics. For most clustering algorithms, accurately detecting the clustering structure of this dataset is quite challenging. The DPC algorithm mistakenly identified two cluster centers in the cluster in the lower left corner. The main reason for this is that the local density calculation method failed to effectively handle the uneven density distribution situation. DPC-KNN only identified one cluster center in the two clusters in the lower left corner, but mistakenly found two cluster centers in the sparse clusters on the right side. This highlights the limitations of the local density calculation method in distinguishing datasets with uneven density distribution. FKNN-DPC, DPCSA, and LF-DPC have improved performance, but they still have a common problem: they cannot correctly identify the cluster centers of sparse clusters and mistakenly found two cluster centers in the large goose-shaped cluster in the upper right corner. This may be due to the use of a fixed k-value that cannot adapt to the local distribution of data points. Unlike conventional approaches, the RMKNN-FDPC algorithm performs exceptionally well in handling this dataset. It not only accurately identifies the cluster centers of sparse clusters but also correctly allocates the data points in sparse clusters. From the evaluation indicators, the performance of the RMKNN-FDPC algorithm is significantly superior to that of other benchmark algorithms, further verifying its superiority.</p>
<p>Twomoons is a manifold dataset composed of two semi-circles above and below. The sparsity of the two clusters is the same, but for most density-based clustering algorithms, the multi-peak problem is prone to occur. As can be seen from <xref ref-type="fig" rid="F6">Figure 6</xref>, the clustering effects of different algorithms on the Twomoons dataset are presented. Both DPC and its derived algorithms (except for the algorithm proposed in this paper) have encountered the multi-peak problem. Specifically, in the upper semi-circle cluster, two cluster centers are incorrectly identified, while no cluster center is identified in the lower semi-circle cluster. Through analysis, it can be found that this phenomenon is mainly caused by the local density calculation method failing to fully consider the local distribution characteristics of sample points. In contrast, the RMKNN-FDPC algorithm not only can accurately identify the cluster centers of the dataset, but also can correctly complete the allocation of the remaining points, thereby achieving perfect clustering of this dataset. This result fully demonstrates the superiority and robustness of the RMKNN-FDPC algorithm in processing manifold datasets.</p>
<p>The Ring dataset consists of two circular clusters. As shown in <xref ref-type="fig" rid="F7">Figure 7</xref>, FKNN-DPC, DPCSA, LF-DPC, and the proposed algorithm, by improving the local density calculation method and optimizing the strategy for distributing the remaining points, can all perform clustering perfectly. However, the original DPC algorithm has multiple peaks problem on the central circular cluster, which is mainly attributed to the limitations of its local density calculation method. Although DPC-KNN improves the local density calculation method, due to the fact that its strategy for distributing the remaining points still follows the original method of DPC, some sample points in the outer circular cluster have incorrect category allcoation. This comparative result further highlights the importance of optimizing the local density calculation and the strategy for distributing the remaining points in improving the clustering performance.</p>
</sec>
<sec id="s4-4">
<title>4.4 Experimental and analytical studies on real datasets</title>
<p>To test the clustering performance of RMKNN-FDPC, in this section, real datasets with different scales and dimensions were selected for experiments. Compared with artificially synthesized datasets, the real datasets from UCI are more complex and typically exhibits diverse feature patterns in density distribution. The experimental results can be used to evaluate the effectiveness of the algorithm proposed in this paper.</p>
<p>
<xref ref-type="table" rid="T8">Table 8</xref> presents the clustering results of six algorithms on six real datasets. The best clustering indicators have been marked in bold. As evidenced by the experimental data, the developed algorithm obtains the best clustering results on the four datasets (Libras, SCADI, Ecoli, WDBC), demonstrating its robustness and superiority under different data distributions. In the experiment on the Banknote dataset, the performance of RMKNN-FDPC is second only to DPCSA, but its ARI value still reaches 0.9368, significantly superior to the other four comparison algorithms, reflecting its stability in handling complex datasets. In the experiments conducted on the Dermatology dataset, the clustering indicators ARI, AMI and FMI of RMKNN-FDPC were second only to those of FKNN-DPC. Overall, RMKNN-FDPC performs well on most datasets, demonstrating its strong competitiveness as a clustering algorithm, mainly due to the relative mutual K-nearest neighbor local density and the fuzzy allocation strategy based on the mutual K-nearest neighbor of the remaining points.</p>
<table-wrap id="T8" position="float">
<label>TABLE 8</label>
<caption>
<p>Cluster results on Real datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithm</th>
<th align="center">ARI</th>
<th align="center">AMI</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
<th align="center">ARI</th>
<th align="center">AMI</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td colspan="5" align="center">Libras</td>
<td colspan="4" align="center">SCADI</td>
</tr>
<tr>
<td align="left">DPC</td>
<td align="center">0.2984</td>
<td align="center">0.5138</td>
<td align="center">0.3682</td>
<td align="center">0.40%</td>
<td align="center">0.5618</td>
<td align="center">0.4966</td>
<td align="center">0.6684</td>
<td align="center">2%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.3051</td>
<td align="center">0.5471</td>
<td align="center">0.3666</td>
<td align="center">1%</td>
<td align="center">0.5627</td>
<td align="center">0.4759</td>
<td align="center">0.6690</td>
<td align="center">2%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.3211</td>
<td align="center">0.5367</td>
<td align="center">0.3943</td>
<td align="center">10</td>
<td align="center">0.6191</td>
<td align="center">0.5319</td>
<td align="center">0.7122</td>
<td align="center">6</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.2683</td>
<td align="center">0.4939</td>
<td align="center">0.3572</td>
<td align="center">-</td>
<td align="center">0.5939</td>
<td align="center">0.4988</td>
<td align="center">0.6932</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">0.3437</td>
<td align="center">0.5406</td>
<td align="center">0.3996</td>
<td align="center">5</td>
<td align="center">0.6953</td>
<td align="center">0.5872</td>
<td align="center">0.7736</td>
<td align="center">6</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>0.3960</bold>
</td>
<td align="center">
<bold>0.6189</bold>
</td>
<td align="center">
<bold>0.4377</bold>
</td>
<td align="center">8</td>
<td align="center">
<bold>0.7418</bold>
</td>
<td align="center">
<bold>0.6276</bold>
</td>
<td align="center">
<bold>0.8054</bold>
</td>
<td align="center">6</td>
</tr>
<tr>
<td colspan="5" align="center">Ecoli</td>
<td colspan="4" align="center">Banknote</td>
</tr>
<tr>
<td align="left">DPC</td>
<td align="center">0.7054</td>
<td align="center">0.5816</td>
<td align="center">0.7983</td>
<td align="center">1%</td>
<td align="center">0.8008</td>
<td align="center">0.7751</td>
<td align="center">0.8968</td>
<td align="center">1%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.6913</td>
<td align="center">0.5817</td>
<td align="center">0.7939</td>
<td align="center">5%</td>
<td align="center">0.3955</td>
<td align="center">0.3575</td>
<td align="center">0.6524</td>
<td align="center">0.8%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.5914</td>
<td align="center">0.5596</td>
<td align="center">0.7071</td>
<td align="center">7</td>
<td align="center">0.7702</td>
<td align="center">0.7576</td>
<td align="center">0.8793</td>
<td align="center">20</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.4883</td>
<td align="center">0.4229</td>
<td align="center">0.6787</td>
<td align="center">-</td>
<td align="center">
<bold>0.9653</bold>
</td>
<td align="center">
<bold>0.9359</bold>
</td>
<td align="center">
<bold>0.9828</bold>
</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">0.7060</td>
<td align="center">0.5877</td>
<td align="center">0.8014</td>
<td align="center">6</td>
<td align="center">0.7702</td>
<td align="center">0.7576</td>
<td align="center">0.8793</td>
<td align="center">10</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>0.7159</bold>
</td>
<td align="center">0<bold>.6029</bold>
</td>
<td align="center">
<bold>0.8046</bold>
</td>
<td align="center">8</td>
<td align="center">0.9368</td>
<td align="center">0.8806</td>
<td align="center">0.9688</td>
<td align="center">18</td>
</tr>
<tr>
<td colspan="5" align="center">WDBC</td>
<td colspan="4" align="center">Dermatology</td>
</tr>
<tr>
<td align="left">DPC</td>
<td align="center">0.4705</td>
<td align="center">0.4146</td>
<td align="center">0.7860</td>
<td align="center">0.40%</td>
<td align="center">0.6622</td>
<td align="center">0.7167</td>
<td align="center">0.7487</td>
<td align="center">0.40%</td>
</tr>
<tr>
<td align="left">DPC-KNN</td>
<td align="center">0.4552</td>
<td align="center">0.4017</td>
<td align="center">0.7813</td>
<td align="center">1%</td>
<td align="center">0.6349</td>
<td align="center">0.7731</td>
<td align="center">0.7089</td>
<td align="center">1%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.4452</td>
<td align="center">0.3932</td>
<td align="center">0.7783</td>
<td align="center">6</td>
<td align="center">
<bold>0.8654</bold>
</td>
<td align="center">
<bold>0.8741</bold>
</td>
<td align="center">
<bold>0.8994</bold>
</td>
<td align="center">6</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.3771</td>
<td align="center">0.3361</td>
<td align="center">0.7595</td>
<td align="center">-</td>
<td align="center">0.6062</td>
<td align="center">0.7451</td>
<td align="center">0.6896</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">LF-DPC</td>
<td align="center">0.4756</td>
<td align="center">0.4189</td>
<td align="center">0.7875</td>
<td align="center">4</td>
<td align="center">0.8288</td>
<td align="center">0.8345</td>
<td align="center">0.8704</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>0.7489</bold>
</td>
<td align="center">
<bold>0.6281</bold>
</td>
<td align="center">
<bold>0.8839</bold>
</td>
<td align="center">5</td>
<td align="center">0.8452</td>
<td align="center">0.8412</td>
<td align="center">0.8813</td>
<td align="center">8</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values indicate that the corresponding algorithm achieved optimal performance on specific evaluation metric (ARI, AMI, FMI) of the real dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s4-5">
<title>4.5 Experimental and analytical studies on Olivetti Faces Dataset</title>
<p>To further verify the clustering performance of RMKNN-FDPC, two types of comparative experiments were conducted on the Olivetti Faces dataset between RMKNN-FDPC and the original DPC algorithm, as well as DPC-derived algorithms (DPCSA and FKNN-DPC). The reason for choosing the Olivetti Faces dataset is that it is a classic face dataset used for clustering tests and can provide intuitive clustering results. This dataset contains 400 face images, with 10 images in each group, and each group of data records the facial features of the same tester under different lighting, expressions, and facial details. To reduce the test cost, we selected 10 groups of face data for the experiment.</p>
<p>The first type of comparative experiment is the selection of cluster centers. In this group of experiments, there are 10 real cluster centers. The experimental results are shown in <xref ref-type="fig" rid="F8">Figure 8</xref>. We can observe that in the decision graph of the DPC algorithm, the <inline-formula id="inf95">
<mml:math id="m107">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>-values of the 10th and 11th cluster centers are relatively close, thus it tends to select 11 cluster centers, resulting in an extra cluster center. In the decision graph of the DPCSA algorithm, the <inline-formula id="inf96">
<mml:math id="m108">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>-values of the 10th, 11th, 12th, and 13th cluster centers are approximately the same, making it difficult to effectively distinguish the first 10 cluster centers, and subsequent clustering may lead to a multi-peak problem. Both the FKNN-DPC and RMKNN-FDPC can efficiently extract the first 10 real cluster centers, which is mainly attributed to the improvement of the local density calculation method in both algorithms.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Experiment on selection of clustering centers. <bold>(a)</bold> DPC, <bold>(b)</bold> DPCSA, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g008.tif">
<alt-text content-type="machine-generated">Four decision graphs labeled a to d display clustering results for the OliveT- Face dataset, comparing different methods. Each graph plots decision values against the number of data points. The methods are labeled: a. DPC, b. DPCSA, c. FKNN-DPC, d. RMKNN-FDPC, with varying numbers of centers: 11, 13, 10, and 10, respectively. Data points in each graph are represented with different colors indicating clusters.</alt-text>
</graphic>
</fig>
<p>The second type of experiment is the clustering of Olivetti Faces dataset. The experimental results are shown in <xref ref-type="fig" rid="F9">Figure 9</xref>. In this figure, the cluster centers are marked with small red squares at the upper right corner of the image. We can observe that: DPC has two sets of face data that have encountered the problem of multiple peaks, and three sets of faces have not been identified with the cluster centers; DPCSA has five sets of face data that have multiple peaks, and three sets have not identified the real cluster centers; FKNN-DPC has three sets of face data that have encountered the problem of multiple peaks, and three sets have not identified the cluster centers; our algorithm has only one instance of multiple peaks, and only one set of data has not found the cluster center.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>The clustering results of the Olivetti Faces Dataset. <bold>(a)</bold> DPC, <bold>(b)</bold> DPCSA, <bold>(c)</bold> FKNN-DPC, <bold>(d)</bold> RMKNN-FDPC.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g009.tif">
<alt-text content-type="machine-generated">Four grids labeled a. DPC, b. DPCSA, c. FKNN-DPC, and d. RMKNN-FDPC display multiple rows of faces, each grid tinted with different hues. Red squares highlight certain images across the grids, represented as a cluster center.</alt-text>
</graphic>
</fig>
<p>By examining the specific clustering indicator values presented in <xref ref-type="table" rid="T9">Table 9</xref>, it is evident that RMKNN-FDPC outperforms the other three comparison algorithms in terms of indicator values. This further validates the efficiency of the proposed algorithm. This is because our algorithm not only improves the calculation of local density but also optimizes the method of the remaining point allocation.</p>
<table-wrap id="T9" position="float">
<label>TABLE 9</label>
<caption>
<p>Cluster results on Olivetti Faces dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithm</th>
<th align="center">ARI</th>
<th align="center">AMI</th>
<th align="center">FMI</th>
<th align="center">Arg-</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DPC</td>
<td align="center">0.6329</td>
<td align="center">0.7445</td>
<td align="center">0.6741</td>
<td align="center">4%</td>
</tr>
<tr>
<td align="left">FKNN-DPC</td>
<td align="center">0.5967</td>
<td align="center">0.7477</td>
<td align="center">0.6512</td>
<td align="center">3</td>
</tr>
<tr>
<td align="left">DPCSA</td>
<td align="center">0.5610</td>
<td align="center">0.6845</td>
<td align="center">0.6002</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">RMKNN-FDPC</td>
<td align="center">
<bold>0.7972</bold>
</td>
<td align="center">
<bold>0.8691</bold>
</td>
<td align="center">
<bold>0.8198</bold>
</td>
<td align="center">4</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values indicate that the corresponding algorithm achieved optimal performance on specific evaluation metric (ARI, AMI, FMI) of the Olivetti Faces dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s4-6">
<title>4.6 Parameter analysis</title>
<p>This part focuses on examining how the single parameter k-value in the RMKNN-FDPC algorithm influences the clustering outcomes. Therefore, we selected three datasets each from synthetic and real datasets for parameter analysis. The datasets include Spiral, Compound, Ring, SCADI, Ecoli, and Dermatology. Each dataset was tested ten times with different k-values.</p>
<p>The parameter analysis experiment results of the synthetic datasets are shown in <xref ref-type="fig" rid="F10">Figure 10</xref>. We can observe that for the Spiral and Ring datasets, in experiments with different k-values, all three clustering evaluation indicators are 1, indicating that the RMKNN-FDPC algorithm is completely unaffected by the k-value and the clustering results are very stable. In the Compound dataset, the proposed algorithm is also relatively stable, except when the k-value is 8, 9, or 10. Therefore, in the parameter analysis of the synthetic datasets, RMKNN-FDPC is basically not affected by the k-value, further demonstrating the stability of the proposed algorithm.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Parameter analysis on Synthetic datasets. <bold>(a)</bold> Spiral dateset, <bold>(b)</bold> Compound dateset, <bold>(c)</bold> Ring dateset.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g010.tif">
<alt-text content-type="machine-generated">Three line graphs compare different parameters on three datasets: a) Spiral, b) Compound, and c) Ring. The graphs plot the index value against parameter K values. In the Spiral and Ring datasets, ARI, AMI, and FMI values remain constant. In the Compound dataset, ARI and FMI increase slightly, while AMI fluctuates more. Each dataset shows trends for ARI (red circles), AMI (blue triangles), and FMI (green squares).</alt-text>
</graphic>
</fig>
<p>The parameter analysis experiment results of the real datasets are presented in <xref ref-type="fig" rid="F11">Figure 11</xref>. We can observe that for the SCADI dataset, RMKNN-FDPC is relatively stable when <inline-formula id="inf97">
<mml:math id="m109">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is between four and 9, but its performance drops when <inline-formula id="inf98">
<mml:math id="m110">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is greater than 9. In the parameter analysis of the Ecoli dataset, the clustering effect of the algorithm also significantly declines when <inline-formula id="inf99">
<mml:math id="m111">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is greater than 9. For the Dermatology dataset, RMKNN-FDPC performs steadily except when <inline-formula id="inf100">
<mml:math id="m112">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> equals 9. Therefore, in the parameter analysis of the real datasets, the clustering performance of RMKNN-FDPC is affected by some k-values.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Parameter analysis on Real datasets. <bold>(a)</bold> SCADI dateset, <bold>(b)</bold> Ecoli dateset, <bold>(c)</bold> Dermatology dateset.</p>
</caption>
<graphic xlink:href="fphy-13-1623161-g011.tif">
<alt-text content-type="machine-generated">Three line graphs compare different parameters on datasets, labeled a, b, and c. Graph a, SCADI dataset, b, Ecoli dataset, and c, Dermatology dataset, display index values against parameter K values. Each graph includes red circles for ARI, green squares for AMI, and blue triangles for FMI. The SCADI and Ecoli datasets show a decline as K increases, while the Dermatology dataset fluctuates.</alt-text>
</graphic>
</fig>
</sec>
<sec id="s4-7">
<title>4.7 Run time analysis</title>
<p>The main purpose of this section is to compare the running time of the proposed RMKNN-FDPC algorithm with FKNN-DPC, DPCSA, and LF-DPC. The reason for choosing these algorithms is that all three compared algorithms have improved local density and optimized remaining point allocation strategies.</p>
<p>
<xref ref-type="table" rid="T10">Table 10</xref> presents the comparison results of the running time of four algorithms on different datasets. The running time is the average of each algorithm running four times and rounded to four decimal places, measured in seconds. We can see that the running time of DPCSA is relatively low compared to the other three algorithms, mainly because this algorithm uses a fixed k-value to calculate local density and allocate remaining points. The running time of FKNN-DPC and LF-DPC is generally the same, because the execution principles of the two algorithms are very similar. Although our algorithm has slightly higher running time on most datasets compared to other algorithms, the running time on some datasets is slightly lower than FKNN-DPC and LF-DPC. This is because RMKNN-FDPC needs to calculate mutual K-nearest neighbors, which increases some additional overhead when calculating local density and allocating data points. But overall, the running time of this algorithm is at the same level as FKNN-DPC and LF-DPC.</p>
<table-wrap id="T10" position="float">
<label>TABLE 10</label>
<caption>
<p>Comparison results of running time.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">FKNN-DPC</th>
<th align="center">DPCSA</th>
<th align="center">LF-DPC</th>
<th align="center">RMKNN-FDPC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">U1</td>
<td align="center">1.4089</td>
<td align="center">0.3096</td>
<td align="center">1.5135</td>
<td align="center">1.0380</td>
</tr>
<tr>
<td align="left">U2</td>
<td align="center">0.7829</td>
<td align="center">0.2044</td>
<td align="center">0.9597</td>
<td align="center">0.9487</td>
</tr>
<tr>
<td align="left">Jain</td>
<td align="center">0.1954</td>
<td align="center">0.1287</td>
<td align="center">0.2077</td>
<td align="center">0.2511</td>
</tr>
<tr>
<td align="left">Spiral</td>
<td align="center">0.2169</td>
<td align="center">0.1297</td>
<td align="center">0.2243</td>
<td align="center">0.2896</td>
</tr>
<tr>
<td align="left">Pathbased</td>
<td align="center">0.2094</td>
<td align="center">0.1235</td>
<td align="center">0.2202</td>
<td align="center">0.2562</td>
</tr>
<tr>
<td align="left">Compound</td>
<td align="center">0.2016</td>
<td align="center">0.1438</td>
<td align="center">0.2077</td>
<td align="center">0.2072</td>
</tr>
<tr>
<td align="left">Twomoons</td>
<td align="center">3.5474</td>
<td align="center">0.2630</td>
<td align="center">2.5862</td>
<td align="center">2.8402</td>
</tr>
<tr>
<td align="left">Ring</td>
<td align="center">1.4004</td>
<td align="center">0.1613</td>
<td align="center">0.7398</td>
<td align="center">1.9819</td>
</tr>
<tr>
<td align="left">Libras</td>
<td align="center">0.2108</td>
<td align="center">0.1445</td>
<td align="center">0.3529</td>
<td align="center">0.3567</td>
</tr>
<tr>
<td align="left">SCADI</td>
<td align="center">0.1408</td>
<td align="center">0.1252</td>
<td align="center">0.1511</td>
<td align="center">0.2072</td>
</tr>
<tr>
<td align="left">Ecoli</td>
<td align="center">0.2078</td>
<td align="center">0.1388</td>
<td align="center">0.2747</td>
<td align="center">0.2454</td>
</tr>
<tr>
<td align="left">Banknote</td>
<td align="center">4.1260</td>
<td align="center">0.3005</td>
<td align="center">4.6558</td>
<td align="center">4.7752</td>
</tr>
<tr>
<td align="left">WDBC</td>
<td align="center">0.4724</td>
<td align="center">0.1510</td>
<td align="center">0.8468</td>
<td align="center">1.0646</td>
</tr>
<tr>
<td align="left">Dermatology</td>
<td align="center">0.2349</td>
<td align="center">0.1320</td>
<td align="center">0.2465</td>
<td align="center">0.3177</td>
</tr>
<tr>
<td align="left">Olivetti Faces</td>
<td align="center">0.2244</td>
<td align="center">0.1754</td>
<td align="center">0.2287</td>
<td align="center">0.2980</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec sec-type="conclusion" id="s5">
<title>5 Conclusion</title>
<p>This paper presents an improved density peak clustering algorithm named RMKNN-FDPC, which can be applied to network intrusion detection. The aim of this algorithm is to address the limitations of traditional density peak clustering algorithms when dealing with complex datasets. By introducing the concept of relative mutual K-nearest neighbor, RMKNN-FDPC can more accurately depict the local density distribution of data points, thereby effectively identifying data structures with randomness and complexity. Moreover, the algorithm combines mutual K-nearest neighbor to optimize the remaining point allocation strategy, further enhancing the accuracy and robustness of the clustering results. Experimental results show that RMKNN-FDPC performs well on the KDD-CUP-1999 dataset, the synthetic datasets, the real datasets and the Olivetti Faces dataset, especially in handling uneven density distribution, non-spherical clusters, and manifold structures. Its performance is significantly superior to traditional DPC and its derivative algorithms when dealing with these issues. Overall, RMKNN-FDPC not only inherits the simplicity and efficiency of DPC algorithm but also significantly improves the clustering effect through the improvement of local density calculation and allocation strategies, providing an effective solution for clustering problems of complex datasets. In the future, we will optimize the RMKNN-FDPC algorithm. Currently, its performance depends on the unique K-nearest neighbor parameter, so we will explore density peak clustering methods without parameters to reduce the sensitivity of the algorithm to parameter selection. At the same time, we will conduct in-depth research on the potential application of this algorithm in the real-time network intrusion detection scenario of CPSS.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>CR: Writing &#x2013; review and editing, Writing &#x2013; original draft. CW: Writing &#x2013; review and editing. YY: Writing &#x2013; review and editing. WY: Writing &#x2013; review and editing. RG: Writing &#x2013; review and editing.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>The author(s) declare that financial support was received for the research and/or publication of this article. This work is supported by The High-Level Departure Project of Yibin University (Grant No. 2023QH02 and No. 2020QH08), Science and Technology Project of Sichuan Province (Grant No. 2024YFHZ0022 and No.2024ZYD0089).</p>
</sec>
<ack>
<p>The authors express their gratitude to the researchers who provided the source codes of the comparative algorithms and the experimental data for this paper.</p>
</ack>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s10">
<title>Generative AI statement</title>
<p>The author(s) declare that no Generative AI was used in the creation of this manuscript.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>FR</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kuo</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Cyber-physical-social systems: a state-of-the-art survey, challenges and opportunities</article-title>. <source>IEEE Commun Surv and Tutorials</source> (<year>2019</year>) <volume>22</volume>:<fpage>389</fpage>&#x2013;<lpage>425</lpage>. <pub-id pub-id-type="doi">10.1109/COMST.2019.2959013</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Islam</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Javeed</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Saeed</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Jolfaei</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Islam</surname>
<given-names>AN</given-names>
</name>
</person-group>. <article-title>Generative ai and cognitive computing-driven intrusion detection system in industrial cps</article-title>. <source>Cogn Comput</source> (<year>2024</year>) <volume>16</volume>:<fpage>2611</fpage>&#x2013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1007/s12559-024-10309-w</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>A new approach to intrusion detection using artificial neural networks and fuzzy clustering</article-title>. <source>Expert Syst Appl</source> (<year>2010</year>) <volume>37</volume>:<fpage>6225</fpage>&#x2013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2010.02.102</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frey</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Dueck</surname>
<given-names>D</given-names>
</name>
</person-group>. <article-title>Clustering by passing messages between data points</article-title>. <source>Science</source> (<year>2007</year>) <volume>315</volume>:<fpage>972</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1126/science.1136800</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mu</surname>
<given-names>W</given-names>
</name>
</person-group>. <article-title>Customer segmentation using k-means clustering and the adaptive particle swarm optimization algorithm</article-title>. <source>Appl Soft Comput</source> (<year>2021</year>) <volume>113</volume>:<fpage>107924</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2021.107924</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carcillo</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Le Borgne</surname>
<given-names>YA</given-names>
</name>
<name>
<surname>Caelen</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Kessaci</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Oble</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Bontempi</surname>
<given-names>G</given-names>
</name>
</person-group>. <article-title>Combining unsupervised and supervised learning in credit card fraud detection</article-title>. <source>Inf Sci Int J</source> (<year>2021</year>) <volume>557</volume>:<fpage>317</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2019.05.042</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Den&#x153;ux</surname>
<given-names>T</given-names>
</name>
</person-group>. <article-title>Application of belief functions to medical image segmentation: a review</article-title>. <source>Inf fusion</source> (<year>2023</year>) <volume>91</volume>:<fpage>737</fpage>&#x2013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1016/j.inffus.2022.11.008</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Niu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Building an effective intrusion detection system using the modified density peak clustering algorithm and deep belief networks</article-title>. <source>Appl Sci</source> (<year>2019</year>) <volume>9</volume>:<fpage>238</fpage>. <pub-id pub-id-type="doi">10.3390/app9020238</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodriguez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Laio</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>Clustering by fast search and find of density peaks</article-title>. <source>Science</source> (<year>2014</year>) <volume>344</volume>:<fpage>1492</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1126/science.1242072</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chao</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>Density peak clustering based on relative density under progressive allocation strategy</article-title>. <source>Math Comput Appl</source> (<year>2022</year>) <volume>27</volume>:<fpage>84</fpage>. <pub-id pub-id-type="doi">10.3390/mca27050084</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Q</given-names>
</name>
</person-group>. <article-title>Effective density peaks clustering algorithm based on the layered k-nearest neighbors and subcluster merging</article-title>. <source>IEEE Access</source> (<year>2020</year>) <volume>8</volume>:<fpage>123449</fpage>&#x2013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3006069</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>Density peaks clustering based on circular partition and grid similarity</article-title>. <source>Concurrency Comput Pract Experience</source> (<year>2019</year>) <volume>32</volume>:<fpage>e5567</fpage>. <pub-id pub-id-type="doi">10.1002/cpe.5567</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Du</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>Study on density peaks clustering based on k-nearest neighbors and principal component analysis</article-title>. <source>Knowledge-Based Syst</source> (<year>2016</year>) <volume>99</volume>:<fpage>135</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2016.02.001</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Grant</surname>
<given-names>PW</given-names>
</name>
</person-group>. <article-title>Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors</article-title>. <source>Inf Sci</source> (<year>2016</year>) <volume>354</volume>:<fpage>19</fpage>&#x2013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2016.03.011</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>X</given-names>
</name>
</person-group>. <article-title>Shared-nearest-neighbor-based clustering by fast search and find of density peaks</article-title>. <source>Inf Sci</source> (<year>2018</year>) <volume>450</volume>:<fpage>200</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2018.03.031</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Comparative density peaks clustering</article-title>. <source>Expert Syst Appl</source> (<year>2018</year>) <volume>95</volume>:<fpage>236</fpage>&#x2013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2017.11.020</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parmar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>A-H</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>J</given-names>
</name>
<etal/>
</person-group> <article-title>Redpc: a residual error-based density peak clustering algorithm</article-title>. <source>Neurocomputing</source> (<year>2019</year>) <volume>348</volume>:<fpage>82</fpage>&#x2013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2018.06.087</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>J-S</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>I</given-names>
</name>
</person-group>. <article-title>Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets</article-title>. <source>Pattern Recognition</source> (<year>2023</year>) <volume>139</volume>:<fpage>109406</fpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2023.109406</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Sfknn-dpc: standard deviation weighted distance based density peak clustering algorithm</article-title>. <source>Inf Sci</source> (<year>2024</year>) <volume>653</volume>:<fpage>119788</fpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2023.119788</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Ann-dpc: density peak clustering by finding the adaptive nearest neighbors</article-title>. <source>Knowledge-Based Syst</source> (<year>2024</year>) <volume>294</volume>:<fpage>111748</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2024.111748</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Relative density-based clustering algorithm for identifying diverse density clusters effectively</article-title>. <source>Neural Comput Appl</source> (<year>2021</year>) <volume>33</volume>:<fpage>10141</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1007/s00521-021-05777-2</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hongxiang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Genxiu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Zhaohui</surname>
<given-names>W</given-names>
</name>
</person-group>. <article-title>Density peaks clustering algorithm based on shared neighbor degree and probability assignment</article-title>. <source>J Comput Eng and Appl</source> (<year>2024</year>) <volume>60</volume>. <pub-id pub-id-type="doi">10.3778/j.issn.1002-83312305-0502</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hassan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>C</given-names>
</name>
<etal/>
</person-group> <article-title>Density peak clustering algorithms: a review on the decade 2014&#x2013;2023</article-title>. <source>Expert Syst Appl</source> (<year>2024</year>) <volume>238</volume>:<fpage>121860</fpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2023.121860</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fr&#xe4;nti</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Rezaei</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Q</given-names>
</name>
</person-group>. <article-title>Centroid index: cluster level similarity measure</article-title>. <source>Pattern Recognition</source> (<year>2014</year>) <volume>47</volume>:<fpage>3034</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2014.03.017</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vinh</surname>
<given-names>NX</given-names>
</name>
<name>
<surname>Epps</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance</article-title>. <source>J Machine Learn Res</source> (<year>2010</year>) <volume>11</volume>:<fpage>2837</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.5555/1756006.1953024</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boudane</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Berrichi</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>Gabriel graph-based connectivity and density for internal validity of clustering</article-title>. <source>Prog Artif Intelligence</source> (<year>2020</year>) <volume>9</volume>:<fpage>221</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1007/s13748-020-00209-z</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>B-y.</given-names>
</name>
<name>
<surname>Xin</surname>
<given-names>X-w.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Y-y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q</given-names>
</name>
</person-group>. <article-title>A novel tree structure-based multi-prototype clustering algorithm</article-title>. <source>J King Saud University-Computer Inf Sci</source> (<year>2024</year>) <volume>36</volume>:<fpage>102002</fpage>. <pub-id pub-id-type="doi">10.1016/j.jksuci.2024.102002</pub-id>
</citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Yao</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Density peaks clustering based on weighted local density sequence and nearest neighbor assignment</article-title>. <source>IEEE Access</source> (<year>2019</year>) <volume>7</volume>:<fpage>34301</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2904254</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Density peaks clustering based on local fair density and fuzzy k-nearest neighbors membership allocation strategy</article-title>. <source>J Intell and Fuzzy Syst</source> (<year>2022</year>) <volume>43</volume>:<fpage>21</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.3233/JIFS-202449</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>