<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2023.1092143</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Jia</surname><given-names>Xuan</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/2073235/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Yin</surname><given-names>ZhiXiang</given-names></name><xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Peng</surname><given-names>Yu</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/2055189/overview"/>
</contrib>
</contrib-group>
<aff><institution>School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science</institution>, <addr-line>Shanghai</addr-line>, <country>China</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by">
<p>Edited by: Lihong Peng, Hunan University of Technology, China</p>
</fn>
<fn id="fn0002" fn-type="edited-by">
<p>Reviewed by: Guohua Huang, Shaoyang University, China; Zhen Tang, Shanghai Jiao Tong University, China</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: ZhiXiang Yin, &#x02709; <email>zxyin66@163.com</email></corresp>
<fn id="fn0003" fn-type="other">
<p>This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>01</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1092143</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Jia, Yin and Peng.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Jia, Yin and Peng</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Male infertility has always been one of the important factors affecting the infertility of couples of gestational age. The reasons that affect male infertility includes living habits, hereditary factors, etc. Identifying the genetic causes of male infertility can help us understand the biology of male infertility, as well as the diagnosis of genetic testing and the determination of clinical treatment options. While current research has made significant progress in the genes that cause sperm defects in men, genetic studies of sperm content defects are still lacking. This article is based on a dataset of gene expression data on the X chromosome in patients with azoospermia, mild and severe oligospermia. Due to the difference in the degree of disease between patients and the possible difference in genetic causes, common classical clustering methods such as k-means, hierarchical clustering, etc. cannot effectively identify samples (realize simultaneous clustering of samples and features). In this paper, we use machine learning and various statistical methods such as hypergeometric distribution, Gibbs sampling, Fisher test, etc. and genes the interaction network for cluster analysis of gene expression data of male infertility patients has certain advantages compared with existing methods. The cluster results were identified by differential co-expression analysis of gene expression data in male infertility patients, and the model recognition clusters were analyzed by multiple gene enrichment methods, showing different degrees of enrichment in various enzyme activities, cancer, virus-related, ATP and ADP production, and other pathways. At the same time, as this paper is an unsupervised analysis of genetic factors of male infertility patients, we constructed a simulated data set, in which the clustering results have been determined, which can be used to measure the effect of discriminant model recognition. Through comparison, it finds that the proposed model has a better identification effect.</p>
</abstract>
<kwd-group>
<kwd>male infertility</kwd>
<kwd>hypergeometric distribution</kwd>
<kwd>Fisher test</kwd>
<kwd>Gibbs sampling</kwd>
<kwd>machine learning</kwd>
<kwd>gene interaction network</kwd>
<kwd>HPV</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="3"/>
<equation-count count="4"/>
<ref-count count="50"/>
<page-count count="8"/>
<word-count count="6511"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>For a long time, infertility has been a difficult problem for many couples of gestational age. With the increase of life pressure, infertility is increasing every year. About 15% of gestational age couples suffer from infertility symptoms of varying degrees, of which about 50% are caused by male infertility (<xref ref-type="bibr" rid="ref10">Dada et al., 2003</xref>). About 7% of men in the general population suffer from different degrees of infertility. The causes of male infertility are related to many influencing factors, including different diseases, genetics, living habits and other factors that may cause or interact to cause male infertility. Although men with this disorder cannot pass on their genetic information naturally, genetic factors can still contribute to male infertility. In approximately 15% of infertile men a genetic defect is most likely the underlying cause of the pathology (<xref ref-type="bibr" rid="ref43">Tournaye et al., 2017</xref>; <xref ref-type="bibr" rid="ref27">Krausz and Riera-Escamilla, 2018</xref>). For example, autosomal recessive or X-linked male infertility mutations transmitted by normal parents can cause infertility (<xref ref-type="bibr" rid="ref8">Chill&#x00F3;n et al., 1995</xref>; <xref ref-type="bibr" rid="ref46">Yatsenko et al., 2015</xref>). Genetic causes have also been found to have an important role in severe male infertility, such as severe oligospermia (&#x003C;5 million sperm cells per milliliter) or azoospermia (azoospermia in ejaculation; <xref ref-type="bibr" rid="ref32">Lopes et al., 2013</xref>; <xref ref-type="bibr" rid="ref27">Krausz and Riera-Escamilla, 2018</xref>). Identifying the genes responsible for male infertility is important for increasing our understanding of the biology of the disease and for genetic testing for diagnosis and clinical treatment. Genes such as NLRP3, BRD7 and others have been shown to affect male fertility (<xref ref-type="bibr" rid="ref4">Aquila et al., 2004</xref>; <xref ref-type="bibr" rid="ref44">Wang et al., 2016</xref>; <xref ref-type="bibr" rid="ref3">Antonuccio et al., 2021</xref>). At the same time, with the rapid development of genetics, more than 3,000 genetic diseases have been discovered, of which about 250 are only found in men, and women have no or little disease. Because women have two X chromosomes, the pathogenic gene on one X chromosome can often be masked by the normal gene on the other X chromosome, so they do not show symptoms. Men, on the other hand, have only one X chromosome. If there is a disease-causing gene on it, there is no corresponding normal gene to cover up, resulting in the disease. In recent years, with the deepening of research, there are about 521 genes that cause male infertility in different forms (<xref ref-type="bibr" rid="ref45">Xavier et al., 2021</xref>), many of which are related to the X chromosome, such as mouse androgen receptor gene mutation, through chain reaction mapping The X chromosome leads to infertility in mice (<xref ref-type="bibr" rid="ref33">Lyon et al., 1970</xref>), and there is one more X chromosome in males, that is, the sex chromosome is XXY (<xref ref-type="bibr" rid="ref23">Jacobs and Strong, 1959</xref>) and so on.</p>
<p>Many scholars have carried out various experimental methods to study the genetic causes of male infertility. Through RNA interference or knockout experiments, the gene cannot be expressed normally, and whether the target abnormality occurs in cells or individuals is observed, and whether the gene is related to the cause of the disease is detected. However, experimental methods are generally time-consuming, labor-intensive, and expensive, and experimental methods are generally designed in a targeted manner on the premise that the experimenter obtains genes that may have basic interference. Technological advances and methodological developments in genomics are critical for identifying genetic factors in male infertility.</p>
<p>In this paper, we use a data set covering all gene expression levels of the male X chromosome in the GEO database, the Gene Expression Omnibus (GEO), a public database that contains 659,203 gene sample data from 9,528 different platforms (<xref ref-type="bibr" rid="ref41">Ron et al., 2002</xref>). And based on a variety of statistical methods and machine learning analysis of gene expression data of male infertility patients, to identify groups of interacting gene clusters that may contribute to male infertility of various phenotypes in various ways. Common hierarchical clustering, k-means and other clustering algorithms are clustering under the assumption that all samples have certain characteristics, and the cluster data of the identified clusters have the same characteristics in all samples. However, the expression of gene data is affected by different sampling individuals, different tissues of the same individual, etc., resulting in different expression of measured gene data in different samples, and common clustering algorithms cannot meet the identification of differential gene expression modules (implementation basis Partial samples of gene expression data to partition gene sample data). For the identification of differentially co-expressed modules, a biclustering algorithm can be used to screen functionally related genes, genes involved in the same pathway, and genes affected by the same drug or a pathological condition. The biclustering algorithm was first proposed in <xref ref-type="bibr" rid="ref19">Hartigan (1972)</xref>, is a two-dimensional data mining technique that allows simultaneous clustering of rows (representing genes) and columns (representing samples/conditions) in a gene expression matrix. Developments continued in the following decades, with (<xref ref-type="bibr" rid="ref7">Cheng and Church, 2000</xref>; <xref ref-type="bibr" rid="ref29">Lazzeroni and Owen, 2000</xref>; <xref ref-type="bibr" rid="ref5">Bergmann et al., 2003</xref>; <xref ref-type="bibr" rid="ref25">Kluger et al., 2003</xref>; <xref ref-type="bibr" rid="ref9">Chiu et al., 2004</xref>; <xref ref-type="bibr" rid="ref40">Preli&#x0107; et al., 2006</xref>; <xref ref-type="bibr" rid="ref12">Dhollander et al., 2007</xref>; <xref ref-type="bibr" rid="ref17">Gu and Liu, 2008</xref>; <xref ref-type="bibr" rid="ref30">Li et al., 2009</xref>; <xref ref-type="bibr" rid="ref22">Hochreiter et al., 2010</xref>; <xref ref-type="bibr" rid="ref34">Madeira et al., 2010</xref>; <xref ref-type="bibr" rid="ref35">Medina et al., 2010</xref>; <xref ref-type="bibr" rid="ref6">Chen et al., 2011</xref>; <xref ref-type="bibr" rid="ref11">De Smet and Marchal, 2011</xref>; <xref ref-type="bibr" rid="ref49">Zhao et al., 2011</xref>; <xref ref-type="bibr" rid="ref50">Zhou et al., 2012</xref>; <xref ref-type="bibr" rid="ref16">Goncalves and Madeira, 2014</xref>; <xref ref-type="bibr" rid="ref20">Henriques and Madeira, 2016a</xref>,<xref ref-type="bibr" rid="ref21">b</xref>; <xref ref-type="bibr" rid="ref2">Alzahrani et al., 2017</xref>; <xref ref-type="bibr" rid="ref18">Guo et al., 2021</xref>) being articles on different clustering algorithms. Among them, BCPlaid (<xref ref-type="bibr" rid="ref29">Lazzeroni and Owen, 2000</xref>), QUBIC (<xref ref-type="bibr" rid="ref30">Li et al., 2009</xref>), C&#x0026;C (<xref ref-type="bibr" rid="ref7">Cheng and Church, 2000</xref>), FABIA (<xref ref-type="bibr" rid="ref22">Hochreiter et al., 2010</xref>) are the more popular biclustering algorithms. Genomics data analysis clustering using machine learning, deep learning, etc., for identifying cell subpopulations, genomic analysis, etc.(<xref ref-type="bibr" rid="ref24">Jiang et al., 2020</xref>; <xref ref-type="bibr" rid="ref28">Lazareva et al., 2020</xref>; <xref ref-type="bibr" rid="ref36">Peng et al., 2020</xref>; <xref ref-type="bibr" rid="ref14">Gerniers et al., 2021</xref>; <xref ref-type="bibr" rid="ref38">Peng et al., 2021</xref>; <xref ref-type="bibr" rid="ref47">Yi et al., 2021</xref>; <xref ref-type="bibr" rid="ref37">Peng et al., 2022</xref>; <xref ref-type="bibr" rid="ref48">Zhai et al., 2022</xref>). Analysis of bronchoalveolar immune cells in COVID-19 patients based on genetic data (<xref ref-type="bibr" rid="ref31">Liao et al., 2020</xref>). By processing the GSE37948 data set (<xref ref-type="bibr" rid="ref26">Krausz et al., 2012</xref>), which contains expression levels of gene data on the X chromosome in testicular tissue from patients with varying degrees of infertility, we identified 19 distinct double clusters, indicating the existence of multiple double clusters identified in this paper there are multiple enriched pathways and there are functional and organizational correlations between the enriched pathways. And the performance of the method is verified using a data set similar to the real gene expression level.</p>
</sec>
<sec id="sec2" sec-type="materials|methods">
<label>2.</label>
<title>Materials and methods</title>
<sec id="sec3">
<label>2.1.</label>
<title>Methods</title>
<p>Rank-rank hyper geometric overlap (RRHO; <xref ref-type="bibr" rid="ref39">Plaisier et al., 2010</xref>) uses unsupervised learning to sort the gene expression profile data of two samples of different categories, and uses hyper geometric distribution to iteratively calculate the <italic>p</italic>-values of all combinations to find the optimal overlap gene combination. In this paper, the sample expression data of two different genes is brought into the RRHO method to find the optimal overlapping sample set, and the SNR value of the signal-to-noise ratio of the sample gene set is calculated to determine whether the clusters have differential expression. For a single gene in the sample set, the SNR value is defined as:</p>
<disp-formula id="E1">
<mml:math id="M1">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">g,</mml:mi>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>&#x03BC;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>&#x03BC;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo stretchy="true">&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>&#x03C3;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x03C3;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo stretchy="true">&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p><inline-formula>
<mml:math id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03BC;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math id="M3">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03BC;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo stretchy="true">&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the mean in the delimited sample set <inline-formula>
<mml:math id="M4">
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:math>
</inline-formula> and the mean in the data outside the sample set, respectively.<inline-formula>
<mml:math id="M5">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03C3;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math id="M6">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03C3;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>represent the standard deviation of the data in the corresponding set. The overall signal-to-noise ratio of the cluster is the average of the signal-to-noise ratios of individual genes in the sample set.</p>
<p>If the signal-to-noise ratio value of the identified sample and gene set is greater than the specified threshold, the set will be retained, and the corresponding genome is considered to have a relationship with the gene data. If one gene cannot form a relationship with other genes in the data, it will be discarded in the subsequent processing, so as to realize the dimensionality reduction processing of the gene data. However, since the genes known to be associated with disease from <xref ref-type="bibr" rid="ref15">Ghiassian et al. (2015)</xref> form a compact but not tightly connected subgraph on the PPI, this paper does not loop through all the genes in the data set, but adds a gene interaction network to the data processing. Using the String database, there is known and predicted gene-protein interaction networks in the database. In this paper, the genes involved in the data set are searched for the interaction network, and the isolated gene points are discarded. The genes existing in the gene network are combined in pairs, and the hierarchical clustering method is used for preliminary clustering to assist in determining the default set signal-to-noise ratio threshold. The set of gene samples constructed by preliminary clustering is calculated as the average of the signal-to-noise ratio values in all sets, and 1/2 of this mean is used as the threshold. When the signal-to-noise ratio of the gene sample set constructed by the RRHO method is used. If the ratio is greater than this threshold, the gene is retained and a new set of double clusters is obtained. Otherwise, in the gene network, the connected edges are discarded. Due to the large number of genes, a partial gene network is shown in <xref rid="fig1" ref-type="fig">Figure 1</xref>. <xref rid="fig2" ref-type="fig">Figure 2</xref> briefly depicts the model&#x2019;s approach. The interrelation data of all genes are presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 1</xref>.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Interaction network of some genes in GDS37948.</p>
</caption>
<graphic xlink:href="fmicb-14-1092143-g001.tif"/>
</fig>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Introduction to the model process.</p>
</caption>
<graphic xlink:href="fmicb-14-1092143-g002.tif"/>
</fig>
<p>Since only gene pairs and their corresponding sample sets can be obtained after using the RRHO method, Gibbs sampling (<xref ref-type="bibr" rid="ref42">Sheng et al., 2003</xref>) is used for the data processed in the first step to make assumptions about the distribution of gene sample data to merge gene clusters. The statistical assumptions for sampling are as following:</p>
<disp-formula id="E2">
<mml:math id="M7">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">ji</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>&#x03B8;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">ic</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:msub>
<mml:mo>&#x223C;</mml:mo>
<mml:mi mathvariant="normal">Bernoulli</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">ji</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>&#x03B8;</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">is</mml:mi>
</mml:mrow>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="E3">
<mml:math id="M8">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mo>&#x223C;</mml:mo>
<mml:mi mathvariant="normal">Categorical</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mi mathvariant="normal">m</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:msub>
<mml:mi>&#x03B8;</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">ic</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x223C;</mml:mo>
<mml:mi mathvariant="normal">Beta</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x03B1;</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mo>&#x223C;</mml:mo>
<mml:mi mathvariant="normal">Dirichlet</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x03B2;</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi mathvariant="normal">K</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p><inline-formula>
<mml:math id="M9">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> represents the gene, <inline-formula>
<mml:math id="M10">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> represents the sample, if the association exists after step 1, <inline-formula>
<mml:math id="M11">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">ji</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is assigned 1 else it is 0. <inline-formula>
<mml:math id="M12">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents which module the gene edge <inline-formula>
<mml:math id="M13">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> belongs to, through the calculation of the edge transition probability in Gibbs sampling:</p>
<disp-formula id="E4">
<mml:math id="M14">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi mathvariant="normal">X,</mml:mi>
<mml:mspace width="0.25em"/>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi mathvariant="normal">; </mml:mi>
<mml:mi>&#x03B1;</mml:mi>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:mi>&#x03B2;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x221D;</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mo>&#x00D7;</mml:mo>
<mml:msub>
<mml:mo>&#x220F;</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">ji</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x03B1;</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">l</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">li</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x03B1;</mml:mi>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">l</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">k,l</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mspace width="0.25em"/>
<mml:mo>&#x00D7;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">l</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">k,l</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>&#x03B2;</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi mathvariant="normal">K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03B2;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x03B1;</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">l</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">li</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x03B1;</mml:mi>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">l</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">k,l</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>Among them, k is set to the number of clusters retained after the calculation and processing of the RRHO method. Finally, the statistical part of Gibbs sampling assumes that the data has a certain prior distribution involving parameter &#x03B1; and &#x03B2;, but because the genetic data lacks the corresponding statistical research foundation, the parameter &#x03B1; and &#x03B2; are set as hyperparameters. At the end of data processing, Fisher&#x2019;s exact test is used to process the calculated set data again, and the sample data in the two clusters are processed to calculate its value of <italic>p</italic>. The set threshold is used to determine whether there is a significant difference between the two sets, and the genes in the two sample sets without significant differences are merged, and the sample data of the corresponding gene is taken out and brought into the hierarchical clustering, and the number of clusters is 2. Since a gene is up-regulated in half of the samples, it will be differentially expressed in the remaining part, so, we limit samples in clusters to less than 55% of the total number of samples in the data set as a difference in the gene set. At the same time, in order to limit that the cluster is differentially expressed in the whole data, the SNR value of the newly formed cluster is required to be greater than the threshold value. Otherwise it will not be merged. All the identified clusters are merged cyclically until no new clusters are generated.</p>
</sec>
<sec id="sec4">
<label>2.2.</label>
<title>Datasets</title>
<sec id="sec5">
<label>2.2.1.</label>
<title>Male infertility gene expression data</title>
<p>First, the corresponding gene expression data were obtained from the micro array gene expression database. In this paper, the GSE37948 (<xref ref-type="bibr" rid="ref26">Krausz et al., 2012</xref>) gene expression data set was selected. This data set contains relevant gene expression data of 96 patients with different degrees of infertility, including 74 cases of azoospermia, 6 cases of mild oligozoospermia, and 16 cases of severe oligozoospermia. Excluding known causes of impairing spermatogenesis in patients, gene expression data identification was performed using testicular tissue from 47 men, and KNN nearest neighbor algorithm was used to impute missing values in gene expression profile data while normalizing data for each gene, to remove the effect of different units on the data. The GSE37948 data set contains 1855 genes and gene-identified expression data from 200 male sperm samples. The genes identified therein to cover the entire X chromosome. The related gene network based on the GSE37948 data set was extracted from the String database. Specific gene interaction data are shown in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Table</xref>: Interrelation data among genes.</p>
</sec>
<sec id="sec6">
<label>2.2.2.</label>
<title>Synthetic datasets</title>
<p>Since the method in this paper belongs to unsupervised learning, there are no standard results for the study of male infertility-related genes, so we constructed simulation data similar in structure to GSE39748. The GSE37948 data set has a total of 1,855 genes and 200 samples, but the size of the double-cluster deletion is unknown. To this end, simulated data of 20 known differentially expressed modules were constructed with gene and sample dimensions of 2,000 and 200, respectively. Based on previous research (<xref ref-type="bibr" rid="ref40">Preli&#x0107; et al., 2006</xref>; <xref ref-type="bibr" rid="ref13">Eren et al., 2013</xref>), we can generate simulation data according to the following rules: Genes and sample numbers are sampled from (100, 50, 20, 10, 5) and (100, 50, 20, 10) respectively, the data within the cluster is sampled from N (2, 1), and the rest of the data are sampled from N (0, 1) and allow the intersection of different clusters. Simulated data is used to determine hyperparameters and statistics are used to evaluate clustering results. Since the gene interaction network graph used in the gene data processing corresponds to the gene interaction graph with certain connectivity, we correspondingly construct the connected network graph according to the determined clustering data. Studies have shown that in the gene interaction network, genes related to disease can form compact linker maps (<xref ref-type="bibr" rid="ref15">Ghiassian et al., 2015</xref>), so we use the method proposed in <xref ref-type="bibr" rid="ref1">Bollob&#x00E1;s et al. (2003)</xref> to construct the network diagram, which can construct a reasonable gene network connection map according to the clustering modules in the expression data.</p>
</sec>
</sec>
</sec>
<sec id="sec7" sec-type="results">
<label>3.</label>
<title>Results</title>
<sec id="sec8">
<label>3.1.</label>
<title>Experimental results of male infertility-related gene expression data</title>
<p>By processing the GSE37948 data set, which contains expression levels of gene data on the X chromosome in testicular tissue from patients with azoospermia, mild and severe oligozoospermia. We identified 19 distinct double clusters. There are multiple enriched pathways and there are functional and organizational correlations between the enriched pathways. The hypergeometric test involved in the RRHO method, in which the significance index is adjusted from the set (0.01, 0.05), and the parameter &#x03B1; and &#x03B2;/k involved in the statistical hypothesis in Gibbs sampling are adjusted from the set (5.0, 1.0, 0.5, 0.1) and (100, 1.0, 0.01), respectively. According to the recognition effect of the model on the simulated data set, the final parameters <italic>p</italic>&#x2009;=&#x2009;0.01, <italic>&#x03B1;</italic>&#x2009;=&#x2009;0.5, and &#x03B2;/k&#x2009;=&#x2009;1.0 were determined. The data processed based on the GSE39748 data is brought into the model to identify the gene sample module, and the results were analyzed using a variety of biometric indicators Includes: Disease (OMIN_DISEASE, UP_KW_DISEASE), Functional_Annotations (COG_ONTOLO, UP_KW_BIOLOGICAL_PROCESS, UP_KW_CELLOULAR_COMPONENT, UP_KW_MOLECULAR_FUNCTION, UP_KW_PTM, UP_SEQ_FEATURE), Protein_Domains (INTERPRO, PIR_SUPERFAMILY, SMART, UP_KW_DOMAIN), Gene_Ontology (GOTERBP, CC, MF), Interactins (UP_KW_LIGAND), Pathways (KEGG_PATHWAY, BBID,BIOCARTA), Protein_Domains (INTERPRO, PIR_SUPERFAMILY, SMART, UP_KW_DOMAIN).</p>
<p>Corresponding to the Enrichment analysis results with the cluster id of 1 in <xref rid="tab1" ref-type="table">Table 1</xref>, there were four significantly enriched pathways after analysis by GO and KEGG, two of which were associated with proteins of the autism spectrum, which includes different phenotypic manifestations such as classic autism, Asperger&#x2019;s syndrome, childhood disintegration Sexual disorder, Rett&#x2019;s syndrome, and pervasive developmental disorder not otherwise specified. Also significantly enriched into axons, the site of neurotransmitter storage and release. And outside the cytoplasmic membrane, referring to gene products attached to the plasma membrane or cell wall.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Clustering results identified in the statistical method proposed in this paper based on the GDS37948 male infertility data set.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">ID</th>
<th align="center" valign="top">avgSNR</th>
<th align="center" valign="top">Number of samples</th>
<th align="center" valign="top">Number of samples</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="center" valign="top">0.700870148</td>
<td align="center" valign="top">13</td>
<td align="center" valign="top">56</td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="center" valign="top">0.816555484</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">110</td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="center" valign="top">0.775713429</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">88</td>
</tr>
<tr>
<td align="left" valign="top">4</td>
<td align="center" valign="top">0.745638081</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top">101</td>
</tr>
<tr>
<td align="left" valign="top">5</td>
<td align="center" valign="top">0.743384851</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">72</td>
</tr>
<tr>
<td align="left" valign="top">6</td>
<td align="center" valign="top">0.743381552</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">71</td>
</tr>
<tr>
<td align="left" valign="top">7</td>
<td align="center" valign="top">0.730139247</td>
<td align="center" valign="top">351</td>
<td align="center" valign="top">20</td>
</tr>
<tr>
<td align="left" valign="top">8</td>
<td align="center" valign="top">0.718222619</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">110</td>
</tr>
<tr>
<td align="left" valign="top">9</td>
<td align="center" valign="top">0.716803164</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">91</td>
</tr>
<tr>
<td align="left" valign="top">10</td>
<td align="center" valign="top">0.70627255</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">101</td>
</tr>
<tr>
<td align="left" valign="top">11</td>
<td align="center" valign="top">0.703721749</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">68</td>
</tr>
<tr>
<td align="left" valign="top">12</td>
<td align="center" valign="top">1.15234204</td>
<td align="center" valign="top">482</td>
<td align="center" valign="top">12</td>
</tr>
<tr>
<td align="left" valign="top">13</td>
<td align="center" valign="top">0.678448517</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">95</td>
</tr>
<tr>
<td align="left" valign="top">14</td>
<td align="center" valign="top">0.678084094</td>
<td align="center" valign="top">11</td>
<td align="center" valign="top">103</td>
</tr>
<tr>
<td align="left" valign="top">15</td>
<td align="center" valign="top">0.67773126</td>
<td align="center" valign="top">25</td>
<td align="center" valign="top">110</td>
</tr>
<tr>
<td align="left" valign="top">16</td>
<td align="center" valign="top">0.674885829</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">38</td>
</tr>
<tr>
<td align="left" valign="top">17</td>
<td align="center" valign="top">0.671869245</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">92</td>
</tr>
<tr>
<td align="left" valign="top">18</td>
<td align="center" valign="top">0.668664873</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">84</td>
</tr>
<tr>
<td align="left" valign="top">19</td>
<td align="center" valign="top">0.667155842</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">49</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Corresponding to the Enrichment analysis results with the cluster id of 2 in <xref rid="tab1" ref-type="table">Table 1</xref>, enriched in chemical synaptic transmission, cell membrane, and plasma membrane pathways. Release of neurotransmitter molecules from presynaptic vesicles across chemical synapses followed by post synaptic activation of neurotransmitter receptors on target cells (neurons, muscles, or secretory cells), and the effect of this activation on synapses Post-membrane potential and ionic composition of the post synaptic cytoplasm. This process includes spontaneous and evoked release of neurotransmitters and all parts of synaptic vesicle exocytosis. Evoked transmission begins when the action potential reaches the presynaptic.</p>
<p>Corresponding to the Enrichment analysis results with the cluster id of 3 in <xref rid="tab1" ref-type="table">Table 1</xref>, by SMART, INTERPRO, UP_KW_DOMAIN showed enrichment to the SH3 domain. The SH3 (src homology-3) domain is a small protein module containing approximately 50 amino acid residues. They are present in a variety of intracellular or membrane-associated proteins, for example, in a variety of proteins with enzymatic activity, in adaptor proteins such as fodrin and the yeast actin-binding protein ABP-1. The SH3 domain has a characteristic fold, which consists of five or six &#x03B2;-strands arranged in two tightly packed antiparallel &#x03B2;-sheets. The linker region may contain short helices. The surface of the SH3 domain bears a flat hydrophobic ligand-binding pocket consisting of three shallow grooves defined by conserved aromatic residues in which the ligands are arranged in an extended left-handed helix. Ligands bind with low affinity, but this can be enhanced by multiple interactions. The region bound by the SH3 domain is proline-rich in all cases and contains PXXP as a core conserved binding motif. The function of SH3 domains is unclear, but they may mediate many different processes, such as increasing the local concentration of proteins, changing their subcellular location and mediating the assembly of large multiprotein complexes.</p>
<p>Through enrichment analysis, we found that the gene sets of the identified clusters were enriched in a variety of enzyme activities, ADP and ATP related generation reactions, replication and translation of genetic material DNA and RNA, neurotransmitter transmission links and other pathways. Multiple clusters were enriched in RNA polymerase II forward and transcriptional regulatory pathways, protein tyrosine related enzyme pathways, neural synapses, neurotransmitter transmission links, ATP, ADP synthesis related links. There were two clusters of gene sets enriched to human papillomavirus infection pathway. One cluster was significantly enriched in calcium ion related pathways. Another cluster was significantly enriched in the inositol phosphate metabolism pathway. SH3 (src Homology-3) domains, proteoglycan cancer pathway, PDZ domain, Hippo signaling pathway, Tight junction pathway, PB1 domain and other pathways were also enriched in some clusters. Each cluster enriched in the above described pathways at the same time there are other enrichment pathways with different functions. There may be multiple gene interactions enriched in different pathways leading to differences in sperm motility.</p>
<p>In order to determine whether the data is significantly enriched, the <italic>p</italic>-values of the enrichment results are corrected using the Benjamini method and the Bonferroni method. The specific identified differentially expressed genes and the number of samples is shown in <xref rid="tab1" ref-type="table">Table 1</xref>. Specific gene and sample data are included in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Table</xref>: The result of identification. <xref rid="tab2" ref-type="table">Table 2</xref> is the cluster-related enrichment results, <xref rid="fig3" ref-type="fig">Figure 3</xref> visualizes the correlation enrichment results, and the enrichment analysis results of all clusters are shown in <xref ref-type="supplementary-material" rid="SM2">Supplementary Data</xref>.</p>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption>
<p>Enrichment results of genes in a cluster identified by our method in the male infertility data set.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Category</th>
<th align="left" valign="top">Term</th>
<th align="left" valign="top">Genes</th>
<th align="left" valign="top">Bonferroni</th>
<th align="left" valign="top">Benjamini</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">GOTERM_CC_DIRECT</td>
<td align="left" valign="top">GO:0030424&#x2009;~&#x2009;axon</td>
<td align="left" valign="top">CNTNAP2, CNTN5, IL1RAPL1, DMD, SCN1A</td>
<td align="left" valign="top">0.002330526</td>
<td align="left" valign="top">0.002333212</td>
</tr>
<tr>
<td align="left" valign="top">GOTERM_CC_DIRECT</td>
<td align="left" valign="top">GO:0009986&#x2009;~&#x2009;cell surface</td>
<td align="left" valign="top">LGALS3, CNTNAP2, NLGN4X, IL1RAPL1, DMD</td>
<td align="left" valign="top">0.021009445</td>
<td align="left" valign="top">0.010615268</td>
</tr>
<tr>
<td align="left" valign="top">UP_KW_DISEASE</td>
<td align="left" valign="top">KW-1269&#x2009;~&#x2009;Autism</td>
<td align="left" valign="top">CNTNAP2, NLGN4X, SCN1A</td>
<td align="left" valign="top">0.002854718</td>
<td align="left" valign="top">0.002858289</td>
</tr>
<tr>
<td align="left" valign="top">UP_KW_DISEASE</td>
<td align="left" valign="top">KW-1268&#x2009;~&#x2009;Autism spectrum disorder</td>
<td align="left" valign="top">CNTNAP2, NLGN4X, SCN1A</td>
<td align="left" valign="top">0.014578999</td>
<td align="left" valign="top">0.007336422</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Only the pathways and related parameters that were modified and significantly enriched by Bonferroni and Benjamini are listed in the table. The cluster is the id in <xref rid="tab1" ref-type="table">Table 1</xref>: 1.</p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Enrichment circle plot of genes in clusters identified by our method in the male infertility data set. The cluster is the id in <xref rid="tab1" ref-type="table">Table 1</xref>: 1. (Visualization of the relationship between genes and enrichment pathways).</p>
</caption>
<graphic xlink:href="fmicb-14-1092143-g003.tif"/>
</fig>
</sec>
<sec id="sec9">
<label>3.2.</label>
<title>Simulation data experimental results</title>
<p>Since this paper belongs to unsupervised learning, there is no standard answer for the quantitative study of male sperm motility. At the same time, in order to better determine the value of hyper-parameters in the statistical method used in this paper, simulated data similar to gene expression profile datasets are constructed to be used in the method proposed in this paper. The clustering results in the simulated data have been determined and can be used to evaluate the model performance. Comparing the identification results of the simulated data set with the results of similar methods, and the results show that the model proposed in this paper may have higher accuracy in the analysis of genetic factors in the quantitative study of male sperm (<xref rid="tab3" ref-type="table">Table 3</xref>).</p>
<table-wrap position="float" id="tab3">
<label>Table 3</label>
<caption>
<p>The jaccard similarity coefficient between the clustering results identified by the three methods on different simulated datasets and the real clusters, where simulation data represents (the number of samples, the number of genes).</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Simulation data</th>
<th align="center" valign="top">BCPlaid</th>
<th align="center" valign="top">C&#x0026;C</th>
<th align="center" valign="top">COEXSML (this work)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">(10, 5)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0346</td>
</tr>
<tr>
<td align="left" valign="top">(10, 10)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0002</td>
<td align="center" valign="top">0.1089</td>
</tr>
<tr>
<td align="left" valign="top">(10, 20)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0005</td>
<td align="center" valign="top">0.0552</td>
</tr>
<tr>
<td align="left" valign="top">(10, 50)</td>
<td align="center" valign="top">0.0003</td>
<td align="center" valign="top">0.0012</td>
<td align="center" valign="top">0.1150</td>
</tr>
<tr>
<td align="left" valign="top">(10, 100)</td>
<td align="center" valign="top">0.0002</td>
<td align="center" valign="top">0.0022</td>
<td align="center" valign="top">0.2023</td>
</tr>
<tr>
<td align="left" valign="top">(20, 5)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0001</td>
<td align="center" valign="top">0.4509</td>
</tr>
<tr>
<td align="left" valign="top">(20, 10)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0005</td>
<td align="center" valign="top">0.6126</td>
</tr>
<tr>
<td align="left" valign="top">(20, 20)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0009</td>
<td align="center" valign="top">0.5373</td>
</tr>
<tr>
<td align="left" valign="top">(20, 50)</td>
<td align="center" valign="top">0.0004</td>
<td align="center" valign="top">0.0023</td>
<td align="center" valign="top">0.3382</td>
</tr>
<tr>
<td align="left" valign="top">(20, 100)</td>
<td align="center" valign="top">0.0012</td>
<td align="center" valign="top">0.0033</td>
<td align="center" valign="top">0.3195</td>
</tr>
<tr>
<td align="left" valign="top">(50, 5)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0003</td>
<td align="center" valign="top">0.5112</td>
</tr>
<tr>
<td align="left" valign="top">(50, 10)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0013</td>
<td align="center" valign="top">0.7917</td>
</tr>
<tr>
<td align="left" valign="top">(50, 20)</td>
<td align="center" valign="top">0.0020</td>
<td align="center" valign="top">0.0033</td>
<td align="center" valign="top">0.8291</td>
</tr>
<tr>
<td align="left" valign="top">(50, 50)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0047</td>
<td align="center" valign="top">0.8715</td>
</tr>
<tr>
<td align="left" valign="top">(50, 100)</td>
<td align="center" valign="top">0.0024</td>
<td align="center" valign="top">0.0061</td>
<td align="center" valign="top">0.8097</td>
</tr>
<tr>
<td align="left" valign="top">(100,5)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0004</td>
<td align="center" valign="top">0.5123</td>
</tr>
<tr>
<td align="left" valign="top">(100, 10)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0042</td>
<td align="center" valign="top">0.6277</td>
</tr>
<tr>
<td align="left" valign="top">(100, 20)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0038</td>
<td align="center" valign="top">0.6794</td>
</tr>
<tr>
<td align="left" valign="top">(100, 50)</td>
<td align="center" valign="top">0.0000</td>
<td align="center" valign="top">0.0074</td>
<td align="center" valign="top">0.6938</td>
</tr>
<tr>
<td align="left" valign="top">(100, 100)</td>
<td align="center" valign="top">0.0007</td>
<td align="center" valign="top">0.0214</td>
<td align="center" valign="top">0.5455</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To identify the differential expression module of the simulated data, we used the C&#x0026;C (<xref ref-type="bibr" rid="ref7">Cheng and Church, 2000</xref>) and BCPlaid (<xref ref-type="bibr" rid="ref29">Lazzeroni and Owen, 2000</xref>) methods to cluster the data, and calculated the jaccard similarity coefficient of the results, which was often used to compare the similarity and difference between the limited sample sets, among which the jaccard coefficient. The higher the value, the higher the similarity between sets. The stable parameters were tuned best in each model. The specific results are shown in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 3</xref>, and the corresponding box plot is in <xref rid="fig4" ref-type="fig">Figure 4</xref>.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>The data are divided according to the difference in the number of genes in the clusters in the simulated data set. The clustering effect is measured according to the jaccard similarity coefficient, and compared with other methods. COEXSML is the method proposed in this paper.</p>
</caption>
<graphic xlink:href="fmicb-14-1092143-g004.tif"/>
</fig>
</sec>
</sec>
<sec id="sec10" sec-type="conclusions">
<label>4.</label>
<title>Conclusion</title>
<p>Based on the analysis of the GSE37948 male infertility-related gene detection data set in the GEO database, this paper proposes a bicluster analysis method based on hypergeometric distribution, Gibbs sampling and machine learning, and establishes simulation data similar to the GSE37948 data set. The common bicluster analysis methods C&#x0026;C (<xref ref-type="bibr" rid="ref7">Cheng and Church, 2000</xref>) and BCPlaid (<xref ref-type="bibr" rid="ref29">Lazzeroni and Owen, 2000</xref>) have compared the experimental results. The results show that the method proposed in this paper has a higher accuracy in the identification of biclusters on the established simulation data set.</p>
<p>Through enrichment analysis, we found that the gene sets of the identified clusters were enriched in a variety of enzyme activities, ADP and ATP related generation reactions, replication and translation of genetic material DNA and RNA, neurotransmitter transmission links and other pathways. Multiple clusters were enriched in RNA polymerase II forward and transcriptional regulatory pathways, protein tyrosine related enzyme pathways, neural synapses, neurotransmitter transmission links, ATP, ADP synthesis related links. There were two clusters of gene sets enriched to human papillomavirus infection pathway. One cluster was significantly enriched in the inositol phosphate metabolism pathway. Each cluster enriched in the above described pathways at the same time there are other enrichment pathways with different functions. There may be multiple gene interactions enriched in different pathways leading to differences in sperm motility.</p>
<p>Infertility is a complex pathological condition that presents with a wide range of heterogeneous prototypes, and identifying the genes that cause male infertility is important to increase our biological understanding and clinically relevant treatments. The genetic causes of male infertility are chromosomal abnormalities, gene mutations and other reasons, which may be present in autosomes or in sex chromosomes, considering the particularity of male infertility, this article only considers the study of related genes on the X chromosome. With the development of genetic testing technology, the relevant data has increased significantly, and follow-up research can fully explore the information contained in the gene expression data of relevant patients from more aspects.</p>
</sec>
<sec id="sec11" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM2">Supplementary material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="sec12">
<title>Author contributions</title>
<p>XJ proposed the model and completed the manuscript writing. YP assisted in completing the model construction. YP and ZY reviewed and revised the manuscript. ZY provided financial support. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="sec13" sec-type="funding-information">
<title>Funding</title>
<p>This research was supported by the National Natural Science Foundation of China (No: 62072296).</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="sec15" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2023.1092143/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fmicb.2023.1092143/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_1.docx" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alzahrani</surname> <given-names>M.</given-names></name> <name><surname>Kuwahara</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Gao</surname> <given-names>X.</given-names></name></person-group> (<year>2017</year>). <article-title>Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data</article-title>. <source>Bioinformatics</source> <volume>33</volume>, <fpage>2523</fpage>&#x2013;<lpage>2531</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btx199</pub-id>, PMID: <pub-id pub-id-type="pmid">28379298</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antonuccio</surname> <given-names>P.</given-names></name> <name><surname>Micali</surname> <given-names>A. G.</given-names></name> <name><surname>Romeo</surname> <given-names>C.</given-names></name> <name><surname>Freni</surname> <given-names>J.</given-names></name> <name><surname>Vermiglio</surname> <given-names>G.</given-names></name> <name><surname>Puzzolo</surname> <given-names>D.</given-names></name> <etal/></person-group> (<year>2021</year>). <article-title>NLRP3 inflammasome: a new pharmacological target for reducing testicular damage associated with varicocele</article-title>. <source>Int. J. Mol. Sci.</source> <volume>22</volume>. doi: <pub-id pub-id-type="doi">10.3390/ijms22031319</pub-id>, PMID: <pub-id pub-id-type="pmid">33525681</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aquila</surname> <given-names>S.</given-names></name> <name><surname>Sisci</surname> <given-names>D.</given-names></name> <name><surname>Gentile</surname> <given-names>M.</given-names></name> <name><surname>Middea</surname> <given-names>E.</given-names></name> <name><surname>Catalano</surname> <given-names>S.</given-names></name> <name><surname>Carpino</surname> <given-names>A.</given-names></name> <etal/></person-group> (<year>2004</year>). <article-title>Estrogen receptor (ER) alpha and ER beta are both expressed in human ejaculated spermatozoa: evidence of their direct interaction with phosphatidylinositol-3-OH kinase/Akt pathway</article-title>. <source>J. Clin. Endocrinol. Metab.</source> <volume>89</volume>, <fpage>1443</fpage>&#x2013;<lpage>1451</lpage>. doi: <pub-id pub-id-type="doi">10.1210/jc.2003-031681</pub-id>, PMID: <pub-id pub-id-type="pmid">15001646</pub-id></citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bergmann</surname> <given-names>S.</given-names></name> <name><surname>Ihmels</surname> <given-names>J.</given-names></name> <name><surname>Barkai</surname> <given-names>N.</given-names></name></person-group> (<year>2003</year>). <article-title>Iterative signature algorithm for the analysis of large-scale gene expression data</article-title>. <source>Phys. Rev. E Stat. Nonlinear Soft Matter Phys.</source> <volume>67</volume>:<fpage>031902</fpage>. doi: <pub-id pub-id-type="doi">10.1103/PhysRevE.67.031902</pub-id>, PMID: <pub-id pub-id-type="pmid">12689096</pub-id></citation></ref>
<ref id="ref1"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Bollob&#x00E1;s</surname> <given-names>B.</given-names></name> <name><surname>Borgs</surname> <given-names>C.</given-names></name> <name><surname>Chayes</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). &#x201C;<article-title>Directed scale-free graphs</article-title>,&#x2019;&#x2019; in <source>Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms</source>. (Philadelphia, PA, USA). <fpage>132</fpage>&#x2013;<lpage>139</lpage>.</citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Mao</surname> <given-names>F.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Xu</surname> <given-names>Y. J. B. B.</given-names></name></person-group> (<year>2011</year>). <article-title>Genome-wide discovery of missing genes in biological pathways of prokaryotes</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>:<fpage>S1</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-12-S1-S1</pub-id>, PMID: <pub-id pub-id-type="pmid">21342538</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Church</surname> <given-names>G. M.</given-names></name></person-group> (<year>2000</year>). &#x201C;<article-title>Biclustering of expression data</article-title>,&#x201D; in <source>Proceedings of the eighth international conference on intelligent systems for molecular biology</source>. (AAAI Press).  <fpage>93</fpage>&#x2013;<lpage>103</lpage>.</citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chill&#x00F3;n</surname> <given-names>M.</given-names></name> <name><surname>Casals</surname> <given-names>T.</given-names></name> <name><surname>Mercier</surname> <given-names>B.</given-names></name> <name><surname>Bassas</surname> <given-names>L.</given-names></name> <name><surname>Lissens</surname> <given-names>W.</given-names></name> <name><surname>Silber</surname> <given-names>S.</given-names></name> <etal/></person-group> (<year>1995</year>). <article-title>Mutations in the cystic fibrosis gene in patients with congenital absence of the vas deferens</article-title>. <source>N. Engl. J. Med.</source> <volume>332</volume>, <fpage>1475</fpage>&#x2013;<lpage>1480</lpage>. doi: <pub-id pub-id-type="doi">10.1056/NEJM199506013322204</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Chiu</surname> <given-names>H.S.</given-names></name> <name><surname>Chuang</surname> <given-names>H.Y.</given-names></name> <name><surname>Tsai</surname> <given-names>H.K.</given-names></name> <name><surname>Huang</surname> <given-names>T.W.</given-names></name> <name><surname>Kao</surname> <given-names>C.Y.</given-names></name></person-group> (<year>2004</year>). <article-title>Discovering statistically significant clusters by using iterative genetic algorithms in gene expression data</article-title>. In <conf-name>Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Scienes, METMBS</conf-name>, Las Vegas, Nevada, USA.</citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dada</surname> <given-names>R.</given-names></name> <name><surname>Gupta</surname> <given-names>N. P.</given-names></name> <name><surname>Kucheria</surname> <given-names>K.</given-names></name></person-group> (<year>2003</year>). <article-title>Molecular screening for Yq microdeletion in men with idiopathic oligozoospermia and azoospermia</article-title>. <source>Proc. Anim. Sci.</source> <volume>28</volume>, <fpage>163</fpage>&#x2013;<lpage>168</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF02706215</pub-id>, PMID: <pub-id pub-id-type="pmid">12711808</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Smet</surname> <given-names>R.</given-names></name> <name><surname>Marchal</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>An ensemble biclustering approach for querying gene expression compendia with experimental lists</article-title>. <source>Bioinformatics</source> <volume>27</volume>, <fpage>1948</fpage>&#x2013;<lpage>1956</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btr307</pub-id>, PMID: <pub-id pub-id-type="pmid">21593133</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dhollander</surname> <given-names>T.</given-names></name> <name><surname>Sheng</surname> <given-names>Q.</given-names></name> <name><surname>Lemmens</surname> <given-names>K.</given-names></name> <name><surname>De Moor</surname> <given-names>B.</given-names></name> <name><surname>Marchal</surname> <given-names>K.</given-names></name> <name><surname>Moreau</surname> <given-names>Y.</given-names></name></person-group> (<year>2007</year>). <article-title>Query-driven module discovery in microarray data</article-title>. <source>Bioinformatics</source> <volume>23</volume>, <fpage>2573</fpage>&#x2013;<lpage>2580</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btm387</pub-id>, PMID: <pub-id pub-id-type="pmid">17686800</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eren</surname> <given-names>K.</given-names></name> <name><surname>Deveci</surname> <given-names>M.</given-names></name> <name><surname>Kucuktunc</surname> <given-names>O.</given-names></name> <name><surname>Catalyurek</surname> <given-names>U. V.</given-names></name></person-group> (<year>2013</year>). <article-title>A comparative analysis of biclustering algorithms for gene expression data</article-title>. <source>Brief. Bioinform.</source> <volume>14</volume>, <fpage>279</fpage>&#x2013;<lpage>292</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bib/bbs032</pub-id>, PMID: <pub-id pub-id-type="pmid">22772837</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gerniers</surname> <given-names>A.</given-names></name> <name><surname>Bricard</surname> <given-names>O.</given-names></name> <name><surname>Dupont</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data</article-title>. <source>Bioinformatics</source> <volume>37</volume>, <fpage>3220</fpage>&#x2013;<lpage>3227</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btab239</pub-id>, PMID: <pub-id pub-id-type="pmid">33830183</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghiassian</surname> <given-names>S. D.</given-names></name> <name><surname>Menche</surname> <given-names>J.</given-names></name> <name><surname>Barabasi</surname> <given-names>A. L.</given-names></name></person-group> (<year>2015</year>). <article-title>A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome</article-title>. <source>PLoS Comput. Biol.</source> <volume>11</volume>:<fpage>e1004120</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004120</pub-id>, PMID: <pub-id pub-id-type="pmid">25853560</pub-id></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goncalves</surname> <given-names>J. P.</given-names></name> <name><surname>Madeira</surname> <given-names>S. C.</given-names></name></person-group> (<year>2014</year>). <article-title>LateBiclustering: efficient heuristic algorithm for time-lagged bicluster identification</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source> <volume>11</volume>, <fpage>801</fpage>&#x2013;<lpage>813</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TCBB.2014.2312007</pub-id>, PMID: <pub-id pub-id-type="pmid">26356854</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>J. S.</given-names></name></person-group> (<year>2008</year>). <article-title>Bayesian biclustering of gene expression data</article-title>. <source>BMC Genomics</source> <volume>9</volume>:<fpage>S4</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2164-9-S1-S4</pub-id>, PMID: <pub-id pub-id-type="pmid">18366617</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>F.</given-names></name> <name><surname>Yin</surname> <given-names>Z.</given-names></name> <name><surname>Zhou</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>PLncWX: a machine-learning algorithm for plant lncRNA identification based on WOA-XGBoost</article-title>. <source>J. Chem.</source> <volume>2021</volume>, <fpage>1</fpage>&#x2013;<lpage>11</lpage>. doi: <pub-id pub-id-type="doi">10.1155/2021/6256021</pub-id></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hartigan</surname> <given-names>J. A.</given-names></name></person-group> (<year>1972</year>). <article-title>Direct clustering of a data matrix</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>67</volume>, <fpage>123</fpage>&#x2013;<lpage>129</lpage>. doi: <pub-id pub-id-type="doi">10.1080/01621459.1972.10481214</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henriques</surname> <given-names>R.</given-names></name> <name><surname>Madeira</surname> <given-names>S. C.</given-names></name></person-group> (<year>2016a</year>). <article-title>BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge</article-title>. <source>Algorithms Mol. Biol.</source> <volume>11</volume>:<fpage>23</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s13015-016-0085-5</pub-id>, PMID: <pub-id pub-id-type="pmid">27651825</pub-id></citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henriques</surname> <given-names>R.</given-names></name> <name><surname>Madeira</surname> <given-names>S. C.</given-names></name></person-group> (<year>2016b</year>). <article-title>BicNET: flexible module discovery in large-scale biological networks using biclustering</article-title>. <source>Algorithms Mol. Biol.</source> <volume>11</volume>:<fpage>14</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s13015-016-0074-8</pub-id>, PMID: <pub-id pub-id-type="pmid">27213009</pub-id></citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hochreiter</surname> <given-names>S.</given-names></name> <name><surname>Bodenhofer</surname> <given-names>U.</given-names></name> <name><surname>Heusel</surname> <given-names>M.</given-names></name> <name><surname>Mayr</surname> <given-names>A.</given-names></name> <name><surname>Mitterecker</surname> <given-names>A.</given-names></name> <name><surname>Kasim</surname> <given-names>A.</given-names></name> <etal/></person-group> (<year>2010</year>). <article-title>FABIA: factor analysis for bicluster acquisition</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>1520</fpage>&#x2013;<lpage>1527</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btq227</pub-id>, PMID: <pub-id pub-id-type="pmid">20418340</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jacobs</surname> <given-names>P. A.</given-names></name> <name><surname>Strong</surname> <given-names>J. A.</given-names></name></person-group> (<year>1959</year>). <article-title>A case of human intersexuality having a possible XXY sex-determining mechanism</article-title>. <source>Nature</source> <volume>183</volume>, <fpage>302</fpage>&#x2013;<lpage>303</lpage>. doi: <pub-id pub-id-type="doi">10.1038/183302a0</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>J.</given-names></name> <name><surname>Pan</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Ni</surname> <given-names>C.</given-names></name> <name><surname>Xue</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <etal/></person-group> (<year>2020</year>). <article-title>Tumour-infiltrating immune cell-based subtyping and signature gene analysis in breast cancer based on gene expression profiles</article-title>. <source>J. Cancer</source> <volume>11</volume>, <fpage>1568</fpage>&#x2013;<lpage>1583</lpage>. doi: <pub-id pub-id-type="doi">10.7150/jca.37637</pub-id>, PMID: <pub-id pub-id-type="pmid">32047563</pub-id></citation></ref>
<ref id="ref25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kluger</surname> <given-names>Y.</given-names></name> <name><surname>Basri</surname> <given-names>R.</given-names></name> <name><surname>Chang</surname> <given-names>J. T.</given-names></name> <name><surname>Gerstein</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>Spectral biclustering of microarray data: coclustering genes and conditions</article-title>. <source>PCR Methods Appl.</source> <volume>13</volume>, <fpage>703</fpage>&#x2013;<lpage>716</lpage>. doi: <pub-id pub-id-type="doi">10.1101/gr.648603</pub-id>, PMID: <pub-id pub-id-type="pmid">12671006</pub-id></citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krausz</surname> <given-names>C.</given-names></name> <name><surname>Giachini</surname> <given-names>C.</given-names></name> <name><surname>Lo Giacco</surname> <given-names>D.</given-names></name> <name><surname>Daguin</surname> <given-names>F.</given-names></name> <name><surname>Chianese</surname> <given-names>C.</given-names></name> <name><surname>Ars</surname> <given-names>E.</given-names></name> <etal/></person-group> (<year>2012</year>). <article-title>High resolution X chromosome-specific array-CGH detects new CNVs in infertile males</article-title>. <source>PLoS One</source> <volume>7</volume>:<fpage>e44887</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0044887</pub-id>, PMID: <pub-id pub-id-type="pmid">23056185</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krausz</surname> <given-names>C.</given-names></name> <name><surname>Riera-Escamilla</surname> <given-names>A. J.</given-names></name></person-group> (<year>2018</year>). <article-title>Genetics of male infertility</article-title>. <source>Nat. Clin. Pract. Urol.</source> <volume>15</volume>, <fpage>369</fpage>&#x2013;<lpage>384</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41585-018-0003-3</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lazareva</surname> <given-names>O.</given-names></name> <name><surname>Canzar</surname> <given-names>S.</given-names></name> <name><surname>Yuan</surname> <given-names>K.</given-names></name> <name><surname>Baumbach</surname> <given-names>J.</given-names></name> <name><surname>Blumenthal</surname> <given-names>D. B.</given-names></name> <name><surname>Tieri</surname> <given-names>P.</given-names></name> <etal/></person-group> (<year>2020</year>). <article-title>BiCoN: network-constrained biclustering of patients and omics data</article-title>. <source>Bioinformatics</source> <volume>37</volume>, <fpage>2398</fpage>&#x2013;<lpage>2404</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa1076</pub-id>, PMID: <pub-id pub-id-type="pmid">33367514</pub-id></citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lazzeroni</surname> <given-names>L.</given-names></name> <name><surname>Owen</surname> <given-names>A. J.</given-names></name></person-group> (<year>2000</year>). <article-title>Plaid models for gene expression data</article-title>. <source>Stat. Sin.</source> <volume>12</volume>, <fpage>61</fpage>&#x2013;<lpage>86</lpage>.</citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Paterson</surname> <given-names>A. H.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name></person-group> (<year>2009</year>). <article-title>QUBIC: a qualitative biclustering algorithm for analyses of gene expression data</article-title>. <source>Nucleic Acids Res.</source> <volume>37</volume>:<fpage>e101</fpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkp491</pub-id>, PMID: <pub-id pub-id-type="pmid">19509312</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Yuan</surname> <given-names>J.</given-names></name> <name><surname>Wen</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>G.</given-names></name> <name><surname>Zhao</surname> <given-names>J.</given-names></name> <etal/></person-group> (<year>2020</year>). <article-title>Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19</article-title>. <source>Nat. Med.</source> <volume>26</volume>, <fpage>842</fpage>&#x2013;<lpage>844</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41591-020-0901-9</pub-id>, PMID: <pub-id pub-id-type="pmid">32398875</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopes</surname> <given-names>A. M.</given-names></name> <name><surname>Aston</surname> <given-names>K. I.</given-names></name> <name><surname>Thompson</surname> <given-names>E. E.</given-names></name> <name><surname>Carvalho</surname> <given-names>F.</given-names></name> <name><surname>Gon&#x00E7;alves</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>N.</given-names></name> <etal/></person-group> (<year>2013</year>). <article-title>Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1</article-title>. <source>Public Library Sci. Genet.</source> <volume>9</volume>:<fpage>e1003349</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.1003349</pub-id>, PMID: <pub-id pub-id-type="pmid">23555275</pub-id></citation></ref>
<ref id="ref33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lyon</surname> <given-names>M. F.</given-names></name> <name><surname>Hawkes</surname> <given-names>S. G.</given-names></name> <name><surname>Nature</surname> <given-names>H. J.</given-names></name></person-group> (<year>1970</year>). <article-title>X-linked gene for testicular feminization in the mouse</article-title>. <source>Nature</source> <volume>227</volume>, <fpage>1217</fpage>&#x2013;<lpage>1219</lpage>. doi: <pub-id pub-id-type="doi">10.1038/2271217a0</pub-id></citation></ref>
<ref id="ref34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madeira</surname> <given-names>S. C.</given-names></name> <name><surname>Teixeira</surname> <given-names>M. C.</given-names></name> <name><surname>Sa-Correia</surname> <given-names>I.</given-names></name> <name><surname>Oliveira</surname> <given-names>A. L.</given-names></name></person-group> (<year>2010</year>). <article-title>Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source> <volume>7</volume>, <fpage>153</fpage>&#x2013;<lpage>165</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TCBB.2008.34</pub-id>, PMID: <pub-id pub-id-type="pmid">20150677</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Medina</surname> <given-names>I.</given-names></name> <name><surname>Carbonell</surname> <given-names>J.</given-names></name> <name><surname>Pulido</surname> <given-names>L.</given-names></name> <name><surname>Madeira</surname> <given-names>S. C.</given-names></name> <name><surname>Goetz</surname> <given-names>S.</given-names></name> <name><surname>Conesa</surname> <given-names>A.</given-names></name> <etal/></person-group> (<year>2010</year>). <article-title>Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling</article-title>. <source>Nucleic Acids Res.</source> <volume>38</volume>, <fpage>W210</fpage>&#x2013;<lpage>W213</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkq388</pub-id>, PMID: <pub-id pub-id-type="pmid">20478823</pub-id></citation></ref>
<ref id="ref36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>L.</given-names></name> <name><surname>Tian</surname> <given-names>X.</given-names></name> <name><surname>Tian</surname> <given-names>G.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Weng</surname> <given-names>Y.</given-names></name> <etal/></person-group> (<year>2020</year>). <article-title>Single-cell RNA-seq clustering: datasets, models, and algorithms</article-title>. <source>RNA Biol.</source> <volume>17</volume>, <fpage>765</fpage>&#x2013;<lpage>783</lpage>. doi: <pub-id pub-id-type="doi">10.1080/15476286.2020.1728961</pub-id>, PMID: <pub-id pub-id-type="pmid">32116127</pub-id></citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Tan</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>L.</given-names></name> <name><surname>Tian</surname> <given-names>X.</given-names></name> <etal/></person-group> (<year>2022</year>). <article-title>Cell&#x2013;cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: data resources and computational strategies</article-title>. <source>Brief. Bioinform.</source> <volume>23</volume>:<fpage>bbac234</fpage>. doi: <pub-id pub-id-type="doi">10.1093/bib/bbac234</pub-id>, PMID: <pub-id pub-id-type="pmid">35753695</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>L.</given-names></name> <name><surname>Yuan</surname> <given-names>R.</given-names></name> <name><surname>Shen</surname> <given-names>L.</given-names></name> <name><surname>Gao</surname> <given-names>P.</given-names></name> <name><surname>Zhou</surname> <given-names>L. J.</given-names></name></person-group> (<year>2021</year>). <article-title>LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification</article-title>. <source>BioData Min</source> <volume>14</volume>, <fpage>50</fpage>&#x2013;<lpage>22</lpage>. doi: <pub-id pub-id-type="doi">10.1186/s13040-021-00277-4</pub-id>, PMID: <pub-id pub-id-type="pmid">34861891</pub-id></citation></ref>
<ref id="ref39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Plaisier</surname> <given-names>S. B.</given-names></name> <name><surname>Taschereau</surname> <given-names>R.</given-names></name> <name><surname>Wong</surname> <given-names>J. A.</given-names></name> <name><surname>Graeber</surname> <given-names>T. G.</given-names></name></person-group> (<year>2010</year>). <article-title>Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures</article-title>. <source>Nucleic Acids Res.</source> <volume>38</volume>:<fpage>e169</fpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkq636</pub-id>, PMID: <pub-id pub-id-type="pmid">20660011</pub-id></citation></ref>
<ref id="ref40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Preli&#x0107;</surname> <given-names>A.</given-names></name> <name><surname>Bleuler</surname> <given-names>S.</given-names></name> <name><surname>Zimmermann</surname> <given-names>P.</given-names></name> <name><surname>Wille</surname> <given-names>A.</given-names></name> <name><surname>B&#x00FC;hlmann</surname> <given-names>P.</given-names></name> <name><surname>Gruissem</surname> <given-names>W.</given-names></name> <etal/></person-group> (<year>2006</year>). <article-title>A systematic comparison and evaluation of biclustering methods for gene expression data</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>1122</fpage>&#x2013;<lpage>1129</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btl060</pub-id>, PMID: <pub-id pub-id-type="pmid">16500941</pub-id></citation></ref>
<ref id="ref41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ron</surname> <given-names>E.</given-names></name> <name><surname>Michael</surname> <given-names>D.</given-names></name> <name><surname>Lash</surname> <given-names>A. E.</given-names></name></person-group> (<year>2002</year>). <article-title>Gene expression omnibus: NCBI gene expression and hybridization array data repository</article-title>. <source>Nucleic Acids Res.</source> <volume>1</volume>, <fpage>207</fpage>&#x2013;<lpage>210</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/30.1.207</pub-id></citation></ref>
<ref id="ref42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sheng</surname> <given-names>Q.</given-names></name> <name><surname>Moreau</surname> <given-names>Y.</given-names></name> <name><surname>De Moor</surname> <given-names>B.</given-names></name></person-group> (<year>2003</year>). <article-title>Biclustering microarray data by Gibbs sampling</article-title>. <source>Bioinformatics</source> <volume>19</volume>:<fpage>ii196-205</fpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btg1078</pub-id>, PMID: <pub-id pub-id-type="pmid">14534190</pub-id></citation></ref>
<ref id="ref43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tournaye</surname> <given-names>H.</given-names></name> <name><surname>Krausz</surname> <given-names>C.</given-names></name> <name><surname>Oates</surname> <given-names>R. D.</given-names></name></person-group> (<year>2017</year>). <article-title>Novel concepts in the aetiology of male reproductive impairment</article-title>. <source>Lancet Diabetes Endocrinol.</source> <volume>5</volume>, <fpage>544</fpage>&#x2013;<lpage>553</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S2213-8587(16)30040-7</pub-id>, PMID: <pub-id pub-id-type="pmid">27395771</pub-id></citation></ref>
<ref id="ref44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>R.</given-names></name> <name><surname>Guo</surname> <given-names>C.</given-names></name> <name><surname>Jiang</surname> <given-names>S.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <etal/></person-group> (<year>2016</year>). <article-title>Knockout of BRD7 results in impaired spermatogenesis and male infertility</article-title>. <source>Sci. Rep.</source> <volume>6</volume>:<fpage>21776</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep21776</pub-id>, PMID: <pub-id pub-id-type="pmid">26878912</pub-id></citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xavier</surname> <given-names>M. J.</given-names></name> <name><surname>Salas-Huetos</surname> <given-names>A.</given-names></name> <name><surname>Oud</surname> <given-names>M. S.</given-names></name> <name><surname>Aston</surname> <given-names>K. I.</given-names></name> <name><surname>Veltman</surname> <given-names>J. A.</given-names></name></person-group> (<year>2021</year>). <article-title>Disease gene discovery in male infertility: past, present and future</article-title>. <source>Hum. Genet.</source> <volume>140</volume>, <fpage>7</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00439-020-02202-x</pub-id>, PMID: <pub-id pub-id-type="pmid">32638125</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yatsenko</surname> <given-names>A. N.</given-names></name> <name><surname>Georgiadis</surname> <given-names>A. P.</given-names></name> <name><surname>R&#x00F6;pke</surname> <given-names>A.</given-names></name> <name><surname>Berman</surname> <given-names>A. J.</given-names></name> <name><surname>Jaffe</surname> <given-names>T.</given-names></name> <name><surname>Olszewska</surname> <given-names>M.</given-names></name> <etal/></person-group> (<year>2015</year>). <article-title>X-linked TEX11 mutations, meiotic arrest, and azoospermia in infertile men</article-title>. <source>N. Engl. J. Med.</source> <volume>372</volume>, <fpage>2097</fpage>&#x2013;<lpage>2107</lpage>. doi: <pub-id pub-id-type="doi">10.1056/NEJMoa1406192</pub-id>, PMID: <pub-id pub-id-type="pmid">25970010</pub-id></citation></ref>
<ref id="ref47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yi</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>L.</given-names></name> <name><surname>Mishne</surname> <given-names>G.</given-names></name> <name><surname>Chi</surname> <given-names>E. C.</given-names></name></person-group> (<year>2021</year>). <article-title>COBRAC: a fast implementation of convex biclustering with compression</article-title>. <source>Bioinformatics</source> <volume>37</volume>, <fpage>3667</fpage>&#x2013;<lpage>3669</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btab248</pub-id>, PMID: <pub-id pub-id-type="pmid">33904580</pub-id></citation></ref>
<ref id="ref48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhai</surname> <given-names>Z.</given-names></name> <name><surname>Lei</surname> <given-names>Y. L.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data</article-title>. <source>Bioinformatics</source> <volume>38</volume>, <fpage>2496</fpage>&#x2013;<lpage>2503</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btac131</pub-id>, PMID: <pub-id pub-id-type="pmid">35253834</pub-id></citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Cloots</surname> <given-names>L.</given-names></name> <name><surname>Bulcke</surname> <given-names>T. V. D.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Marchal</surname> <given-names>K. J. B. B.</given-names></name></person-group> (<year>2011</year>). <article-title>Query-based biclustering of gene expression data using probabilistic relational models</article-title>. <source>Bioinformatics</source> <volume>12</volume>:<fpage>S37</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-12-S1-S37</pub-id>, PMID: <pub-id pub-id-type="pmid">21342568</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>F.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>QServer: a biclustering server for prediction and assessment of co-expressed gene clusters</article-title>. <source>PLoS One</source> <volume>7</volume>:<fpage>e32660</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0032660</pub-id>, PMID: <pub-id pub-id-type="pmid">22403692</pub-id></citation></ref>
</ref-list>
</back>
</article>