AUTHOR=Achitouv Ixandra , Chavalarias David , Gaume Bruno 

TITLE=Testing network clustering algorithms with natural language processing

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 8 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1635436

DOI=10.3389/frai.2025.1635436

ISSN=2624-8212

ABSTRACT=IntroductionWe propose a hybrid methodology to evaluate the alignment between structural communities inferred from interaction networks and the linguistic coherence of users' textual production in online social networks. Understanding whether community structure reflects language use allows for a more nuanced validation of Community Detection Algorithms (CDAs) beyond assuming their outputs as ground truth.MethodsUsing Twitter data on climate change discussions, we compare several CDAs by training Natural Language Processing Classification Algorithms (NLPCA), such as BERTweet-based models, on the communities they generate. Classification accuracy serves as a proxy for the semantic coherence of CDA-induced groups. This comparative scoring approach offers a self-consistent framework for evaluating CDA performance without requiring manually annotated labels. We also introduce a coverage–precision trade-off metric to assess community-level performance.ResultsOur results show that the best CDA/NLPCA combinations predict a user's community with over 85% accuracy using only three short sentences. This demonstrates a strong alignment between structural and linguistic patterns in online discourse.DiscussionOur framework enables scoring CDAs based on semantic predictability and allows prediction of community membership from minimal textual input. It offers practical benefits, such as providing proxy labels for low-supervision NLP tasks, and is adaptable to other social platforms. Limitations include potential noise in CDA-generated labels but the approach offers a generalizable method for evaluating CDA performance and the coherence of online social groups.