AUTHOR=Nguyen Khanh , Vu Binh , Chandna Swati , Schultz Jobst-Hendrik , Mayer Gwendolyn TITLE=Between the lines: investigating health beliefs and emotional expressions in online mental health communities JOURNAL=Frontiers in Psychology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2025.1521623 DOI=10.3389/fpsyg.2025.1521623 ISSN=1664-1078 ABSTRACT=IntroductionSocial media platforms play an important role in mental health discourse. Applying the Health Belief Model (HBM) to health-related discussions on Reddit could yield deeper insights into individuals' perceptions of mental health threats and barriers to seeking help. The primary objective of this research is to develop an efficient methodology not only for classifying key HBM components—such as perceived susceptibility, severity, benefits, barriers, cues to action, and self-efficacy—but also for examining emotional expressions within these discussions.MethodsA sample of 5,000 posts was selected for classification and a subset was manually labelled for further analysis. Multiple models were tested in classification tasks. Data analysis utilized visualization techniques—such as word clouds, heatmaps, and emotional content analysis—to identify thematic trends and emotional expressions in the discussions.ResultsDistilBERT outperformed other approaches, achieving accuracy rates between 75 and 84% for most components. However, challenges persist in predicting perceived severity, with an accuracy of only 47% due to its multi-label nature; to address this, GPT-4-based keyword extraction was combined with human review, improving accuracy to 81%. The emotional content analysis reveals patterns in mental health discussions, such as the attribution of personality as a root cause of anxiety by users and the urgent need for targeted interventions in cases of suicidal ideation.DiscussionFindings demonstrate that users tend to use more negative language in contexts with higher perceived severity. Future work should prioritize improving model adaptability to health-specific data, handling rare terms, conducting nuanced emotional analyses in written expressions, and addressing ethical implications in analyzing user-generated content.