AUTHOR=Elmitwalli Sherif , Mehegan John TITLE=Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques JOURNAL=Frontiers in Big Data VOLUME=Volume 7 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2024.1357926 DOI=10.3389/fdata.2024.1357926 ISSN=2624-909X ABSTRACT=Sentiment analysis has become a crucial area of research in natural language processing in recent years. Various sentiment analysis techniques have been developed, including lexicon-based, machine learning, Bi-LSTM, BERT, and GPT-3 approaches. The aim of this study is to compare the performance of these approaches using two commonly used datasets for Tweet sentiment analysis: IMDB reviews and Sentiment140. The study seeks to identify the best-performing technique for an exemplar dataset, tweets associated with the WHO Framework Convention on Tobacco Control Ninth Conference of the Parties in 2021 (COP9), by applying common and recent approaches to the datasets and evaluating their results using standard evaluation metrics such as accuracy, F1-score, and precision. To achieve this, we conducted a two-stage evaluation. The first stage compared various techniques on standard sentiment analysis datasets. BERT achieved the highest F1-scores (0.9380 for IMDB and 0.8114 for Sentiment140), followed by GPT-3 (0.9119 and 0.7913) and Bi-LSTM (0.8971 and 0.7778). In the second stage, the best-performing technique for sentiment analysis on partially annotated COP9 conference-related tweets was GPT-3, with an F1-score of 0.8812. This model was then used to analyze sentiments in all the COP9 tweets, providing practical insights for researchers and practitioners in selecting suitable sentiment analysis techniques across domains. Furthermore, the study highlights the resilient performance of pre-trained models, showcasing their effectiveness even in scenarios with limited or no annotated data.