AUTHOR=Yousaf Wafa , Haseeb Abdul , Shen Yongheng , Li Hongquan , Fan Kuohai , Sun Na , Sun Panpan , Sun Yaogui , Yang Huizhen , Yin Wei , Zhang Hua , Zhang Zhenbiao , Zhong Jia , Wang Jianzhong , Huo Nairui TITLE=Data-driven discovery of antiviral peptides against PRRSV using multiple machine learning models JOURNAL=Frontiers in Veterinary Science VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/veterinary-science/articles/10.3389/fvets.2025.1681083 DOI=10.3389/fvets.2025.1681083 ISSN=2297-1769 ABSTRACT=IntroductionCellular machinery is built upon proteins and their functional interrelationships. Their network evaluation is essential for a comprehensive insight into biological processes and may establish a foundation for predicting antivirulence. Antiviral peptides (AVPs) have robust, broad-spectrum anti-virulence capabilities. Nevertheless, the existing predicted AVPs database is insufficient and necessitates more precise, reliable annotations. This study aimed to screen differentially expressed proteins and peptides of healthy and porcine reproductive and respiratory syndrome virus (PRRSV)-infected tissues and to predict AVP’s using Machine learning and Deep learning based computational methods.MethodsLungs, small intestine and large intestine samples were collected to validate and quantify proteins and peptides through proteomics, and followed by predicting AVPs by employing machine learning (ML) and deep learning (DL). Models were developed exploiting significant features based on physicochemical characteristics, encompassing amino acid composition (AAC), secondary structure, and hydrophilicity. Proteomics analysis facilitated peptide qualification through GO, KEGG, COG, and PPI analysis. To predict AVPs, we employed a DL graph neural network (GNN) by making its inaugural implication in this domain and benchmarked its efficacy against conventional ML random forest (RF) and support vector machine (SVM) models.ResultsFindings demonstrated that lysine, arginine, and leucine were ranked nearly 0.1, highlighting their significant importance in prediction. Additionally, the correlation heatmap showed that lysine and glutamate exhibited the strongest positive association (0.57). RF model achieved an area under the curve (AUC) of 0.95 ± 2, verified via 5-fold cross-validation. In contrast, GNN and SVM models yielded 0.94 ± 1 AUC, demonstrating comparable performance across models, and revealed that the RF model outperformed compared to the others.DiscussionIntegrating proteomics with computational modeling revealed peptides with antiviral potential against PRRSV. The RF model demonstrated the best discriminative power, and amino acid composition played a key predictive role. Consequently, these comparative predictive results may serve as revolutionized and distinctive resources for the experimental validation and identification of PRRSV AVPs as prospective therapeutics.