AUTHOR=Li Jin-Tong , Wei Ya-Wen , Wang Meng-Yu , Yan Chun-Xiao , Ren Xia , Fu Xian-Jun TITLE=Antibacterial Activity Prediction Model of Traditional Chinese Medicine Based on Combined Data-Driven Approach and Machine Learning Algorithm: Constructed and Validated JOURNAL=Frontiers in Microbiology VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2021.763498 DOI=10.3389/fmicb.2021.763498 ISSN=1664-302X ABSTRACT=Traditional Chinese medicines (TCMs), as a unique natural medicine resource, were used to prevent and treat bacterial diseases in China with long history. In order to provide prediction model of screening antibacterial TCMs for design/discover novel antibacterial agents, the literature about antibacterial TCMs in China National Knowledge Infrastructure (CNKI) and Web of Science database was retrieved, the data were extracted and standardized, a total of 28786 pieces of data from 904 antibacterial TCMs were collected, among which the data of plant medicine were the most, the result of association rules mining showed high correlation between antibacterial activity with cold nature, bitter and sour tastes, hemostatic and drain fire efficacies. Moreover, TCMs with antibacterial activity showed a certain aggregation in the phylogenetic tree, 92% of them came from Tracheophyta, of which 74% were mainly concentrated in rosids, asterids, Liliopsida and Ranunculales. The prediction models of anti-Escherichia coli and anti-Staphylococcus aureus activity, with AUC values (the area under the ROC curve) of 77.5% and 80.0%, respectively, were constructed by the Neural Networks (NN) algorithm after Bagged Classification and Regression Tree (Bagged CART) and Linear Discriminant Analysis (LDA) selection. The in vitro experimental results showed the prediction accuracy of these two models was 75% and 60% respectively. And four TCMs (Cirsium japonicum Fisch.ex DC., Changium smyrnioides Wolff, Swertia pseudochinensis Hara, Callicarpa formosana Rolfe) were proposed for the first time to show antibacterial activity against E.coli and/or S.aureus. The results implied that the properties and families of TCMs related with antibacterial activity, the prediction model of antibacterial activity of TCMs based on data-driven and machine learning algorithm showed certain prediction ability, which was of great significance to the screening of antibacterial TCMs and can be used to discover novel antibacterial agents.