AUTHOR=Liu Yinbo , Liu Yufeng , Wang Gang-Ao , Cheng Yinchu , Bi Shoudong , Zhu Xiaolei TITLE=BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for Homo sapiens JOURNAL=Frontiers in Bioinformatics VOLUME=Volume 2 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2022.834153 DOI=10.3389/fbinf.2022.834153 ISSN=2673-7647 ABSTRACT=As one of the most important post-translational modifications (PTMs), protein lysine glycation changes the characteristics of the proteins and leads to the dysfunction of the proteins so that it may cause diseases. Accurately detecting the glycation sites is of great benefit for understanding the biological function and potential mechanism of glycation in the treatment of the diseases. However, experimental methods are expensive and time-consuming for lysine glycation sites identification. Instead, computational methods, with their higher efficiency and lower cost, could be an important supplement to the experiment methods. In this study, we proposed a novel predictor, BERT-Kgly, for protein lysine glycation sites prediction, which was developed by extracting embedding features of protein segments from pre-trained bidirectional encoder representations from transformers (BERT) models. Three pre-trained BERT models were explored to get the embeddings with optimal representability and three down-stream deep networks were employed to build our models. Our results show that the model based on embeddings extracted from the BERT model pre-trained on 556,603 protein sequences of Uniport outperforms other models. In addition, an independent test set was used to evaluate and compare our model with other existing methods, which indicated that our model was superior to other existing models.