AUTHOR=Ebeid Islam Akef , Tang Haoteng , Gu Pengfei TITLE=Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks JOURNAL=Frontiers in Bioinformatics VOLUME=Volume 5 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2025.1651623 DOI=10.3389/fbinf.2025.1651623 ISSN=2673-7647 ABSTRACT=IntroductionAccurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing the development of drugs. While existing in-silico methods leverage direct sequence embeddings from Protein Language Models (PLMs) or apply Graph Neural Networks (GNNs) to 3D protein structures, the main focus of this study is to investigate less computationally intensive alternatives. This work introduces a novel framework for the downstream task of PPI prediction via link prediction.MethodsWe introduce a two-stage graph representation learning framework, ProtGram-DirectGCN. First, we developed ProtGram, a novel approach that models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of a directed graph of paired residues. Second, we propose a custom directed graph convolutional neural network, DirectGCN, which features a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations, combined via a learnable gating mechanism. DirectGCN is applied to the ProtGram graphs to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings for the prediction task.ResultsThe efficacy of the DirectGCN model was first established on standard node classification benchmarks, where its performance is comparable to that of established methods on general datasets, while demonstrating specialization for complex, directed, and dense heterophilic graph structures. When applied to PPI prediction, the full ProtGram-DirectGCN framework achieves robust predictive power despite being trained on limited data.DiscussionOur results suggest that a globally inferred, directed graph-based representation of sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs for the task of PPI prediction. Future work will involve testing ProtGram-DirectGCN on a wider range of bioinformatics tasks.