AUTHOR=Chen Guo , Sun Kaixin 

TITLE=Leveraging multimodal learning for enhanced drug-target interaction prediction

JOURNAL=Frontiers in Pharmacology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2025.1639979

DOI=10.3389/fphar.2025.1639979

ISSN=1663-9812

ABSTRACT=IntroductionThe evolving landscape of artificial intelligence in drug discovery necessitates increasingly sophisticated approaches to predict drug-target interactions (DTIs) with high precision and generalizability. In alignment with the current surge of interest in AI-driven pharmacological modeling and integrative biomedical data analysis, this study introduces a multimodal framework for enhancing DTI prediction by fusing heterogeneous data sources. While conventional methods typically rely on unimodal inputs such as chemical structures or protein sequences, they fall short in capturing the complex, multi-faceted nature of biochemical interactions and are often limited in adaptability across different tasks or incomplete datasets. These limitations impede the model’s capability to generalize beyond narrow benchmarks and reduce interpretability when modalities are missing or noisy.MethodsTo address these challenges, we propose a comprehensive multimodal learning pipeline composed of three principal innovations. The Unified Multimodal Molecule Encoder (UMME) jointly embeds molecular graphs, textual descriptions, transcriptomics, protein sequences, and bioassay data using modality-specific enc followed by a hierarchical attention-based fusion strategy. This encoder is capable of aligning intra- and inter-modal representations while retaining high-level semantic features critical for interaction prediction. We introduce a robust training strategy named Adaptive Curriculum-guided Modality Optimization (ACMO), which dynamically prioritizes more reliable or informative modalities during early training and gradually incorporates less certain data via a curriculum mechanism. This allows the model to maintain strong performance even when faced with modality absence or noise, thereby mimicking realistic drug screening conditions. We employ a novel cross-modal contrastive alignment loss and modality dropout scheduling, which together enforce consistency and encourage generalization across diverse data settings.ResultsExperiments on multiple benchmark datasets demonstrate that our framework achieves state-of-the-art performance in drug-target affinity estimation and binding prediction tasks, particularly under conditions of partial data availability.DiscussionAblation studies confirm the effectiveness of both UMME and ACMO components in improving accuracy and robustness.