AUTHOR=Jobarteh Bubacarr , Mincu-Iorga Madalina , Gavojdian Dinu , Neethirajan Suresh 

TITLE=Integrating multi-modal data fusion approaches for analysis of dairy cattle vocalizations

JOURNAL=Frontiers in Veterinary Science

VOLUME=Volume 12 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/veterinary-science/articles/10.3389/fvets.2025.1704031

DOI=10.3389/fvets.2025.1704031

ISSN=2297-1769

ABSTRACT=Non-invasive analysis of dairy cattle vocalizations offers a practical route to continuous assessment of stress and timely health interventions in precision livestock systems. We present a multi-modal AI framework that fuses standard acoustic features (e.g., frequency, duration, amplitude) with non-linguistic, transformer-based representations of call structure for behavior classification. The classification analysis represents the core contribution of this work, while the integration of the Whisper model serves as a complementary exploratory tool, highlighting its potential for future motif-based behavioral studies. Using contact calls recorded from a cohort of lactating Romanian Holsteins during a standardized, brief social-isolation paradigm, we developed an ontology distinguishing high-frequency calls (HFCs) associated with arousal from low-frequency calls (LFCs) associated with calmer states. Across cross-validated models, support vector machine and random-forest classifiers reliably separated call types, and fused acoustic + symbolic features consistently outperformed single-modality inputs. Feature-importance analyses highlighted frequency, loudness, and duration as dominant, interpretable predictors, aligning vocal patterns with established markers of arousal. From a clinical perspective, the system is designed to operate passively on barn audio to flag rising stress signatures in real time, enabling targeted checks, husbandry adjustments, and prioritization for veterinary examination. Integrated with existing sensor networks (e.g., milking robots, environmental monitors), these alerts can function as an early-warning layer that complements conventional surveillance for conditions where vocal changes may accompany pain, respiratory compromise, or maladaptive stress. While the present work validates behaviorally anchored discrimination, ongoing efforts will pair vocal alerts with physiological measures (e.g., cortisol, infrared thermography) and multi-site datasets to strengthen disease-specific inference and generalizability. This framework supports scalable, on-farm welfare surveillance and earlier intervention in emerging health and stress events.