AUTHOR=Zhang Yunwei , Tian Jing , Xiong Qiaochu TITLE=A review of embodied intelligence systems: a three-layer framework integrating multimodal perception, world modeling, and structured strategies JOURNAL=Frontiers in Robotics and AI VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2025.1668910 DOI=10.3389/frobt.2025.1668910 ISSN=2296-9144 ABSTRACT=Embodied intelligent systems build upon the foundations of behavioral robotics and classical cognitive architectures. They integrate multimodal perception, world modeling, and adaptive control to support closed-loop interaction in dynamic and uncertain environments. Recent breakthroughs in Multimodal Large Models (MLMs) and World Models (WMs) are profoundly transforming this field, providing the tools to achieve its long-envisioned capabilities of semantic understanding and robust generalization. Targeting the central challenge of how modern MLMs and WMs jointly advance embodied intelligence, this review provides a comprehensive overview across key dimensions, including multimodal perception, cross-modal alignment, adaptive decision-making, and Sim-to-Real transfer. Furthermore, we systematize these components into a three-stage theoretical framework termed “Dynamic Perception–Task Adaptation (DP-TA)”. This framework integrates multimodal perception modeling, causally driven world state prediction, and semantically guided strategy optimization, establishing a comprehensive “perception–modeling–decision” loop. To support this, we introduce a “Feature-Conditioned Modal Alignment (F-CMA)” mechanism to enhance cross-modal fusion under task constraints.