AUTHOR=Li Mengwei , Zhao Tongle , Liu Yingchao , Yu Ruteng , Wang Lukun TITLE=RoCD: leveraging foundation vision models with refine-and-fuse framework for robust change detection JOURNAL=Frontiers in Remote Sensing VOLUME=Volume 6 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2025.1744950 DOI=10.3389/frsen.2025.1744950 ISSN=2673-6187 ABSTRACT=In recent years, Foundation Vision Models (FVMs) have provided new technical approaches for change detection and understanding of remote sensing images due to their strong generalization and multi-scale representation capabilities. However, in complex spatiotemporal scenarios, existing methods still face two major challenges: insufficient feature interaction and an imbalance between global and detail representation. To address these challenges, this paper proposes RoCD, which introduces the Refine-and-Align Framework (RAF) and FusionR-Decoder on top of the frozen basic visual model, FastSAM encoder. First, RAF introduces a pairwise difference refinement (PDR) mechanism to enhance feature interactions and effectively suppress spurious changes caused by inter-domain differences. Second, FusionR-Decoder embeds a three-branch RBlock based on state space models (SSMs) in the multi-scale decoding stage to achieve long dependency modeling and global consistency constraints. Experimental results on the three public datasets LEVIR-CD, LEVIR-CD+, and WHU-CD show that RoCD achieves F1/IoU scores of 92.11/85.38, 87.68/76.90, and 95.95/91.04, respectively.