AUTHOR=Jiang Li , Lu Wang TITLE=Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP JOURNAL=Frontiers in Neurorobotics VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2023.1275645 DOI=10.3389/fnbot.2023.1275645 ISSN=1662-5218 ABSTRACT=This paper introduces an Intelligent Robot Sports competition tactical analysis model that leverages multimodal perception to address the challenge of analyzing opponent tactics in sports competitions. There is an urgent need for analysis of opponent tactics in the current field of sports competition analysis. However, traditional methods are often limited to a single data source or mode, and it is difficult to fully capture the details of opponent tactics. The system incorporates Swin Transformer and CLIP models, employing cross-modal transfer learning for comprehensive observation and analysis of opponent tactics. The Swin Transformer learns the opponent's action posture and behavior patterns in basketball or football games, while the CLIP model enhances the system's understanding of opponent tactical information by establishing semantic associations between images and text. To overcome potential imbalances and biases between models, we propose a cross-modal transfer learning method, mitigating modal bias problems and improving the model's generalization performance on multimodal data. Through cross-modal transfer learning, tactical information learned from images by the Swin Transformer is effectively transferred to the CLIP model, providing coaches and athletes with comprehensive tactical insights. Our method is experimentally verified based on Sport UV, Sports-1M, HMDB51 and NPU RGB+D datasets. Experimental results show that the system performs well in terms of prediction accuracy, stability, training time, inference time, number of parameters, and computational complexity. In particular, in terms of prediction error MAE on the Kinetics dataset, the system error is 8.47% lower than that of other models, and the training time is reduced by 72.86 seconds.The system proves suitable for real-time sports competition assistance and analysis, offering a novel and effective approach for intelligent robot Sports competition tactical analysis model that maximizes the potential of multimodal perception technology.