AUTHOR=Aglogallos Anastasis , Bousdekis Alexandros , Kontos Stefanos , Mentzas Gregoris TITLE=Health state prediction with reinforcement learning for predictive maintenance JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1720140 DOI=10.3389/frai.2025.1720140 ISSN=2624-8212 ABSTRACT=IntroductionPredictive maintenance has emerged as a critical strategy in modern manufacturing, in the frame of Industry 4.0, enabling proactive intervention before equipment failure. However, traditional machine learning approaches require extensive labeled data and lack adaptability to evolving operational conditions. On the other hand, Reinforcement Learning (RL) enables agents to learn optimal policies through interaction with the environment, eliminating the need for labeled datasets and naturally capturing the sequential, uncertain dynamics of equipment degradation.MethodsIn this paper, we propose an approach that incorporates four model-free RL algorithms, namely Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Soft Actor-Critic (SAC). We formulate the problem as a Markov Decision Process (MDP), which is solved with the aforementioned RL algorithms.ResultsThe proposed approach is validated in the context of CNC machine tool wear prediction, using sensor data from the 2010 PHM Society Data Challenge. We examine algorithmic performance across four custom made environments, corrective and non-corrective environments both with and without delay correction mechanisms in order to compare learning dynamics, convergence behavior, and generalization aspects. Our results reveal that PPO and SAC achieve the most stable and efficient performance, with SAC excelling in structured environments and PPO demonstrating robust generalization. A2C shows consistent long-term learning, while DDPG underperforms due to insufficient exploration.DiscussionThe findings highlight the potential of RL for predictive maintenance applications and underscore the importance of aligning algorithm design with environment characteristics and reward structures.