AUTHOR=Grote Alexander , Hariharan Anuja , Weinhardt Christof TITLE=Finding the needle in the haystack—An interpretable sequential pattern mining method for classification problems JOURNAL=Frontiers in Big Data VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2025.1604887 DOI=10.3389/fdata.2025.1604887 ISSN=2624-909X ABSTRACT=IntroductionThe analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.MethodsWe propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.ResultsWe evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.DiscussionThis study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.