AUTHOR=Grote Alexander , Hariharan Anuja , Weinhardt Christof 
  
TITLE=Finding the needle in the haystack—An interpretable sequential pattern mining method for classification problems
  
JOURNAL=Frontiers in Big Data
  
VOLUME=Volume 8 - 2025
  
YEAR=2025
  
URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2025.1604887
  
DOI=10.3389/fdata.2025.1604887
  
ISSN=2624-909X
  
ABSTRACT=IntroductionThe analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.MethodsWe propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.ResultsWe evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.DiscussionThis study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.