(2022) Omen: discovering sequential patterns with reliable prediction delays.

Text
omencueppers,kalofolias,vreekenkais.pdf  Accepted Version Download (1MB)  Preview 


Text
omencueppers,kalofolias,vreekenkais.pdf Download (1MB)  Preview 
Abstract
Suppose we are given a discretevalued time series $$X $$ X of observed events and an equally long binary sequence $$Y $$ Y that indicates whether something of interest happened at that particular point in time. We consider the problem of mining serial episodes, sequential patterns allowing for gaps, from $$X $$ X that reliably predict those interesting events. With reliable we mean patterns that not only predict that an interesting event is likely to follow, but in particular that we can also accurately tell how how long until that event will happen. In other words, we are specifically interested in patterns with a highly skewed distribution of delays between pattern occurrences and predicted events. As it is unlikely that a single pattern can explain a complex realworld progress, we are after the smallest, least redundant set of such patterns that together explain the interesting events well. We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best patterns as those that describe the occurrences of interesting events $$Y $$ Y most succinctly given the data over $$X $$ X . As neither discovering the optimal explanation of $$Y $$ Y given a set of patterns, nor the discovery of optimal pattern set are problems that allow for straightforward optimization, we break the problem in two and propose effective heuristics for both. Through extensive empirical evaluation, we show that both our main method, Omen , and its fast approximation fOmen , work well in practice and both quantitatively and qualitatively beat the state of the art.
Item Type:  Article 

Divisions:  Jilles Vreeken (Exploratory Data Analysis) 
Depositing User:  Sebastian Dalleiger 
Date Deposited:  15 Jul 2022 10:21 
Last Modified:  15 Jul 2022 10:22 
Primary Research Area:  NRA1: Trustworthy Information Processing 
URI:  https://publications.cispa.saarland/id/eprint/3725 
Actions
Actions (login required)
View Item 