(2019) Sets of Robust Rules, and How to Find Them.
|
Text
grab-fischer,vreeken.pdf Download (1MB) | Preview |
Abstract
Association rules are among the most important concepts in data mining. Rules of the form X → Y are simple to understand, simple to act upon, yet can model important local dependencies in data. The problem is, however, that there are so many of them. Both traditional and state-of-the-art frameworks typically yield millions of rules, rather than identifying a small set of rules that capture the most important dependencies of the data. In this paper, we define the problem of association rule mining in terms of the Minimum Description Length principle. That is, we identify the best set of rules as the one that most succinctly describes the data. We show that the resulting optimization problem does not lend itself for exact search, and hence propose Grab, a greedy heuristic to efficiently discover good sets of rules directly from data. Through an extensive set of experiments we show that, unlike the state-of-the-art, Grab does reliably recover the ground truth. On real world data we show it finds reasonable numbers of rules, that upon close inspection give clear insight in the local distribution of the data.
Item Type: | Conference or Workshop Item (A Paper) (Paper) |
---|---|
Divisions: | Jilles Vreeken (Exploratory Data Analysis) |
Conference: | ECML PKDD European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database (PKDD and ECML combined from 2008) |
Depositing User: | Sebastian Dalleiger |
Date Deposited: | 23 Jun 2019 18:39 |
Last Modified: | 11 May 2021 08:33 |
Primary Research Area: | NRA5: Empirical & Behavioral Security |
URI: | https://publications.cispa.saarland/id/eprint/2924 |
Actions
Actions (login required)
View Item |