Modern MDL meets Data Mining Insights, Theory, and Practice

Vreeken, Jilles and Yamanishi, Kenji

(2019) Modern MDL meets Data Mining Insights, Theory, and Practice.

In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).

Conference: KDD ACM International Conference on Knowledge Discovery and Data Mining

Preview

Text
mdl-vreeken,yamanishi-tut.pdf
Download (2MB) | Preview

Abstract

When considering a data set it is often unknown how complex it is, and hence it is difficult to assess how rich a model for the data should be. Often these choices are swept under the carpet, ignored, left to the domain expert, but in practice this is highly unsatisfactory; domain experts do not know how to set $k$, what prior to choose, or how many degrees of freedom is optimal any more than we do. The Minimum Description Length~(MDL) principle can answer the model selection problem from an intuitively appealing and clear viewpoint of information theory and data compression. In a nutshell, it asserts that the best model is the one that best compresses both the data and that model. It does not only imply the best strategy for model selection, but also gives a unifying viewpoint of designing optimal data mining algorithms for a wide range of issues, and has been very successfully applied to a wide range of data mining tasks, ranging from pattern mining, clustering, classification, text mining, graph mining, anomaly detection, up to causal inference. In this tutorial we give an introduction to the basics of model selection, show important properties of MDL-based modelling, successful examples as well as pitfalls for how to apply MDL to solve data mining problems, but also introduce advanced topics on important new concepts in modern MDL (e.g, normalized maximum likelihood (NML), sequential NML, decomposed NML, and MDL change statistics) and emerging applications in dynamic settings.

Item Type:	Conference or Workshop Item (A Paper) (Lecture)
Divisions:	Jilles Vreeken (Exploratory Data Analysis)
Conference:	KDD ACM International Conference on Knowledge Discovery and Data Mining
Depositing User:	Jilles Vreeken
Date Deposited:	10 Mar 2020 13:54
Last Modified:	10 May 2021 11:31
Primary Research Area:	NRA5: Empirical & Behavioral Security
URI:	https://publications.cispa.saarland/id/eprint/3037

Actions

Actions (login required)

View Item