What is Normal, What is Strange, and What is Missing in an Knowledge Graph

Belth, Caleb and Zheng, X and Vreeken, Jilles and Koutra, Danai

(2020) What is Normal, What is Strange, and What is Missing in an Knowledge Graph.

In: The Web Conference.

Conference: WWW The Web Conference (Formerly: International World Wide Web Conference)

Preview

Text
2003.10412v1.pdf
Download (3MB) | Preview

Abstract

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a \textit{unified solution} to KG characterization by formulating the problem as \emph{unsupervised KG summarization} with a set of inductive, \textit{soft rules}, which describe what is \emph{normal} in a KG, and thus can be used to identify what is \emph{abnormal}, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose \method, \emph{Knowledge Graph Inductive SummarizaTion}, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (\NELL{}, \DBpedia{}, and \Yago{}), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that \method outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93\% of missing entities---over 10\% more than baselines), while also being efficient for large knowledge graphs.

Item Type:	Conference or Workshop Item (A Paper) (Paper)
Divisions:	Jilles Vreeken (Exploratory Data Analysis)
Conference:	WWW The Web Conference (Formerly: International World Wide Web Conference)
Depositing User:	Jilles Vreeken
Date Deposited:	10 Mar 2020 13:58
Last Modified:	10 May 2021 11:13
Primary Research Area:	NRA5: Empirical & Behavioral Security
URI:	https://publications.cispa.saarland/id/eprint/3038

Actions

Actions (login required)

View Item