Mining Input Grammars from Dynamic Control Flow

Gopinath, Rahul and Mathis, Björn and Zeller, Andreas
(2020) Mining Input Grammars from Dynamic Control Flow.
In: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2020-11-09, virtual.
Conference: ESEC/FSE - European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (duplicate was listed as ESEC, removed from DB)

[img]
Preview
Text
fse2020-mimid.pdf

Download (745kB) | Preview
[img] Other (Plain Text Bibliography)
bibliography.txt - Bibliography

Download (7kB)

Abstract

One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript.

Actions

Actions (login required)

View Item View Item