Learning Input Tokens for Effective Fuzzing

Mathis, Björn and Gopinath, Rahul and Zeller, Andreas

(2020) Learning Input Tokens for Effective Fuzzing.

In: ISSTA - ACM SIGSOFT International Symposium on Software Testing and Analysis, Sat 18 - Wed 22 July 2020, Virtual.

Conference: ISSTA International Symposium on Software Testing and Analysis

(In Press)

There is a more recent version of this item available.

Preview

Text
lFuzzer-preprint.pdf
Download (5MB) | Preview

Abstract

Modern fuzzing tools like AFL operate at a lexical level: They explore the input space of tested programs one byte after another. For inputs with complex syntactical properties, this is very inefficient, as keywords and other tokens have to be composed one character at a time. Fuzzers thus allow to specify dictionaries listing possible tokens the input can be composed from; such dictionaries speed up fuzzers dramatically. Also, fuzzers make use of dynamic tainting to track input tokens and infer values that are expected in the input validation phase. Unfortunately, such tokens are usually implicitly converted to program specific values which causes a loss of the taints attached to the input data in the lexical phase. In this paper we present a technique to extend dynamic tainting to not only track explicit data flows but also taint implicitly converted data without suffering from taint explosion. This extension makes it possible to augment existing techniques and automatically infer a set of tokens and seed inputs for the input language of a program given nothing but the source code. Specifically targeting the lexical analysis of an input processor, our lFuzzer test generator systematically explores branches of the lexical analysis, producing a set of tokens that fully cover all decisions seen. The resulting set of tokens can be directly used as a dictionary for fuzzing. Along with the token extraction seed inputs are generated which give further fuzzing processes a head start. In our experiments, the lFuzzer-AFL combination achieves up to 17% more coverage on complex input formats like JSON, LISP, tinyC, and JavaScript compared to AFL.

Item Type:	Conference or Workshop Item (A Paper) (Paper)
Uncontrolled Keywords:	fuzzing, test input generation, parser
Divisions:	Andreas Zeller (Software Engineering, ST)
Conference:	ISSTA International Symposium on Software Testing and Analysis
Depositing User:	Björn Mathis
Date Deposited:	09 Jun 2020 08:13
Last Modified:	09 Jun 2020 08:13
Primary Research Area:	NRA4: Secure Mobile and Autonomous Systems
URI:	https://publications.cispa.saarland/id/eprint/3098

Available Versions of this Item

Learning Input Tokens for Effective Fuzzing. (deposited 09 Jun 2020 08:13) [Currently Displayed]
- Learning Input Tokens for Effective Fuzzing. (deposited 03 Jul 2020 11:33)

Actions

Actions (login required)

View Item