AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis

Samhi, Jordan and Bissyandé, Tegawendé F. and Klein, Jacques
(2024) AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis.
In: Mining Software Repositories, Lisbon, Portugal.
Conference: MSR IEEE International Working Conference on Mining Software Repositories
(In Press)

[img] Text
paper.pdf

Download (679kB)

Abstract

Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when libraries account for a large part of the code. As a direct consequence, it is common practice in the literature to consider developer code only during static analysis --with the assumption that the sought issues are in developer code rather than the libraries. However, analysts need to distinguish between library and developer code. Currently, many static analyses rely on white lists of libraries. However, these white lists are unreliable, inaccurate, and largely non-comprehensive. In this paper, we propose a new approach to address the lack of comprehensive and automated solutions for the production of accurate and ``always up to date" sets of libraries. First, we demonstrate the continued need for a white list of libraries. Second, we propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.

Actions

Actions (login required)

View Item View Item