Discovering Vulnerabilities and Patches for Open Source Security

Gunkel, Tamara; Hupperich, Thomas


Abstract
Open source software is used in numerous systems and security vulnerabilities in such software often affect many targets at once. Hence, it is crucial to find security vulnerabilities as soon as possible. A convenient method to check software for vulnerabilities is executing a static code analysis tool before deployment. However, for verifying the reliability of such tools, real-world data including labeled non-vulnerable and vulnerable code is required.
This paper introduces an approach to automatically create and enhance a labeled data set of open source projects. The ground truth of vulnerabilities is extracted from up-to-date CVEs. We identify repositories related to known vulnerabilities, select vulnerable versions and take patch commits into account. In this context, we utilize Gradient Boosting based on regression trees as a meta classifier for associating patch commits to CWE categories. With a high precision of this matching, we give insights about the impact of certain vulnerabilities and a general overview of open source code security. Our findings may be used for future studies, such as the impact of certain code design criteria, e.g. clean code, on the prevalence of vulnerabilities.

Keywords
Web Security, Data Set Generation, Commit Classification



Publication type
Research article in proceedings (conference)

Peer reviewed
Yes

Publication status
Published

Year
2022

Conference
International Conference of Software Technologies (ICSOFT)

Venue
Lisbon

Book title
Proceedings of the 17th International Conference on Software Technologies

Editor
SciTePress

Publisher
SciTePress

Place
Lisbon, Portugal

Language
English

ISSN
2184-2833