Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/146232
Title: Semantic driven vulnerability detection and patch analysis
Authors: Xu, Zhengzi
Keywords: Engineering::Computer science and engineering::Software::Software engineering
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Xu, Z. (2021). Semantic driven vulnerability detection and patch analysis. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Software vulnerability has become a major threat to software security. Works have been proposed to search for vulnerabilities in both source code and binary programs. Code clone detection is one of the effective approaches to identify 1-day vulnerabilities, which detects similar code between known vulnerabilities and code in target programs. However, the current works have limitations such as lack of binary vulnerability data, inaccurate matching algorithms, noise-prone matching results, limited ability to detect new vulnerabilities, and lack of mitigation methods. To address them, we propose a framework, which provides a complete solution to collect known vulnerability information, to match for 1-day and recurring vulnerabilities with high accuracy across different compilation settings in binary and source code, and to provide automatically generated hot patches to fix the flaws. First, when using matching to search for vulnerabilities, researchers are required to have the signature or pattern for known vulnerabilities as input. However, at the binary level, there is often limited information on known vulnerabilities. To obtain the binary vulnerability data for matching, we propose SPAIN, a tool that can automatically analyze program diffs across versions to identify security-related patches. It distinguishes the vulnerability patches from other program changes by partial trace execution technique. The experimental results show that it can efficiently distinguish the security patches with 71% true positives and less than 22% false-positive rate. It can find the vulnerabilities that are secretly patched. Second, the same functions in binary form may be different due to different compilation settings. To detect vulnerabilities in binary precisely, the framework needs to be able to match them. Therefore, we propose two cross-compiler, cross-optimization level, and cross-architecture matching tools, Bingo, and Bingo-E. Bingo uses a selective inline technique to construct the full semantics of the function. It then divides the function into various-length traces and matches the traces to measure function similarity. Bingo-E extends Bingo and introduces partial trace execution to capture the semantic feature of the functions. The evaluation results show that Bingo can achieve 41.5% top 1 rank accuracy when a match for CoreUtils projects across a different compiler and optimization levels. Bingo-E further improves the results which range from 70.1% to 99.7% for the same settings as Bingo. Third, the existing works only focus on improving the accuracy of function matching without taking patch information into account. Therefore, patched functions are usually predicted as vulnerable, resulting in a high false-positive rate. To address the problem, we propose BinXray, a tool to detect and filter out patched functions from the vulnerable function candidates. It uses a novel block mapping algorithm to compute the differences between vulnerable and patched functions and builds patch signatures by leveraging these differences. It uses the generated patch signatures with the length sensitive similarity matching algorithm to match and identify patched functions. Experiments have shown that BinXray can effectively and efficiently identify patched functions in 12 projects with 93% accuracy at the speed of 296.17ms per function on average. Forth, to efficiently and effectively fix the vulnerability detected. We propose a patch generation algorithm which leverages weakest precondition reasoning to learn the official patches and convert them into binary hot patches. We develop Vulmet, the prototype to generate semantic preserving patches. The experimental results show that 55 real-world vulnerabilities from the Android kernels have been successfully converted into hot patches, which incur little performance overhead after being applied to the system. Last, at the source code level, traditional code clone based approaches can only find known vulnerabilities. They are also not robust when some changes are introduced into the target functions since they use syntax level information. To this end, we propose MVP, a source code vulnerability matching tool, which summarizes the semantics of the known vulnerabilities. Then it matches for new vulnerabilities, which share a similar logic. The experiment shows that MVP can detect 97 new vulnerabilities in 10 commonly used projects. It takes 17,272.82 milliseconds on average to extract the vulnerability signatures and less than 100 milliseconds to match in the target programs.
URI: https://hdl.handle.net/10356/146232
DOI: 10.32657/10356/146232
Schools: School of Computer Science and Engineering 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
NTU_PhD_Thesis_to_submitted.pdf3.22 MBAdobe PDFThumbnail
View/Open

Page view(s) 50

677
Updated on Mar 22, 2025

Download(s) 10

417
Updated on Mar 22, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.