Research Projects

Android Application Security

We have entered the era of mobile computing. I am particularly interested in security problems in mobile apps, such as mobile malware, privacy leakage, and vulnerabilities. Compared to traditional desktop systems, mobile platforms pose both opportunities and challenges in security. I have done the following research projects under this umbrella.

Virtualization-based Platform for Dynamic Android Malware Analysis. Compared to traditional desktop malware, one unique challenge in virtualization-based malware analysis in Android platform is that there are two levels of semantic information must be rebuilt. In the lower level, Android is a Linux operating system where each Android app is encapsulated into a process. With each app, a virtual machine (known as the Dalvik virtual machine) provides a runtime environment for the app's Java component. Therefore, we build a new analysis platform, called DroidScope, which can reconstruct both OS-level and Java-level semantics to facilitate analyzing Android malware.

Android Malware Classification. Existing automated Android malware detection and classification methods fall into two general categories: 1) signature-based and 2)machine learning-based. Signature-based approaches can be easily evaded by bytecode-level transformation attacks. Prior learning-based works extract features from application syntax, rather than program semantics, and are also subject to evasion. We proposed a novel semantic-based approach that classifies Android malware via dependency graphs. To battle transformation attacks, we proposed to extract a weighted contextual API dependency graph as program semantics to construct feature sets. To fight against malware variants and zero-day malware, we introduced graph similarity metrics to uncover homogeneous application behaviors while tolerating minor implementation differences. We implement a prototype system, DroidSIFT. Experiments show that our signature detection can correctly label 93% of malware instances; our anomaly detector is capable of detecting zero-day malware with a low false negative rate (2%) and an acceptable false positive rate (5.15%) for a vetting purpose.

Binary Code Patching and Hardening

The objective of this research is to develop generic hardening techniques that can be directly applied to stripped binary programs, substantially raising the bar for attackers to exploit the vulnerabilities in the binary programs. We have made three advances to the state-of-the-art: 1) we proposed binary code continent (BinCC) to significantly improve the strictness of Control-Flow-Integrity policy for binary code; 2) we proposed a technique (vfGuard) to construct sound CFI policy for protecting virtual function calls in C++ binary code; and 3) we proposed a Stack-Pointer Integrity technique to block stack pivoting attacks. These three techniques can be combined together.

Memory Forensics

Memory forensics has become increasingly valuable in digital forensic analysis, as it extracts live digital evidence from the volatile memory state of a running system, which cannot be obtained from traditional hard disk based forensic analysis. However, memory forensics is an extremely challenging task, especially for closed-source operating systems (e.g.,Microsoft Windows). We aim to use binary analysis and machine learning techniques to improve the quality and robustness of memory forensics.

We first developed a memory-based OS fingerprinting system that can be used in the cloud environment to precisely identify the Operating System and the version of a running virtual machine. Compared to the existing OS fingerprinting techniques that primarily inspect network packets or CPU states, our memory analysis based approach is more precise and practical. This work is published in SoCC'12.

We then looked at how to improve high coverage and robustness in memory forensic analysis. Memory analysis on on commodity operating systems (such as Microsoft Windows) faces the following key challenges: (1) a partial knowledge of kernel data structures; (2) difficulty in handling ambiguous pointers; and (3) lack of robustness by relying on soft constraints that can be easily violated by kernel attacks. To address these challenges, we present MACE, a memory analysis system that can extract a more complete view of the kernel data structures for closed-source operating systems and significantly improve the robustness by only leveraging pointer constraints (which are hard to manipulate) and evaluating these constraints globally (to even tolerate certain amount of pointer attacks).

Furthermore, we conducted a systematic study on the trustworthiness of memory analysis. Semantic values in kernel data structures are critical to many security applications, such as virtual machine introspection, malware analysis, and memory forensics. However, malware, or more specifically a kernel rootkit, can often directly tamper with the raw kernel data structures, known as DKOM (Direct Kernel Object Manipulation) attacks, thereby significantly thwarting security analysis. In addition to manipulating pointer fields to hide certain kernel objects, DKOM attacks may also mutate semantic values, which are data values with important semantic meanings. Our experimental results show that the space of SVM attacks is vast for both Windows and Linux. Our proof-of-concept kernel rootkit further demonstrates that it can successfully evade all the security tools tested in our experiments, including recently proposed robust signature schemes. Moreover, our duplicate value analysis implies the challenges in defeating SVM attacks, such as an intuitive cross checking approach on duplicate values can only provide marginal detection improvement. Our study motivates revisiting of existing security solutions and calls for more effective defense against kernel threats. This work is published in DSN'13 and an extended version published in TDSC.

More recently, we developed a deep-learning based technique to identify kernel objects from memory dumps, which can achieve high detection accuracy and good effiicency even when these memory dumps may be manipulated by attackers with attemps to evade detection. This work is published in CCS'18.

Vulnerability Scanning and Discovery