Skip to content

Latest commit

 

History

History
98 lines (70 loc) · 4.14 KB

README.md

File metadata and controls

98 lines (70 loc) · 4.14 KB

CISA: Continuous Incremental Static Analyzer

Introduction

CISA is an LLVM-based IR static analysis framework supporting an incremental analysis over the git commit history.

The basic philosophy is to do costly static analyses (e.g., indirect call graph analysis) incrementally while scanning through the commit history. Every analysis is partially done and updated at the commit-modified parts (hence incremental) and, like the LLVM IR passes, can refer to the result of other analyses.

It is still in its infancy and only supports limited stuff (e.g., analyses can only refer to the call graph analysis, not other custom ones). If anybody reads this, I welcome any contribution.

Concept

As the introduction mentions, CISA aims to only analyze changed parts from commits. To do so, CISA scans the commit history within a given range in chronological order and, given the changed entity X by the current commit (e.g., changed function or module), it updates the analysis in the changed part first and then aggregates the up-to-date analysis results. For this, CISA requires custom analyzers for the following two callbacks: Update(X) and Aggregate(X).

  • Update(X): update the analysis for the changed entity X. This only updates the analysis inside X.
  • Aggregate(X): aggregate the up-to-date analysis result for the changed entity X. This assembles the analysis done by Update and produces the final analysis result. Aggregate is always called after every possible Update has been called first, so it's safe to assume all entities in the source code have up-to-date analysis states.

Workflow

The following is what developing and using a custom analyzer would look like.

  1. Write a custom analyzer (in src/analyzer) that implements Update and Aggregate.
  2. Build again ($ make).
  3. Run the CISA front-end ($ ./cisa <repo_path> -o <out_path>).
    • For each commit from the beginning to the end, CISA calls Update with all changed entities first and calls Aggregate next.
  4. Inspect the printed analysis result in <out_path>.

Features (so far)

  • Integrated call graph analysis [MLTA, CCS'19]
  • Nice C++ interface for custom function-level analyses

Requirements

  • LLVM 15.0.5
  • Python 3.8.0+
  • CMake 3.16.3+
  • Some python packages: gitpython, termcolor, alive_progress

Build (native)

  1. Install prerequisites. (assuming Ubuntu 20.04+)
    • Make sure that python is python3 and pip is pip3.
$ sudo apt install python3 python3-pip python-is-python3 cmake
$ sudo pip install gitpython termcolor alive_progress
  1. Decompress the prebuilt LLVM 15 binary to llvm at the root.
    • Or you can create a symlink llvm to the LLVM install directory (if you built LLVM on your own).
$ # example: assuming Ubuntu 20.04+. at the root directory.
$ wget https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.5/clang+llvm-15.0.5-x86_64-linux-gnu-ubuntu-18.04.tar.xz
$ tar -xvf clang+llvm-15.0.5-x86_64-linux-gnu-ubuntu-18.04.tar.xz
$ rm clang+llvm-15.0.5-x86_64-linux-gnu-ubuntu-18.04.tar.xz
$ mv clang+llvm-15.0.5-x86_64-linux-gnu-ubuntu-18.04 llvm
  1. Make.
$ make # at the root directory.

Build (dockerized)

See this page for a dockerized setting.

Document

Repository Structure

  • script: CISA front-end scripts (Python)
  • src: CISA back-end code (C++)
    • analyzer: where custom analyzers reside
    • callgraph: incremental call graph analysis (MLTA)
  • extern: external dependencies

TODO

  • Supporting references to LLVM objects (e.g., Function) in custom analyses
  • Supporting custom module-level analyses
  • Converting the integrated call graph analysis to a custom module-level analysis
  • Supporting custom analysis inter-operability
  • Improving initial checkout delay

Reference

  • Call graph analysis: code based on MLTA (0cfc662b51b4, 01/02/2023)
  • Front-end/back-end binding: pybind11 (31b0a5d94f60, 04/11/2023)