Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
mafsbaptista committed Apr 3, 2024
1 parent a144aeb commit 030e33c
Show file tree
Hide file tree
Showing 4 changed files with 169 additions and 104 deletions.
272 changes: 169 additions & 103 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,15 @@
# Graph.js: A Static Vulnerability Scanner for _npm_ packages
# Efficient Static Vulnerability Analysis for JavaScript with Multiversion Dependency Graphs

Graph.js is a static vulnerability scanner specialized in analyzing _npm_
packages and detecting taint-style and prototype pollution vulnerabilities.

- Currently, detects 4 types of vulnerabilities:
- _Path Traversal_ (CWE-22);
- _Command Injection_ (CWE-94);
- _Code Execution_ (CWE-78);
- _Prototype Pollution_ (CWE-1321).
- Our evaluation on two curated datasets (VulcaN [1]; SecBench) shows that it significantly
outperforms ODGen, the state-of-the-art tool, with lower false negatives and shorter analysis time.

---

### Publications and Open-Source Repositories

The development of Graph.js relates to additional research performed by this group.

#### 1. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages
This work comprises an empirical study of static code analysis tools for detecting vulnerabilities in Node.js code.
We created a curated dataset of 957 Node.js code vulnerabilities, characterized and annotated by analyzing the information contained in _npm_ advisory reports.

The dataset is available [here](https://github.com/VulcaN-Study/Supplementary-Material).

The publication associated with this work is:
- <a href="https://ieeexplore.ieee.org/document/10168679">**VulcaN Dataset [1]**</a>: Tiago Brito, Mafalda Ferreira, Miguel Monteiro, Pedro Lopes, Miguel Barros, José Fragoso Santos, Nuno Santos:
*"Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages"*,
in *IEEE Transactions on Reliability 2023 (ToR 2023)*.
```
@inproceedings{vulcan_tor,
author = {Brito, Tiago and Ferreira, Mafalda and Monteiro, Miguel and Lopes, Pedro and Barros, Miguel and Santos, José Fragoso and Santos, Nuno},
booktitle = {IEEE Transactions on Reliability},
title = {Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages},
year = {2023},
pages = {1-16},
doi = {10.1109/TR.2023.3286301},
}
```
Mafalda Ferreira, Miguel Monteiro, Tiago Brito, Miguel E. Coimbra, Nuno Santos, Limin Jia, and José Fragoso
Santos. 2024. Efficient Static Vulnerability Analysis for JavaScript with Multiversion Dependency Graphs.
https://doi.org/XXXX


#### 2. RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks
In this work we developed a prototype of RuleKeeper, a GDPR-aware policy compliance system for web frameworks.
RuleKeeper uses Graph.js to automatically check for the presence of GDPR compliance bugs in Node.js servers.
## Artifact evaluation
The [Artifact Evaluation](./artifact-evaluation) folder contains all the necessary instructions and scripts used to reproduce the results and the figures from the original paper.

The prototype is available [here](https://github.com/rulekeeper/rulekeeper).
[//]: [![DOI](https://zenodo.org/badge/724237294.svg)](https://zenodo.org/badge/latestdoi/724237294)

The publication associated with this work is:
- <a href="https://www.computer.org/csdl/proceedings-article/sp/2023/933600b014/1Js0DzhaXNm">**RuleKeeper**</a>:
Mafalda Ferreira, Tiago Brito, José Fragoso Santos, Nuno Santos:
*"RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks"*,
in *Proceedings of 44th IEEE Symposium on Security and Privacy (S&P’23)*, 2023.
```
@inproceedings{ferreira_sp23,
author = {Ferreira, Mafalda and Brito, Tiago and Santos, José Fragoso and Santos, Nuno},
title = {RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks},
booktitle = {Proceedings of 44th IEEE Symposium on Security and Privacy (S&P'23)},
year = {2023},
doi = {10.1109/SP46215.2023.00058},
pages = {1014-1031},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}
```

---
## Team

### Main Contributors
Expand All @@ -83,27 +29,93 @@ in *Proceedings of 44th IEEE Symposium on Security and Privacy (S&P’23)*, 2023
</table>

#### Collaborators
- [Tiago Brito](https://www.dpss.inesc-id.pt/blog/tiago-brito/)
- [Miguel Coimbra](https://www.dpss.inesc-id.pt/~mcoimbra/)
- [Limin Jia](https://www.andrew.cmu.edu/user/liminjia/)
- [Miguel Monteiro](https://www.linkedin.com/in/miguel-monteiro-229b86195/)
- [Tiago Brito](https://www.dpss.inesc-id.pt/blog/tiago-brito/)
- [Miguel Coimbra](https://www.dpss.inesc-id.pt/~mcoimbra/)
- [Limin Jia](https://www.andrew.cmu.edu/user/liminjia/)
- [Miguel Monteiro](https://www.linkedin.com/in/miguel-monteiro-229b86195/)

---

## Tool Installation
## Graph.js: A Static Vulnerability Scanner for _npm_ packages
Graph.js is a static vulnerability scanner specialized in analyzing _npm_
packages and detecting taint-style and prototype pollution vulnerabilities.

Graph.js generates a graph using [npm](https://www.npmjs.com/)/[node](https://nodejs.org/en) and uses [Neo4j](https://neo4j.com/) to query the graph. <br>
This last component can be executed in a docker container (easier setup) or locally.
Its execution flow is composed of two phases: **graph construction**
and **graph queries**. In the first phase, Graph.js builds a
Multiversion Dependency Graph (MDG) of the program to be analyzed.
This graph-based data structure coalesces into the same
representation the abstract syntax tree, control flow graph, and
data dependency graph. This phase has two outputs:
1. Graph output: nodes and edges in .csv format.
2. Graph metrics: graph_stats.json

In the second phase, Graph.js imports the graph to a Neo4j graph
database, and executes graph queries, written in Cypher, to capture
vulnerable code patterns, e.g. data dependency paths connecting
unreliable sources to dangerous sinks.

- Currently, Graph.js detects four types of vulnerabilities: prototype
pollution (CWE-1321), OS command injection (CWE-78),
arbitrary code execution (CWE-94), and path traversal (CWE-22).

---

#### Requirements

## Installation

Graph.js generates a graph using [Node](https://nodejs.org/en) and uses [Neo4j](https://neo4j.com/) to query the graph. <br>
It can be executed locally, or in a Docker container (easier and more robust setup).

### Using Docker
#### Requirements:
- [Python3](https://www.python.org/downloads/)
- [Docker](https://www.docker.com/)

Build the Docker container by running the command:
```
docker build -t graphjs .
```

### Run locally
#### Requirements:
- [Node](https://nodejs.org/en) (I've tested v18+).
- [Python3](https://www.python.org/downloads/).
- **Option 1 (Local queries)**: [Neo4j v5](https://neo4j.com/). Instructions: https://neo4j.com/docs/operations-manual/current/installation/linux/
- **Option 2 (Docker)**: [Docker](https://www.docker.com/).
- [Neo4j v5](https://neo4j.com/). Instructions: https://neo4j.com/docs/operations-manual/current/installation/linux/

Set up the local environment by running the command:
```
./setup.sh
```

---

## Usage

### Using Docker

Graph.js provides a command-line interface. Run it with **-h** for a short description.

```console
Usage: ./graphjs_docker.sh -f <file> [options]
Description: Run Graph.js for a given file <file> in a Docker container.

Required:
-f <file> Filename (.js).

Options:
-o <path> Path to store analysis results.
-l Store docker logs.
-e Create exploit template.
-s Silent mode: Does not save graph .svg.
-h Print this help.
```

To run Graph.js, run the command:
```bash
./graphjs_docker.sh -f <file_to_analyze> [options]
```

### Run locally

Graph.js provides a command-line interface. Run it with **-h** for a short description.

```console
Expand All @@ -119,6 +131,12 @@ Options:
-e, --exploit Generates symbolic tests.
```

To run Graph.js, run the command:
```bash
python3 graphjs.py -f <file_to_analyze> [options]
```

---
By default, all the results are stored in a *graphjs-results* folder, in the root of the project, with the following structure:

```
Expand All @@ -131,43 +149,38 @@ graphjs-results
└── taint_summary_detection.json (detection results)
```


#### Run
- Execute inside the root folder
- If first time, execute the setup (`./setup.sh`)
- To run with docker:
- Have docker service running
- Use flag **-d**

```bash
python3 graphjs.py -f <file_to_analyze> -s [-d]
```

---

### Graph.js phases

#### 1. Build the code property graph (representation of source code)

This stage builds the code property graph of the program to be analyzed, a graph-based data structure that coalesces into the same representation the abstract syntax tree, control flow graph, and data dependency graph of the given program.
## Reusability

The code for the code property graph is in the [parser](./parser) folder.

This step outputs:
- Normalized javascript file of the program
- Graph outputs (svg and/or csv)
- Graph metrics (graph_stats.json)

#### 2. Query the graph

This stage queries the graphs to capture vulnerable code patterns, e.g. a data dependency paths connecting unreliable sources to dangerous sinks.

The code for the queries is in the [detection](./detection) folder.
Graph.js code is designed to enable straightforward usage by others, and can be easily adapted to accommodate
new scenarios. As described before, Graph.js is composed of two phases: graph construction and graph queries.
The graph construction code is located in the `graphjs/parser/src` folder, and the most relevant files are organized as follows:
```
src
├── parser.ts
├── output # Code to generate outputs (.csv and .svg)
├── traverse # Parsing algorithms
├── dependency
│ ├── structures/dependency_trackers.ts
│ └── dep_builder.ts
├── ast-builder.ts
├── cfg-builder.ts
└── cg-builder.ts
```
The code referring to the MDG construction algorithm is located
in `src/traverse/dependency, where the file `structures/dependency_trackers.ts`
contains the rules and structures referred in the paper.
The MDG is intended to be generic, so all the building steps can be
adapted to new scenarios by creating new types of nodes and edges.

This step uses the graph csv output and produces a summary file (*taint_summary.json*) with the detection results.
The code for the queries is in located in the `graphjs/detection`
folder. The queries are entirely customizable, so, it is possible not
only modify the existing queries but also to create new queries that
search for new and different patterns in the graph.


### Generate only the graph
## Generate only the graph

- Execute inside the *parser* folder

Expand All @@ -189,3 +202,56 @@ npm start -- -f <file_to_be_analyzed> [options]
| Set array of functions to ignore in graph figure | --if=[...] | _[]_ | No | _graph_ |
| Show the code in each statement in graph figure | --sc | _false_ | No | _graph_ |
| Silent mode (not verbose) | --silent | _false_ | No | - |

---


### Publications and Open-Source Repositories

The development of Graph.js relates to additional research performed by this group.

#### 1. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages
This work comprises an empirical study of static code analysis tools for detecting vulnerabilities in Node.js code.
We created a curated dataset of 957 Node.js code vulnerabilities, characterized and annotated by analyzing the information contained in _npm_ advisory reports.

The dataset is available [here](https://github.com/VulcaN-Study/Supplementary-Material).

The publication associated with this work is:
- <a href="https://ieeexplore.ieee.org/document/10168679">**VulcaN Dataset [1]**</a>: Tiago Brito, Mafalda Ferreira, Miguel Monteiro, Pedro Lopes, Miguel Barros, José Fragoso Santos, Nuno Santos:
*"Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages"*,
in *IEEE Transactions on Reliability 2023 (ToR 2023)*.
```
@inproceedings{vulcan_tor,
author = {Brito, Tiago and Ferreira, Mafalda and Monteiro, Miguel and Lopes, Pedro and Barros, Miguel and Santos, José Fragoso and Santos, Nuno},
booktitle = {IEEE Transactions on Reliability},
title = {Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages},
year = {2023},
pages = {1-16},
doi = {10.1109/TR.2023.3286301},
}
```


#### 2. RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks
In this work we developed a prototype of RuleKeeper, a GDPR-aware policy compliance system for web frameworks.
RuleKeeper uses Graph.js to automatically check for the presence of GDPR compliance bugs in Node.js servers.

The prototype is available [here](https://github.com/rulekeeper/rulekeeper).

The publication associated with this work is:
- <a href="https://www.computer.org/csdl/proceedings-article/sp/2023/933600b014/1Js0DzhaXNm">**RuleKeeper**</a>:
Mafalda Ferreira, Tiago Brito, José Fragoso Santos, Nuno Santos:
*"RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks"*,
in *Proceedings of 44th IEEE Symposium on Security and Privacy (S&P’23)*, 2023.
```
@inproceedings{ferreira_sp23,
author = {Ferreira, Mafalda and Brito, Tiago and Santos, José Fragoso and Santos, Nuno},
title = {RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks},
booktitle = {Proceedings of 44th IEEE Symposium on Security and Privacy (S&P'23)},
year = {2023},
doi = {10.1109/SP46215.2023.00058},
pages = {1014-1031},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}
```
Empty file added artifact-evaluation/README.pdf
Empty file.
Empty file.
1 change: 0 additions & 1 deletion graphjs_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ if [ "$DOCKER_LOGS" = true ]; then
/bin/bash -c "python3 /graphjs/graphjs.py -f /input-file.js -o /output_path -s &> /docker_logs/graphjs-debug.log;
cp /var/log/neo4j/debug.log /docker_logs/neo4j-debug.log"
mv docker_logs ${output_path}/
docker system prune -f
else
docker run -it \
-v "${filename}":/input-file.js \
Expand Down

0 comments on commit 030e33c

Please sign in to comment.