The dataset dataset.json.gz
can be downloaded from Google Drive.
File dataset.json.gz
contains the dataset gzip
.
gzip -d -k dataset.json.gz
File datast.json
will be generated after compression.
It is the entire dataset of attacks we collected as mentioned in the paper.
The JSON file contains an array of front-running attacks.
Each attack
-
_id
: id of this attack. -
hash
: the keccak256 hash of the transaction hash of$T_a$ ,$T_v$ (and$T_a^p$ ). -
block
: the block height at which the attack is launched, i.e., the block containing$T_a$ . -
attacker
: the address of the attacker. -
victim
: the address of the victim. -
attackTx
: the transaction hash of$T_a$ . -
victimTx
: the transaction hash of$T_v$ . -
profitTx
: the transaction hash of$T_a^p$ , if it exists. -
attackerProfits
: the profits of digital assets obtained by the attacker in attack and attack-free scenarios, respectively. -
victimProfits
: the profits of digital assets obtained by the victim in attack and attack-free scenarios, respectively. -
outOfGas
whether this is a gas estimation griefing attack. -
analysis
: the vulnerability localization analysis results.
The analysis
field of each attack is an array of influence traces.
Each influence trace is a JSON object containing the following fields:
-
_id
: id of this influence trace. -
hash
: the keccak256 hash of the shared variable and the hash of the belonging attack. -
sharedVariable
: the variable in the smart contract that loads the attack altered data in$T_v$ in the attack scenario. In other words, this is the taint source used in dynamic taint analysis. -
addressingPath
: computations to calculate the address of the variable in contract storage if thesharedVariable
is a contract storage variable. -
originalValue
: the value of thesharedVariable
in the attack-free scenario. -
alteredValue
: the value of thesharedVariable
in the attack scenario. -
writePoint
: the program location where$T_a$ modifiessharedVariable
in the attack scenario. -
readPoint
: the program location where$T_v$ loadssharedVariable
in the attack scenario. -
consequencePoint
: the program location that directly affects the profits of the victim. In other words, this is the taint sink of dynamic taint analysis. -
influenceTrace
: the taint flow trace fromreadPoint
toconsequencePoint
, i.e., influence trace, in the form of a sequence of contract function invocations. -
attack
: the id of the attack from which this influence trace is identified. -
influenceString
: the string representation ofinfluenceTrace
in the form of a sequence of${contract address}:${function id}:
-
influenceString1
: the string representation ofinfluenceTrace
in the form of a sequence of${contract code hash}:${function id}:
. This is used to identify duplicate influence traces.
The benchmark benchmark.tar.gz
can be downloaded from Google Drive.
File benchmark.tar.gz
contains the benchmark tar
:
tar -xvcf benchmark.tar.gz
After decompression, the folder benchmark
contains two subfolders.
attacks
: This folder contains all attacks included in the benchmark. Each attack is represented as a JSON file.- Since we focus those attacks with exactly one influence trace, each attack is represented as a influence trace. The file for each attack is named as
${id of influence trace}.attack.json
, and contains the fields of this influence trace. There are also additional fields in the JSON file.attack
: The attack that each influence trace belongs to.decodedInfluence
: similar theinfluenceTrace
field of the influence trace, but inputs and outputs of each function invocation are decoded.contractMetas
: Vulnerable contracts functions identified from this influence trace. This fields contains multiple vulnerable contracts that are involved in the influence traces and their related vulnerable functions. The source code is available incontracts
folder, with relative path specified byrelativePath
field.
- Since we focus those attacks with exactly one influence trace, each attack is represented as a influence trace. The file for each attack is named as
contracts
: This folder contains all contracts that are referenced by thecontractMetas
fields of attacks inattacks
folder. Each contract is a folder, which is structured as a Hardhat project.- Each contract has already been compiled and flattened.
- The contract runtime bytecode is in
deployedBytecode.bin
file, which is analyzed by tools that analyze bytecode. - The flattened source code is in
flattened.sol
file, which is analyzed by tools that analyze source code.