This repository contains 47,518 smart contracts extracted from the Ethereum network.
SmartBugs was used to analyze this dataset. The results are available at: https://github.com/smartbugs/smartbugs-results For more details on the analysis, please see the ICSE 2020 paper.
├─ contracts
│ └─ <contract_address>.sol
├─ contracts.csv.tar.gz # the meta data of all the contract
├─ script
│ ├─ get_contracts.py # collect the source code of the contracts from Etherscan
│ └─ get_balance.py # collect the balance of the contracts from Etherscan
- Collect the contract addresses We use Google BigQuery to select all the contracts that have at least one transaction. We use the following request (also available here: https://bigquery.cloud.google.com/savedquery/281902325312:47fd9afda3f8495184d98db6ae36a40c)
SELECT contracts.address, COUNT(1) AS tx_count
FROM `ethereum_blockchain.contracts` AS contracts
JOIN `ethereum_blockchain.transactions` AS transactions
ON (transactions.to_address = contracts.address)
GROUP BY contracts.address
ORDER BY tx_count DESC
- Download the source code related to the contract addresses
We use Etherscan to download to the contracts (the script of the collect is available in
script
). - We filter the contracts by identifying duplicates
Metric | Value |
---|---|
Solidity source not available | 1290074 |
Solidity source available | 972975 |
Unaccessible | 47 |
Total | 2263096 |
Unique Solidity Contracts | 47518 |
LOC of the unique contracts | 9693457 |