Skip to content

Mostamhd/address-poisoning-model

Repository files navigation

Ethereum Address Poisoning Detection Model

An end-to-end machine learning pipeline to detect "Address Poisoning" attacks on the Ethereum blockchain.

Overview

Address poisoning is a deceptive tactic where attackers send small or zero-value transactions from addresses that mimic a user's recent counterparties (often by matching the first and last few characters). The goal is to "poison" the user's transaction history so that they might accidentally copy the attacker's address for future transfers.

This project provides tools to:

  1. Extract Data: Query a MySQL Ethereum database to collect transaction metadata.
  2. Engineer Features: Calculate metrics like counterparty frequency and transaction bursts.
  3. Train & Detect: Utilize a Support Vector Machine (SVM) model to classify addresses as malicious or benign.

Tech Stack

  • Language: Python 3.7
  • Database: MySQL (Ethereum blockchain data)
  • Libraries: Pandas, Scikit-learn, Matplotlib, Seaborn
  • Environment: Pipenv, Jupyter Notebooks

Project Structure

  • scripts/: Python and Bash scripts for data collection.
    • gather_addresses_metadata.py: The primary data extraction engine.
    • start_dataset_generation.sh: Wrapper for the extraction process.
  • address_poisoning_dataset.ipynb: Notebook for data exploration and preprocessing.
  • address_poisining_model.ipynb: Notebook for model training (SVC) and evaluation.
  • docs/: Visual documentation and diagrams.
  • dataset/: (Required) Folder for input/output CSV data.

Getting Started

1. Prerequisites

  • Python 3.7 and Pipenv.
  • Access to an Ethereum MySQL database.
  • Create a dataset/ directory in the root.
  • Environment Variables: Copy .env.example to .env and update with your database credentials.
    cp .env.example .env

2. Installation

# Install dependencies
pipenv install

# Enter the virtual environment
pipenv shell

3. Data Collection

Update dataset/address_poisoning_addresses_list.csv with the target phishing addresses, then run:

bash scripts/start_dataset_generation.sh

This will generate address_poisoning_transactions.csv and use address_poisoning_transactions_checkpoint.txt to track progress.

4. Model Training & Analysis

Launch Jupyter and open the notebooks:

jupyter notebook
  1. Run address_poisoning_dataset.ipynb to analyze the raw transaction data.
  2. Run address_poisining_model.ipynb to train the classifier and visualize detection performance.

Model Features

The classifier relies on several engineered features:

  • is_repeat_counterparty: Identifies if a transaction pair has been seen before.
  • counterparty_tx_count: The total number of interactions between two addresses.
  • burst_flag: Detects rapid-fire transactions within a short time threshold (5 minutes).

Security

Database credentials are managed via environment variables using python-dotenv. A template is provided in .env.example. Never commit your .env file to version control.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors