This project implements a machine learning pipeline to detect anomalous transactions on the Ethereum blockchain. It uses historical transaction data to identify patterns that deviate from normal behavior, which could indicate potential suspicious activities.
- Data Collection: Automated fetching of Ethereum transactions using Etherscan API
- Data Preprocessing: Comprehensive transaction data cleaning and feature engineering
- Anomaly Detection: Implementation of multiple detection algorithms:
- Isolation Forest
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- MLflow Integration: Experiment tracking and model versioning
- Interactive Dashboard: Streamlit-based visualization of results
- Docker Support: Containerized application for easy deployment
- Python 3.9
- Core Libraries:
web3
: Ethereum blockchain interactionpandas
: Data manipulationscikit-learn
: Machine learning algorithmsmlflow
: ML experiment trackingstreamlit
: Dashboard creation
- Infrastructure:
- Docker
- GitHub Actions (CI/CD)
- MLflow server
- SQLite database
- Python 3.9+
- Docker (optional)
- Etherscan API key
- Ethereum Node URL (Infura or other provider)
- Clone the repository:
git clone https://github.com/GuillaumeVerb/anomalie_eth.git
cd anomalie_eth
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create a .env file with your credentials:
ETH_NODE_URL=your_ethereum_node_url
ETHERSCAN_API_KEY=your_etherscan_api_key
MLFLOW_TRACKING_URI=sqlite:///mlflow.db
Build and run using Docker Compose:
docker-compose up --build
This will start:
- MLflow server on port 5000
- Streamlit dashboard on port 8501
anomalie_eth/
├── data/
│ ├── raw/ # Raw transaction data
│ └── processed/ # Preprocessed datasets
├── src/
│ ├── config.py # Configuration parameters
│ ├── data_collection.py # Ethereum data collection
│ ├── preprocessing.py # Data cleaning and preprocessing
│ ├── modeling.py # Anomaly detection models
│ └── etherscan_api.py # Etherscan API wrapper
├── notebooks/
│ ├── 01_data_collection.ipynb # Data collection exploration
│ └── 02_modeling.ipynb # Model development
├── tests/
│ ├── test_preprocessing.py
│ └── test_modeling.py
├── dashboard/
│ └── app.py # Streamlit dashboard
├── mlruns/ # MLflow artifacts
├── Dockerfile
├── docker-compose.yml
└── requirements.txt
from src.data_collection import collect_transactions
# Collect recent transactions
transactions = collect_transactions(
start_block=12000000,
end_block=12001000
)
from src.modeling import train_anomaly_detector
# Train an anomaly detection model
model = train_anomaly_detector(
data,
algorithm="isolation_forest",
experiment_id="your_experiment_id"
)
python -m pytest tests/
- Transaction volume visualization
- Anomaly score distribution
- Interactive transaction explorer
- Model performance metrics
- Real-time anomaly detection
The project includes a GitHub Actions workflow that:
- Runs all tests
- Builds the Docker image
- Pushes the image to Docker Hub
- Deploys the application (if configured)
MLflow is used to track:
- Model parameters
- Performance metrics
- Model artifacts
- Experiment history
Access the MLflow UI at http://localhost:5000
when running locally.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- Etherscan API for providing transaction data
- The Ethereum community for blockchain insights
- Open-source ML community for algorithms and tools