An automated data governance tool for cleaning, masking, and validating datasets with MinIO integration and multi-agent collaboration.
- Data Cleaning & Masking
Truncate and hash sensitive fields (e.g., SHA-256 forname
fields). - Null Value Imputation
Optimize missing values usingIterativeImputer
for numeric columns. - Interactive Reports
Generate JSON summaries and Plotly-powered HTML visualizations. - MinIO Integration
Seamlessly read/write files from/to MinIO buckets. - Multi-Agent Workflow
Coordinate tasks with roles: Orchestrator, DataCleaner, QualityValidator, etc.
pip install -r requirements.txt
git clone https://github.com/madaibaba/dataagent.git
cd dataagent
Edit demo.py
with your MinIO and AI service credentials:
client = DataGovernanceClient(
minio_endpoint="your-minio-endpoint:9000",
minio_access="your-access-key",
minio_secret="your-secret-key",
ollama_config={
"model": "your-llm-model",
"base_url": "http://your-llm-api-url/v1",
"api_key": "your-api-key"
},
bucket="your-bucket-name",
base_path="your-project-path"
)
result = client.process_directory(
input_prefix="raw", # Process files in base_path/raw/
sensitive_fields=["name", "email"], # Fields to mask
max_workers=4 # Parallel threads
)
Batch Processing Result:
Total Files: 5
Succeeded: 5
Failed: 0
Reports saved to: base_path/report/
# Truncate + Hash sensitive fields
df[col] = df[col].apply(
lambda x: str(x)[:6] + hashlib.sha256(str(x).encode()).hexdigest()[:6]
)
Interactive HTML report generated with Plotly.
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-idea
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/your-idea
) - Open a Pull Request
This project is licensed under the MIT License. See LICENSE for details.
For questions or feedback:
📩 houalex@gmail.com
🌐 Project Wiki