Automated infrastructure for deploying MongoDB on AWS EC2 with EBS volumes, and safely cloning production data to staging environments with PII anonymization.
| Environment | Last Restored | Status | Documents | Anonymized | Duration |
|---|---|---|---|---|---|
| Staging | 2026-03-03 10:51 UTC | ✅ Success | 10 | ✅ Yes | 2m 56s |
💡 Tip: This table is automatically updated after each successful sync using the GitHub Actions workflow.
This project provides:
- Terraform Infrastructure: Deploy MongoDB on EC2 with dedicated EBS data volumes
- GitHub Actions Automation: Clone production MongoDB volumes to staging
- Data Anonymization: Automatically anonymize PII data in staging
- Resource Management: Track and cleanup old snapshots/volumes
- CI/CD Ready: Automated workflows with self-hosted runners
.
├── terraform/ # Infrastructure as Code
│ ├── modules/
│ │ └── ec2/ # EC2 + EBS module for MongoDB
│ └── stacks/
│ ├── production/ # Production environment
│ └── staging/ # Staging environment
├── mongodb/ # Database scripts
│ ├── setup_database.js # Create DB with mock PII data
│ ├── anonymize_data.js # Simple anonymization
│ ├── anonymize_with_hash.js # Hash-based anonymization
│ └── restore_original_data.js
├── .cleanup/ # Resource tracking
│ ├── resource-tracker.json # Tracked snapshots/volumes
│ └── README.md # Cleanup documentation
└── .github/
└── workflows/
├── prod-to-staging-sync.yml # Production to staging sync
├── cleanup-resources.yml # Automated resource cleanup
├── README-SYNC.md # Workflow documentation
└── QUICKSTART-SYNC.md # Quick start guide
# Deploy staging environment
cd terraform/stacks/staging
terraform init
terraform plan
terraform apply
# Deploy production environment
cd ../production
terraform init
terraform applyWhat gets created:
- EC2 instance with Amazon Linux 2023
- Root volume (8GB) for OS
- Data volume (20GB) for MongoDB
- Security groups for SSH and MongoDB access
- MongoDB 7.0 installed and configured
# SSH into production instance
ssh ec2-user@<production-ip>
# Copy and run the setup script
mongosh < setup_database.jsUsing GitHub Actions Workflow:
- Add AWS credentials to GitHub Secrets (
AWS_SECRET_ACCESS_ID,AWS_SECRET_ACCESS_KEY) - Go to Actions → "Production to Staging DB Sync"
- Click "Run workflow"
- Select anonymization option (default: enabled)
- Monitor progress through 9 separate jobs
📖 Quick Start | Full Docs
What happens during sync:
- ✅ Validates AWS resources
- ✅ Stops staging MongoDB
- ✅ Creates snapshot from production
- ✅ Swaps EBS volumes
- ✅ Mounts new volume
- ✅ Starts MongoDB
- ✅ Anonymizes PII data (optional)
- ✅ Tracks old resources for cleanup
- ✅ Updates README with sync status
- ✅ EC2 instances with MongoDB 7.0
- ✅ Separate EBS volumes for data
- ✅ Auto-mounting and configuration via user_data
- ✅ Security groups with proper access controls
- ✅ Support for multiple environments (staging/production)
- ✅ XFS filesystem (MongoDB recommended)
- ✅ Automated EBS snapshot creation
- ✅ Volume-based replication
- ✅ Zero-downtime for production
- ✅ Automatic volume swap
- ✅ Self-hosted runner on staging
- ✅ Built-in data anonymization
- ✅ Resource tracking for cleanup
- ✅ Auto-update README status
- ✅ Mock PII data generation
- ✅ Simple anonymization (User 1, User 2, etc.)
- ✅ Hash-based anonymization (non-reversible)
- ✅ Data restoration scripts
- ✅ 10 sample users with realistic PII
The project includes two anonymization strategies:
Simple Anonymization:
- Names → "User [ID]"
- Email → "user[ID]@anonymized.local"
- SSN → "XXX-XX-[ID]"
- Address → Redacted values
Hash-Based Anonymization:
- One-way hash functions
- Non-reversible transformation
- Maintains referential consistency
- Suitable for production-like testing
- EC2 security groups restrict access
- EBS volumes encrypted at rest
- SSM for secure command execution
- IAM roles with least privilege
- Snapshots properly tagged
- Workflow Documentation: Complete workflow guide
- Quick Start Guide: Get started in 3 steps
- Cleanup Guide: Resource cleanup and management
- MongoDB Scripts: Database setup and anonymization
File: prod-to-staging-sync.yml
✨ Features:
- 9 separate jobs for granular control
- Self-hosted runner on staging EC2
- GitHub-hosted runner for AWS API calls
- Detailed per-job summaries
- Maximum visibility and debugging
- Fail-fast at job level
- ~3-10 minutes duration
- Automatic README updates
- Resource tracking for safe cleanup
📊 Jobs:
- Setup & Validation - Discover volume IDs
- Stop MongoDB - Stop service on staging
- Create Snapshot - Snapshot production volume
- Swap Volumes - Create and attach new volume
- Mount Volume - Mount on staging EC2
- Start & Verify - Start MongoDB and verify data
- Anonymize Data - Optional PII anonymization
- Track Resources - Track snapshot/volume for cleanup
- Final Summary - Update README and report
File: cleanup-resources.yml
✨ Features:
- Scheduled daily cleanup (2 AM UTC)
- Manual trigger with dry-run option
- Age-based deletion (default: 1 day)
- Tracks all snapshots and volumes
- Safe rollback capability
- Automatic tracker file updates
Cleanup Options:
- Scheduled: Runs daily at 2 AM UTC, deletes resources older than 1 day
- Manual: Run on-demand with custom age threshold and dry-run preview
| Feature | Description |
|---|---|
| Zero Production Impact | Works on snapshots, production untouched |
| Self-Hosted Execution | MongoDB operations run on staging EC2 |
| Resource Management | Track and cleanup old snapshots/volumes |
| Safety First | Dry-run mode, rollback instructions |
| Cost Efficient | Automated cleanup prevents AWS cost buildup |
| Visibility | Per-job logs and comprehensive summaries |
| Automation | Scheduled cleanup, auto-update README |
- Terraform 1.0+
- AWS CLI configured
- AWS credentials with EC2/VPC permissions
- AWS credentials (stored as GitHub Secrets:
AWS_SECRET_ACCESS_ID,AWS_SECRET_ACCESS_KEY) - Self-hosted runner configured on staging EC2
- IAM permissions for EC2, EBS, Snapshots
- GitHub repository access to Actions
- Initial Setup: Deploy infrastructure with Terraform
- Populate Production: Load production data
- Clone to Staging: Use GitHub Actions workflow to sync and anonymize
- Test in Staging: Verify functionality with anonymized data
- Cleanup Resources: Run cleanup workflow to delete old snapshots/volumes
- Repeat: Run sync as needed (on-demand or scheduled)
Per Environment (Monthly):
- EC2 t3.large: ~$60/month
- EBS Root (8GB gp3): ~$0.64/month
- EBS Data (20GB gp3): ~$1.60/month
- Data transfer: Free (same region)
- Total: ~$62/month per environment
Snapshots:
- ~$0.05/GB/month (incremental)
- 20GB snapshot: ~$1/month
- Development/Testing: Safe staging environment with anonymized data
- Compliance: Meet GDPR/CCPA requirements for test data
- Disaster Recovery: Practice restoration procedures
- Performance Testing: Use production-sized datasets
- Training: Onboard new team members safely
Edit terraform/stacks/staging/terraform.tfvars:
instance_type = "t3.large"
root_volume_size = 8
mongodb_data_volume_size = 20Edit workflow environment variables in .github/workflows/prod-to-staging-sync.yml:
env:
AWS_REGION: us-west-2
PROD_INSTANCE_ID: i-0e360e7615a63a796
STAGING_INSTANCE_ID: i-05661b198eb8d9b0aEdit cron schedule in .github/workflows/cleanup-resources.yml:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM UTCCheck MongoDB status:
sudo systemctl status mongodView data volume:
df -h | grep mongodbCount documents:
mongosh --eval "use userdb; db.users.countDocuments()"Workflow fails to find volumes:
- Verify instance IDs in workflow env variables
- Check that volumes are attached to
/dev/sdf - Ensure AWS credentials have EC2 describe permissions
Self-hosted runner offline:
- SSH to staging EC2 and check runner status
- Restart runner service if needed
- Verify GitHub runner token hasn't expired
Cleanup finds 0 resources:
- Check
max_age_dayssetting (use 0 for same-day cleanup) - Verify
.cleanup/resource-tracker.jsonhas entries - Date comparison uses
<=so same-day resources are included
README not updating:
- Check final-summary job logs
- Verify git push succeeded (check for conflicts)
- Ensure workflow has
contents: writepermission
See detailed documentation:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is provided as-is for educational and internal use purposes.
For issues or questions:
- Check the documentation in
ansible/README.md - Review troubleshooting guides
- Check AWS CloudWatch logs
- Verify IAM permissions