Automated infrastructure for deploying MongoDB on AWS EC2 with EBS volumes, and safely cloning production data to staging environments with PII anonymization.
| Environment | Last Restored | Status | Documents | Anonymized | Duration |
|---|---|---|---|---|---|
| Staging | 2026-03-03 10:51 UTC | β Success | 10 | β Yes | 2m 56s |
π‘ Tip: This table is automatically updated after each successful sync using the GitHub Actions workflow.
This project provides:
- Terraform Infrastructure: Deploy MongoDB on EC2 with dedicated EBS data volumes
- GitHub Actions Automation: Clone production MongoDB volumes to staging
- Data Anonymization: Automatically anonymize PII data in staging
- Resource Management: Track and cleanup old snapshots/volumes
- CI/CD Ready: Automated workflows with self-hosted runners
.
βββ terraform/ # Infrastructure as Code
β βββ modules/
β β βββ ec2/ # EC2 + EBS module for MongoDB
β βββ stacks/
β βββ production/ # Production environment
β βββ staging/ # Staging environment
βββ mongodb/ # Database scripts
β βββ setup_database.js # Create DB with mock PII data
β βββ anonymize_data.js # Simple anonymization
β βββ anonymize_with_hash.js # Hash-based anonymization
β βββ restore_original_data.js
βββ .cleanup/ # Resource tracking
β βββ resource-tracker.json # Tracked snapshots/volumes
β βββ README.md # Cleanup documentation
βββ .github/
βββ workflows/
βββ prod-to-staging-sync.yml # Production to staging sync
βββ cleanup-resources.yml # Automated resource cleanup
βββ README-SYNC.md # Workflow documentation
βββ QUICKSTART-SYNC.md # Quick start guide
# Deploy staging environment
cd terraform/stacks/staging
terraform init
terraform plan
terraform apply
# Deploy production environment
cd ../production
terraform init
terraform applyWhat gets created:
- EC2 instance with Amazon Linux 2023
- Root volume (8GB) for OS
- Data volume (20GB) for MongoDB
- Security groups for SSH and MongoDB access
- MongoDB 7.0 installed and configured
# SSH into production instance
ssh ec2-user@<production-ip>
# Copy and run the setup script
mongosh < setup_database.jsUsing GitHub Actions Workflow:
- Add AWS credentials to GitHub Secrets (
AWS_SECRET_ACCESS_ID,AWS_SECRET_ACCESS_KEY) - Go to Actions β "Production to Staging DB Sync"
- Click "Run workflow"
- Select anonymization option (default: enabled)
- Monitor progress through 9 separate jobs
π Quick Start | Full Docs
What happens during sync:
- β Validates AWS resources
- β Stops staging MongoDB
- β Creates snapshot from production
- β Swaps EBS volumes
- β Mounts new volume
- β Starts MongoDB
- β Anonymizes PII data (optional)
- β Tracks old resources for cleanup
- β Updates README with sync status
- β EC2 instances with MongoDB 7.0
- β Separate EBS volumes for data
- β Auto-mounting and configuration via user_data
- β Security groups with proper access controls
- β Support for multiple environments (staging/production)
- β XFS filesystem (MongoDB recommended)
- β Automated EBS snapshot creation
- β Volume-based replication
- β Zero-downtime for production
- β Automatic volume swap
- β Self-hosted runner on staging
- β Built-in data anonymization
- β Resource tracking for cleanup
- β Auto-update README status
- β Mock PII data generation
- β Simple anonymization (User 1, User 2, etc.)
- β Hash-based anonymization (non-reversible)
- β Data restoration scripts
- β 10 sample users with realistic PII
The project includes two anonymization strategies:
Simple Anonymization:
- Names β "User [ID]"
- Email β "user[ID]@anonymized.local"
- SSN β "XXX-XX-[ID]"
- Address β Redacted values
Hash-Based Anonymization:
- One-way hash functions
- Non-reversible transformation
- Maintains referential consistency
- Suitable for production-like testing
- EC2 security groups restrict access
- EBS volumes encrypted at rest
- SSM for secure command execution
- IAM roles with least privilege
- Snapshots properly tagged
- Workflow Documentation: Complete workflow guide
- Quick Start Guide: Get started in 3 steps
- Cleanup Guide: Resource cleanup and management
- MongoDB Scripts: Database setup and anonymization
File: prod-to-staging-sync.yml
β¨ Features:
- 9 separate jobs for granular control
- Self-hosted runner on staging EC2
- GitHub-hosted runner for AWS API calls
- Detailed per-job summaries
- Maximum visibility and debugging
- Fail-fast at job level
- ~3-10 minutes duration
- Automatic README updates
- Resource tracking for safe cleanup
π Jobs:
- Setup & Validation - Discover volume IDs
- Stop MongoDB - Stop service on staging
- Create Snapshot - Snapshot production volume
- Swap Volumes - Create and attach new volume
- Mount Volume - Mount on staging EC2
- Start & Verify - Start MongoDB and verify data
- Anonymize Data - Optional PII anonymization
- Track Resources - Track snapshot/volume for cleanup
- Final Summary - Update README and report
File: cleanup-resources.yml
β¨ Features:
- Scheduled daily cleanup (2 AM UTC)
- Manual trigger with dry-run option
- Age-based deletion (default: 1 day)
- Tracks all snapshots and volumes
- Safe rollback capability
- Automatic tracker file updates
Cleanup Options:
- Scheduled: Runs daily at 2 AM UTC, deletes resources older than 1 day
- Manual: Run on-demand with custom age threshold and dry-run preview
| Feature | Description |
|---|---|
| Zero Production Impact | Works on snapshots, production untouched |
| Self-Hosted Execution | MongoDB operations run on staging EC2 |
| Resource Management | Track and cleanup old snapshots/volumes |
| Safety First | Dry-run mode, rollback instructions |
| Cost Efficient | Automated cleanup prevents AWS cost buildup |
| Visibility | Per-job logs and comprehensive summaries |
| Automation | Scheduled cleanup, auto-update README |
- Terraform 1.0+
- AWS CLI configured
- AWS credentials with EC2/VPC permissions
- AWS credentials (stored as GitHub Secrets:
AWS_SECRET_ACCESS_ID,AWS_SECRET_ACCESS_KEY) - Self-hosted runner configured on staging EC2
- IAM permissions for EC2, EBS, Snapshots
- GitHub repository access to Actions
- Initial Setup: Deploy infrastructure with Terraform
- Populate Production: Load production data
- Clone to Staging: Use GitHub Actions workflow to sync and anonymize
- Test in Staging: Verify functionality with anonymized data
- Cleanup Resources: Run cleanup workflow to delete old snapshots/volumes
- Repeat: Run sync as needed (on-demand or scheduled)
Per Environment (Monthly):
- EC2 t3.large: ~$60/month
- EBS Root (8GB gp3): ~$0.64/month
- EBS Data (20GB gp3): ~$1.60/month
- Data transfer: Free (same region)
- Total: ~$62/month per environment
Snapshots:
- ~$0.05/GB/month (incremental)
- 20GB snapshot: ~$1/month
- Development/Testing: Safe staging environment with anonymized data
- Compliance: Meet GDPR/CCPA requirements for test data
- Disaster Recovery: Practice restoration procedures
- Performance Testing: Use production-sized datasets
- Training: Onboard new team members safely
Edit terraform/stacks/staging/terraform.tfvars:
instance_type = "t3.large"
root_volume_size = 8
mongodb_data_volume_size = 20Edit workflow environment variables in .github/workflows/prod-to-staging-sync.yml:
env:
AWS_REGION: us-west-2
PROD_INSTANCE_ID: i-0e360e7615a63a796
STAGING_INSTANCE_ID: i-05661b198eb8d9b0aEdit cron schedule in .github/workflows/cleanup-resources.yml:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM UTCCheck MongoDB status:
sudo systemctl status mongodView data volume:
df -h | grep mongodbCount documents:
mongosh --eval "use userdb; db.users.countDocuments()"Workflow fails to find volumes:
- Verify instance IDs in workflow env variables
- Check that volumes are attached to
/dev/sdf - Ensure AWS credentials have EC2 describe permissions
Self-hosted runner offline:
- SSH to staging EC2 and check runner status
- Restart runner service if needed
- Verify GitHub runner token hasn't expired
Cleanup finds 0 resources:
- Check
max_age_dayssetting (use 0 for same-day cleanup) - Verify
.cleanup/resource-tracker.jsonhas entries - Date comparison uses
<=so same-day resources are included
README not updating:
- Check final-summary job logs
- Verify git push succeeded (check for conflicts)
- Ensure workflow has
contents: writepermission
See detailed documentation:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is provided as-is for educational and internal use purposes.
For issues or questions:
- Check the documentation in
ansible/README.md - Review troubleshooting guides
- Check AWS CloudWatch logs
- Verify IAM permissions