Skip to content

Comments

CFDB stack#5

Open
conradbzura wants to merge 11 commits intomasterfrom
ecs-stack
Open

CFDB stack#5
conradbzura wants to merge 11 commits intomasterfrom
ecs-stack

Conversation

@conradbzura
Copy link
Collaborator

@conradbzura conradbzura commented Feb 17, 2026

Summary

Deploy CFDB on ECS Fargate with DocumentDB, replacing the unused X.509 certificate auth with SCRAM (username/password) over TLS.

  • Replace X.509 authentication with SCRAM/TLS in the Python API and Rust materializer — credentials are now parsed from DATABASE_URL, and DocumentDB's retryWrites=false requirement is handled via MONGODB_RETRY_WRITES
  • Add GET /health endpoint for ECS target group health checks
  • Simplify Dockerfile.mongodb to dev-only (remove TLS/X.509 branch); add curl and AWS DocumentDB CA bundle to Dockerfile.api
  • Remove all X.509 artifacts: cert generation script, mongo TLS config, user creation script, -prod Makefile targets
  • Add three CloudFormation templates (cloudformation/): network (VPC, subnets, security groups), database (DocumentDB + Secrets Manager), backend (ECS, ALB, ACM, Route 53)
  • Add ECS deployment step to CI/CD — images tagged with both :latest and short git SHA, then aws ecs update-service --force-new-deployment

Local dev workflow (make mongodb && make api) is unchanged.

Post-merge deployment steps

1. Deploy CloudFormation stacks (in order)

# Network
aws cloudformation deploy \
  --stack-name cfdb-network \
  --template-file cloudformation/network.yml \
  --capabilities CAPABILITY_IAM \
  --region us-east-2

# Database
aws cloudformation deploy \
  --stack-name cfdb-database \
  --template-file cloudformation/database.yml \
  --capabilities CAPABILITY_IAM \
  --region us-east-2 \
  --parameter-overrides \
    NetworkStackName=cfdb-network

# Backend (sync API key is auto-generated in Secrets Manager)
aws cloudformation deploy \
  --stack-name cfdb-backend \
  --template-file cloudformation/backend.yml \
  --capabilities CAPABILITY_IAM \
  --region us-east-2 \
  --parameter-overrides \
    NetworkStackName=cfdb-network \
    DatabaseStackName=cfdb-database \
    ImageURI=605134458779.dkr.ecr.us-east-2.amazonaws.com/cfdb:latest

2. Set GitHub repo secrets

The CI/CD workflow needs two new secrets:

# ECS_CLUSTER is the cluster name (deterministic from stack name)
# → cfdb-backend-cluster

# ECS_SERVICE comes from the backend stack output:
aws cloudformation describe-stacks \
  --stack-name cfdb-backend \
  --region us-east-2 \
  --query 'Stacks[0].Outputs[?OutputKey==`ECSServiceName`].OutputValue' \
  --output text

Set in GitHub → Settings → Secrets and variables → Actions:

Secret Value
ECS_CLUSTER cfdb-backend-cluster
ECS_SERVICE (output from command above)

3. Retrieve the sync API key

The sync API key is auto-generated at stack creation. Retrieve it for use in X-API-Key headers:

aws secretsmanager get-secret-value \
  --secret-id cfdb-backend-sync-api-key \
  --region us-east-2 \
  --query SecretString \
  --output text

4. Update IAM deploy role permissions

The existing AWS_IAM_ROLE used by GitHub Actions needs these additional permissions:

  • ecs:UpdateService
  • ecs:DescribeServices

5. Initial data load

DocumentDB starts empty. Restore data from a host with VPC access (bastion or ECS Exec):

# Get the cluster endpoint
aws docdb describe-db-clusters \
  --db-cluster-identifier cfdb-database-cluster \
  --region us-east-2 \
  --query 'DBClusters[0].Endpoint' \
  --output text

# Get the master password
aws secretsmanager get-secret-value \
  --secret-id cfdb-database-master-credentials \
  --region us-east-2 \
  --query 'SecretString' \
  --output text | python3 -c "import sys,json; print(json.load(sys.stdin)['password'])"

# Restore (from a host inside the VPC)
mongorestore --gzip \
  --uri="mongodb://cfdbadmin:<password>@<endpoint>:27017/cfdb?tls=true&tlsCAFile=/etc/cfdb/certs/global-bundle.pem" \
  /path/to/database/dump

6. Verify

# Health check
curl https://cfdb.vis-api.link/health

# API smoke test
curl -X POST https://cfdb.vis-api.link/metadata \
  -H 'Content-Type: application/json' \
  -d '{"query": "{ files(first: 1) { edges { node { filename } } } }"}'

Test plan

  • Local dev: make mongodb && make api starts successfully, curl http://localhost:8000/health returns 200
  • Docker build: docker build -f Dockerfile.api . succeeds (CA bundle downloaded)
  • CloudFormation: stacks deploy in order without errors
  • ECS: ALB target group shows healthy targets
  • HTTPS: curl https://cfdb.vis-api.link/health returns 200

Drop X.509 certificate authentication in favor of SCRAM (username/password
via DATABASE_URL) with TLS. Add /health endpoint for ECS health checks,
URL redaction for safe logging, and MONGODB_RETRY_WRITES env var support
(DocumentDB requires retryWrites=false).
Simplify Dockerfile.mongodb to dev-only (no TLS/X.509 branch). Add curl
and AWS DocumentDB CA bundle download to Dockerfile.api. Remove prod
targets and cert generation from Makefile. Delete X.509 cert scripts and
config files.
Three standalone templates deployed in order:
- network.yml: VPC (10.2.0.0/21), 2 public + 2 private subnets, ALB/ECS/DocumentDB
  security groups, DocumentDB subnet group, S3 VPC endpoint
- database.yml: DocumentDB 5.0 cluster with Secrets Manager credentials and
  Lambda-backed custom resource to build the full connection URL
- backend.yml: ECS Fargate service behind ALB with HTTPS (ACM), Route 53
  DNS (cfdb.vis-api.link), Secrets Manager injection for DATABASE_URL
After pushing the image to ECR, trigger a rolling ECS deployment via
update-service --force-new-deployment. Tag images with both :latest and
the short git SHA for traceability. ECS cluster/service names are read
from GitHub secrets.
Store the sync API key in Secrets Manager instead of as a plain
environment variable, matching the DATABASE_URL pattern. The execution
role is granted read access to both secrets.
Add VPC Flow Logs to CloudWatch (matching CVH reference pattern) for
network traffic monitoring. Expand the AZ region map from 3 regions to
the full set of 27 regions from the CVH template.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS EC2 security group descriptions only accept ASCII characters.
… instance count to 1; Drop DB instance size to t3.micro; Drop default DB admin username
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant